Shearer Software

Andrew Shearer’s Drivel

99 and 44/100 percent pure.

Tuesday, September 11, 2007

RI Nexus Beta

Check out the beta of the new RI Nexus site, full of news and resources about information techonology and digital media in Rhode Island.

I wrote some new Drupal modules to support it. Feedback and suggestions are welcome!

   PHP, Drupal, Providence, Open Source, Technology, General  Posted at 10:11 PM    Add a comment
Tuesday, March 27, 2007

Providence PHP Meetup: Tuesday, April 3

We have a new location and a guest speaker for our April meetup. Nate Abele of the CakePHP project will be here to show off the rapid web development framework and answer questions. We’ll make time for discussion too.

The new location is a really nice conference room at the Johnson & Wales Academic Center, with everything that implies (i.e. a projector).

All programming skill levels welcome. If you’re going to be in the Providence, RI area and can make it, please see here for more details and to RSVP:

Providence PHP April Meetup [meetup.com]

   Providence, Open Source, Technology, Software, General  Posted at 6:28 PM    Add a comment
Monday, January 8, 2007

Providence PHP Meetup: Tuesday, Feb. 6

Lots of geeky news. I took over officially as organizer of the Providence PHP meetup this month, and our next event is at 729 Hope St on Tuesday, February 6 at 7 PM. So join us for coffee, pastry, a wide-ranging, informal discussion of anything related to programming with PHP, or all three.

This time, we’ll probably share some of the projects we’re working on, so bring some screenshots or a quick demo if you’d like. (If this starts to run long, we can always go into more depth next month.)

Please RSVP.

   Providence, Technology, Software, General  Posted at 10:29 PM    Add a comment

Web Developers Lunch Hour this Thursday, Jan. 11

If your usual lunch crowd doesn’t talk enough about computers for your taste, escape with us to the monthly Providence Web Developers Lunch Hour. (Chris, the usual organizer, won’t be able to make it, so I’m hosting in his place.) Please RSVP here.

   Technology, Software, General  Posted at 10:19 PM    Add a comment
Thursday, November 9, 2006

Providence Web Developers Lunch Hour

If you’re interested in web development topics, work in the Providence area, and eat food, you’d be perfect for the lunch hour meetup event happening downtown today (Thursday, Nov. 9) at noon. I’ll be filling in for the regular organizer.

For more information and to RSVP, please visit the event’s Meetup page.

   Providence, Technology, General  Posted at 2:06 AM    Add a comment
Thursday, February 24, 2005

91% of Windows PCs are infected?

Why does Windows still suck?

The most surprising figure in the article–that 91% of PCs are infected (or maybe the word should be “infested”)–sounds high, but gets some anecdotal support in comments in Brent Simmons’ weblog.

   Technology, Software, General  Posted at 9:24 PM    Add a comment
Thursday, June 10, 2004

Why We Aren’t Safe

Broken Windows: With viruses, worms, and vulnerabilities in the news, John Gruber wrote an excellent piece. “Here’ s a billion-dollar question: Why are Windows users besieged by security exploits, but Mac users are not?”

And, like clockwork, here comes the latest Windows vulnerability:

Internet Explorer Carved Up By Zero-Day Hole:
“Two new vulnerabilities have been discovered in Internet Explorer which allow a complete bypass of security and provide system access to a computer, including the installation of files on someone’s hard disk without their knowledge, through a single click.

Worse, the holes have been discovered from analysis of an existing link on the Internet and a fully functional demonstration of the exploit have been produced and been shown to affect even fully patched versions of Explorer.

It has been rated ‘extremely critical’ by security company Secunia, and the only advice is to disable Active Scripting support for all but trusted websites.”

The article goes on to say that the code exploits three holes in Internet Explorer for Windows, including one that has been known since August 2003, and there’s no patch available for any of them. (You could turn off Active Scripting, which breaks functionality on many sites, or stop browsing web sites you don’t trust completely. If that’s not acceptable, you have to switch another browser such as Mozilla , or switch to a Mac.)

   Mac OS X, Technology, General  Posted at 8:24 AM    Comments (1)
Sunday, January 18, 2004

Counterfeiting Restrictions and Unintended Consequences

Macintouch has some interesting commentary on anti-counterfeiting measures that Adobe quietly slipped into Photoshop CS. The program now detects images containing currency and prevents you from working with them, even though doing so is perfectly legal, as long as you don’t then make a printout that’s double-sided or very close in size to the original.

[Tim Wright] It would be fairly easy to create other documents which would mistrigger this pattern [described in eurion.pdf].
Now the cat is out of the bag, I fully expect this to start appearing on magazine page backgrounds, books, any documents considered “sensitive”, grocery coupons, etc, which will rapidly render colour photocopiers pretty useless until they disable this feature.
For more amusement, why not put it onto t-shirts or baseball caps, which will neatly prevent people from printing (or editing) photos of you? I’m sure more inventive people will be able to think of plenty of other uses, like car decorations, wallpaper, badges and so on…

   Society, Technology  Posted at 3:41 PM    Add a comment
Wednesday, January 14, 2004

The Hole in Postel’s Law

“Be conservative in what you do, be liberal in what you accept from others.”

This law is making the rounds again, with arguments both pro and con. Here are my thoughts.

Postel’s Law is a great, useful principle for writing programs that communicate. However, the law is so elegant and successful that it’s easy to regard it as an absolute. And then, because be liberal in what you accept is such an open-ended goal, people go too far. Here’s an analysis of the problem, followed by a suggestion.

The first half of Postel’s Law, be conservative in what you transmit, is a well-specified rule with a clearly defined goal. The tools to achieve it are specs and validators. But the vague goal of the other half, to be liberal in what you accept, can turn into a bottomless hole. There’s hardly any limit to how loose an interpretation of the spec can get, how cleverly the code can guess at the sender’s intent, and how much code for special cases you can write to fix invalid data. Because such code can provide an immediate user benefit and a market advantage, it turns into an arms race. Often, the code ends up violating the spec itself, intentionally or unintentionally, which we’ll see below.

The Growing Hole

Plenty has been written praising be liberal in what you accept. So I won’t repeat it. Here are some of the problems:

It enlarges the spec. Every additional error condition fixed by a market leader becomes an (undocumented) part of the spec. Senders come to rely on it. The senders probably don’t even realize that their output is wrong because of the way software is written.

In the edit-run-debug cycle of the typical software development process, testing is often done just by trying the program out, not through any mathematical process or formal validation suite. HTML authoring tends to be done the same way. Though modern XP [Extreme Programming] test-first practices call for a thorough suite of test cases to be written before the actual code, most software still doesn’t have this advantage. HTML is an easy case for validation, with scores of easily accessible validators already written, much easier to test than most program code, yet the bulk of new pages in the world have probably never been through an HTML validator.

The problem is that, even after removing all obvious bugs, the product of this run-test-debug cycle can only run at the “seems-to-work” level. There’s no guarantee that the it’s really working, and specifically no proof that the program or web page is being conservative in what it sends. If it’s a program that communicates with other types of programs, the developer will test it with real examples of those programs. So, when a developer writing program Z needs to interoperate with programs such as A and B, and A and B are silently fixing errors in the output of program Z, Z’s developer will declare the code “working” (because to all appearances, it is), and say “ship it!”. And everything will be fine until an edge case comes along that program A or program B either can’t fix or interpret differently. Or until someone tries program Z with program C, which didn’t get the memo about all the particular types of errors that programs A and B fix. All this because Z had a latent bug, due to the second half of Postel’s Law, because:

It hides violations of the other half of Postel’s Law. In other words, by being more liberal on the receiver, it becomes more difficult to find bugs in the sender.

As an example, Microsoft Internet Explorer sports what some have called a “ridiculous tolerance for errors in HTML markup”. Microsoft FrontPage has a well-known tendency to silently create invalid HTML markup. (One of the bugs: FrontPage 98 and 2000 will occasionally go through a valid page with spacer images and replace all of their alt=”" attributes with the lone word alt, which is invalid HTML. A developer familiar with the SGML foundations of HTML might think the fix is to parse this as a boolean attribute, alt=”alt”, but IE and other browsers choose to interpret it as alt=”".) Though I doubt that any such bugs are intentional, the tendencies of the two products feed on each other. If the developers of FrontPage were testing with a browser that flagged such errors, it’s likely that the bugs wouldn’t have made it to release.

The bind here is that Postel’s Law tries to make things work as often as possible for users, but people trying to test other programs are users too, and errors are also covered up for them. One way out of this would be some kind of Postel Kill Switch, a strict mode intended for interoperability testing. (Turning off the other half of the law at the same time, causing the program to send out data malformed in various ways, would be harder to switch on programmatically.) Though the strict mode might do some good, it has some drawbacks: it would require a different code path, making it prudent to test both modes; and even without the extra work that would entail, testers might not bother turning the feature on every time in the first place.

Market Forces

Even though it’s usually more work to be more liberal, developers with time or money on their hands will still do it. They are often motivated just to provide convenience for their users, but with competitors in the same market, it has a predictable effect:

It increases the cost of entry. Accepting everything is a greedy strategy. It rewards the incumbents, and makes more work for newcomers. Not only do the newcomers have to catch up with all the error-fixing logic that the market leaders have been writing since the beginning, they have to somehow figure out what all those error conditions are. They’re not in the spec, and it’s almost certain that they’re not publicly documented anywhere. Even if the types of errors to be fixed were known, the new programs would have to fix them exactly the same way as the old ones, even in the face of multiple overlapping errors or ambiguous edge cases. And in some cases, this may require disregarding the spec, deliberately misinterpreting a valid document to match an overzealous fix.

Safety

This leads to one of the most damning consequences:

It makes software unreliable. Even the safest-looking fix can have unexpected consequences once others depend on it. (Which they will, and, unless the fix was added purely on speculation, already do.)

For instance, if you’re writing an HTML parser, and you see a lone ampersand (technically illegal–it should be encoded as &) the liberally accepting thing to do is to display an ampersand, just as if it had been encoded properly. Which is fine, at that moment. If the users knew what had happened, they would probably thank you for soldiering on through the rest of the document and not giving up right there. But in reality they don’t even know it happened, and as the years go by, they will keep turning out pages with unencoded ampersands. (It’s the testers-are-also-users problem again.) New high-end content management systems will be deployed without anyone working with the system even knowing that they’re entering raw HTML into some of the text fields, and that they have to be careful with ampersands (yes, this already happens). A validator may catch the problem if it happens to crop up on the page at the time it’s checked, but most likely, no one will notice until the unlucky day that someone writes a classified for an electric guitar setup saying “For Sale: guitar& $200.” Then the amp will just mysteriously disappear on the post, putting a guitar and $200 on sale. (If you think the example is contrived, note that in another attempt to apply Postel’s Law, real-world browsers end up expanding the error domain even further: “guitars&amplifiers” will have three letters dropped out of it, because the first browsers judged that to be most likely what the author intended. However, if you added spaces around the punctuation, whole words would show up. This is the kind of bizarre behaviour that makes people distrust computers.)

At its root, the ampersand problem is really just confusion over a weakly specified input format. (You can find similar examples on display in comment forums across the web, which often treat visitors to the spectacle of a web developer repeatedly trying to describe an HTML tag, only to have the tag itself disappear.) However, in this case being liberally accepting didn’t fix the problem; it just made its symptoms more rare, and therefore the real problem harder to find, more capricious, and more puzzling.

In an effort to do the right thing, some programs intentionally go against the spec. Internet Explorer (and therefore Outlook, when opening HTML mail) will disbelieve the content type specified by the web server, and choose a different type itself based on heuristics, a behaviour which is even documented. An XHTML document might not be rendered if it starts with a comment that’s too long, or a plain text file might be parsed as HTML because it contained a tag-like sequence of characters. The HTTP spec specifically forbids browsers to second-guess the content type provided by the server, but IE does it anyway. This makes IE compatible with many badly-configured web servers. It also frustrates the owners of well-configured web servers for whom IE always guesses wrongly.

In certain cases, outright bugs in complex code designed to tolerate many errors has the ironic effect of limiting the spec. For example, RSS is based on XML, but because of the existence of RSS feeds with invalid XML, liberal RSS parsers can’t be based on real XML parsers. Real XML parsers are thoroughly tested and widely deployed. But instead, the developers have to roll their own quasi-XML parsers (increasing the barrier to entry). The chance of getting some part of the XML spec wrong is high (making the software unreliable). This in turn has made feed developers reluctant at various times to begin using any XML features that don’t already appear in the most common feeds, such as CDATA blocks in the description element, namespaces, and XML comments, because they might break regexp-based parsers. (Mark Pilgrim’s Ultra Liberal Feed Parser is a solution for Python programmers, and while it gets everything right as far as I know, it still doesn’t much help developers in other languages.)

In this example, XML is special, because the XML spec itself violates Postel’s Law. It calls for clients to terminate parsing entirely when they encounter malformed content. While it may have been better if this decision hadn’t been made, that’s the current reality of XML parsers. Replacing them all with less flighty ones would be nice. (Any takers?)

Security

Finally, security. A whole class of security vulnerabilities results from automatically fixing errors in input data. Because the set of errors to be fixed is ill-defined, software downstream can take a radically different action than what the software upstream thought possible. Malicious users can exploit this.

Think of the difficulty just of reliably filtering out dangerous HTML tags and attributes from a comment left on a web site. The browser is working as hard as possible to be liberal in its definition of an HTML tag, working by unknown rules to fix almost-tags. Can the author of such a filter ever be truly certain that nothing gets through? (Thinking about this, the only sure way around it without writing an entire HTML validator would be to fully parse the HTML input into an intermediate HTML-free representation, then write it back out as guaranteed-valid HTML code. The only thing left to worry about: an overzealous fix that would cause the valid code to be misinterpreted.)

The rule: Arbitrary fixes to bad input data will thwart any previous filtering or security checking of the data.

What to do?

A Suggestion

Future specs could require implementations to report whenever they encounter and correct errors, with an interface that could be as simple and non-intrusive as an exclamation point icon. (A newsreader, for instance, would place it next to a suspect newsfeed and link it to the Feed Validator.) There’s nothing particularly new about this kind of interface; several products, such as Opera, already do something similar. The trick would be that that it would be required by the spec. The market leaders would be compelled to adopt it, not just the smaller products.

This behavior wouldn’t hamper a program’s ability to accept liberally; it would just let testers and other interested users know that the data had not been sent conservatively. It would thus remove the conflict of interest between the two parts of the law. The feature would be on by default, so testers wouldn’t need to activate it, but it wouldn’t be so annoying that users wanted it off (as a modal alert box would be).

This doesn’t mean that each implementation has to have a full-fleged validator aboard. Only errors detectable by reasonably straightforward means and cases where the implementation goes to extra lengths to make sense of the input would have to be flagged. That does give implementations some wiggle room.

It’s important that this minor error display mechanism be required in order to comply with the spec. It can’t be voluntary on the part of the implementors. There’s nothing in it for them, at least not directly. To record the error as it’s fixed and display the fact takes extra code, albeit not much. Considering that the benefit goes mainly to future implementors as well as users of less liberal implementations that don’t know how to handle the same error, implementors will tend not the write that code unless nudged.

And the developers can be nudged, even for specs without trademarks or an official logo program. Having the requirement enshrined in the spec at least provides some social pressure for implementors to comply.

And some other things that seem to make sense right now:

  • Developers should also take great care to hold back and not misinterpret technically valid input in an attempt to do the right thing. Internet Explorer’s habit of second-guessing the Content-Type header is the kind of thing to avoid.
  • By the same token, to provide tolerant XML parsing, use a real standards-compliant XML parser first, and fall back to a handcoded quasi-XML parser only when that fails. (Or, if you can absolutely guarantee that the result will be identical, use the quasi-XML parser alone, but that guarantee is hard to make.)
  • To avoid unintentionally thwarting security filters, all heroic fixes to input should be made as far upstream in the call chain as possible. If there’s still a danger the downstream code will try to outsmart the upstream code, the upstream code could rewrite the input to be canonical and unmisinterpretable.
   Technology  Posted at 7:14 AM    Comments (1)
Sunday, September 28, 2003

John Cleese on Technology

Some good quotes here from a recent speech.

   Technology, General  Posted at 3:03 PM    Add a comment

Listamatic

CSS techniques for great-looking HTML lists, demonstrated.

   Technology, General  Posted at 3:03 PM    Add a comment
Saturday, September 6, 2003

Moving to a Mac

A small integration project dispels some preconceptions about OS X [InfoWorld: Application Development]

   Technology, General  Posted at 5:05 PM    Add a comment

Clay Shirky: Fame vs Fortune: Micropayments…

Clay Shirky: Fame vs Fortune: Micropayments and Free Content. The answer is simple: creators are not publishers, and putting the power to publish directly into their hands does not make them publishers. It makes them artists with printing presses. This matters because creative people crave attention in a way publishers do not. [Tomalak’s Realm]

   Society, Technology, General  Posted at 1:01 AM    Add a comment
Thursday, August 14, 2003

Google calculator

Google’s newest feature is full of Easter eggs. Finally, something that could replace Python as my desktop calculator, at least when I’m online.

Wouldn’t an offline version be a great idea? I don’t know of any current calculator programs that have such a simple natural-language interface. [dive into mark]

   Technology, General  Posted at 9:09 PM    Add a comment
Wednesday, August 13, 2003

Googleholes and the Unofficial Voice

Slate: Digging for Googleholes.

If you’re searching for something that can be sold online, Google’s top results skew very heavily toward stores, and away from general information…

The same goes for searching for specific products: Type in the make and model of a new DVD player, and you’ll get dozens of online electronic stores in the top results, all of them eager to sell you the item. But you have to burrow through the results to find an impartial product review that doesn’t appear in an online catalog.

Running into this problem myself, I’ve sometimes wished for a “comments-only” search option, to only show me unofficial information. The official information is always just copies of the same thing. Google Groups comes close, but it misses the vast majority of discussions nowadays, which tend to happen on web-based forums and review sites.

This idea comes surprisingly close to Andrew Orlowski’s blog-separation fixation.

   Technology, General  Posted at 11:11 PM    Add a comment
Wednesday, August 6, 2003

WiFi is too expensive when it’s not free

Scott Rafer writes:

The fully loaded cost of offering free Wi-Fi access is less than $6/day. Operating a billable hotspot costs over $30/day…. Here’s the irony in Wi-Fi public access pricing: retailers can be profitable by offering free Wi-Fi as a customer acquisition tool. But when they charge for Wi-Fi access, these retailers, and the WISPs serving them, almost certainly lose money. According to a market study coming out this summer, retailers are quickly learning this lesson: up to 30% of US location owners who plan to deploy commercial hotspots in 2004 intend those hotspots to be free or free-with-purchase.

[via WiFi Networking News and Boing Boing Blog]

   Technology, General  Posted at 1:01 AM    Add a comment
Tuesday, August 5, 2003

RFID Chips are Here. Or not.

Here one week, gone the next.

The Register: RFID Chips are Here.

CNET News: Wal-Mart cancels ’smart-shelf’ trial. The retail giant cancels testing for an experimental wireless inventory control system, ending one of the most closely watched efforts to bring RFID technology to store shelves.

   Technology, General  Posted at 6:06 PM    Add a comment
Thursday, July 31, 2003

NY Times: A Safer System for…

NY Times: A Safer System for Home PC’s Feels Like Jail to Some Critics. But by entwining PC software and data in an impenetrable layer of encryption, critics argue, the companies may be destroying the very openness that has been at the heart of computing in the three decades since the PC was introduced. [via Tomalak’s Realm]

   Technology, General  Posted at 3:03 AM    Add a comment

Winning the browser peace

Jon Udell on cross-browser JavaScript programming:

Mozilla has emerged from its long nuclear winter to become a pillar of the Linux desktop. Alpha geeks everywhere (including Sun and Microsoft) are running Safari on their PowerBooks. But here’s the reality check you knew was coming: cross-browser and cross-OS compatibility remains nearly as elusive as ever. I won’t bore you with the details. Let’s just say that testing CSS and JavaScript effects on the three major OS platforms, in six different browsers, isn’t a good use of anybody’s time. [Full story at InfoWorld.com]

[Jon’s Radio]

   Technology, General  Posted at 2:02 AM    Comments (1)
Saturday, June 7, 2003

Rage against the fruit machines

A UK activist clade is taking on the insidious digital fruit machine (AKA, “slot machine”). These things are supposed to be random and fair, but by design or by glitch, the pubside gambling systems are anything but.

Fruit machines cheat you on practically every spin of the reels. Almost every spin is entirely predetermined - which symbols are going to drop in, whether you’re going to be awarded nudges, which numbers the “random” stop will land on, the lot. Ever had two cherries on the win line, not held them, then watched the third one drop in on the next spin and thought, “Damn, if I’d held them I’d have won”? Well, you wouldn’t. If you’d held the two cherries, the machine would have dropped in a different symbol. And now we can prove it.

LinkDiscuss

(via NTK)

[via Boing Boing Blog]

Nice technique for proving program behavior without looking at its code. To summarize: they run the machine’s software inside a program on a regular desktop computer, and save the state of RAM at various points. Then they can “go back in time” at will by restoring RAM, and try making different choices.

By the way, wouldn’t it be interesting if you could do that with the real world? You could try out different things to see how people, or the world, would react, without risk of harm. Physics with “undo”. Heaven for opportunists.

   Technology, General  Posted at 2:02 PM    Add a comment
July 2009
M T W T F S S
« Nov    
 12345
6789101112
13141516171819
20212223242526
2728293031  
Recent Reading

A Heartbreaking Work of Staggering Genius, by Dave Eggers

Harry Potter and the Order of the Phoenix, by J. K. Rowling

Player Piano, by Kurt Vonnegut

Bad News, by Donald E. Westlake

The Blank Slate: The Modern Denial of Human Nature, by Steven Pinker

The Jungle, by Upton Sinclair

Gödel, Escher, Bach: An Eternal Golden Braid, by Douglas R. Hofstadter

Speaking With the Angel, by Nick Hornby (Editor)

In Progress

The Language Instinct, by Steven Pinker

The Corrections, by Jonathan Franzen