XMLFilter

Parse and modify XML anywhere

XMLFilter is a module for Python programs. It augments or stands in for xml.sax in four ways:

  1. It provides an xml.sax-compatible XML parser even when the installed version of Python lacks a working copy of xml.sax. This allows your scripts to work when expat is missing, such as in the version of Python bundled with Mac OS X 10.2 (Jaguar), which would otherwise give the error message:
    xml.sax._exceptions.SAXReaderNotAvailable: No parsers found
    It gets around the problem by falling back to the pure-Python xmllib and adapting it to match xml.sax’s callbacks. A test suite verifies call-for-call compatibility, so after substituting XMLFilter for xml.sax.handler.ContentHandler, your existing scripts should run unmodified.
  2. It allows subclasses to filter, modify, add and delete content from an XML file with minimal disruption to the rest of the file. Multiple filters can be chained in series.
  3. It allows output to be written to an XML file. This works even without xml.sax, and avoids an xml.sax.saxutils.XMLGenerator bug in Python 2.2.
  4. It allows programs to hint that they want to write particular chunks of content to an XML file as CDATA, using a method that’s fully compatible with code (SAX handlers, filters, or output handlers) that doesn’t have any special CDATA support. This is useful for RSS files embedding HTML or other data that would be unwieldy after XML entity encoding.

If xml.sax is working, the code uses it in preference to the older xmllib for a factor-of-3 performance boost. Namespaces are fully supported, and can be switched on or off. Character encodings are supported on Unicode-aware versions of Python.

XMLFilter has been successfully tested with versions of Python ranging from 1.5.2 to 2.3.

It is distributed under a Python license.

XMLFilter 1.1 Download

[Download .zip Archive] Windows CRLF format, 14K

[Download .tgz Archive] Unix/Linux/Mac OS X LF format, 12K

(The XML test suite has been stripped down to a couple of files for distribution. If you want, you can put your own XML files into the test folder, and the test suite will verify that the generated SAX event sequences match with and without the use of xml.sax.)

Example

For sample code using XMLFilter as a safe drop-in replacement for xml.sax, look at my PList reader.

For a more complex example, see RSSFilter, which uses XMLFilter’s filter chaining to perform operations on an RSS file in place, such as getting all posts that match given criteria, or adding, modifying, or deleting a post.