HTMLFilter, in its first public standalone release, is a module for Python
programs. It parses an HTML 4 document, allowing subclasses to
pass through or modify the the text and tags as
they go by. The resulting copy will be an
otherwise exact replica of the original, including whitespace
and comments. ASP, PHP, JSP, or other server-side code will
generally survive the round trip. (The only exception is
if the code is embedded inside an HTML tag you’re actually
modifying, not just passing through, and in most cases any tag attributes not explicitly modified are safe.)
The use can be as simple as adding a <meta> tag to an
existing web page without disturbing the rest, or as complex as merging two HTML pages (as
it’s used in ShearerSite,
which intelligently merges content pages into template pages).
You can also use it to generate HTML from scratch, with HTMLFilter
taking care of the attribute encoding for tags.
HTMLFilter. Python-licensed. Unicode and encoding-savvy. Tested with Python 1.5.2 through 2.3.