cgi.escape seems like one possible choice. Does it work well? Is there something that is considered better?
cgi.escape is fine. It escapes:
That is enough for all HTML.
EDIT: If you have non-ascii chars you also want to escape, for inclusion in another encoded document that uses a different encoding, like Craig says, just use:
Don"t forget to decode
unicode first, using whatever encoding it was encoded.
However in my experience that kind of encoding is useless if you just work with
unicode all the time from start. Just encode at the end to the encoding specified in the document header (
utf-8 for maximum compatibility).
>>> cgi.escape(u"<a>b√°</a>").encode("ascii", "xmlcharrefreplace") "<a>bá</a>
Also worth of note (thanks Greg) is the extra
cgi.escape takes. With it set to
cgi.escape also escapes double quote chars (
") so you can use the resulting value in a XML/HTML attribute.
EDIT: Note that cgi.escape has been deprecated in Python 3.2 in favor of
html.escape, which does the same except that
quote defaults to True.
In Python 3.2 a new
html module was introduced, which is used for escaping reserved characters from HTML markup.
It has one function
>>> import html >>> html.escape("x > 2 && x < 7 single quote: " double quote: "") "x > 2 && x < 7 single quote: ' double quote: ""