Friday, 29 June 2012

Python - encode XML escape characters

Anyway as part of my iPlayer podcast project I need to create an RSS feed,which is nothing more complicated than a XML file which conforms to a specific schema.  Its a pretty simple schema so I decided rather than using an XML parser, I would just write strings to a file - this created a problem, I need to encode any escape characters which appeared in the text, otherwise the xml created was invalid and most RSS readers wouldn't accept it.

There are 5 escape characters which need to be encoded:

"      ->     "
'      ->     '
<      ->     &lt;
>      ->     &gt;
&      ->     &amp;

So I created a really simple python function which I added to my create RSS feed program.

# encode xml escape characters
def encodeXMLText(text):
    text = text.replace("&", "&amp;")
    text = text.replace("\"", "&quot;")
    text = text.replace("'", "&apos;")
    text = text.replace("<", "&lt;")
    text = text.replace(">", "&gt;")
    return text

Which is called by passing the text I need to make xml safe:

xmlText = encodeXMLText("Some text with escape characters & " ' < > you want to make xml safe")

No comments:

Post a Comment