Encoding types in etree
Setting encoding in the header of ElementTree generated XML.
Not that I'm picking on ElementTree or anything but ...
Normally XML documents will declare their character encoding in their opening tag. And, normally, this will be utf8. (This may be a standard.) So this is a common sight:
xml version='1.0' encoding='utf8'?>
An oddity in ElementTree is that it assumes 'us-ascii' as the default encoding. The obvious answer is to explicitly provide an alternative encoding if this doesn't suit. This brings up another oddity: ElementTree actually prevents you from explicitly providing the encoding in the output if it is 'utf8' or 'us-ascii':
def write(self, file, encoding="us-ascii"): # the file is opened and then ... if not encoding: encoding = "us-ascii" elif encoding != "utf-8" and encoding != "us-ascii": file.write("xml version='1.0' encoding='%s'?> " % encoding) self._write(file, self._root, encoding, {})
There's probably a good reason for this, but in the meanwhile, it's easy to fix. Write your own simple XML tree printing function that manually handles the header:
def write_etree (etree, file, encoding="us-ascii"): # etree is an ElemenTree # the file is opened and then ... self.hndl.write (" " % self.encoding) etree._write (file, etree._root, encoding, {})