How do I parse XML in Python?


Question

I have many rows in a database that contains xml and I'm trying to write a Python script that will go through those rows and count how many instances of a particular node attribute show up. For instance, my tree looks like:

<foo>
   <bar>
      <type foobar="1"/>
      <type foobar="2"/>
   </bar>
</foo>

How can I access the attributes 1 and 2 in the XML using Python?

1
909
4/9/2015 11:47:42 AM

Accepted Answer

I suggest ElementTree. There are other compatible implementations of the same API, such as lxml, and cElementTree in the Python standard library itself; but, in this context, what they chiefly add is even more speed -- the ease of programming part depends on the API, which ElementTree defines.

First build an Element instance root from the XML, e.g. with the XML function, or by parsing a file with something like:

import xml.etree.ElementTree as ET
root = ET.parse('thefile.xml').getroot()

Or any of the many other ways shown at ElementTree. Then do something like:

for type_tag in root.findall('bar/type'):
    value = type_tag.get('foobar')
    print(value)

And similar, usually pretty simple, code patterns.

703
3/27/2019 8:25:49 PM

minidom is the quickest and pretty straight forward:

XML:

<data>
    <items>
        <item name="item1"></item>
        <item name="item2"></item>
        <item name="item3"></item>
        <item name="item4"></item>
    </items>
</data>

PYTHON:

from xml.dom import minidom
xmldoc = minidom.parse('items.xml')
itemlist = xmldoc.getElementsByTagName('item')
print(len(itemlist))
print(itemlist[0].attributes['name'].value)
for s in itemlist:
    print(s.attributes['name'].value)

OUTPUT

4
item1
item1
item2
item3
item4

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon