How to find elements by 'id' field in SVG file using Python


Question

Below is an excerpt from an .svg file (which is xml):

   <text
       xml:space="preserve"
       style="font-size:14.19380379px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:start;line-height:125%;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;font-family:DejaVu Sans Mono;-inkscape-font-specification:DejaVu Sans Mono"
       x="109.38555"
       y="407.02847"
       id="libcode-00"
       sodipodi:linespacing="125%"
       inkscape:label="#text4638"><tspan
         sodipodi:role="line"
         id="tspan4640"
         x="109.38555"
         y="407.02847">12345678</tspan></text>

I'm learning Python and have no clue how can I find all such text elements that have an id field equal to libcode-XX where XX is a number.

I've loaded this .svg file using minidom's parser and tried to find elements using getElementById. However I'm getting None result.

    svgTemplate = minidom.parse(svgFile)
    print svgTemplate
    print svgTemplate.getElementById('libcode-00')

Going after other SO question I've tried using setIdAttribute('id') on svgTemplate object with no luck.

Bottom line: please give a hint for a smart way to extract all of these text elements that have ids in form of libcode-XX. After that it should be no problem to get tspan text and substitute it with generated content.

1
3
3/1/2010 9:49:49 PM

Accepted Answer

Sorry, I don't know my way around minidom. Also, I had to find the namespace declarations from a sample svg document so that your excerpt could load.

I personally use lxml.etree. I'd recommend that you use XPATH for addressing parts of your XML document. It's pretty powerful and there's help here on SO if you're struggling.

There are lots of answers on SO about XPATH and etree. I've written several.

from lxml import etree
data = """
 <svg
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:cc="http://web.resource.org/cc/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:svg="http://www.w3.org/2000/svg"
    xmlns="http://www.w3.org/2000/svg"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
    xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
    width="50"
    height="25"
    id="svg2"
    sodipodi:version="0.32"
    inkscape:version="0.45.1"
    version="1.0"
    sodipodi:docbase="/home/tcooksey/Projects/qt-4.4/demos/embedded/embeddedsvgviewer/files"
    sodipodi:docname="v-slider-handle.svg"
    inkscape:output_extension="org.inkscape.output.svg.inkscape">
    <text
       xml:space="preserve"
       style="font-size:14.19380379px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:start;line-height:125%;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;font-family:DejaVu Sans Mono;-inkscape-font-specification:DejaVu Sans Mono"
       x="109.38555"
       y="407.02847"
       id="libcode-00"
       sodipodi:linespacing="125%"
       inkscape:label="#text4638"><tspan
         sodipodi:role="line"
         id="tspan4640"
         x="109.38555"
         y="407.02847">12345678</tspan></text>
    </svg>
"""

nsmap = {
    'sodipodi': 'http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd',
    'cc': 'http://web.resource.org/cc/',
    'svg': 'http://www.w3.org/2000/svg',
    'dc': 'http://purl.org/dc/elements/1.1/',
    'xlink': 'http://www.w3.org/1999/xlink',
    'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
    'inkscape': 'http://www.inkscape.org/namespaces/inkscape'
    }


data = etree.XML(data)

# All svg text elements
>>> data.xpath('//svg:text',namespaces=nsmap)
[<Element {http://www.w3.org/2000/svg}text at b7cfc9dc>]
# All svg text elements with id="libcode-00"
>>> data.xpath('//svg:text[@id="libcode-00"]',namespaces=nsmap)
[<Element {http://www.w3.org/2000/svg}text at b7cfc9dc>]
# TSPAN child elements of text elements with id="libcode-00"
>>> data.xpath('//svg:text[@id="libcode-00"]/svg:tspan',namespaces=nsmap)
[<Element {http://www.w3.org/2000/svg}tspan at b7cfc964>]
# All text elements with id starting with "libcode"
>>> data.xpath('//svg:text[fn:startswith(@id,"libcode")]',namespaces=nsmap)
[<Element {http://www.w3.org/2000/svg}text at b7cfcc34>]
# Iterate text elements, access tspan child
>>> for elem in data.xpath('//svg:text[fn:startswith(@id,"libcode")]',namespaces=nsmap):
...     tp = elem.xpath('./svg:tspan',namespaces=nsmap)[0]
...     tp.text = "new text"

open("newfile.svg","w").write(etree.tostring(data))
10
3/2/2010 8:05:09 AM

Adding a little bit to MattH's great example when you use xpath and you know the namespace you can do things like

pub_name = data.xpath('//dc:publisher/cc:Agent/dc:title',
                            namespaces=nsmap)[0].text

This will give direct access to the element tag text that you want.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon