How to Update XML Files in Python?

Author: neptune | 01st-Jul-2024
🏷️ #Python

XML (Extensible Markup Language) is widely used for representing structured data. Python provides robust libraries for handling XML files. This article covers how to update XML files in Python, focusing on updating element values, deleting elements, and handling empty or null values.

Libraries Used

The primary library used in this tutorial is `xml.etree.ElementTree`, which is part of Python's standard library.


 

   import xml.etree.ElementTree as ET


Sample XML File

Here is a sample XML file we'll be working with:


    <?xml version='1.0' encoding='utf-8'?>

    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"

        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">

        <url>

            <loc>https://neptuneworld.in/blog/rest-graphql-future-api-design</loc>

            <lastmod>2024-02-25</lastmod>

            <changefreq>always</changefreq>

            <priority>1.0</priority>

            <image:image>

                <image:loc>https://neptuneworld.in/media/static/website/blogs/GraphQL.JPG</image:loc>

            </image:image>

        </url>

    </urlset>




Updating Element Values

To update an element's value, you need to locate the element and set its text attribute to the new value.


    tree = ET.parse('sitemap.xml')

    root = tree.getroot()


    # Namespace dictionary

    namespaces = {

        'default': 'http://www.sitemaps.org/schemas/sitemap/0.9',

        'image': 'http://www.google.com/schemas/sitemap-image/1.1'

    }


    # Update the value of the <priority> element

    for priority in root.findall('.//default:priority', namespaces):

        priority.text = '0.8'


    tree.write('updated_sitemap.xml', encoding='utf-8', xml_declaration=True)




Deleting an Element

To delete an element, you need to find its parent first, and then remove the child element.


    tree = ET.parse('sitemap.xml')

    root = tree.getroot()


    # Find the parent element of <image:image> and remove the <image:image> child

    for url in root.findall('default:url', namespaces):

        image_elem = url.find('image:image', namespaces)

        if image_elem is not None:

            url.remove(image_elem)


    tree.write('updated_sitemap.xml', encoding='utf-8', xml_declaration=True)



Handling Empty or Null Values

To update empty or null elements, you can check if the element's text is `None` or an empty string before updating it.


    tree = ET.parse('sitemap.xml')

    root = tree.getroot()


    # Update <lastmod> element if it is empty or null

    for lastmod in root.findall('.//default:lastmod', namespaces):

        if lastmod.text is None or lastmod.text.strip() == '':

            lastmod.text = '2024-06-30'


    tree.write('updated_sitemap.xml', encoding='utf-8', xml_declaration=True)



How to Read Value

To read value from elements, you can check if the element's text is `None` before reading value then read the value else return not found.

    msgid_element = root.find('.//ns:MsgId', ns)

    if msgid_element is not None:

     msgid_value = msgid_element.text

     print(f'MsgId: {msgid_value}')

    else:

     print('MsgId element not found')


Complete Example

Here is a complete example that demonstrates updating an element value, deleting an element, and handling empty or null values.


    import xml.etree.ElementTree as ET


    # Load the XML file

    tree = ET.parse('sitemap.xml')

    root = tree.getroot()


    # Namespace dictionary

    namespaces = {

        'default': 'http://www.sitemaps.org/schemas/sitemap/0.9',

        'image': 'http://www.google.com/schemas/sitemap-image/1.1'

    }


    # Update the value of the <priority> element

    for priority in root.findall('.//default:priority', namespaces):

        priority.text = '0.8'


    # Find the parent element of <image:image> and remove the <image:image> child

    for url in root.findall('default:url', namespaces):

        image_elem = url.find('image:image', namespaces)

        if image_elem is not None:

            url.remove(image_elem)


    # Update <lastmod> element if it is empty or null

    for lastmod in root.findall('.//default:lastmod', namespaces):

        if lastmod.text is None or lastmod.text.strip() == '':

            lastmod.text = '2024-06-30'


    # Save the updated XML file

    tree.write('updated_sitemap.xml', encoding='utf-8', xml_declaration=True)



Conclusion

Handling XML files in Python is straightforward with the `xml.etree.ElementTree` module. You can update element values, delete elements, and handle empty or null values efficiently. This tutorial provides a foundation for manipulating XML files, allowing you to adapt these methods to more complex XML structures as needed.




👉 Read More
How to extract Speech from Video using Python?
How to download video from youtube using python module ?
Deploy Django project on AWS with Apache2 and mod_wsgi module.
Best Python package manager and package for virtual environment ?
Mostly asked Python Interview Questions - 2023.
Core Python Syllabus for Interviews
Python Built-in functions lambda, map, filter, reduce.
How to reverse string in Python ?
Python 3.9 new amazing features ?
10 Proven Ways to Earn Money Through Python
Building a Simple Chatbot with Python and openpyxl
5 Languages that Replace Python with Proof
Monkey Patching in Python: A Powerful Yet Controversial Technique
Best Practices for Managing Requests Library Sessions When Interacting with Multiple APIs ?
How to Ensure Proper Namespace Handling in XML with Python's lxml Library
Explore more Blogs...