How to parse XML with PHP5

Feeds are streams of content that people can use to share pieces of information across websites. PHP5’s simpleXML functions dramatically simplify the process of interpreting the feeds into something useful for your web pages.

I recently worked on a widget that displayed a live feed of songs from one of our radio station’s playlist. The system that runs our radio station outputs XML which is then fed to a database. A program turns that data into feeds that get pulled by other programs.
Live feed from our station's playlist

It’s common to use XML to share data between servers that use different technologies. The system that generates the playlist is proprietary, so XML provides an easy format through which this data can be shared.

What is XML?

The eXtensible Markup Language is a way to structure your data for sharing across sites. Some of the technologies that are crucial to the web like RSS (Real Simple Syndication) and Podcasts are special flavors of XML. The beautiful thing about XML is that you can easily roll your own for anything you need.

XML is easy to create because it’s a lot like HTML…except you can make up your own tags. Let’s say, for example that you’re putting together a feed for a list of songs playing at your own radio station. We’ll keep this simple, so we’ll just encode the name of the artist, the title of the song, plus the time when the song was played. We make up a couple of tags called <title> and <artist> and wrap each of them around a <song> tag. We’ll create a dateplayed attribute for each song with the date and time the song was played. You might encode something like that in this manner.

<songs>
    <song dateplayed="2011-07-24 19:40:26">
        <title>I left my heart on Europa</title>
        <artist>Ship of Nomads</artist>
    </song>
    <song dateplayed="2011-07-24 19:27:42">
        <title>Oh Ganymede</title>
        <artist>Beefachanga</artist>
    </song>
    <song dateplayed="2011-07-24 19:23:50">
        <title>Kallichore</title>
        <artist>Jewitt K. Sheppard</artist>
    </song>
</songs>

There’s some rules that you have to adhere to when creating XML data. If you’re familiar with XHTML…you’ll be right at home with some of these, but let’s review them:

  • XML is case sensitive so <Title>` is not the same as <title>.
  • All XML elements must have closing tags.
  • XML requires a root element (the <songs> tag above serves as our root element)
  • Attributes must be quoted
  • Special characters (like & (&amp;) and < (&lt;) and > (&gt;) signs) must be encoded.

XML is a bit more strict than HTML, but is real easy to create and deal with.

Introducing simpleXML

With simpleXML, it’s as easy as reading the XML and then accessing it’s contents through an easy to read object. Assuming we’ve got our XML file above saved as a file called songs.xml in the same folder as our php file, we can read the whole feed into an object with the following code.

<?php
    $mysongs = simplexml_load_file('songs.xml');
?>

That’s it! The file can even be the URL of a feed on the web and not just a file on your hard drive. We now have an object that is a representation of our file. The songs object has been absorbed into the $mysongs variable. If we want to output the name of the first artist in our list we can refer to it like this:

<?php
    $mysongs = simplexml_load_file('songs.xml');
    echo $mysongs->song[0]->artist;
?>

Showing the first artist in our XML document

Notice that our XML tags are mapped as part of the object so we can get to any element simply by typing it’s name. Remember that arrays are 0 indexed in PHP so our first title would be our 0th title. Now, let’s output the third song title.

<?php
    $mysongs = simplexml_load_file('songs.xml');
    echo $mysongs->song[2]->title;
?>

Showing the third song's title

Working with Attributes

In order to get to our dates, we’ll need to know how to access attributes, the notation is slightly different than with tags, but just as easy. As a matter of fact, it works just like accessing an array element. Let’s see how you would take a look at the second song’s date.

<?php
    $mysongs = simplexml_load_file('songs.xml');
    echo $mysongs->song[1]['dateplayed'];
?>

Showing the second song's date played

Making a list of songs

So now that we’ve got the basics of accessing elements, let’s write the code to make a complete list of our songs parsed by interpreting our XML file.

<?php
    $mysongs = simplexml_load_file('songs.xml');
    echo "<ul id="songlist">n";
    foreach ($mysongs as $songinfo):
        $title=$songinfo->title;
        $artist=$songinfo->artist;
        $date=$songinfo['dateplayed'];
        echo "<li><div class="title">",$title,"</div><div class="artist">by ",$artist,"</div><time>",$date,"</time></li>n";
    endforeach;
    echo "</ul>";
?>

We’re using a foreach statement here to go through each song and then parsing the information into a simple HTML list. You can use that in a regular HTML document or use it as a widget to display a list of songs.

Generating a list of songs with simpleXML

Parsing a Flickr feed from a set

There are lots of XML feeds available for you to parse online. We can, for example, get a feed from a Flickr set for inclusion on a website. That way, when you update your Flickr set, the widget will automatically display this on your site. I’ve prepared a special set with some pictures of kittens. In order to get the XML for this feed, we can go to our page on Flickr and look for the XML icon at the bottom left of the screen.

My kittens set on Flickr

We want to examine the feed to determine it’s structure. You can right click on that feed’s link on the flickr page and save it to your hard drive, then rename the file photoset.xml and open it up with your browser.

Reading the XML document

I’ve opened the XML file with Safari, which lets me take a look at the structure (Safari will normally try to push an RSS link to it’s built in reader, so saving it, giving it an XML extension and opening it will let us view the structure). I can see that each photo is stored as an <entry> tag in our XML. Inside this tag, there are two <link> tags. The first one has a link to our image on Flickr. The second one has a link to a medium sized version of our image.

I’m going to modify the code slightly and make an adjustment on the second link to get a small thumbnail of all of our images. Before I do that, I’ll go back to Flickr and right click on the link to the XML file and this time copy it to the clipboard, then adjust the code:

<?php
    $mypix = simplexml_load_file('http://api.flickr.com/services/feeds/photoset.gne?set=72157627229375826&nsid=73845487@N00&lang=en-us');
    foreach ($mypix->entry as $pixinfo):
        $title=$pixinfo->title;
        $link=$pixinfo->link['href'];
        $image=str_replace("_b.jpg","_s.jpg",$pixinfo->link[1]['href']);
        echo "<a href="",$link,""><img src="",$image,"" alt="",$title,"" /></a>n";
    endforeach;
?>

We’re going to use the <title> tag as the alt text of our <img> tag. We’re also going to use the first of the <link> tags as the href attribute of our anchor tag. Getting to our second link will be a bit trickier. We can use the array notation to get to the second link’s href attribute ($pixinfo->link[1]['href']). However, that will give us a much larger picture than we’ll need. I’m using a simple str_replace function to change the end of our link url from _b.jpg, to _s.jpg. That will give us thumbnails, which gives you something like this.

The finished feed of kittens from Flickr

Conclusion

SimpleXML is super easy and even fun to use to parse complicated feeds like the example from Flickr. Notice that it had no problems getting to the second instance of the <link> tag inside the <entry> tag. This would have been really difficult to do with previous versions of PHP. There’s a lot more to simpleXML, which you can read in the PHP manual. If you’re interested in an object oriented approach, PHP5 also provides the SimpleXMLElement class.

If you’d like to learn more PHP, feel free to head over to the Think Vitamin Membership PHP course.

Free Workshops

Watch one of our expert, full-length teaching videos. Choose from either HTML, CSS or Wordpress.

Start learning

Treehouse

Our mission is to bring affordable Technology education to people everywhere, in order to help them achieve their dreams and change the world.

Comments

45 comments on “How to parse XML with PHP5

  1. While SimpleXML works for a lot of use cases you should keep in mind that it keeps a complete document object model (DOM) in memory which can easily kill your application when handling big XML data.

    An alternative that is harder to write but lighter and faster though not always applicable is XMLReader.

    Read about when to use it here:
    http://test.ical.ly/2010/06/29/working-with-php-and-xml-consider-xmlreader-and-xmlwriter/

    And about how to use it here:
    http://test.ical.ly/2011/03/08/simply-iterate-over-xml-with-plain-php-using-little-memory-and-cpu/

    • Hi Caefer,

      Those are some good articles and points on XMLReader. I really like SimpleXML’s ease of use, but you’re definitely right about the memory/performance. If you have an XML file that is extremely large, then you might want to look at an alternative way of parsing your XML. Sometimes, ease of programming is a good thing though.

      • just to clarify: I don’t argue against SimpleXML it’s a working solution and I use it often. It just isn’t a solution for everything.

  2. While SimpleXML works for a lot of use cases you should keep in mind that it keeps a complete document object model (DOM) in memory which can easily kill your application when handling big XML data.

    An alternative that is harder to write but lighter and faster though not always applicable is XMLReader.

    Read about when to use it here:
    http://test.ical.ly/2010/06/29/working-with-php-and-xml-consider-xmlreader-and-xmlwriter/

    And about how to use it here:
    http://test.ical.ly/2011/03/08/simply-iterate-over-xml-with-plain-php-using-little-memory-and-cpu/

    • Hi Caefer,

      Those are some good articles and points on XMLReader. I really like SimpleXML’s ease of use, but you’re definitely right about the memory/performance. If you have an XML file that is extremely large, then you might want to look at an alternative way of parsing your XML. Sometimes, ease of programming is a good thing though.

      • just to clarify: I don’t argue against SimpleXML it’s a working solution and I use it often. It just isn’t a solution for everything.

  3. Hey guys…thanks for your comments. My jaw dropped the first time I read about this method since it’s something I have to deal with often. 

  4. Hey guys…thanks for your comments. My jaw dropped the first time I read about this method since it’s something I have to deal with often. 

  5. Ray, thank you so much! I remember trying to use SimpleXML a few years back (it looks as though it has been updated since then as it involved a lot more than just loading the XML file in one line).

    When I discovered it, I got really excited and then really dejected when it seemed to be almost impossible to use. I then pretty much cast it aside and forgot about it.

    Now I’m really looking forward to using it again.

  6. Ray, thank you so much! I remember trying to use SimpleXML a few years back (it looks as though it has been updated since then as it involved a lot more than just loading the XML file in one line).

    When I discovered it, I got really excited and then really dejected when it seemed to be almost impossible to use. I then pretty much cast it aside and forgot about it.

    Now I’m really looking forward to using it again.

  7. Hi Ray,

    I’m much interested in how to generate xml from php data structure.i.e. array

    will you recommendate any existing php library to build xml ?

    here data structure is not flat. I mean, array could be 1-d or n-dimensional.

    Thanks
    Sachin

    • If rolling my own generator proved too cumbersome, I would probably go with the XMLWriter functions then. http://www.php.net/manual/en/ref.xmlwriter.php. Caefer below has some good articles below on that subject.

  8. Nice article.  These two tips should help out people as well:

    1)  If you come across a XML feed with many nested elements xpath can come in handy because it goes straight to the part of the structure that you care about:

    foreach ($xml->xpath(‘//player’) as $item) {
    echo $item->firstname;
    }
    http://php.net/manual/en/simplexmlelement.xpath.php

    2) simplexml does not use gzip compression when fetching.  Don’t waste your own bandwidth and the owners.  So use cURL with gzip compression if the file is not local, then pass it to simpleXML.

  9. Nice article.  These two tips should help out people as well:

    1)  If you come across a XML feed with many nested elements xpath can come in handy because it goes straight to the part of the structure that you care about:

    foreach ($xml->xpath(‘//player’) as $item) {
    echo $item->firstname;
    }
    http://php.net/manual/en/simplexmlelement.xpath.php

    2) simplexml does not use gzip compression when fetching.  Don’t waste your own bandwidth and the owners.  So use cURL with gzip compression if the file is not local, then pass it to simpleXML.

    • Thanks Brant. Xpath is a great way to search through a very long XML document, so this is not the best way to do things, still, it’s one of the easiest…especially for very short feeds.

  10. Hi, anybody can hepl me? What if I want to get the content of the tag . What I must to do for get this info. Thank you.

  11. Hi, anybody can help me? What if I want to get the content of the tag
    . What I must to do for get this info. Thank
    you.

  12. Problem is when the file gets big, simple xml uses a lot of memory because it stores the whole file there. It is very inefficient, why would I want to store the whole file if I want one bit?

  13. Awesome! Thanks for putting this together. Really helped out with a project I’m currently working at!