XML Parsing using PHP {Easy}
         by Jubba

Introduction
If you are looking at this tutorial then you probably have a good idea in your head about what XML is and what you can do with it. However, for those of you that stumbled across this tutorial on accident and are interested in learning more, I will briefly explain what XML is and what it is used for. XML stands for eXtensible Markup Language and is used primarily for data storage and organization. It is useful for many things but the main thing I like about it is that there are no predefined tags; the author of the code completely creates the tags as he goes along. A sample xml code could look like this:

<?xml version="1.0"?>
<friends>
<joe/>
<karen/>
<bob/>
</friends>

As you can see the code is easy to read and understand. We can clearly see every tag and easily read them in plain English, or whatever language you are most comfortable with. Unfortunately this tutorial isn't about XML so if you wish to learn more about it then there are some good resources to be found all over the internet. Here are a couple of them:

-XML.com
-w3.org/XML
-XML FAQ

 
Formatting XML

Ok, so you're probably asking, "Now what?" We have our XML code but when you view the XML page in the browser all you see is the XML code! That won't do! We need to find a way to format it. There are actually many ways to do this. I prefer to use PHP to get the job done. And that is what this tutorial is all about; using PHP to parse and format an XML document. If you don't know anything about PHP then you might not want to start here. A good place to start is php.net. They have a great beginner tutorial for dealing with the basics of PHP.

Now, before I continue I want to let everyone know that the following methods are simply the ways in which I prefer to go about getting the job done. They are in no way the only complete and comprehensive ways to do anything, ever, for any reason. If you think you have a better way then by all means, do it your way and if you like, please let me know about your way. With that said...

Creating our XML
As with any project, organization is the key. I'm not very good at being organized so most of my projects take forever to complete. Kind of like this tutorial which I have been trying to write for weeks now and I keep putting it off because of one thing or another. My apartment is a disaster and I should be cleaning that right now, but instead I am typing this up. Oh, well... Organization... yeah that. Well, since this is the easy level tutorial about XML parsing the actual XML I am going to use is super-duper simple:

<?xml version="1.0"?>
<numbers>
<num>1</num>
<num>2</num>
<num>3</num>
<num>4</num>
<num>5</num>
<num>6</num>
<num>7</num>
<num>8</num>
<num>9</num>
<num>10</num>
</numbers>

A simple list of numbers 1-10 with opening and closing tags for each number and then the higher level group tag "numbers" that contains all of the information in the XML document. That's really all for the XML. The majority of this tutorial is about PHP and introducing you to the functions that are necessary for formatting the XML.

Creating our PHP
To make this easy on us, I will post the code I used and then explain what each line does after.

<?php

$file = "xml_beginner.xml";

function
contents($parser, $data){
    echo
$data;
}


function startTag($parser, $data){
    echo
"<b>";
}

function endTag($parser, $data){
    echo
"</b><br />";
}

$xml_parser = xml_parser_create();

xml_set_element_handler($xml_parser, "startTag", "endTag");

xml_set_character_data_handler($xml_parser, "contents");

$fp = fopen($file, "r");

$data = fread($fp, 80000);

if(!(xml_parse($xml_parser, $data, feof($fp)))){
    die(
"Error on line " . xml_get_current_line_number($xml_parser));
}

xml_parser_free($xml_parser);

fclose($fp);

?>

Ok, the basic set up is easy. You need 3 major things:

-A function to handle the start tags
-A function to handle the data between the tags
-A function to handle the end tags

Thats what we have here:

function startTag($parser, $data){
    echo
"<b>";
}

function contents($parser, $data){
    echo $data;
}

function endTag($parser, $data){
    echo
"</b><br />";
}


When the script comes across a start tag it will replace it with a <b>. When the script reads data between the tags it will simply output the data; and when an end tag is read it outputs </b><br /> to close the bold tag and go to the next line.

Now for the XML functions, we need to:

-Create the parser
-Set the start and end tag handlers
-Set the data handler
-Open the XML file
-Read the XML file
-Parse the XML data
-Destroy the parser
-Close the XML file

Creating the parser is easy:

$xml_parser = xml_parser_create();

Setting the start tag, end tag, and data handlers are pretty easy as well:

xml_set_element_handler($xml_parser, "startTag", "endTag");

xml_set_character_data_handler($xml_parser, "contents");


The first argument for both of these functions is always the name of the parser we created in the previous step. The next arguments are the functions we created a little earlier. Next up is opening and reading the XML file:

$fp = fopen($file, "r");

$data = fread($fp, 80000);


These are basic file handling functions that you should be familiar with by now. If you need to learn more or just refresh your memory you can check out the great tutorials on php.net.

The next bit of code is the most complex and the last complex thing about this script. Basically this code does two things: 1) it parses through the XML data from the XML file, and 2) if the parse fails it outputs an error message complete with line number.

if(!(xml_parse($xml_parser, $data, feof($fp)))){
    die(
"Error on line " . xml_get_current_line_number($xml_parser));
}


Again the first argument of the function is our parser. The second argument is the data to be parsed, in this case the variable $data. The third argument tells the function to keep parsing until it reaches the end of the file.

The next two lines just wrap up the script. The first one frees up the memory used by the server to create the parser and the second closes the XML file. Both of these lines are very important so do not forget to include them in your script. Failure to do so could result in problems with your server.

xml_parser_free($xml_parser);

fclose($fp
);
 

Conclusion
Well again, you are probably asking, "Now what?" Well the only thing left to do is save your PHP file and run the script. Make sure that the PHP file and the XML file are in the same directory and that you have permission to read the XML file on the server. The output for this file isn't anything exciting, but if you did it correctly you should get this as your source:

<b>
    <b>1</b><br />
    <b>2</b><br />
    <b>3</b><br />
    <b>4</b><br />
    <b>5</b><br />
    <b>6</b><br />
    <b>7</b><br />
    <b>8</b><br />
    <b>9</b><br />
    <b>10</b><br />
</b><br />


That is pretty much it. There are a couple things to remember when working with XML.

1. Always free the parser memory
2. Always close the file
3. Always escape illegal XML characters
a. <
b. >
c. &
d. '
e. "

You can download my source files for this tutorial to look at the commented code here, and if you have any questions the best place to ask would be on the forums in the Server-side Scripting Forum.

Jubba

 




SUPPORTERS:

kirupa.com's fast and reliable hosting provided by Media Temple.