XML
Parsing using PHP {Intermediate}
by Jubba
Introduction
This tutorial is a continuation of the previous XML tutorial
I have written. Because I have already written some
background information (very little) I will not add that
into this tutorial in order to save space and time. The
other tutorial can be found
at this
link.
Formatting XML
Ok, since I already went over the basics for formatting XML
data and the basics of PHP/XML parsing I'm just going to
jump right into the XML and PHP without much of an
explanation. For this project I decided to create a
mock-news headline parser. Basically, we have our XML file
that has news headlines and a brief description of the
story. Many of the news-tickers that you see on websites use
a process similar to this (often called
RSS). Now, on to our XML file.
Creating our XML
Just as with the last tutorial this XML file is quite
simple. We have our highest level "news" tags which encase
everything. The next level down is our "story" tags which
split up each different news headline that we have and
contained within that are the "headline" tag and the
"description" tag. See? Simple...
- <?xml version="1.0"?>
- <news>
- <story>
- <headline> Godzilla Attacks LA! </headline>
- <description>Equipped with a Japanese
Mind-control device, the giant monster has attacked
important harbours along the California coast.
President to take action. </description>
- </story>
- <story>
- <headline> Bigfoot Spotted at M.I.T. Dining Area
</headline>
- <description>The beast was seen ordering a
Snapple in the dining area on Tuesday. In a related
story, Kirupa Chinnathambi, an MIT engineering
student has been reported missing. </description>
- </story>
- <story>
- <headline> London Angel Saves England
</headline>
- <description>The "London Angel" known only as
"Kit" has saved the U.K. yet again. Reports have
stated that she destroyed every single Churchill
bobble-head dog in the country. A great heartfilled
thank you goes out to her. </description>
- </story>
- <story>
- <headline> Six-eyed Man to be Wed to an
Eight-armed Woman </headline>
- <description>Uhhhmmm... No comment really...
just a little creepy to see them together...
</description>
- </story>
- <story>
- <headline> Ahmed's Birthday Extravaganza!
</headline>
- <description>The gifted youngster's birthday
party should be a blast. He is turning thirteen and
has requested a large cake, ice cream, and a petting
zoo complete with pony rides. </description>
- </story>
- </news>
Creating our PHP
To make this easy on us, I will post the code I used and
then explain what each line does after.
<?php
$xml_file
= "xml_intermediate.xml";
$xml_headline_key
=
"*NEWS*STORY*HEADLINE";
$xml_description_key
=
"*NEWS*STORY*DESCRIPTION";
$story_array
= array();
$counter
= 0;
class xml_story{
var $headline,
$description;
}
function startTag($parser,
$data){
global $current_tag;
$current_tag
.=
"*$data";
}
function endTag($parser,
$data){
global $current_tag;
$tag_key
= strrpos($current_tag,
'*');
$current_tag
= substr($current_tag,
0,
$tag_key);
}
function contents($parser,
$data){
global $current_tag,
$xml_headline_key,
$xml_description_key,
$counter,
$story_array;
switch($current_tag){
case $xml_headline_key:
$story_array[$counter]
= new xml_story();
$story_array[$counter]->headline
=
$data;
break;
case $xml_description_key:
$story_array[$counter]->description
=
$data;
$counter++;
break;
}
}
$xml_parser
=
xml_parser_create();
xml_set_element_handler($xml_parser,
"startTag",
"endTag");
xml_set_character_data_handler($xml_parser,
"contents");
$fp
= fopen($xml_file,
"r")
or die("Could not open file");
$data
= fread($fp,
filesize($xml_file))
or die("Could not read file");
if(!(xml_parse($xml_parser,
$data,
feof($fp)))){
die("Error on line "
.
xml_get_current_line_number($xml_parser));
}
xml_parser_free($xml_parser);
fclose($fp);
?>
<html>
<head>
<title>CNT HEADLINE NEWS</title>
</head>
<body bgcolor="#FFFFFF">
<?php
for($x=0;$x<count($story_array);$x++){
echo "\t<h2>"
. $story_array[$x]->headline
.
"</h2>\n";
echo "\t\t\n";
echo "\t<i>"
. $story_array[$x]->description
.
"</i>\n";
}
?>
</body>
</html>
This project needs a bit more setup than
before. In addition to our 3 main functions:
- -A function to handle the start tags
- -A function to handle the data between the tags
- -A function to handle the end tags
We need a few more things:
- -Our XML tag keys
- -An array to store information
- -A counter
- -A class
Thats what we have in the following explanations:
$xml_headline_key
=
"*NEWS*STORY*HEADLINE";
$xml_description_key
=
"*NEWS*STORY*DESCRIPTION";
These are our tag keys. They are the different
levels of hierarchical tags in our XML file. Because we
don't actually have any information in the "news" or the
"story" tags we don't have to include them. Our main focus
is on the "headline" and the "description" tags. We will use
these tags later on in the script.
$story_array
= array();
$counter
= 0;
Here we are simply initializing our array and
our counter for later use.
Now we come up upon our 3 major functions for parsing and
formating our data. Same as before they are the "startTag",
"endTag", and "contents" functions. Ultimately they do the
same things as before. They perform their designated actions
when they are called on by the parser. The only change in
this file is that the actions are a bit more complex. In
this tutorial we'll go through each function fully before
moving on to the next. We'll start with "startTag":
function
startTag($parser,
$data){
global $current_tag;
$current_tag
.=
"*$data";
}
When the script hits a start tag it will add
the tag that it is currently reading to the string $current_tag.
There is a "global" in front of
our variable because we will be using the variable in all
three of our functions and in order to use it the way we
want, we need to declare it as a global variable instead of
a local variable.
function
endTag($parser,
$data){
global $current_tag;
$tag_key
= strrpos($current_tag,
'*');
$current_tag
= substr($current_tag,
0,
$tag_key);
}
Again we delcare $current_tag as global
so we can use it just like the "startTag". The variable $tag_key
is used to mark the last occurence of an asterix (*) in the
string $current_tag. Then the string $current_tag
is cut back by one XML tag. The purpose of this function is
to take a step back just as the purpose of "startTag" is to
take a step forward.
function
contents($parser,
$data){
global $current_tag,
$xml_headline_key,
$xml_description_key,
$counter,
$story_array;
switch($current_tag){
case $xml_headline_key:
$story_array[$counter]
= new xml_story();
$story_array[$counter]->headline
=
$data;
break;
case $xml_description_key:
$story_array[$counter]->description
=
$data;
$counter++;
break;
}
}
Our first line of this functions declares all
of our variables: $current_tag, $xml_headline_key,
$xml_description_key, etc... The next line begins our
switch statement. Switch() is basically another method for
if() statements. For more on switch() in PHP visit
php.net. What this switch statement is doing is
comparing the variable $current_tag to the variables
$xml_headline_key and $xml_description_key and
if it finds a match it performs the scripted actions.
If $current_tag matches $xml_headline_key
the script defines $story_array[$counter]
as a new xml_story() object. Then we assign our data to our
new objects "headline" property with this line:
$story_array[$counter]->headline
=
$data;
Then it breaks the switch statement. If the $current_tag
matches $xml_description_key it assigns the data to
the objects "description" property and adds 1 to our
$counter variable, then breaks out of the switch
statement.
Just to clarify, I prefer to use the
object method to keep track of my data a little easier. This
is just what works for me. Other people may have other
methods that work better for them. Its all about you're own
personal preferences. Next up are the XML functions, which
are exactly the same as in the
{Easy} tutorial.
XML functions
For the XML functions, we need to:
- -Create the parser
- -Set the start and end tag handlers
- -Set the data handler
- -Open the XML file
- -Read the XML file
- -Parse the XML data
- -Destroy the parser
- -Close the XML file
Creating the parser is easy:
$xml_parser
=
xml_parser_create();
Setting the start tag, end tag, and data handlers are pretty
easy as well:
xml_set_element_handler($xml_parser,
"startTag",
"endTag");
xml_set_character_data_handler($xml_parser,
"contents");
The first argument for both of these functions is always the
name of the parser we created in the previous step. The next
arguments are the functions we created a little earlier.
Next up is opening and reading the XML file:
$fp
=
fopen($file,
"r");
$data
= fread($fp,
80000);
These are basic file handling functions that you should be
familiar with by now. If you need to learn more or just
refresh your memory you can check out the great tutorials on
php.net.
The following if statement does two
things: 1) it parses through the XML data from the XML file,
and 2) if the parse fails it outputs an error message
complete with line number.
if(!(xml_parse($xml_parser,
$data,
feof($fp)))){
die("Error on line "
.
xml_get_current_line_number($xml_parser));
}
Again the first argument of the function is our parser. The
second argument is the data to be parsed, in this case the
variable $data. The third argument tells the function to
keep parsing until it reaches the end of the file.
The next two lines just wrap up the
script. The first one frees up the memory used by the server
to create the parser and the second closes the XML file.
Both of these lines are very
important so do not forget to include them in
your script. Failure to do so could result in problems with
your server.
xml_parser_free($xml_parser);
fclose($fp);
Wrapping it up
Well thats all fine and good. Now we have our XML parsed and
the data is stored into our objects. All we have left to do
is format it. We can pretty much do whatever we want to do
with it now. I prefer to keep this simple for the example,
so the code is just a simple for() loop that outputs our
information into headers.
<html>
<head>
<title>CNT HEADLINE NEWS</title>
</head>
<body bgcolor="#FFFFFF">
<?
// A simple for loop that
outputs our final data.
for($x=0;$x<count($story_array);$x++){
echo "\t<h2>"
. $story_array[$x]->headline
.
"</h2>\n";
echo "\t\t\n";
echo "\t<i>"
. $story_array[$x]->description
.
"</i>\n";
}
?>
</body>
</html>
Most of this is simple HTML, I'm sure you can figure that
out on your own. The for loop is fairly easy as well,
outputting our data in the format that I have specified. The
"\t" and "\n" are special characters in PHP. They don't show
up in the source, they merely tell the PHP to indent or go
to the next line when printing out the HTML code. If you
aren't exactly sure how to use a for loop more information
can be found at
php.net.
Conclusion
That is pretty much it. There are many ways to accomplish
this and there are many uses for this project. Oh, yeah,
here is what our output looks like:
<html>
<head>
<title>CNT HEADLINE NEWS</title>
</head>
<body bgcolor="#FFFFFF">
<h2>Godzilla Attacks LA!</h2>
<br />
<i>Equipped with a Japanese Mind-control device, the
giant monster has attacked important harbours along the
California coast. President to take action. </i>
<h2>Bigfoot Spoted at M.I.T. Dining Area</h2>
<br />
<i>The beast was seen ordering a Snapple in the dining
area on Tuesday. In a related story, Kirupa Chinnathambi, an
MIT engineering student has been reported missing.</i>
<h2>London Angel Saves England</h2>
<br />
<i>The "London Angel" known only as "Kit" has saved the
U.K. yet again. Reports have stated that she destroyed every
single Churchill bobble-head dog in the country. A great
heartfilled thank you goes out to her.</i>
<h2>Six-eyed Man to be Wed to an Eight-armed Woman</h2>
<br />
<i>Uhhhmmm... No comment really... just a little creepy
to see them together...</i>
<h2>Ahmed's Birthday Extravaganza!</h2>
<br />
<i>The gifted youngster's birthday party should be a
blast. He is turning thirteen and has requested a large
cake, ice cream, and a petting zoo complete with pony
rides.</i>
</body>
</html>
There are a couple things to remember when working
with XML.
- 1. Always free the parser memory
- 2. Always close the file
- 3. Always escape illegal XML characters
- a. <
- b. >
- c. &
- d. '
- e. "
You can download my source files for this tutorial to
look at the commented code
here, and if you have any questions the best place to
ask would be on the forums in the
Server-side Scripting Forum.
|
Jubba |
|