|
|
CXXVII. XML パーサ関数
XML (eXtensible Markup Language) は、Web における構造化された
ドキュメント交換用のデータフォーマットです。XML は、World Wide
Web consortium (W3C) で規定された規格です。XML に関する情報およ
び関連する技術は、
で参照することができます。
このPHP拡張機能は、James Clark氏の
expatのサポートをPHPに付加します。
このツールキットは、XML ドキュメントの構文解析をしますが、
検証は行いません。3種類のソース
文字エンコーディング、
US-ASCII,
ISO-8859-1 ,UTF-8
がPHPでサポートされます。UTF-16 はサポートさ
れません。
この拡張モジュールは、XML パーサの作成
を行い、異なった XML イベントに関してハンドラ
を定義します。各XMLパーサーには、設定可能な小数の
パラメータ
もあります。
この拡張機能は、expat を使用します。こ
れは、にあります。
expatに付属のMakefileは、デフォルトでライブラリを構築しません。こ
れを行うmakeルールを次のように指定できます。
libexpat.a: $(OBJS)
ar -rc $@ $(OBJS)
ranlib $@ |
expat のソース RPM パッケージが
にあります。
付属しているexpatライブラリを用いて以下の関数はデフォルトで有効となっ
ています。
--disable-xmlを指定してXMLサポート
を無効にすることができます。Apache 1.3.9以降でモジュールとしてPHPを
コンパイルする場合、PHPは、Apacheから自動的に付属する
expatライブラリを使用します。
付属するexpatライブラリを使用したくない場合には、
--with-expat-dir=DIRを指定してPHP
のconfigureを実行して下さい。ただし、DIRは、expatをインストールした
ベースディレクトリです。
Windows版のPHPには
この拡張モジュールのサポートが組み込まれています。これらの関数を使用
するために拡張モジュールを追加でロードする必要はありません。 この拡張モジュールは設定ディレクティブを全く定義しません。 この拡張モジュールはリソース型を全く定義しません。
これらの定数は、この拡張モジュールで定義されており、
この拡張モジュールがPHP内部にコンパイルされているか実行時に動的にロー
ドされるかのどちらかの場合のみ使用可能です。
- XML_ERROR_NONE
(integer)
- XML_ERROR_NO_MEMORY
(integer)
- XML_ERROR_SYNTAX
(integer)
- XML_ERROR_NO_ELEMENTS
(integer)
- XML_ERROR_INVALID_TOKEN
(integer)
- XML_ERROR_UNCLOSED_TOKEN
(integer)
- XML_ERROR_PARTIAL_CHAR
(integer)
- XML_ERROR_TAG_MISMATCH
(integer)
- XML_ERROR_DUPLICATE_ATTRIBUTE
(integer)
- XML_ERROR_JUNK_AFTER_DOC_ELEMENT
(integer)
- XML_ERROR_PARAM_ENTITY_REF
(integer)
- XML_ERROR_UNDEFINED_ENTITY
(integer)
- XML_ERROR_RECURSIVE_ENTITY_REF
(integer)
- XML_ERROR_ASYNC_ENTITY
(integer)
- XML_ERROR_BAD_CHAR_REF
(integer)
- XML_ERROR_BINARY_ENTITY_REF
(integer)
- XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF
(integer)
- XML_ERROR_MISPLACED_XML_PI
(integer)
- XML_ERROR_UNKNOWN_ENCODING
(integer)
- XML_ERROR_INCORRECT_ENCODING
(integer)
- XML_ERROR_UNCLOSED_CDATA_SECTION
(integer)
- XML_ERROR_EXTERNAL_ENTITY_HANDLING
(integer)
- XML_OPTION_CASE_FOLDING
(integer)
- XML_OPTION_TARGET_ENCODING
(integer)
- XML_OPTION_SKIP_TAGSTART
(integer)
- XML_OPTION_SKIP_WHITE
(integer)
XML イベントハンドラは次のように定義されます。
要素ハンドラ関数は、その要素に大文字小文字を変換する
(case-folded)の名前をつけることができます。
大文字変換(case-folding) は、XML標準により "大文字でないものは等
価な大文字に置換される一連の文字に適用されるプロセス" として定義
されています。言い替えると、XML に関しては単に大文字変換は大文字
にすることを意味します。
デフォルトで、ハンドラ関数に渡される全ての要素名は、大文字変換さ
れます。この動作は、xml_parser_get_option()
およびxml_parser_set_option() 関数でXMLパーサー
毎にそれぞれ問い合わせ、制御することが可能です。
(xml_parse() により返されるものとして)
XMLエラーコードとして次のような定数が定義されています。:
XML_ERROR_NONE | XML_ERROR_NO_MEMORY | XML_ERROR_SYNTAX | XML_ERROR_NO_ELEMENTS | XML_ERROR_INVALID_TOKEN | XML_ERROR_UNCLOSED_TOKEN | XML_ERROR_PARTIAL_CHAR | XML_ERROR_TAG_MISMATCH | XML_ERROR_DUPLICATE_ATTRIBUTE | XML_ERROR_JUNK_AFTER_DOC_ELEMENT | XML_ERROR_PARAM_ENTITY_REF | XML_ERROR_UNDEFINED_ENTITY | XML_ERROR_RECURSIVE_ENTITY_REF | XML_ERROR_ASYNC_ENTITY | XML_ERROR_BAD_CHAR_REF | XML_ERROR_BINARY_ENTITY_REF | XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF | XML_ERROR_MISPLACED_XML_PI | XML_ERROR_UNKNOWN_ENCODING | XML_ERROR_INCORRECT_ENCODING | XML_ERROR_UNCLOSED_CDATA_SECTION | XML_ERROR_EXTERNAL_ENTITY_HANDLING |
PHPのXML拡張機能は、異なった文字エンコーディング
を通じて
文字セットをサポートします。ソースエンコーディング
およびターゲットエンコーディング
という2種類の文字エンコーディングがあります。
PHP におけるドキュメントの内部表現は、常に
UTF-8でエンコードされます。
ソースエンコーディングは、XMLドキュメントが
構文解析された際に行わ
れます。XML パーサの
作成を行う際に、ソースエンコードを指定することができます。
(このエンコーディングは、その XML パーサーが存在する間、後で変更す
ることはできません)サポートされるソースエンコーディングは、
ISO-8859-1, US-ASCII ,
UTF-8 です。前の二つは、シングルバイトエンコー
ディングです。これは、各文字がシングルバイトで表現されることを意
味します。UTF-8 は、1から4バイトの可変ビット
数(最大21ビット)で構成された文字をエンコードすることが可能です。
PHP で用いられるデフォルトのソースエンコーディングは、
ISO-8859-1です。
ターゲットエンコーディングは、PHPがデータをXMLハンドラ関数に
渡す時に行われます。あるXMLパーサが作成された際、ターゲットエン
コーディングは、ソースエンコーディングと同様に設定されます。
しかし、これは、いつでも変更可能です。ターゲットエンコーディング
は、タグ名と同様に文字データに作用し、命令を処理します。
XML パーサがソースエンコーディングが表現できる範囲の外側の文字に
出会った場合、エラーが返されます。
解釈するXMLドキュメントにおいてPHPが文字に出会った際に、選択した
ターゲットエンコーディングで表現できない文字に出会った場合、問題
の文字は "降格" されます。現在、このことはこのような文字が疑問符
で置換されることを意味します。
以下にXMLドキュメントを処理するPHPスクリプトの例をいくつか示しま
す。
この最初の例は、あるドキュメント中のstart エレメントの構造をイン
デントを付けて表示します。
例 1. XML エレメント構造を表示
$file = "data.xml";
$depth = array();
function startElement($parser, $name, $attrs){
global $depth;
for ($i = 0; $i < $depth[$parser]; $i++) {
print " ";
}
print "$name\n";
$depth[$parser]++;
}
function endElement($parser, $name){
global $depth;
$depth[$parser]--;
}
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
|
|
例 2. XMLをHTMLにマップする
この例は、XMLドキュメントのタグを直接HTMLタグにマップします。
"map array" にないエレメントは無視されます。もちろん、この例は、
特定の XML ドキュメント型を有する場合のみ動作します。
$file = "data.xml";
$map_array = array(
"BOLD" => "B",
"EMPHASIS" => "I",
"LITERAL" => "TT"
);
function startElement($parser, $name, $attrs){
global $map_array;
if ($htmltag = $map_array[$name]) {
print "<$htmltag>";
}
}
function endElement($parser, $name){
global $map_array;
if ($htmltag = $map_array[$name]) {
print "</$htmltag>";
}
}
function characterData($parser, $data){
print $data;
}
$xml_parser = xml_parser_create();
// $map_array の中のタグをみつけられるように大文字変換を行う
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
|
|
この例は、XML コードに焦点を当てます。この例は、他のドキュメント
をインクルードし処理するための外部エンティティリファレンスのハン
ドラの使用法およびPIの処理方法、PIが含むコードに関する"信頼度"
を定義する手段を説明します。
この例で使用される XML ドキュメントは、例題ファイル
(xmltest.xml および
xmltest2.xml) にあります。
例 3. 外部エンティティの例
$file = "xmltest.xml";
function trustedFile($file){
// 自己所有のローカルファイルのみを信頼する
if (!eregi("^([a-z]+)://", $file)
&& fileowner($file) == getmyuid()) {
return true;
}
return false;
}
function startElement($parser, $name, $attribs){
print "<<font color=\"#0000cc\">$name</font>";
if (sizeof($attribs)) {
while (list($k, $v) = each($attribs)) {
print " <font color=\"#009900\">$k</font>=\"<font
color=\"#990000\">$v</font>\"";
}
}
print ">";
}
function endElement($parser, $name){
print "</<font color=\"#0000cc\">$name</font>>";
}
function characterData($parser, $data){
print "<b>$data</b>";
}
function PIHandler($parser, $target, $data){
switch (strtolower($target)) {
case "php":
global $parser_file;
// 処理されるドキュメントが "信頼されている" 場合、
// PHP コードをその内部で実行します。
// そうでない場合、そのコードが代わりに表示されます。
if (trustedFile($parser_file[$parser])) {
eval($data);
} else {
printf("Untrusted PHP code: <i>%s</i>",
htmlspecialchars($data));
}
break;
}
}
function defaultHandler($parser, $data){
if (substr($data, 0, 1) == "&" && substr($data, -1, 1) == ";") {
printf('<font color="#aa00aa">%s</font>',
htmlspecialchars($data));
} else {
printf('<font size="-1">%s</font>',
htmlspecialchars($data));
}
}
function externalEntityRefHandler($parser, $openEntityNames, $base, $systemId,
$publicId){
if ($systemId) {
if (!list($parser, $fp) = new_xml_parser($systemId)) {
printf("Could not open entity %s at %s\n", $openEntityNames,
$systemId);
return false;
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($parser, $data, feof($fp))) {
printf("XML error: %s at line %d while parsing entity %s\n",
xml_error_string(xml_get_error_code($parser)),
xml_get_current_line_number($parser), $openEntityNames);
xml_parser_free($parser);
return false;
}
}
xml_parser_free($parser);
return true;
}
return false;
}
function new_xml_parser($file) {
global $parser_file;
$xml_parser = xml_parser_create();
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, 1);
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
xml_set_processing_instruction_handler($xml_parser, "PIHandler");
xml_set_default_handler($xml_parser, "defaultHandler");
xml_set_external_entity_ref_handler($xml_parser, "externalEntityRefHandler");
if (!($fp = @fopen($file, "r"))) {
return false;
}
if (!is_array($parser_file)) {
settype($parser_file, "array");
}
$parser_file[$xml_parser] = $file;
return array($xml_parser, $fp);
}
if (!(list($xml_parser, $fp) = new_xml_parser($file))) {
die("could not open XML input");
}
print "<pre>";
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d\n",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
print "</pre>";
print "parse complete\n";
xml_parser_free($xml_parser);
?>
|
|
例 4. xmltest.xml <?xml version='1.0'?>
<!DOCTYPE chapter SYSTEM "/just/a/test.dtd" [
<!ENTITY plainEntity "FOO entity">
<!ENTITY systemEntity SYSTEM "xmltest2.xml">
]>
<chapter>
<TITLE>Title &plainEntity;</TITLE>
<para>
<informaltable>
<tgroup cols="3">
<tbody>
<row><entry>a1</entry><entry morerows="1">b1</entry><entry>c1</entry></row>
<row><entry>a2</entry><entry>c2</entry></row>
<row><entry>a3</entry><entry>b3</entry><entry>c3</entry></row>
</tbody>
</tgroup>
</informaltable>
</para>
&systemEntity;
<section id="about">
<title>About this Document</title>
<para>
<!-- this is a comment -->
<?php print 'Hi! This is PHP version '.phpversion(); ?>
</para>
</section>
</chapter> |
|
This file is included from xmltest.xml:
例 5. xmltest2.xml <?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY testEnt "test entity">
]>
<foo>
<element attrib="value"/>
&testEnt;
<?php print "This is some more PHP code being executed."; ?>
</foo> |
|
john at etechdata dot com dot au
08-Mar-2005 10:42
This code uses CURL to connect to a server and post XML info to it and then capture the response from the server. I spent hours trying to find the solution and it's really quite simple. Easy to say AFTER you find the answer. Hope it helps someone.
<?php
$XPost = "<?xml version='1.0' encoding='UTF-8'?>";
$XPost .= "<XMLCodeBody>";
$XPost .= "<MessageInfo>";
$XPost .= "<messageID>8af793f9af34bea0ecd7eff71c94d6</messageID>";
$XPost .= "<messageTimestamp>20040710050758444000+600</messageTimestamp>";
$XPost .= "<timeoutValue>60</timeoutValue>";
$XPost .= "<apiVersion>spxml-3.0</apiVersion>";
$XPost .= "</MessageInfo>";
$XPost .= "</XMLCodeBody>";
$url = "https://www.urltopost.data.to"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,$url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_HEADER, 1); curl_setopt($ch, CURLOPT_TIMEOUT, 4); curl_setopt($ch, CURLOPT_POSTFIELDS, $XPost); $result = curl_exec($ch); echo "<pre>";
print_r($result);
echo "</pre>";
?>
pille at hbr1 dot com
05-Mar-2005 01:52
A simple xml parser +regenerator class without handling attributes:
<?
class XMLSimpleParser {
function XMLSimpleParser($data,$encoding='') {
$this->document = new stdClass();
$this->current =& $this->document;
$xml_parser = xml_parser_create($encoding);
xml_set_object($xml_parser, &$this);
xml_set_element_handler($xml_parser, 'startElement', 'endElement');
xml_set_character_data_handler($xml_parser, 'characterData');
xml_parse($xml_parser, $data, true);
$this->encoding = xml_parser_get_option( $xml_parser, XML_OPTION_TARGET_ENCODING );
xml_parser_free($xml_parser);
unset( $this->document->_ITEMS );
unset( $this->current );
}
function startElement($parser, $tag, $attributeList) {
if( is_object( $this->current->$tag ) ) {
$obj = $this->current->$tag;
$this->current->$tag = array();
array_push( $this->current->$tag, $obj );
}
if( is_array( $this->current->$tag ) ) {
$obj =& new stdClass;
$obj->_PARENT =& $this->current;
$obj->_ITEMS = 0;
array_push( $this->current->$tag, &$obj );
$this->current =& $obj;
}
else {
$this->current->$tag->_PARENT =& $this->current;
$this->current =& $this->current->$tag;
$this->current->_ITEMS = 0;
}
$this->current->_PARENT->_ITEMS ++;
}
function endElement($parser, $tag) {
$parent =& $this->current->_PARENT;
if( $this->current->_DATA != '' || $this->current->_ITEMS == 0 ) {
$this->current = $this->current->_DATA;
}
else {
unset( $this->current->_PARENT );
unset( $this->current->_ITEMS );
unset( $this->current->_DATA );
}
$this->current =& $parent;
}
function characterData($parser, $data) {
$this->current->_DATA = trim( $data );
}
function generateXML( $encoding = '' ) {
if( ! $encoding ) $encoding = $this->encoding;
$this->xml = '<?xml version="1.0"';
if( $encoding ) $this->xml .= ' encoding="' . $encoding . '"';
$this->xml .= "?>\n";
$this->xml .= $this->_generateXML( $this->document, 0 );
return $this->xml;
}
function _generateXML( $item, $level ) {
$xml = '';
if( is_object( $item ) ) {
$vars = get_object_vars( $item );
foreach( $vars as $key => $val ) {
if( is_array( $val ) ) {
foreach( $val as $entry ) {
for( $i = 0; $i < $level; $i ++ )
$xml .= "\t";
$xml .= '<' . $key;
if( $xml2 = $this->_generateXML( $entry, $level + 1 ) ) {
$xml .= ">\n" . $xml2;
for( $i = 0; $i < $level; $i ++ )
$xml .= "\t";
$xml .= '</' . $key . '>' . "\n";
}
else {
$xml .= " />\n";
}
}
}
else if( is_object( $val ) ) {
for( $i = 0; $i < $level; $i ++ )
$xml .= "\t";
$xml .= '<' . $key;
if( $xml2 = $this->_generateXML( $val, $level + 1 ) ) {
$xml .= ">\n" . $xml2;
for( $i = 0; $i < $level; $i ++ )
$xml .= "\t";
$xml .= '</' . $key . ">\n";
}
else {
$xml .= " />\n";
}
}
else {
for( $i = 0; $i < $level; $i ++ )
$xml .= "\t";
$xml .=
'<' . $key . '>' .
$val .
'</' . $key . ">\n";
}
}
}
return $xml;
}
}
?>
php at NOSPAM dot stratos-online dot nl
09-Feb-2005 08:48
the XML_OPTION_SKIP_WHITE din't work for me.
Or i don't fully understand what it is supposed to do.
either way, if you want to get rid of all access white space, new lines and tabs in your formatted XML, the following code snippet might help.
<?php
$buffer = preg_replace('/\>(\n|\r|\r\n| |\t)*\</','><',$buffer);
?>
i'm sure it isn't effecient, but atleast it works. (for me)
however when you would be parsing cdata with sgml style tags in them. (< >) I'm sure it will horribly mess it up.
compu_global_hyper_mega_net_2 at yahoo dot com
19-Sep-2004 08:35
The documentation regarding white space was never complete I think.
The XML_OPTION_SKIP_WHITE doesn't appear to do anything. I want to preserve the newlines in a cdata section. Setting XML_OPTION_SKIP_WHITE to 0 or false doesn't appear to help. My character_data_handler is getting called once for each line. This obviously should be reflected in the documentation as well. When/how often does the handler get called exactly? Having to build separate test cases is very time consuming.
Inserting newlines myself in my cdata handler is no good either. For non actual CDATA sections that cause my handler to get called, long lines are split up in multiple calls. My handler would not be able to tell the difference whether or not the subsequent calls would be due to the fact that the data is coming from the next line or the fact that some internal buffer is long enough for it to 'flush' out and call the handler.
This behaviour also needs to be properly documented.
andrewcare at execulink dot com
01-Jul-2004 11:24
I've been working on a similiar tree-based generator (although somewhat simpler), and I thought that it might be helpful to a developer just starting out:
Simplified source:
<?
$file = ;
$elements = $stack = array();
$count = $depth = 0;
class element{
var $name = '';
var $attributes = array();
var $data = '';
var $depth = 0;
}
function start_element_handler($parser, $name, $attribs){
global $elements, $stack, $count, $depth;
$id = $count;
$element = new element;
$elements[$id] = $element;
$elements[$id]->name = $name;
while(list($key, $value) = each($attribs))
$elements[$id]->attributes[$key] = $value;
$elements[$id]->depth = $depth;
array_push($stack, $id);
$count++;
$depth++;
}
function end_element_handler($parser, $name){
global $stack, $depth;
array_pop($stack);
$depth--;
}
function character_data_handler($parser, $data){
global $elements, $stack;
$elements[$stack[count($stack)-1]]->data .= $data;
}
$xml_parser = xml_parser_create('');
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, 0);
xml_set_element_handler($xml_parser, "start_element_handler", "end_element_handler");
xml_set_character_data_handler($xml_parser, "character_data_handler");
if(!file_exists($file))
die("\n<p>\"$file\" does not exist.</p>\n</body>\n</html>");
if(!($handle = fopen($file, "r")))
die("<p>Cannot open \"$file\".</p>\n</body>\n</html>");
while($contents = fread($handle, 4096))
xml_parse($xml_parser, $contents, feof($handle));
fclose($handle);
xml_parser_free($xml_parser);
echo "<hr />\n";
$depth = $offset = 0;
while(list($key_a) = each($elements)){
$depth--;
$offset = 0;
if($elements[$key_a]->depth < $depth){
while($elements[$key_a]->depth != (($elements[$key_a - $offset]->depth) - 1) || $offset == 0){
$offset++;
if($elements[$key_a]->depth == (($elements[$key_a - $offset]->depth) - 1))
echo "<dl>\n<dt><strong>Element Closed:</strong></dt>\n<dd>" . $elements[$key_a - $offset]->name . "</dd>\n</dl>\n<hr />\n";
}
$depth--;
}
if($elements[$key_a]->depth == $depth && $depth != 0){
while($elements[$key_a]->depth != $elements[$key_a - $offset]->depth || $offset == 0){
$offset++;
if($elements[$key_a]->depth == $elements[$key_a - $offset]->depth)
echo "<dl>\n<dt><strong>Element Closed:</strong></dt>\n<dd>" . $elements[$key_a - $offset]->name . "</dd>\n</dl>\n<hr />\n";
}
$depth--;
}
$depth++;
echo "<dl>\n<dt><strong>Element:</strong></dt>\n<dd>" . $elements[$key_a]->name . "</dd>\n</dl>\n";
echo "<dl>\n<dt><strong>Attributes:</strong></dt>\n";
if(empty($elements[$key_a]->attributes))
echo "<dd>No attributes specified</dd>\n";
else{
while(list($key_b, $value) = each($elements[$key_a]->attributes))
echo "<dd>$key_b=\"$value\"</dd>\n";
}
echo "</dl>\n<dl>\n<dt><strong>Data:</strong></dt>\n";
if(trim($elements[$key_a]->data) == '')
echo "<dd>No data specified</dd>\n";
else
echo "<dd>" . $elements[$key_a]->data . "</dd>\n";
echo "</dl>\n<dl>\n<dt><strong>Depth:</strong></dt>\n<dd>" . $elements[$key_a]->depth . "</dd>\n</dl>\n<hr />\n";
$depth++;
}
$depth--;
for($i = $depth; $i >= 0; $i--){
$offset = 0;
$count = count($elements) - 1;
for($j = 0; $j <= $count; $j++){
if($elements[$count - $j]->depth == $depth){
echo "<dl>\n<dt><strong>Element Closed:</strong></dt>\n<dd>" . $elements[$count - $j]->name . "</dd>\n</dl>\n<hr />\n";
break;
}
}
$depth--;
}
?>
A few good tutorials on the subject of parsing XML with PHP:
talraith at withouthonor dot com
29-Jun-2004 01:11
If you are looking for some heavy duty code to parse or create XML documents, then may I suggest taking a look at a class module I am working on. The module is complete except for support of namespaces and XPath.
The class takes a string of XML code and creates a TRUE object tree. Likewise, you can create a tree in your code and generate an XML document. There are no eval() statements used at all unlike some of the other examples shown here.
I posted this a while ago, but it has since been buried by a number of posts and I believe it to be beneficial to anyone looking to use XML / PHP to see this information.
for the source code. Sample usage can be found in my post below.
torsten at jserver dot de
07-Jun-2004 10:43
I expanded the function below a little bit, cause I wasn't really happy with the array created. This version creates an array, which has the same structure as the XML-Tree
<?php
$XML_LIST_ELEMENTS = array( "concert", "song" );
function makeXMLTree($file)
{
$open_file = fopen($file, "r");
$data = "";
while ($r=fread($open_file,8192) ) {
$data .= $r;
}
$parser = xml_parser_create();
xml_parser_set_option($parser,XML_OPTION_CASE_FOLDING,0);
xml_parser_set_option($parser,XML_OPTION_SKIP_WHITE,1);
xml_parse_into_struct($parser,$data,$values,$tags);
xml_parser_free($parser);
$hash_stack = array();
$ret = array();
foreach ($values as $key => $val) {
switch ($val['type']) {
case 'open':
array_push($hash_stack, $val['tag']);
if (isset($val['attributes']))
$ret = composeArray($ret, $hash_stack, $val['attributes']);
else
$ret = composeArray($ret, $hash_stack);
break;
case 'close':
array_pop($hash_stack);
break;
case 'complete':
array_push($hash_stack, $val['tag']);
$ret = composeArray($ret, $hash_stack, $val['value']);
array_pop($hash_stack);
if (isset($val['attributes']))
{
while(list($a_k,$a_v) = each($val['attributes']))
{
$hash_stack[] = $val['tag']."_attribute_".$a_k;
$ret = composeArray($ret, $hash_stack, $a_v);
array_pop($hash_stack);
}
}
break;
}
}
return $ret;
}
function &composeArray($array, $elements, $value=array())
{
global $XML_LIST_ELEMENTS;
$element = array_shift($elements);
if (in_array($element,$XML_LIST_ELEMENTS))
{
if(sizeof($elements) > 0)
{
$array[$element][sizeof($array[$element])-1] = &composeArray($array[$element][sizeof($array[$element])-1], $elements, $value);
}
else {
$array[$element][sizeof($array[$element])] = $value;
}
}
else
{
if(sizeof($elements) > 0)
{
$array[$element] = &composeArray($array[$element], $elements, $value);
}
else
{
$array[$element] = $value;
}
}
return $array;
}
echo "<pre>";
$res = makeXMLTree($xml_file);
var_dump($res);
echo "</pre>";
?>
juliano at setor4 dot com
25-May-2004 04:56
An update of rcotta at ig dot com dot br. The function below will not overwrite an existent element.
<?
function makeXMLTree($file) {
$open_file = fopen($file, "r");
$data = fread($open_file, filesize($file));
$ret = array();
$parser = xml_parser_create();
xml_parser_set_option($parser,XML_OPTION_CASE_FOLDING,0);
xml_parser_set_option($parser,XML_OPTION_SKIP_WHITE,1);
xml_parse_into_struct($parser,$data,$values,$tags);
xml_parser_free($parser);
$hash_stack = array();
$a=0;
foreach ($values as $key => $val) {
switch ($val['type']) {
case 'open':
array_push($hash_stack, $val['tag']);
break;
case 'close':
array_pop($hash_stack);
break;
case 'complete':
array_push($hash_stack, $val['tag']);
eval("
\$ret[\$a][" . implode($hash_stack, "][") . "] = '{$val[value]}';
\$a++;");
array_pop($hash_stack);
break;
}
}
return $ret;
}
$res = makeXMLTree($xml_file);
print_r($res);
?>
johnt at divector dot net
12-May-2004 08:40
When I first read this documentation, and tried the examples none of which seemed to work. The contributed ones in the notes were a bit "long winded" for a simple example to demonstrate a working example of the required functions. So I came up with this, while not perfect, it does execute and give an idea of the functions and Call back functions to create an XML Parser.
<?PHP
$file = "xmldata.xml";
$feed = array();
$key = "";
$info = "";
$in_HEAD = false;
function startElement($xml_parser, $name, $attrs ) {
global $feed, $key, $in_HEAD;
$key = $name;
if( $name == "HEAD" )
$in_HEAD = true; }
function endElement($xml_parser, $name) {
global $feed, $key, $info, $in_HEAD;
if( $name == "HEAD" )
$in_HEAD = false;
if($in_HEAD==false)
$key = $name;
elseif( $in_HEAD )
$key = "HEAD_".$name;
$feed[$key] = $info;
$info = ""; }
function charData($xml_parser, $data ) {
global $info;
$info .= $data; }
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "charData" );
$fp = fopen($file, "r");
while ($data = fread($fp, 8192))
!xml_parse($xml_parser, $data, feof($fp));
xml_parser_free($xml_parser);
echo "<HTML>\n";
echo "<HEAD>\n";
echo "<TITLE>".$feed['HEAD_TITLE']."</TITLE>\n";
echo "</HEAD>\n";
echo "<BODY>\n";
echo "<CENTER><H1>".$feed['HEAD_TITLE']."</H1></CENTER>\n";
echo "<HR>\n";
foreach( $feed as $assoc_index => $value )
{
echo "\$assoc_index = $assoc_index<BR> \$value = $value<BR><BR>\n";
}
echo "</BODY>\n";
echo "</HTML>\n";
?>
<?xml version="1.0" encoding="UTF-8"?>
<XML>
<HEAD>
<TITLE>XML Data Demo</TITLE>
<DESCRIPTION>XML Data Demo for testing XML parsers. A Simple demo for demonstrating the PHP Call Back functions.</DESCRIPTION>
</HEAD>
<FUNCLIST>The functions necessary for Parser Creation are: xml_parser_create(); xml_set_element_handler($xml_parser, "startElement", "endElement");xml_set_character_data_handler($xml_parser, "charData" );</FUNCLIST>
<RECAP>The Really neat thing here is allowing the programmer complete control over these call back functions to parse virtually any XML file. In my opinion, an extra variable in the Call Back functions allowing an array to be passed would be better, this would keep globals from being used.</RECAP>
</XML>
moc.oohay@mijito
06-May-2004 05:21
I found a type-o in the XMLTag->addChild function. I re-examined the code and changed the function so it is a little cleaner.
Also, as an interesting side-note. I ran the script on a 1 MB XML file. The php.exe memory usage exceeded 50 MB during runtime. I did a print_r($XML_data) dumping into a plain-text file which resulted in a 70 MB text file. However, after I removed all the [spaces] used for formatting and readability the file size was reduced to 5 MB.
This script may not be efficient for very large data sets. ;)
I am very pleased that the script parsed the file without error. A very successful "real world" test.
Repaired addChild:
function addChild($XMLTag_obj) {
$key = $XMLTag_obj->name;
// If this tag *name* is not already a child initialize it.
if ( !isset($this->children[$key][0]) ) {
$this->children[$key][0] = 0;
}
// Get the next array index. This is the next available location to store the child
$index = $this->children[$key][0] + 1;
// Add the child and update the tag count
$this->children[$key][$index] = $XMLTag_obj;
$this->children[$key][0]++;
// Return the Child Tag
return $this->children[$key][$index];
}
otijim at AT at yahoo dot dot dot com
03-May-2004 07:47
After going through all the examples of XML to data structure examples posted here and having problems will all of them I came up with my own. It's not thoroughly tested but works very well for me.
This example will not give any 'depricated pass by reference' errors and returns false on mal-formed XML.
Example of using the Classes:
<?
$myXMLParser = new XMLStructParser;
$cds_XMLTag = $myXMLParser->parse('<cd_list><cd title="Best of PBS"><track number="1">Sesame Street Theme</track></cd></cd_list>');
print $bookXMLTag->children['cd_list'][1]->children['track'][1]->cdata;
?>
Here are the two class files:
XMLStructParse.phpclass
<?
require_once("XMLTag.phpclass");
class XMLStructParser {
var $index; var $obj_data; var $stack; function XMLStructParser() {
}
function parse($data) {
$this->index = 0;
$this->obj_data = new XMLTag("XML");
$this->stack[$this->index] = &$this->obj_data;
$xml_parser = xml_parser_create();
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, false);
xml_set_object($xml_parser, $this);
xml_set_element_handler($xml_parser, "tag_open", "tag_close");
xml_set_character_data_handler($xml_parser, "cdata");
$parse_results = xml_parse($xml_parser, $data);
xml_parser_free($xml_parser);
if ($parse_results) {
return $this->obj_data;
} else {
return false;
}
}
function tag_open($parser, $tag, $attributes) {
$theTag = new XMLTag($tag);
$theTag->addAttributes($attributes);
$childTag = &$this->stack[$this->index]->addChild($theTag);
$this->index++;
$this->stack[$this->index] = &$childTag;
}
function cdata($parser, $cdata) {
$this->stack[$this->index]->cdata = $cdata;
}
function tag_close($parser, $tag) {
$this->index--;
}
}
?>
XMLTag.phpclass
<?
class XMLTag {
var $name; var $cdata; var $children; function XMLTag($name, $cdata="") {
$this->name = $name;
$this->cdata = $cdata;
}
function addAttributes($attribute_array) {
foreach ($attribute_array as $key => $value) {
if ( !isset($this->children[$key][0]) ) {
$index = 1;
$this->children[$key][0] = 0;
} else {
$index = $this->children[$key][0] + 1;
}
$this->children[$key][$index] = new XMLTag($key, $value);
$this->children[$key][0]++;
}
return;
}
function addChild($XMLTag_obj) {
$key = $XMLTag_obj->name;
if ( !isset($this->children[$key][0]) ) {
$index = 1;
$this->children[$key][0] = 0;
} else {
$index = $$this->children[$key][0] + 1;
}
$this->children[$key][$index] = $XMLTag_obj;
$this->children[$key][0]++;
return $this->children[$key][$index];
}
}
?>
nate at adeptisoft dot com
22-Apr-2004 01:51
just a slight modification to info at b1g dot de's wonderful RDFParse class... I have changed "titel" to "title" and added the description.. so output should look like:
[1] => Array
(
[description] => Some story here
[title] => A title
[link] =>
)
-------
<?php
class RDFParser {
var $_item;
var $_url;
function RDFParser($url) {
$this->_url = $url;
}
function ParseRDF() {
$this->_item = array('i' => 0);
$parser = xml_parser_create();
xml_set_object($parser, &$this);
xml_set_element_handler($parser, "_startElement", "_endElement");
xml_set_character_data_handler($parser, "_charHandler");
$fp = fopen($this->_url, "r");
while(!feof($fp)) {
$line = fgets($fp, 4096);
xml_parse($parser, $line);
}
fclose($fp);
xml_parser_free($parser);
return($this->_item['items']);
}
function _startElement($parser, $name, $attrs) {
$this->_item['maychar'] = true;
if($name=="ITEM") {
$this->_item['mayparse'] = true;
$this->_item['i']++;
} elseif($name=="TITLE") {
$this->_item['akt'] = "TITLE";
} elseif($name=="LINK") {
$this->_item['akt'] = "LINK";
} elseif($name=="DESCRIPTION") {
$this->_item['akt'] = "DESCRIPTION";
} else {
$this->_item['maychar'] = false;
}
}
function _endElement($parser, $name) {
if($name=="ITEM") {
$this->_item['mayparse'] = false;
} elseif($name=="TITLE" || $name=="LINK" || $name="DESCRIPTION") {
$this->_item['maychar'] = false;
}
}
function _charHandler($parser, $data) {
if($this->_item['maychar'] && $this->_item['mayparse']) {
if($this->_item['akt']=="TITLE") {
$this->_item['items'][$this->_item['i']]['title'] = $data;
}
if($this->_item['akt']=="LINK") {
$this->_item['items'][$this->_item['i']]['link'] = $data;
}
if($this->_item['akt']=="DESCRIPTION") {
$this->_item['items'][$this->_item['i']]['description'] = $data;
}
}
}
}
?>
chibo at gmx dot de
15-Apr-2004 11:09
TO: jon at gettys dot org (the simple xml parser)
For german language change the function:
function characterData($parser, $data) {
global $obj;
eval($obj->tree.'->data=\''. $data .'\';');
}
to:
function characterData($parser, $data) {
global $obj;
eval($obj->tree.'->data.=\''. $data .'\';');
}
to get all the value of the attribute! otherwise you get only the last piece of the entire string.
Greets,
Chi
askgopal [AT] sify [PERIOD] com
06-Apr-2004 10:47
A simple XML parser that would allow us to retrieve a value of an element using its path.
-- cut here --
<?
$_elements = array();
$_cur_path = '';
function parse_xml_config($file, $elems)
{
global $_elements;
$e = error_reporting(0);
if (($fp = fopen($file, 'r')) === false)
return ($elements);
$xph = xml_parser_create();
if (is_resource($xph)) {
xml_parser_set_option($xph, XML_OPTION_CASE_FOLDING, true);
if (!xml_set_element_handler($xph,
'start_elem_handler', 'end_elem_handler'))
return ($elements);
while (($data = fread($fp, 4096)))
xml_parse($xph, $data, feof($fp));
xml_parser_free($xph);
}
fclose($fp);
$elems = $_elements;
error_reporting($e);
}
function start_elem_handler($xph, $name, $attrs)
{
global $_elements, $_cur_path;
$e = error_reporting(0);
$_cur_path .= "/$name";
while (list($key,$val) = each($attrs)) {
$index = "$_cur_path/$key";
if (isset($_elements[$index])) {
$tmp = $_elements[$index];
$_elements[$index] = array();
array_push($_elements[$index], $tmp);
array_push($_elements[$index], $val);
} else
$_elements[$index] = $val;
}
error_reporting($e);
}
function end_elem_handler($xph, $name)
{
global $_elements, $_cur_path;
$_cur_path = dirname($_cur_path);
}
$config = array();
parse_xml_config('/usr/local/etc/myconfig.xml', &$config);
print_r($config);
?>
-- paste --
if the input is:
<config>
<db host="localhost" username="foo" password="bar" db="test"/>
<column name="x" value="x1"/>
<column name="y" value="y1"/>
</config>
the output would be:
Array
(
[/CONFIG/DB/HOST] => localhost
[/CONFIG/DB/USERNAME] => foo
[/CONFIG/DB/PASSWORD] => bar
[/CONFIG/DB/DB] => test
[/CONFIG/COLUMN/NAME] => Array
(
[0] => x
[1] => y
)
[/CONFIG/COLUMN/VALUE] => Array
(
[0] => x1
[1] => y1
)
)
odders
19-Mar-2004 06:36
I wrote a simple xml parser mainly to deal with rss version 2. I found lots of examples on the net, but they were all masive and bloated and hard to manipulate.
Output is sent to an array, which holds arrays containg data for each item.
Obviously, you will have to make modifications to the code to suit your needs, but there isnt a lot of code there, so that shouldnt be a problem.
<?php
$currentElements = array();
$newsArray = array();
readXml("./news.xml");
echo("<pre>");
print_r($newsArray);
echo("</pre>");
function readXML($xmlFile)
{
$xmlParser = xml_parser_create();
xml_parser_set_option($xmlParser, XML_OPTION_CASE_FOLDING, false);
xml_set_element_handler($xmlParser, startElement, endElement);
xml_set_character_data_handler($xmlParser, characterData);
$fp = fopen($xmlFile, "r");
while($data = fread($fp, filesize($xmlFile))){
xml_parse($xmlParser, $data, feof($fp));}
xml_parser_free($xmlParser);
}
function startElement($parser, $name, $attrs)
{
global $currentElements, $itemCount;
array_push($currentElements, $name);
if($name == "item"){$itemCount += 1;}
}
function characterData($parser, $data)
{
global $currentElements, $newsArray, $itemCount;
$currentCount = count($currentElements);
$parentElement = $currentElements[$currentCount-2];
$thisElement = $currentElements[$currentCount-1];
if($parentElement == "item"){
$newsArray[$itemCount-1][$thisElement] = $data;}
else{
switch($name){
case "title":
break;
case "link":
break;
case "description":
break;
case "language":
break;
case "item":
break;}}
}
function endElement($parser, $name)
{
global $currentElements;
$currentCount = count($currentElements);
if($currentElements[$currentCount-1] == $name){
array_pop($currentElements);}
}
?>
talraith at withouthonor dot com
03-Feb-2004 10:27
I have created a class set that both parses XML into an object structure and from that structure creates XML code. It is mostly finished but I thought I would post here as it may help someone out or if someone wants to use it as a base for their own parser. The method for creating the object is original compared to the posts before this one.
The object tree is created by created seperate tag objects for each tag inside the main document object and associating them together by way of object references. An index table is created so that each tag is assigned an ID number (in numerical order from 0) and can be accessed directly using that ID number. Each tag has object references to its children. There are no uses of eval() in this code.
The code is too long to post here, so I have made a HTML page that has it:
Sample code would look something like this:
<?
$xml = new xml_doc($my_xml_code);
$xml->parse();
$root_tag =& $xml->xml_index[0];
$children =& $root_tag->children;
$my_xml = new xml_doc();
$root_tag = $my_xml->CreateTag('ROOTTAG');
$my_xml->CreateTag('CHILDTAG',array(),'',$root_tag);
$out_xml = $my_xml->generate();
?>
bradparks at bradparks dot com
17-Dec-2003 10:38
Hey;
If you need to parse XML on an older version of PHP (e.g. 4.0) or if you can't get the expat extension enabled on your server, you might want to check out the Saxy and DOMIT! xml parsers from Engage Interactive. They're opensource and pure php, so no extensions or changes to your server are required. I've been using them for over a month on some projects with no problems whatsoever!
Check em out at:
DOMIT!, a DOM based xml parser, uses Saxy (included)
or
Saxy, a sax based xml parser
Brad
condor33NOSPAM at tiscali dot it
09-Dec-2003 07:01
This is a variation to the routine posted here by
jon at gettys dot org to convert an XML file
into a php structure.
I did not find a cleaner method than "eval"
as he asks, but anyway his way is not so bad.
<?php
Class xmlread
{
var $tree = '$this->ogg';
var $ogg ;
var $cnt = 0;
function change_to_array($test,$is_arr) {
if ($test and !$is_arr): eval('$tmp = '.$this->tree.';'); eval('unset('.$this->tree.');'); eval(''.$this->tree.'= array();'); eval('array_push('.$this->tree.',$tmp);');return true;
endif;
if ($is_arr)
return true;
}
function startElement($parser, $name, $attrs)
{
$this->tree = $this->tree."->".$name; eval('$is_arr = is_array('.$this->tree.');');
eval('$test = isset('.$this->tree.');');
$is_arr = $this->change_to_array($test,$is_arr);
if ($is_arr): $this->cnt = $this->cnt+1; $this->tree = $this->tree.'['.$this->cnt.']';
endif;
return true;}
function characterData($parser, $data)
{
if (trim($data)!=''):
$data = addslashes($data);
eval($this->tree."='".trim($data)."';");
endif;
return true;}
function endElement($parser, $name)
{ $pos = strrpos($this->tree, ">");
$leng = strlen($this->tree);
$pos1 = ($leng-$pos)+1;
$this->tree = substr($this->tree, 0, -$pos1);
return true;}
function get_data
($doc,$st_el='startElement',
$end_el='endElement',
$c_data='characterData') {
$this->mioparser = xml_parser_create();
xml_set_object($this->mioparser, &$this);
xml_set_element_handler
($this->mioparser, $st_el,$end_el);
xml_set_character_data_handler
($this->mioparser,$c_data);
xml_parser_set_option
($this->mioparser, XML_OPTION_CASE_FOLDING, false);
xml_parse($this->mioparser,$doc);
if (xml_get_error_code($this->mioparser)):
print "<b>XML error at line n. ".
xml_get_current_line_number
($this->mioparser)." -</b> ";
print xml_error_string
(xml_get_error_code($this->mioparser));
endif;
return true; }
function xmlread($doc) {
$xml = file_get_contents('document.xml');
$this->get_data($xml);
return true; }
} ?>
chris at hitcatcher dot com
07-Nov-2003 10:48
In regards to jon at gettys dot org's XML object, The data should be TRIM()ed to remove any whitespace that could appear in CDATA entered as :
<xml_tag>
cdata here. cdata here. cdata here. cdata here.
</xml_tag>
So, after applying fred at barron dot com's suggested change to the characterData function, the function should appear as:
function characterData($parser, $data)
{
global $obj;
$data = addslashes($data);
eval($obj->tree."->data.='".trim($data)."';");
}
SIDE NOTE: I'm fairly new to XML so perhaps it is considered bad form to enter CDATA as I did in my example. Is this true or is the extra whitespace for the sake of readablity acceptable?
ml at csite dot com
02-Jul-2003 03:29
A fix for the fread breaking thing:
while ($data = fread($fp, 4096)) {
$data = $cache . $data;
if (!feof($fp)) {
if (preg_match_all("(</?[a-z0-9A-Z]+>)", $data, $regs)) {
$lastTagname = $regs[0][count($regs[0])-1];
$split = false;
for ($i=strlen($data)-strlen($lastTagname); $i>=strlen($lastTagname); $i--) {
if ($lastTagname == substr($data, $i, strlen($lastTagname))) {
$cache = substr($data, $i, strlen($data));
$data = substr($data, 0, $i);
$split = true;
break;
}
}
}
if (!$split) {
$cache = $data;
}
}
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser)));
}
}
panania at 3ringwebs dot com
20-May-2003 10:12
The above example doesn't work when you're parsing a string being returned from a curl operation (why I don't know!) I kept getting undefined offsets at the highest element number in both the start and end element functions. It wasn't the string itself I know, because I substringed it to death with the same results. But I fixed the problem by adding these lines of code...
function defaultHandler($parser, $name) {
global $depth;
@ $depth[$parser]--;
}
xml_set_default_handler($xml_parser, "defaultHandler");
Hope this helps 8-}
fred at barron dot com
23-Apr-2003 12:28
regarding jon at gettys dot org's nice XML to Object code, I've made some useful changes (IMHO) to the characterData function... my minor modifications allow multiple lines of data and it escapes quotes so errors don't occur in the eval...
function characterData($parser, $data)
{
global $obj;
$data = addslashes($data);
eval($obj->tree."->data.='".$data."';");
}
software at serv-a-com dot com
17-Feb-2003 05:10
2. Pre Parser Strings and New Line Delimited Data
One important thing to note at this point is that the xml_parse function requires a string variable. You can manipulate the content of any string variable easily as we all know.
A better approach to removing newlines than:
while ($data = fread($fp, 4096)) {
$data = preg_replace("/\n|\r/","",$data); //flarp
if (!xml_parse($xml_parser, $data, feof($fp))) {...
Above works across all 3 line-delimited text files (\n, \r, \r\n). But this could potentially (or will most likely) damage or scramble data contained in for example CDATA areas. As far as I am concerned end of line characters should not be used _within_ XML tags. What seems to be the ultimate solution is to pre-parse the loaded data this would require checking the position within the XML document and adding or subtracting (using a in-between fread temporary variable) data based on conditions like: "Is within tag", "Is within CDATA" etc. before fedding it to the parser. This of course opens up a new can of worms (as in parse data for the parser...). (above procedure would take place between fread and xml_parser calls this method would be compatible with the general usage examples on top of the page)
3. The Answer to parsing arbitrary XML and Preprocessor Revisited
You can't just feed any XML document to the parser you constructed and assuming that it will work! You have to know what kind of methods for storing data are used, for example is there a end of line delimited data in the file ?, Are there any carriage returns in the tags etc... XML files come formatted in different ways some are just a one long string of characters with out any end of line markers others have newlines, carriage returns or both (Microsloth Windows). May or may not contain space and other whitespace between tags. For this reason it is important to what I call Normalize the data before feeding it to the parser. You can perform this with regular expressions or plain old str_replace and concatenation. In many cases this can be done to the file it self sometimes to string data on the fly( as shown in the example above). But I feel it is important to normalize the data before even calling the function to call xml_parse. If you have the ability to access all data before that call you can convert it to what you fell the data should have been in the first place and omit many surprises and expensive regular expression substitution (in a tight spot) while fread'ing the data.
software at serv-a-com dot com
17-Feb-2003 05:09
My previous XML post (software at serv-a-com dot com/22-Jan-2003 03:08) resulted in some of the visitors e-mailg me on the carriage return stripping issue with questions. I'll try to make the following mumble as brief and easy to understand as possible.
1. Overview of the 4096 fragmentation issue
As you know the following freads the file 4096 bytes at a time (that is 4KB) this is perhaps ok for testing expat and figuring out how things work, but it it rather dangerous in the production environment. Data may not be fully understandable due to fread fragmentation and improperly formatted due to numerous sources(formats) of data contained within (i.e. end of line delimited CDATA).
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
Sometimes to save time one may want to load it all up into a one big variable and leave all the worries to expat. I think anything under 500 KB is ok (as long as nobody knows about it). Some may argue that larger variables are acceptable or even necessary because of the magic that take place while parsing using xml_parse. Our XML parser(expat) works and can be successfully implemented only when we know what type of XML data we are dealing with, it's average size and structure of general layout and data contained within tags. For example if the tags are followed by a line delimiter like a new line we can read it with fgets in and with minimal effort make sure that no data will be sent to the function that does not end with a end tag. But this require a fair knowledge of the file's preference for storing XML data and tags (and a bit of code between reading data and xml_parse'ing it).
software at serv-a-com dot com
22-Jan-2003 10:08
use:
while ($data = str_replace("\n","",fread($fp, 4096))){
instead of:
while ($data = fread($fp, 4096)) {
It will save you a headache.
and in response to (simen at bleed dot no 11-Jan-2003 04:27) "If the 4096 byte buffer fills up..."
Please take better care of your data don't just shove it in to the xml_parse() check and make sure that the tags are not sliced the middle, use a temporary variable between fread and xml_parse.
simen at bleed dot no
11-Jan-2003 11:27
I was experiencing really wierd behaviour loading a large XML document (91k) since the buffer of 4096, when reading the file actually doesn't take into consideration the following:
<node>this is my value</node>
If the 4096 byte buffer fills up at "my", you will get a split string into your xml_set_character_data_handler().
The only solution I've found so far is to read the whole document into a variable and then parse.
mreilly at ZEROSPAM dot MAC dot COM
14-Nov-2002 06:01
I wanted a way to reference the XML tree by path. I couldn't find exactly what I wanted, but using examples here and on phpbuilder.com came up with this. This results in a nested associative array, so elements can be accessed in the manner:
echo $ary_parsed_file['path']['to']['value'];
<?php
echo '<PRE>';
$ary_path = array();
$ary_parsed_file = array();
$int_starting_level = 1;
$xml_file = 'label.xml';
$type = 'UTF-8';
$xml_parser = xml_parser_create($type);
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
xml_parser_set_option($xml_parser, XML_OPTION_TARGET_ENCODING, 'UTF-8');
xml_set_element_handler($xml_parser, 'startElement','endElement');
xml_set_character_data_handler($xml_parser, 'characterData');
if (!($fp = fopen($xml_file, 'r'))) {
die("Could not open $xml_file for parsing!\n");
}
while ($data = fread($fp, 4096)) {
if (!($data = utf8_encode($data))) {
echo 'ERROR'."\n";
}
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf( "XML error: %s at line %d\n\n",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
print_r($ary_parsed_file);
function startElement($parser, $name, $attrs=''){
global $ary_path;
array_push($ary_path, $name);
}
function endElement($parser, $name, $attrs=''){
global $ary_path;
array_pop($ary_path);
}
function characterData($parser, $data){
global $ary_parsed_file, $ary_path, $int_starting_level;
$str_trimmed_data = trim($data);
if (!empty($str_trimmed_data)) {
$str_array_define = '$ary_parsed_file';
for ($i = $int_starting_level; $i < count($ary_path); $i++) {
$str_array_define .= '[\'' . $ary_path[$i] . '\']';
}
$str_array_define .= " = '" . $str_trimmed_data . "';";
eval($str_array_define);
} }
?>
sfaulkner at hoovers dot com
04-Nov-2002 08:29
Building on... This allows you to return the value of an element using an XPath reference. This code would of course need error handling added :-)
function GetElementByName ($xml, $start, $end) {
$startpos = strpos($xml, $start);
if ($startpos === false) {
return false;
}
$endpos = strpos($xml, $end);
$endpos = $endpos+strlen($end);
$endpos = $endpos-$startpos;
$endpos = $endpos - strlen($end);
$tag = substr ($xml, $startpos, $endpos);
$tag = substr ($tag, strlen($start));
return $tag;
}
function XPathValue($XPath,$XML) {
$XPathArray = explode("/",$XPath);
$node = $XML;
while (list($key,$value) = each($XPathArray)) {
$node = GetElementByName($node, "<$value>", "</$value>");
}
return $node;
}
print XPathValue("Response/Shipment/TotalCharges/Value",$xml);
guy at bhaktiandvedanta dot com
27-Sep-2002 07:01
For a simple XML parser you can use this function. It doesn't require any extensions to run.
<?
function GetElementByName ($xml, $start, $end) {
global $pos;
$startpos = strpos($xml, $start);
if ($startpos === false) {
return false;
}
$endpos = strpos($xml, $end);
$endpos = $endpos+strlen($end);
$pos = $endpos;
$endpos = $endpos-$startpos;
$endpos = $endpos - strlen($end);
$tag = substr ($xml, $startpos, $endpos);
$tag = substr ($tag, strlen($start));
return $tag;
}
$file = "data.xml";
$pos = 0;
$Nodes = array();
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}
while ($getline = fread($fp, 4096)) {
$data = $data . $getline;
}
$count = 0;
$pos = 0;
while ($node = GetElementByName($data, "<XML_TAG>", "</XML_TAG>")) {
$Nodes[$count] = $node;
$count++;
$data = substr($data, $pos);
}
for ($i=0; $i<$count; $i++) {
$code = GetElementByName($Nodes[$i], "<Code>", "</Code>");
$desc = GetElementByName($Nodes[$i], "<Description>", "</Description>");
$price = GetElementByName($Nodes[$i], "<BasePrice>", "</BasePrice>");
}
?>
Hope this helps! :)
Guy Laor
dmarsh dot NO dot SPAM dot PLEASE at spscc dot ctc dot edu
18-Sep-2002 07:27
Some reference code I am working on as "XML Library" of which I am folding it info an object. Notice the use of the DEFINE:
Mainly Example 1 and parts of 2 & 3 re-written as an object:
--- MyXMLWalk.lib.php ---
<?php
if (!defined("PHPXMLWalk")) {
define("PHPXMLWalk",TRUE);
class XMLWalk {
var $p; var $e; function prl($x,$i=0) {
ob_start();
print_r($x);
$buf=ob_get_contents();
ob_end_clean();
return join("\n".str_repeat(" ",$i),split("\n",$buf));
}
function XMLWalk() {
$this->p = xml_parser_create();
$this->e = array();
xml_parser_set_option($this->p, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler($this->p, array(&$this, "startElement"), array(&$this, "endElement"));
xml_set_character_data_handler($this->p, array(&$this, "dataElement"));
register_shutdown_function(array(&$this, "free")); }
function startElement($parser, $name, $attrs) {
if (count($attrs)>=1) {
$x = $this->prl($attrs, $this->e[$parser]+6);
} else {
$x = "";
}
print str_repeat(" ",$this->e[$parser]+0). "$name $x\n";
$this->e[$parser]++;
$this->e[$parser]++;
}
function dataElement($parser, $data) {
print str_repeat(" ",$this->e[$parser]+0). htmlspecialchars($data, ENT_QUOTES) ."\n";
}
function endElement($parser, $name) {
$this->e[$parser]--;
$this->e[$parser]--;
}
function parse($data, $fp) {
if (!xml_parse($this->p, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($this->p)),
xml_get_current_line_number($this->p)));
}
}
function free() {
xml_parser_free($this->p);
}
} } ?>
--- end of file ---
Calling code:
<?php
...
require("MyXMLWalk.lib.php");
$file = "x.xml";
$xme = new XMLWalk;
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}
while ($data = fread($fp, 4096)) {
$xme->parse($data, $fp);
}
...
?>
jon at gettys dot org
14-Aug-2002 08:59
[Editor's note: see also xml_parse_into_struct().]
Very simple routine to convert an XML file into a PHP structure. $obj->xml contains the resulting PHP structure. I would be interested if someone could suggest a cleaner method than the evals I am using.
<?
$filename = 'sample.xml';
$obj->tree = '$obj->xml';
$obj->xml = '';
function startElement($parser, $name, $attrs) {
global $obj;
eval('$test=isset('.$obj->tree.'->'.$name.');');
if ($test) {
eval('$tmp='.$obj->tree.'->'.$name.';');
eval('$arr=is_array('.$obj->tree.'->'.$name.');');
if (!$arr) {
eval('unset('.$obj->tree.'->'.$name.');');
eval($obj->tree.'->'.$name.'[0]=$tmp;');
$cnt = 1;
}
else {
eval('$cnt=count('.$obj->tree.'->'.$name.');');
}
$obj->tree .= '->'.$name."[$cnt]";
}
else {
$obj->tree .= '->'.$name;
}
if (count($attrs)) {
eval($obj->tree.'->attr=$attrs;');
}
}
function endElement($parser, $name) {
global $obj;
for($a=strlen($obj->tree);$a>0;$a--) {
if (substr($obj->tree, $a, 2) == '->') {
$obj->tree = substr($obj->tree, 0, $a);
break;
}
}
}
function characterData($parser, $data) {
global $obj;
eval($obj->tree.'->data=\''.$data.'\';');
}
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($filename, "r"))) {
die("could not open XML input");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
print_r($obj->xml);
return 0;
?>
danielc at analysisandsolutions dot com
15-Apr-2002 09:23
I put up a good, simple, real world example of how to parse XML documents. While the sample grabs stock quotes off of the web, you can tweak it to do whatever you need.
jason at N0SPAM dot projectexpanse dot com
22-Mar-2002 09:16
In reference to the note made by [email protected] about parsing entities:
I could be wrong, but since it is possible to define your own entities within an XML DTD, the cdata handler function parses these individually to allow for your own implementation of those entities within your cdata handler.
jason at NOSPAM_projectexpanse_NOSPAM dot com
27-Feb-2002 12:11
For newbies wanting a good tutorial on how to actually get started and where to go from this listing of functions, then visit:
It shows an excellent example of how to read the XML data into a class file so you can actually process it, not just display it all pretty-like, like many tutorials on PHP/XML seem to be doing.
hans dot schneider at bbdo-interone dot de
24-Jan-2002 04:43
I had to TRIM the data when I passed one large String containig a wellformed XML-File to xml_parse. The String was read by CURL, which aparently put a BLANK at the end of the String. This BLANK produced a "XML not wellformed"-Error in xml_parse!
morgan_rogers at yahoo dot com
06-Oct-2000 08:37
There's a really good article on XML parsing with PHP at
sam at cwa dot co dot nz
28-Sep-2000 02:39
I've discovered some unusual behaviour in this API when ampersand entities are parsed in cdata; for some reason the parser breaks up the section around the entities, and calls the handler repeated times for each of the sections. If you don't allow for this oddity and you are trying to put the cdata into a variable, only the last part will be stored.
You can get around this with a line like:
$foo .= $cdata;
If the handler is called several times from the same tag, it will append them, rather than rewriting the variable each time. If the entire cdata section is returned, it doesn't matter.
May happen for other entities, but I haven't investigated.
Took me a while to figure out what was happening; hope this saves someone else the trouble.
Daniel dot Rendall at btinternet dot com
07-Jul-1999 05:21
When using the XML parser, make sure you're not using the magic quotes option (e.g. use set_magic_quotes_runtime(0) if it's not the compiled default), otherwise you'll get 'not well-formed' errors when dealing with tags with attributes set in them.
| |