En curso

small PHP project: parse HTML files, write XML

Hey PHP experts!\r\n\r\nWe need a PHP 5 program that parses HTML files, extracts specific content and writes it to a XML file. \r\nYou find as an attachment a zip, please have a look at the sample files.\r\n\r\nHere is a task description in pseudocode:\r\n\r\n// PHP5\r\nclass Utilities\r\n{\r\n \r\n public $pathToSourceDirectory = \'someSourceDirectory\';\r\n public $pathToTargetDirectory = \'someTargetDirectory\';\r\n public $nameXMLfile = \'newXMLFile\';\r\n public $targetNode = \"misc_texts\";\r\n public $targetTag = \'body\'\r\n public $ignoreDate:Boolean;\r\n \r\n $xmlFile = $pathToTargetDirectory.\'/\'.$nameXMLfile;\r\n if exists, open $xmlFile \r\n else create $xmlFile first \r\n \r\n write one or several methods that perform the following routine:\r\n \r\n loop through all files inside $pathToSourceDirectory and all its subdirectories\r\n if the file is a HTML file (any extension like .html || .htm || .HTML etc.)\r\n if date of the file is newer than date of $xmlFile || $ignoreDate == true \r\n open file\r\n \r\n parse it: loop through all the top-level tags (do not loop through children tags) \r\n if div does not have the class \'private\'\r\n extract content of div\r\n write it to $xmlFile \r\n as a child of the $targetNode \r\n if node with this page name already exists (compare page name) replace content \r\n else add new node\r\n \r\n structure of the resulting $xmlFile:\r\n \r\n // $targetNode\r\n \r\n // path to file (replace slashes (/) with double underscore (\'__\') + filename (without extension)\r\n // id of first div\r\n content to be extracted\r\n ]]>\r\n \r\n // id of second div\r\n content to be extracted\r\n ]]>\r\n \r\n \r\n \r\n \r\n \r\n content of file B to be extracted\r\n ]]>\r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n save $xmlFile\r\n close all files\r\n return success or error\r\n \r\n}\r\n\r\nLooking forward to hear from you!\r\nAndreas

Habilidades: PHP, XML

Ver más: xml utilities, look php id, look and write a or b, html 5 top 10, html 5 date, c++ parse html 5, small-task-project, small php project, R and R , Pseudocode, parse html, parse an html, html level 1, parse error parse error, project php html, php5 class, php write file xml, save div, php parse xml html, underscore php

Información del empleador:
( 31 comentarios ) Zürich, Switzerland

Nº del proyecto: #4857621