Over the past week, I have been working on a web-based management system for the NHS site. I am trying to make it as easy as possible to update and add pages to the site for whoever will maintain it when I am not there next year. To be completely honest, I don't know if anyone actually visits the site besides me; it has been very useful to me in that I can easily keep up with NHS activities.
I am going to store page data as XML. I was going to use SimpleXML to read back the data, but I don't have access to PHP 5. I am essentially building my own XML parser. It should not be too difficult; I plan to store everything opaquely by running it through PHP's htmlspecialchars() function before storing it in the XML file.
The first thing that I am working on (before implementing the code that writes the XML file) is validating user input. Since the site is served as "application/xhtml+xml" to browsers that support it, I need to make sure that the input is well formed, and properly nested to avoid a draconian XML error (these are good in that they force authors to produce well formed code). Checking for well-formedness was quite a challenge. I eventually came up with a slow, recursive function that checks the input. It isn't fool-proof, but is good enough for my purposes because I am encoding all HTML special characters to their corresponding entities for markup that I don't recognize. I only recognize a pre-defined list of markup elements.
I do hope that this site that I have spent much time on is actually used and appreciated.
In case anyone is interested, here is the function that I came up with to check if the input is well formed: function validateInput($content) {
$content = preg_replace('/<!--.*?-->/s','',$content);
if (strpos($content,'<')===false)
return true;
$fullElements = preg_match_all(
'/<([a-z][a-z0-9]*)(\\s+[a-z][a-z0-9]*="[^"]*")*\\s*>(.*?)<\/\1>/is',
$content, $matches,PREG_PATTERN_ORDER);
if ($fullElements == 0)
return false;
$valid = true;
foreach($matches[3] as $key => $value)
$valid &= validateInput($value);
return $valid;
}
Edit 27 May 2007 23:59:38 EDT: there is a problem with this function that incorrectly identifies proper markup such as: "<div>text<div>text</div>text</div>" as invalid. I am working on this problem.
Labels: code, nhs, php