James Cassell's Blog

Sunday, May 13, 2007

Current Work on NHS Site

Over the past week, I have been working on a web-based management system for the NHS site. I am trying to make it as easy as possible to update and add pages to the site for whoever will maintain it when I am not there next year. To be completely honest, I don't know if anyone actually visits the site besides me; it has been very useful to me in that I can easily keep up with NHS activities.

I am going to store page data as XML. I was going to use SimpleXML to read back the data, but I don't have access to PHP 5. I am essentially building my own XML parser. It should not be too difficult; I plan to store everything opaquely by running it through PHP's htmlspecialchars() function before storing it in the XML file.

The first thing that I am working on (before implementing the code that writes the XML file) is validating user input. Since the site is served as "application/xhtml+xml" to browsers that support it, I need to make sure that the input is well formed, and properly nested to avoid a draconian XML error (these are good in that they force authors to produce well formed code). Checking for well-formedness was quite a challenge. I eventually came up with a slow, recursive function that checks the input. It isn't fool-proof, but is good enough for my purposes because I am encoding all HTML special characters to their corresponding entities for markup that I don't recognize. I only recognize a pre-defined list of markup elements.

I do hope that this site that I have spent much time on is actually used and appreciated.

In case anyone is interested, here is the function that I came up with to check if the input is well formed:

function validateInput($content) {
 $content = preg_replace('/<!--.*?-->/s','',$content);
 if (strpos($content,'<')===false)
  return true;
 $fullElements = preg_match_all(
'/<([a-z][a-z0-9]*)(\\s+[a-z][a-z0-9]*="[^"]*")*\\s*>(.*?)<\/\1>/is',
$content, $matches,PREG_PATTERN_ORDER);
 if ($fullElements == 0)
  return false;
 $valid = true;
 foreach($matches[3] as $key => $value)
  $valid &= validateInput($value);
 return $valid;
}

Edit 27 May 2007 23:59:38 EDT: there is a problem with this function that incorrectly identifies proper markup such as: "<div>text<div>text</div>text</div>" as invalid. I am working on this problem.

Labels: , ,

0 Comments:

Post a Comment

<< Home