PHP 5 includes a great built in class DOMDocument to DOM parsing of HTML/XML document. The class includes a number of methods to easily traversing a DOM.
However, it has a few shortfalls like it fails to handle encoding correctly and includes some tags which may often seem irritating for developers.
Artem Russakovskii has made an extension (named SmartDOMDocument) of this class to eliminate such shortcomings. His class inherits the built in DOMDocument and includes a few extra methods that may make developers life peaceful.
Extra Features of SmartDOMDocument
DOMDocument has an extremely badly designed “feature” where if the HTML code you are loading does not contain <html> and <body> tags, it adds them automatically (yup, there are no flags to turn this behavior off).
Thus, when you call $doc->saveHTML(), your newly saved content now has <html><body> and DOCTYPE in it. Not very handy when trying to work with code fragments (XML has a similar problem).
SmartDOMDocument contains a new function called saveHTMLExact() which does exactly what you would want – it saves HTML without adding that extra garbage that DOMDocument does.
DOMDocument notoriously doesn’t handle encoding (at least UTF-8) correctly and garbles the output.
SmartDOMDocument tries to work around this problem by enhancing loadHTML() to deal with encoding correctly. This behavior is transparent to you – just use loadHTML() as you would normally.
SmartDOMDocument Object As String
You can use a SmartDOMDocument object as a string which will print out its contents.
echo “Here is the HTML: $smart_dom_doc”;
- PHP 5
- DOMDocument – which is built in to PHP Core.