All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
- Comments that start with
>
or->
are now considered malformed in accordance with section 12.1.6 of the HTML specification. Comments may still contain the strings<!--
or--!>
and they may still end with<!-
contrary to the specification.
Important: This is a release candidate, which means some features might not yet be stable or emit unexpected behavior. Please don't hesitate to report broken or unstable features.
- Added a
README
file. - Added a
composer
file. - Added
.travis.yml
for automated unit tests withTravis-CI
. - Added the magic method
__debugInfo
toHtmlDocument
andHtmlNode
in order to reduce the memory footprint and to prevent recursion errors when usingprint_r
andvar_dump
. - Added the magic method
__call
toHtmlDocument
andHtmlNode
as a wrapper for deprecated methods using the lowercase calling convention (see below). - Added unit tests
attribute_test.php
,callback_test.php
,debug_info_test.php
,doctype_test.php
,script_test.php
,server_side_script_test.php
,style_test.php
anddom_manipulation_test.php
. - Added and extended unit tests for
cdata_test.php
andcomment_test.php
. - Added a new
Debug
class to inform users about deprecated functions, malformed documents and parsing issues. - Added full support for
script
element parsing.
- Renamed unit test
simple_html_dom_test.php
tohtmldocument_test.php
. - Renamed unit test
simple_html_dom_node_test.php
tohtmlnode_test.php
. - Changed the implementation of destructors for better garbage collection.
- Changed how literal elements (
script
,style
,cdata
, "comment" andcode
) are handled byHtmlDocument
.
HtmlDocument::clear()
has been deprecated and will be removed in the next major version of simplehtmldom. Useunset()
instead.HtmlDocument::load_file()
has been deprecated and will be removed in the next major version of simplehtmldom. UseHtmlDocument::loadFile()
instead.HtmlNode::children()
has been deprecated and will be removed in the next major version of simplehtmldom. UseHtmlNode::childNodes()
instead.HtmlNode::first_child()
has been deprecated and will be removed in the next major version of simplehtmldom. UseHtmlNode::firstChild()
instead.HtmlNode::has_child()
has been deprecated and will be removed in the next major version of simplehtmldom. UseHtmlNode::hasChild()
instead.HtmlNode::last_child()
has been deprecated and will be removed in the next major version of simplehtmldom. UseHtmlNode::lastChild()
instead.HtmlNode::next_sibling()
has been deprecated and will be removed in the next major version of simplehtmldom. UseHtmlNode::nextSibling()
instead.HtmlNode::prev_sibling()
has been deprecated and will be removed in the next major version of simplehtmldom. UseHtmlNode::previousSibling()
instead.- Support for Smarty scripts has been deprecated and will be removed in the next major version of simplehtmldom.
- Support for server-side scripts has been deprecated and will be removed in the next major version of simplehtmldom.
- Removed the
testcase/
folder as all tests are covered by unit tests insidetests/
.
- Fixed a bug with boolean attributes that were incorrectly represented with a value of "1" when saving the DOM.
- Fixed a bug with comment and CDATA parsing that could cause an infinite loop if any of these elements contained
script
,style
,code
, server-side php or Smarty tags. - Fixed a bug with comment and CDATA parsing that resulted in whitespace and newlines being removed when loading a document with
$stripRN = true
(default setting). - Fixed a bug with attribute values that resulted in incorrectly encoded content when using
outertext()
,innertext()
orsave()
. - Fixed a bug with charset encoding that resulted in partially encoded documents depending on the use of
outertext()
andinnertext()
#178 - Fixed multiple bugs related to DOM manipulation when using
HtmlDocument::createElement()
,HtmlDocument::createTextNode()
andHtmlNode::appendChild()
.
Important: This is a release candidate, which means some features might not yet be stable or emit unexpected behavior. Please don't hesitate to report broken or unstable features.
- Added unit tests
- Added tests for whitespace handling.
- Added tests for entity decoding.
- Added tests for node functions after calling remove().
- Added tests for
maxLen
infile_get_html
. - Added tests for
simple_html_dom_node
. - Added tests for
HtmlWeb
. - Added test for bug #172
- Added optional argument
$trim = true
to$node->text()
- Added attribute value normalization
- Added automatic HTML entity decoding when loading documents [feature:#52]
- Added the negation pseudo-class
- Added
simple_html_dom::expect()
. - Added
simple_html_dom_node::expect()
. - Added the ability to parse CDATA sections.
- Added
HtmlWeb
to directly load webpages via cURL or fopen as DOM. - Added
HtmlDocument
,HtmlNode
,HtmlWeb
andconstants
to namespacesimplehtmldom
. - Added a new element type
HDOM_TYPE_CDATA
for CDATA sections. - Added full support for parsing comments and CDATA sections.
simple_html_dom::doc
is now unset after loading the DOM.simple_html_dom::restore_noise()
now clears restored elements.simple_html_dom_node::_[HDOM_INFO_ENDSPACE]
now only exists if needed.simple_html_dom_node::_[HDOM_INFO_SPACE]
- Now stores elements by attribute names.
- Now only exists if needed (defaults to
array(' ', '', '')
).
simple_html_dom_node::_[HDOM_INFO_QUOTE]
- Now stores elements by attribute names.
- Now only exists if needed (defaults to
HDOM_QUOTE_DOUBLE
).
simple_html_dom_node::text()
now supports all block and inline level elements.simple_html_dom_node::text()
now skips empty block elements.simple_html_dom_node::text()
now properly handles 
characters.simple_html_dom_node::removeChild()
now removes all types of childs.- Increased
MAX_FILE_SIZE
from 0.6 MB (600000 Bytes) to 2.5 MiB (2621440 Bytes) HDOM_INFO_INNER
(innertext) is now stored as part of the owning element.- Moved and renamed
simple_html_dom
toHtmlDocument
. - Moved and renamed
simple_html_dom_node
toHtmlNode
. - Moved constants to
constants.php
- Moved
HDOM_TYPE_*
,HDOM_INFO_*
andHDOM_QUOTE_*
constants intoHtmlNode
.
- Removed
/example/scraping/example_scraping_general.php
. - Removed
/example/simple_html_dom_utility.php
. - Removed
/app
. - Removed
/testcase/reader
. - Removed
simple_html_dom_node::tag_start
.
- Fixed fatal error when removing nodes from the DOM (#172)
- Fixed
simple_html_dom::parse()
to work after removing elements from the DOM. - Fixed
simple_html_dom_node::text()
to properly handle UTF-8 characters. - Fixed all scripts in the example folder.
- Fixed
file_get_html
to return false if the file size is larger thanmaxLen
. - Fixed a bug that caused the parser to convert UTF-8 to UTF-8 on mistake.
- Fixed
simple_html_dom::loadFile
to properly forward arguments tosimple_html_dom::load_file
. - Fixed handling of optional closing tags to end on the last element.
- Fixed broken support for
text
nodes when usingfind
(#175).
- Added unit test for bug reports
- Added unit test for character sets UTF-8, CP1251 and CP1252 (#142)
- Added support for meta charset to parse_charset
- Added detection for CP1251 to parse_charset, using iconv
- Added LICENSE file (MIT) to the project root
- Added functions to
simple_html_dom_node
remove
: Removes the current node recursively from the DOM treeremoveChild
: Removes a child node recursively from the DOM treehasClass
: Checks if the current node has the specified class nameaddClass
: Adds one or more classes to the current noderemoveClass
: Removes one or more classes from the current nodesave
: Saves the current node to disk
- Changed manual from custom implementation to MkDocs (https://www.mkdocs.org/)
- Fixed warning when trying to clear() the DOM on a null nodes list (#153)
- Fixed missing whitespace when returning plaintext (#163)
- Fixed broken detection of duplicate attributes (#166)
- Fixed broken detection of CP1252 (ISO-8859-1) documents (#142)
- Fixed error using next-sibling combinator ('E + F') on last child
- Fixed selector parsing for attribute selectors ending on "s" or "i" (#169)
- Fixed various bugs related to parsing classes and ids
- Added documentation for
simple_html_dom_node::find
- Added documentation for
simple_html_dom_node::parse_selector
- Added documentation for
simple_html_dom_node::seek
- Added documentation for
simple_html_dom_node::match
- Added unit tests for bug reports
- Added unit tests for CSS selectors
- Added ability to define constants before simple_html_dom does
- 'DEFAULT_TARGET_CHARSET'
- 'DEFAULT_BR_TEXT'
- 'DEFAULT_SPAN_TEXT'
- 'MAX_FILE_SIZE'
- Added support for CSS combinators
- Added support for Child Combinator (
>
) - Added support for Next Sibling Combinator (
+
) - Added support for Subsequent Sibling Combinator (
~
)
- Added support for Child Combinator (
- Added support for multiclass selectors (
.class.class.class
) - Added support for multiattribute selectors (
[attr1][attr2][attribute3]
) - Added support for attribute selectors
- Added support for pipe selectors (
|=
) - Added support for tilde selectors (
~=
) - Added support for case sensitivity selectors (
i
ands
)
- Added support for pipe selectors (
- Added unit tests for PHP compatibility to PHP 5.6+
- Added coding standard using PHP_CodeSniffer
- Removed automatic filtering of 'tbody' selectors (#79)
Remove 'tbody' from all selectors to maintain the previous state!
- Coding standard using PHP_CodeSniffer
- Fixed broken CSS selector attributes with value "0" (#62)
- Fixed broken simple_html_dom::load_file
- Fixed forward slashes in CSS selector breaks value matching using '*=' (#144)
- Fixed Universal Selectors
- Added code documentation to improve readability
- Added unit tests for
simple_html_dom::$self_closing_tags
- Added unit tests for
simple_html_dom::$optional_closing_tags
- Added unit tests for bug reports
- Added unit tests for memory management of the parser
- Added bit flags to
simple_html_dom::load()
- Added bit flag
HDOM_SMARTY_AS_TEXT
to optionally filter Smarty scripts (#154)
Note: Smarty scripts are no longer filtered by default!\
- Added bit flag
- Added build script to automate releases
- Added support for attributes without whitespace to separate them
- Improved documentation and readability for
$self_closing_tags
- Improved documentation and readability for
$block_tags
- Improved documentation and readability for
$optional_closing_tags
- Updated list of
simple_html_dom::$self_closing_tags
- Removed 'spacer' (obsolete)
- Added 'area'
- Added 'col'
- Added 'meta'
- Added 'param'
- Added 'source'
- Added 'track'
- Added 'wbr'
- Updated list of
simple_html_dom::$optional_closing_tags
- Removed "nobr" (obsolete)
- Added 'th' as closable element to 'td'
- Added 'td' as closable element to 'th'
- Added 'optgroup' with 'optgroup' and 'option' as closable elements
- Added 'optgroup' as closable element to 'option'
- Added 'rp' with 'rp' and 'rt' as closable elements
- Added 'rt' with 'rt' and 'rp' as closable elements
- Clarified meaning of
simple_html_dom->parent
- Changed default
$offset
forfile_get_html()
from -1 to 0 (#161) - Changed
simple_html_dom::load()
to remove script tags before replacing newline characters simple_html_dom_node::text()
no longer adds whitespace to top level span elements (only to sub-elements)simple_html_dom_node::text()
adds blank lines between paragraphs- Normalized line endings in the repository to LF via
.gitattributes
- Improved performance of
simple_html_dom::parse_charset()
by approximately 25% - Improved performance of
simple_html_dom::parse()
by approximately 10%
str_get_html()
is deprecated and should be replaced bynew simple_html_dom()
- Removed protected function
simple_html_dom::copy_until_char_escaped()
- Fixed compatibility issues with PHP 7.3
- Fixed typo (#147)
- Fixed handling of incorrectly escaped text (#160)
- Restore functionality of
$maxLen
infile_get_html()
- Fixed load_file breaks if an error ocurred in another script
- Added some ability to insert and create nodes
- Add ability to search the "noise" array
- Added flag: LOCK_EX while calling "file_put_contents()"
- Added support for detecting the source html character set. This is used to convert characters when plaintext is requested.
- Other little fixes and features, too numerous to categorize
- Error of "file_get_contents()" will be thrown as an exception
- Fixed the typo of "token_blank_t"
- Memory leak fixed
- Supports xpath generated from Firebug
- New method "dump" of "simple_html_dom_node"
- New attribute "xmltext" of "simple_html_dom_node"
- Remove preg_quote on selector match function:
[attribute*=value]
- Element "Comment" will treat as children
- Fixed the problem with
<pre>
- Fixed bug #2207477 (does not load some pages properly)
- Fixed bug #2315853 (Error with character after < sign)
- Negative indexes supports of "find" method, thanks for Vadim Voituk
- Constructor with automatically load contents either text or file/url, thanks for Antcs
- Fully supports wildcard in selectors
- Fixed bug of confusing by the < symbol inside the text
- Fixed bug of dash in selectors
- Fixed bug of
<nobr>
- Fixed bug #2155883 (Nested List Parses Incorrectly)
- Fixed bug #2155113 (error with unclosed html tags)
- New method "getAllAttributes" of "simple_html_dom_node"
- Supports full javascript string in selector:
$e->find("a[onclick=alert('hello')]")
- Changed selector "*=" to case-insentive
- Fixed the bug of selector in some critical conditions
- Fixed the bug of striping php tags
- Fixed the bug of remove_noise()
- Fixed the bug of noise in attributes
- Performance tuning (boost 10%)
- Memory requirement reduced by 25%
- Changed function name from "file_get_dom()" to "file_get_html()"
- Changed function name from "str_get_dom()" to "str_get_html()"
- Fixed bug #2011286 (Error with unclosed html tags)
- Fixed bug #2012551 (Error parsing divs)
- Fixed bug #2020924 (Error for missed tag)
- Fixed bug (problem with
<body>
tag's innertext)
- Supports "multiple class" selector feature:
<div class="a b c"></div>
- New "callback function" feature
- New "multiple selectors" feature: $dom->find('p,a,b')
- New examples
- Supports extract contents from HTML features: $dom->plaintext
- Performance tuning (boost 20%)
- Changed simple_html_dom_node method name from "text()" to "makeup()"
- Fixed the bug of $dom->clear()
- Fixed the bug of text nodes' innertext
- Fixed the bug of comment nodes' innertext
- Fixed the bug of decendent selector with optional tags
- New node type "comment" (eg. $dom->find('comment'))
- Add self-closing tags: 'base', 'spacer'
- New example "simple_html_dom_utility.php"
- File and class name changed (html_dom_parser->simple_html_dom)
- ($dom->save_file) will not support anymore
- Remove example "example_customize_parser.php"
- Fixed the bug of outertext (th)
- Fixed the bug of regular expression escaping chars ($dom->find)
- Fixed the bug while line-breaker and "\t" in tags
- Reference section in manual
- Added traverse section in manual
- Added the solution while server behind proxy in FAQ (Thanks to Yousuke Shaggy)
- New method to remove attribute.
- New DOM operations(first_child, last_child, next_sibling, previous_sibling) (Request #1936000)
- Now file_get_dom supports full file_get_contents parameters
- Fixed the bug of self-closing tags in the end of file
- Fixed the bug of blanks in the end of tag
- Fixed some typo of testcase
- Supports tag name with namespace
- New attribute filters (Thanks to Yousuke Kumakura)
- Refine structure of testcase
- Fix the bug of optional-closing tags
- Fix the bug of parsing the line break next to the tag's name
- Add FAQ section in manual
- Fixed infinity loop while the source content is BAD HTML
- Fixed the bug of adding new attributes to self closing tags
- Fixed the bug of customize parser without $dom->remove_noise()