PHP DOM textContent vs nodeValue?

PHP DOMnode objects contain a textContent and nodeValue attributes which both seem to be the innerHTML of the node.

nodeValue: The value of this node, depending on its type

textContent: This attribute returns the text content of this node and its descendants.

What is the difference between these two properties? When is it proper to use one instead of the other?


Solution 1:

I finally wanted to know the difference as well, so I dug into the source and found the answer; in most cases there will be no discernible difference, but there are a bunch of edge cases you should be aware of.

Both ->nodeValue and ->textContent are identical for the following classes (node types):

  • DOMAttr
  • DOMText
  • DOMElement
  • DOMComment
  • DOMCharacterData
  • DOMProcessingInstruction

The ->nodeValue property yields NULL for the following classes (node types):

  • DOMDocumentFragment
  • DOMDocument
  • DOMNotation
  • DOMEntity
  • DOMEntityReference

The ->textContent property is non-existent for the following classes:

  • DOMNameSpaceNode (not documented, but can be found with //namespace:* selector)

The ->nodeValue property is non-existent for the following classes:

  • DOMDocumentType

See also: dom_node_node_value_read() and dom_node_text_content_read()

Solution 2:

Hope this will make sense:

$doc = DOMDocument::loadXML('<body><!-- test --><node attr="test1">old content<h1>test</h1></node></body>');
var_dump($doc->textContent);
var_dump($doc->nodeValue);
var_dump($doc->firstChild->textContent);
var_dump($doc->firstChild->nodeValue);

Output:

string(15) "old contenttest"
NULL
string(15) "old contenttest"
string(15) "old contenttest"

Because: nodeValue - The value of this node, depending on its type