Scala - modifying nested elements in xml
All this time, and no one actually gave the most appropriate answer! Now that I have learned of it, though, here's my new take on it:
import scala.xml._
import scala.xml.transform._
object t1 extends RewriteRule {
override def transform(n: Node): Seq[Node] = n match {
case Elem(prefix, "version", attribs, scope, _*) =>
Elem(prefix, "version", attribs, scope, Text("2"))
case other => other
}
}
object rt1 extends RuleTransformer(t1)
object t2 extends RewriteRule {
override def transform(n: Node): Seq[Node] = n match {
case sn @ Elem(_, "subnode", _, _, _*) => rt1(sn)
case other => other
}
}
object rt2 extends RuleTransformer(t2)
rt2(InputXml)
Now, for a few explanations. The class RewriteRule
is abstract. It defines two methods, both called transform
. One of them takes a single Node
, the other a Sequence
of Node
. It's an abstract class, so we can't instantiate it directly. By adding a definition, in this case override one of the transform
methods, we are creating an anonymous subclass of it. Each RewriteRule needs concern itself with a single task, though it can do many.
Next, class RuleTransformer
takes as parameters a variable number of RewriteRule
. It's transform method takes a Node
and return a Sequence
of Node
, by applying each and every RewriteRule
used to instantiate it.
Both classes derive from BasicTransformer
, which defines a few methods with which one need not concern oneself at a higher level. It's apply
method calls transform
, though, so both RuleTransformer
and RewriteRule
can use the syntactic sugar associated with it. In the example, the former does and the later does not.
Here we use two levels of RuleTransformer
, as the first applies a filter to higher level nodes, and the second apply the change to whatever passes the filter.
The extractor Elem
is also used, so that there is no need to concern oneself with details such as namespace or whether there are attributes or not. Not that the content of the element version
is completely discarded and replaced with 2
. It can be matched against too, if needed.
Note also that the last parameter of the extractor is _*
, and not _
. That means these elements can have multiple children. If you forget the *
, the match may fail. In the example, the match would not fail if there were no whitespaces. Because whitespaces are translated into Text
elements, a single whitespace under subnode
would case the match to fail.
This code is bigger than the other suggestions presented, but it has the advantage of having much less knowledge of the structure of the XML than the others. It changes any element called version
that is below -- no matter how many levels -- an element called subnode
, no matter namespaces, attributes, etc.
Furthermore... well, if you have many transformations to do, recursive pattern matching becomes quickly unyielding. Using RewriteRule
and RuleTransformer
, you can effectively replace xslt
files with Scala code.
You can use Lift's CSS Selector Transforms and write:
"subnode" #> ("version *" #> 2)
See http://stable.simply.liftweb.net/#sec:CSS-Selector-Transforms
I think the original logic is good. This is the same code with (shall I dare to say?) a more Scala-ish flavor:
def updateVersion( node : Node ) : Node = {
def updateElements( seq : Seq[Node]) : Seq[Node] =
for( subNode <- seq ) yield updateVersion( subNode )
node match {
case <root>{ ch @ _* }</root> => <root>{ updateElements( ch ) }</root>
case <subnode>{ ch @ _* }</subnode> => <subnode>{ updateElements( ch ) }</subnode>
case <version>{ contents }</version> => <version>2</version>
case other @ _ => other
}
}
It looks more compact (but is actually the same :) )
- I got rid of all the unnecessary brackets
- If a bracket is needed, it starts in the same line
- updateElements just defines a var and returns it, so I got rid of that and returned the result directly
if you want, you can get rid of the updateElements too. You want to apply the updateVersion to all the elements of the sequence. That's the map method. With that, you can rewrite the line
case <subnode>{ ch @ _* }</subnode> => <subnode>{ updateElements( ch ) }</subnode>
with
case <subnode>{ ch @ _* }</subnode> => <subnode>{ ch.map(updateVersion (_)) }</subnode>
As update version takes only 1 parameter I'm 99% sure you can omit it and write:
case <subnode>{ ch @ _* }</subnode> => <subnode>{ ch.map(updateVersion) }</subnode>
And end with:
def updateVersion( node : Node ) : Node = node match {
case <root>{ ch @ _* }</root> => <root>{ ch.map(updateVersion )}</root>
case <subnode>{ ch @ _* }</subnode> => <subnode>{ ch.map(updateVersion ) }</subnode>
case <version>{ contents }</version> => <version>2</version>
case other @ _ => other
}
What do you think?