Skip to content
Tags

,

XSLT in Groovy: Quick and Dirty

February 8, 2013

As soon as you start writing Extensible Stylesheet Language Transformations (XSLT) you’ll start to wonder if there is an alternative to the coarse language. Even after you’ve written many XSLT files and the sharpness of the angled brackets and other ugly XML syntax has dulled, you may still wish for the relaxed style of indentations and semi-colons in your favorite programming language. A natural question with a popularly sought answer for XSLT programmers is “is there a better syntax to be had that matches the utility of XSLT?”

Do some Google research and you’ll find that this is a common question and one that programmers have answered in ways ranging from disgust to gusto with everything from pointless rants on StackOverflow to constructive solutions in the form of new languages and frameworks. Despite the questioning and the attempts at innovation, XSLT remains a powerful tool for its intended purpose despite its apparent flaws. Groovy, however, does offer some features that replace common XSLT tasks and makes for smoother reading too. This article will demonstrate those features of XSLT that can easily be replaced by Groovy.

Where to Begin?

The Groovy language is not a perfect replacement for XSLT. XSLT is a useful language that is well suited to its specific purpose and, despite the desire of many to replace it, has not been uprooted by any alternatives. So what I’m showing you in this article is not a way to discard XSLT but a way to reduce the amount of XSLT you have to write if you do simple transformations. Use Groovy only as long as it makes your work easier. If you find it difficult to get a transformation to work in Groovy, look again at what XSLT offers. It will probably have a feature that solves your complex problem in a way that Groovy does not.

Admonishment administered, let’s roll this article out as a set of questions and answers. If you’ve done even a few XSLT programs before, you’ll have a grasp on the basics of that language. If you’ve done even a few Groovy programs, you’ll have a grasp on the basics of that language too. And here is the sweet spot for Groovy. It does the common, basic work of an XSLT program but with Groovy’s flair for unceremonious readability and conciseness. The rest of the article unceremoniously lays out the common XSLT tasks to be done and concisely demonstrates how to do them with Groovy.

What is the “XSLT Processor” in Groovy?

XSLT is an interpreted language that requires a processing engine to figure out how to interpret the code and perform the transformation. This processing includes opening the input XML file, performing the encoded transformations and then outputting the result to the console or another file. Listing WAT-1 shows an Ant script that invokes an XSLT transformation using the popular Saxon XSLT processor. There are other ways to call the processor, such as through a command line call, but you are always dealing with your chosen XSLT processor. So where is the XSLT processor in Groovy?

The answer is the Groovy language, mostly, with support from some classes in the standard Groovy API. The front end of the XSLT processor requires using the XMLSlurper and GPathResult classes. Using XMLSlurper to parse an XML file results in a GPathResult object, which is a convenient way of navigating the XML structure. The back end is the MarkupBuilder class, which is designed to make outputting XML and HTML easy on the eye for programmers.

<?xml version="1.0" encoding="UTF-8"?>
    <project name="XSLTFromAntExample">
        <path id="saxon9.classpath" location="D:\\saxon9he.jar"/>

        <target name="transform" depends="clean">
             <xslt destdir="output"
                  extension=".xml"
                  failOnTransformationError="false"
                  processor="trax"
                  style="ArticleTransform.xsl"
                  useImplicitFileset="false"
                  classpathref="saxon9.classpath"
            >
                <outputproperty name="method" value="xml"/>
                <outputproperty name="indent" value="yes"/>
                <fileset dir="input"/>
                <factory name="net.sf.saxon.TransformerFactoryImpl"/>
            </xslt>
        </target>
    </project>

Listing WAT-1: XSLT transformation called from an Ant script

Why Don’t I Show You An Example?

As an example, Listing WAT-2 provides the steps required to parse the XML file in Listing WAT-3 and obtain a GPathResult object.

def xml = new XmlSlurper().parse(new File("example.xml"))

Listing WAT-2: Obtaining a GPathResult object

<?xml version="1.0" encoding="UTF-8"?>
    <article>
         <front>
             <journal-meta>
                 <publisher>
                    <publisher-name name-type="full">Nick's Academic Journal</publisher-name>
                </publisher>
            </journal-meta>
            <article-meta>
                <pub-date pub-type="ppub">
                    <day>9</day>
                    <month>6</month>
                    <year>2010</year>
                </pub-date>
                <pub-date pub-type="epub">
                    <day>1</day>
                    <month>6</month>
                    <year>2010</year>
                </pub-date>
                <volume>39</volume>
                <fpage>265</fpage>
                <lpage>289</lpage>
            </article-meta>
        </front>
        <body>
        ...
        </body>
    </article>

Listing WAT-3: A sample of an XML file for an academic journal article

Oops, I guess that was just one step. So now you have a representation of the entire XML structure in memory, held in the xml variable. With convenient and powerful GPath expressions you can walk the XML structure as easily as with XPath and use them to perform your transform.

The MarkupBuilder class is the output side of the transform processing in Groovy. With this class you can almost literally write out your output XML with Groovy in between to fill in the dynamic bits. Listing WAT-4 expands on Listing WAT-2 and generates a simple XML structure.

import groovy.xml.MarkupBuilder
def xml = new XmlSlurper().parse(new File("nfjs-example.xml"))
def builder = new MarkupBuilder()
builder.ArticleSet {
  Article {
    Journal {
      Volume(xml.front."article-meta".volume)
      Issue(xml.front."article-meta".issue)
    }
  }
}

Listing WAT-4: A simple example of using the MarkupBuilder class

Listing WAT-5 shows the XML that is output from running the code in Listing WAT-4.

<?xml version="1.0" encoding="UTF-8"?>
    <ArticleSet>
         <Article>
            <Journal>
                <Volume>1</Volume>
                <Issue>2</Issue>
            </Journal>
        </Article>
    </ArticleSet>

Listing WAT-5: XML output from the previous listing’s code

I could talk about how this works with closures and all, but that would just confuse the issue. It’s plain to see from Listing WAT-4 both how to build your XML output from the parsed XML and why it is more attractive to look at than XSLT. However, I will explain one thing. Notice that to get the value for the Volume and Issue tags, I had to access methods on the xml object (of type GPathResult). That is a GPath. xml holds the in-memory XML representation parsed from the example.xml file and I use dot notation to “walk” the XML structure. I could get the same value in the XPath expression /article/front/article-meta/volume. The one difference to notice here is that the GPath always begins with a child of the root tag, which is article in this case. That’s why article is not in the GPath expression.

I’ve covered enough ground already that you could go off and do some transformations in Groovy. There are a couple of things to watch out for though, so I’ll cover those before finishing.

Don’t You Have to Replace XPath Too?

That’s true. XPath is a part of XSLT. In response to the desire to handle XML gracefully, Groovy has kindly provided us with the GPathResult class. GPathResult implements the Groovy concept of a GPath, which substitutes XPath with an object-oriented syntax. The basic idea is that an XML document is turned into an object-model dynamically after the XML has been parsed by the XMLSlurper class. This translation has a similar result to the one an Object-Relational Mapping tool has when converting a SQL result set to a hierarchy of objects.

How Do I Get the Value of an Attribute?

Listing WAT-4 covers how to get the value out of any XML tag. Getting the value of an attribute of a tag is just as easy. For example, to get the value of the `name-type` attribute of the `publisher-name` tag from the XML in Listing WAT-3, you would need the simple Groovy

def pubNameType = xml.front."journal-meta".publisher."publisher-name"[@"name-type"]

Listing WAT-6: Obtaining the value of an XML element’s attribute

How Do I Reference an XML Element by Attribute Value?

This is one area where Groovy’s solution is slightly less elegant than the one that XSLT provides. To access an element with a specific attribute value you have to use the find method of the GPathResult class. For example, the XPath expression to access only pub-date tags from the XML in Listing WAT-3 with an attribute of pub-type=”ppub” is /article/front/article-meta/pub-date[@pub-type=’ppub’]. Groovy requires a syntax that is Groovy-esque but harder to read. Listing WAT-7 shows the Groovy equivalent of the prior XPath expression.

xml.front."article-meta"."pub-date".find {it.@"pub-type" == 'ppub'}

Listing WAT-7: The Groovy way to access XML elements by attribute value

If you’re not as worried about the exact path within the XML, you can use the shorter XPath expression //pub-date[@pub-type=’ppub’]. The equivalent in Groovy is shown in Listing WAT-8.

xml.**.find {it.@"pub-type" == 'ppub'}

Listing WAT-8: The Groovy way to access XML elements by attribute value with a broader path

The fact that I feel compelled to place the Groovy code from this section in listings while the XPath remains in line with the text proves that XPath wins on conciseness and readability here.

Did You Notice the XML Tags Containing Dashes?

If you happen to have an XML file with tag names containing dashes, you’ll have a little trouble referencing them in your GPath expressions in Groovy. Plainly, an expression like xml.front.article-meta won’t work. As I’ve done in the code in this article, just wrap the dashed tag names in quotes and Groovy will compensate accordingly: xml.front.”article-meta”.

How Do I Make an XSLT Template in Groovy?

Groovy–remember that it is a general purpose language–does not have a direct equivalent to the XSLT template command. Since the XSLT template is one of its strongest and most common features, we should find a good replacement. It turns out that a simple loop can step in for the template. In XSLT, a template is activated whenever its “match” XPath expression returns a result. So you could, for instance, transform all of the <pub-date> tags in the input XML into <PubDate> tags in the output with the XSLT template shown in Listing WAT-9.

<xsl:template match="pub-date">
        <Year><xsl:value-of select="year"/></Year>
        <Month><xsl:value-of select="month"/></Month>
        <Day><xsl:value-of select="day"/></Day>
</xsl:template>

Listing WAT-9: An example of an XSLT template

Listing WAT-10 adds to the previous Groovy example by adding the <PubDate> tags as children of the <Journal> tag. The resulting XML is provided in Listing WAT-11.

... 
<Journal>
  <Volume>1</Volume>
  <Issue>2</Issue>
  PubDate {
    xml.front."article-meta"."pub-date".each {
      Year(it.year.text())
      Month(it.month.text())
      Day(it.day.text())
    }
  }
</Journal>
...

Listing WAT-10: A foreach loop is the replacement for an XSLT template

 <Journal>
         <Volume>1</Volume>
         <Issue>2</Issue>
         <PubDate>
             <Year>2010</Year>
             <Month>6</Month>
             <Day>1</Day>
         </PubDate>
         <PubDate>
             <Year>2010</Year>
             <Month>6</Month>
             <Day>9</Day>
         </PubDate>
</Journal>

Listing WAT-11: XML output that includes the `PubDate` tag

Where to End?

We began with some stern finger-shaking about how far you can go when substituting Groovy for XSLT, so let’s end with some more. In general, XSLT provides a thoughtful method of processing XML that is specific to that task. XSLT as a language supports that methodology. On the other hand, Groovy is a general-purpose programming language that has some convenient XML processing features built-in. You cannot always find direct correlations between XSLT features and Groovy. However, if you comprehend each system fundamentally, you can benefit by utilizing the terseness of Groovy to perform some of your XSLT tasks.

Advertisements
Leave a Comment

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: