Sunday, July 6, 2008

In Response to the Easiest Java XML Binding. What About XStream?

Originally I asked the following question in this posting:

I have an XML document and I want to use that document to populate a corresponding set of Java objects. This is a commonly encountered scenario when working with Java, so what is the easiest method for Java XML Binding that requires the least amount of code?

What do I want? The 5 requirements of Java XML laziness.

I then went on to explain past methods of XML binding in Java, and what it was that I was exactly looking for. To summarize that endeavor, I was unable to locate something that fit my exact needs:

  1. I don't want to have to manually iterate through XML documents and manually map values.
  2. I don't want to have to use XSD files.
  3. I don't want to have to use libraries for binding compilers to use external binding definitions to map to Java classes.
  4. I don't want to have to modify existing XML documents so that I can use them as beans.
  5. I want to be able to have class and class field names that are different then their corresponding XML node names.

What is available?

I was aware that something had to exist that did this; I just couldn't find it. I searched the usual places such as sourcefoge.net, google, java.sun.com and so on, but everywhere I went I was bombarded with results that didn't fit my exact needs. After posting regarding this conundrum and my solution to it, which was to contribute to the problem of too many Java XML bindings by creating my own, several recommendations were made to me. The following is a list of Java XML bindings that have used and that were recommended to me:

  • JAXB - Requires Java Web Services Developer Pack v1.1 or later and the use of XSD's to bind the corresponding XML to Java interfaces (http://java.sun.com/developer/technicalArticles/WebServices/jaxb/).
  • JiBX - Open source project that uses binding definition documents to describe how XML is mapped to Java objects at binding runtime, where enhanced class files generated by the binding compiler build objects from an XML input document and can be outputted as XML documents (http://jibx.sourceforge.net/).
  • SAX/DOM/JAXP/Xerces-J - These are all types of Java XML parsers that require the manual handling of XML documents, meaning that you have to [in general] manually map the values from an XML document to the corresponding Java object (http://www.cafeconleche.org/books/xmljava/chapters/ch05.html).
  • Spring Beans - Uses the Inversion of Control container from the Spring Framework in order to bind values from special "bean" xml documents to Java classes. This is not useful for reading a specific XML document, but it is the binding that is of interest ; the automatic mapping of XML value to Java object field. (http://jvalentino.blogspot.com/2007/10/introduction-to-spring-framework.html).
  • EJB Beans - Enterprise Java Beans are a subject unto themselves, far to broad for discussion here. What is important though as with the inversion of control container from the Spring Framework, is the ability to bind values from XML documents to Java object fields (http://java.sun.com/javaee/5/docs/tutorial/doc/bnblr.html).
  • Castor - Uses marshalling and unmarshalling to go from XML to POJO and POJO to XML. The documentation is unfinished, but from what I can gather in order to go from XML to POJO and POJO to XML the POJO needs to implement serializable, and if the XML node names do not match class field names a binding file has to be used (http://www.castor.org/xml-framework.html).
  • Codehaus XFire - Another method for going from XML to POJO and POJO to XML, and requires the use of XSD files to do so (http://xfire.codehaus.org/Aegis+Binding).
  • XStream - Another method for going from XML to POJO and POJO to XML, but instead of having to use XSD files and other forms of external mappings it allows the use of annotations for aliasing (http://xstream.codehaus.org/).

What about XStream?

As it turns out XStream was exactly was I was looking for; it meets my 5 requirements of Java XML laziness. So how does it work?

Consider an RSS 2.0 document and what it represents:

<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>Lift Off News</title>
    <link>http://liftoff.msfc.nasa.gov/</link>
    <description>Liftoff to Space Exploration.</description>
    <language>en-us</language>
    <pubDate>Tue, 10 Jun 2003 04:00:00 GMT</pubDate>
    <lastBuildDate>Tue, 10 Jun 2003 09:41:01 GMT</lastBuildDate>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <generator>Weblog Editor 2.0</generator>
    <managingEditor>editor@example.com</managingEditor>
    <webMaster>webmaster@example.com</webMaster>
    <ttl>5</ttl>
 
    <item>
      <title>Star City</title>
      <link>http://liftoff.msfc.nasa.gov/news/2003/news-starcity.asp</link>
      <description>How do Americans get ready to work with Russians aboard the
        International Space Station? They take a crash course in culture, language
        and protocol at Russia's Star City.</description>
      <pubDate>Tue, 03 Jun 2003 09:39:21 GMT</pubDate>
      <guid>http://liftoff.msfc.nasa.gov/2003/06/03.html#item573</guid>
    </item>
 
    <item>
      <title>Space Exploration</title>
      <link>http://liftoff.msfc.nasa.gov/</link>
      <description>Sky watchers in Europe, Asia, and parts of Alaska and Canada
        will experience a partial eclipse of the Sun on Saturday, May 31st.</description>
      <pubDate>Fri, 30 May 2003 11:06:42 GMT</pubDate>
      <guid>http://liftoff.msfc.nasa.gov/2003/05/30.html#item572</guid>
    </item>
 
    <item>
      <title>The Engine That Does More</title>
      <link>http://liftoff.msfc.nasa.gov/news/2003/news-VASIMR.asp</link>
      <description>Before man travels to Mars, NASA hopes to design new engines
        that will let us fly through the Solar System more quickly.  The proposed
        VASIMR engine would do that.</description>
      <pubDate>Tue, 27 May 2003 08:37:32 GMT</pubDate>
      <guid>http://liftoff.msfc.nasa.gov/2003/05/27.html#item571</guid>
    </item>
 
    <item>
      <title>Astronauts' Dirty Laundry</title>
      <link>http://liftoff.msfc.nasa.gov/news/2003/news-laundry.asp</link>
      <description>Compared to earlier spacecraft, the International Space
        Station has many luxuries, but laundry facilities are not one of them.
        Instead, astronauts have other options.</description>
      <pubDate>Tue, 20 May 2003 08:56:02 GMT</pubDate>
      <guid>http://liftoff.msfc.nasa.gov/2003/05/20.html#item570</guid>
    </item>
  </channel>
</rss>

Document from http://en.wikipedia.org/wiki/RSS_(file_format)

An "rss" node has a "channel" node, a "channel" node has "item" nodes, and every node has their own properties and attributes. These relationships can then be represented in terms of objects using a class diagram (ignoring methods for getters and setters):

From this class diagram it is then possible to generate the corresponding Java classes, assuming that the class fields have the same names as their corresponding XML nodes. This can't always work though since class fields can be represented in XML as nodes or attributes of nodes, so it would be necessary to designate this in Java. You may also not want to have the Java class field names the same as the XML node names, so there would need to be a way to designate this as well. This is why there will always need to be some form of mapping, which is why XStream provides annotations and also methods for their service to specify aliasing at runtime.

Bad use of annotations?

With my previous posting where I used annotations in the same manner as XStream, there was discussion regarding whether this use of annotations was necessary. The answer is absolutely yes for two reasons:

  1. For when XML node names are not the same as class and class field names.
  2. In order to translate the representation of a class field as an XML node or XML node attribute.

Consider the following class and the following XML document:

public class Example {
 public String myValue;
 public String version;
}

<example ver="1.0">
 <my_value>value</my_value>
</example>

How would one map the XML to the class when the class name and the field names are different then the XML node names? While some XML documents may be able to have their class field names interfered based on similarities, types, and positions this would not work for all cases. This is one reason why annotations or other mechanisms are need to map between class fields and XML nodes.

Also consider when converting the class to XML, how would one determine whether a class field should be represented as an XML node or a node attribute? The following XML documents all could be used to represent the data in the Example class:

<example>
 <ver>1.0</ver>
 <my_value>value</my_value>
</example>

<example ver="1.0" my_value="value" />

<example my_value="value">
 <ver>1.0</ver>
</example>

How does XStream do what you need?

XStream allows the use of annotations or object aliasing methods to specify mappings and attributes. For example here is my previous RSS 2.0 POJP example marked up for XStream using annotations:

@XStreamAlias("rss")
public class Rss2 {
 @XStreamImplicit
 private List<Channel> channel;
 @XStreamAsAttribute
 private String version;
 
 //getters and setters go here...
}

@XStreamAlias("channel")
public class Channel {
 private String title;
 private String link;
 private String description;
 private String language;
 private String pubDate;
 private String lastBuildDate;
 private String docs;
 private String generator;
 private String managingEditor;
 private String webMaster;
 private int ttl;
 @XStreamImplicit
 private List<Item> item;
 
 //getters and setters go here...
}

@XStreamAlias("item")
public class Item {
 private String title;
 private String link;
 private String description;
 private String pubDate;
 private String guid;
 
 //getters and setters go here...
}

The following code can be used to populate an RSS object using XML:

XStream xstream = new XStream(new DomDriver());
xstream.processAnnotations(Rss2.class);

//Create an RSS object and populate it using an XML file
Rss2 rssA = new Rss2();
xstream.fromXML(new FileReader(new File("test/Rss2Test.xml")), rssA);

The following code can then be used to convert a POJO to XML:

String xml = xstream.toXML(rssA);

Why did you reinvent the wheel with your RE: JAXB project?

As I have said before I created a factory for binding XML to POJOs and back because I could not find anything at the time that suited my exact needs. To me it is really not that big of a deal, because it only took a single afternoon and only consisted of several hundred lines of code.

 

9 comments:

Chrigel said...

I was considering to mention XStream in your previous post, but I decided not to because of the fact that XStream considers itself to be a serialization tool:

"Q:Is XStream a data binding tool?
A: No. It is a serialization tool." (see xstream FAQ)

The main difference is, from my understanding, that in XStream the object is the source and not an XML schema.

Your requirements start with an XML document, and it *might* be that you will have troubles to use XStream. If not, I can highly recomment XStream, really easy to use, good performance and well documented

John Valentino said...

That is correct, I was not sticking to the true definition of "binding" as in the sense of JAXB and its use of XSD. What I was really after was serialization.

ybanrab said...

The other option that springs to mind is Commons Digester. It's read-only, so you can't write your objects back out as XML, but I've found it to be very flexible once you get your head around what it's doing. I had to read in a 4-level deep XML file into objects and it ran to about 14 lines of code.

http://commons.apache.org/digester/

Barny

Karl Garske said...

At last! An XML framework which allows for maximum laziness! I read once that this is a virtue amongst programmers :)

But seriously, use of annotations is brilliant. And, IMHO exactly what meta-data is all about. .NET uses similar techniques in XML serialization, but I've been having trouble tracking down anything noteworthy in Java.

zaeffi said...

Why does Jaxb not fit your needs? As I understand it, XML-Schema is just really needed if you want to create your java classes, if you want to create them yourself, just add the Annotations, just like in your own Binder/XStream.

With JAXB Annotations your classes would looke like this:


@XmlRootElement(name = "rss")
public class Rss2 {
   private List<Channel> channel;
   private String version;

   @XmlAttribute
   public String getVersion() {
      return version;
   }

   // other getter/setter/logic


@XmlType(name = "channel")
public class Channel {
   private String title;
   private String link;
   private String description;
   private String language;
   private String pubDate;
   private String lastBuildDate;
   private String docs;
   private String generator;
   private String managingEditor;
   private String webMaster;
   private int ttl;
   private List<Item> item;

   // getter/setter/logic

@XmlType(name = "item")
public class Item {
   private String title;
   private String link;
   private String description;
   private String pubDate;
   private String guid;

   // getter/setter/logic


Just the transformation is not that nice to look at, but with two generic helper methods even this can be solved (roughly):


public static <E extends Object> E fromXML(File xmlFile, Class<E> rootElement) throws JAXBException, IOException {
   JAXBContext context = JAXBContext.newInstance(rootElement);
   Unmarshaller um = context.createUnmarshaller();
   um.setEventHandler(new javax.xml.bind.helpers.DefaultValidationEventHandler());
   FileInputStream fin = new FileInputStream(xmlFile);
   return (E) um.unmarshal(fin);
}

public static void toXML(Object root, File xmlFile) throws JAXBException, IOException {
   JAXBContext context = JAXBContext.newInstance(root.getClass());
   Marshaller marshaller = context.createMarshaller();
   marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
   FileOutputStream fos = new FileOutputStream(xmlFile);
   marshaller.marshal(root, fos);
   fos.close();
}


with these you can do the transformations with 1 line:


Rss2 rss2 = Helper.load(new File("Rss2Test.xml"), Rss2.class);
Helper.save(rss2, new File("Rss2Test.xml"));


Or am I missing something here?

John Valentino said...

I had only ever used JAXB with XSD, and thought that it was required for any martialing; guess I should have looked into it more. In any case if you use your JAXB method you have to write some helpers, and if you use XStream you don't have to.

Basil Vandegriend said...

This was a really helpful post. I had the same idea that parsing XML should be just a matter of throwing some annotations on my java objects, like Hibernate persistence, but the 'official' Java XML APIs don't seem to mention this case anywhere in their documentation.

@zaeffi Thanks for the tip on how to do this with JAXB. I'm probably going to try that.

Basil Vandegriend said...

I wrote a post about using JAXB to do XML parsing, inspired by this article and zaeffi's comment:
Simple XML Parsing using JAXB

dontcare said...

You may also want to look at vtd-xml, the latest and most advanced XML processing
API

http://vtd-xml.sf.net