Data Conversion Between XML and JSON (w/ Scala)
I don’t write technical posts often because I have had to dig through more than my fair share of irrelevant blog posts from 2007. However, every once in a while I run across a problem that has a solution so convoluted and inconvenient that I feel the need to put it in one place; so here we are.
I have been working in Scala for the past year or so (but am not an expert or particularly fond of it). If you haven’t touched Scala before, it’s very similar to Java but the syntax is not exactly the same (and also it’s a functional language, so you end up creating a lot of
vals). I modified a lot of Java snippets to get this working in Scala. It is not perfect Scala, if you have a better solution, feel free to share it.
Also, I’ve pulled these code snippets out of a much larger project from multiple files, you will run into issues if you try to just copy/paste them.
If you’re working with third party API’s that have been around for a while then there’s a pretty good chance that they’ll be returning XML when you’re using JSON. If you’re unfamiliar with both formats, let me try to explain why this is what’s been keeping me up for the past five nights.
Right off the bat you can see that they’re organized a little bit differently. XML uses tags, JSON uses brackets and semicolons. The examples above are extremely simple and can pretty easily be parsed by libraries like LiftWeb, no big deal.
Unfortunately, in real life, the examples are rarely so simple, your XML is probably going to look a little bit more like this
Suddenly there’s stuff inside the brackets, but not all of them (those are called attributes), some of the tags don’t have anything between them, and the XML is a lot less readable. The equivalent JSON might look like this:
You can see how the similarities between XML and the JSON models start to fall apart due to convention as the model gets more complicated. In XML you might have a variable data model (the types of data can change according to what data is available) but in JSON you will rarely run into a key that has a value that is occasionally a list, often a string, and sometimes just doesn’t exist, all on the same API call.
Parsing XML into JSON
For this first part we used scala.xml.NodeSeq to extrapolate the information we wanted and place it into objects accordingly.
- You can pull out nodes by using the
\followed by the node name
- You can pull out attributes by using
\@followed by the attribute name
After we figured out how to do it, this became simple and even enjoyable. scala.xml.NodeSeq allows us to walk down the XML tree structure, grab the exact text and attributes that we want, then reformulate them in JSON objects that we’ve defined. If the node is blank or doesn’t exist, it returns an empty string instead of a parsing error. You just have to make sure that in your pre-defined JSON objects that every field is an
Voila, problem of parsing the weird ambiguous XML structure has been solved.
Parsing JSON as XML
This is where it gets weird. Unfortunately, it seems like it’s a lot harder to make elegant code that reliably parses your complex JSON objects back into XML.
The Scala Elem type that’s found in
scala.xml._ allows you to create XML structures and mix in values in an incredibly simple way:
If you’re dealing with elements that have lists or fields that may or may not exist, then Elem isn’t going to cut it. You want something that can parse your JSON into XML with attributes and a minimum amount of typing on your part. A lot of the libraries will cleanly parse JSON objects into XML even if they have complex organizations, but it was a struggle to find a library that would also dynamically parse attributes.
This is where the Staxon library comes in (you can find GitHub documentation here). They have examples on their wiki for converting XML to JSON and JSON to XML so I won’t steal their thunder by copy and pasting their exact code here – but I will show you what we did.
Staxon solves the attribute issue by changing the way you name the keys in your JSON objects.
@Symbols denote a key that is an attribute for the containing key (so in the example below, if you had a list of jobs the XML would look like
<job order=2><title>Con... etc. etc.</job>
Unfortunately, Scala being the finicky beast that it is, you can’t use an @ symbol as the beginning of a key name in a JSON object. If you use single quotes (`) to escape the @ symbol your IDE will likely not give you any errors, but it will probably throw a runtime error. Our way around this was to add underscores ( _ ) in the model where we wanted the @ symbol to be, and then when we stringified the object we simply did a
replace.all('_', '@') to get the desired format.
We also modified the
Output streams (originally Java inputStream and outputStream) from the original Staxon documentation into
ByteArrayInputStream/ByteArrayOutputStream so we could pass in and parse out Strings instead of just printing to a file or the command line.
Disclaimer 2.0: To re-emphasize before I get 50 code reviews, this snippet is not code complete –
we declare implicit values of objects, translators, and jsonformatters with Spray in other files in our code.
The base of this function is usually intended to return the result of an API call, not to just transform an object (That’s where the Future()) comes in at the end.
The functionality of this snippet is spread out over at least 4-5 files and multiple functions
– I ordered it this way for simplicity in reading, not for efficiency.
There you go. It’s not the prettiest way to parse something with all of the transformations, but trust me when I say it is super effective
- Learn more about XML
- Learn more about JSON
- High level documentation for scala.xml._
- Documentation for Staxon
- Documentation for LiftWeb’s parsing library with net.liftweb.json
- Converting OutputStreams into ByteArrayOutputStream
Also, a large amount of credit goes to the technical lead on my project who did a lot of research and was the one who eventually found the Staxon library. When I say “we” in this article, the research that went into finding and implementing this solution was truly a team effort.