Data Conversion Between XML and JSON (w/ Scala)

Data Conversion Between XML and JSON (w/ Scala)

I don’t write technical posts often because I have had to dig through more than my fair share of irrelevant blog posts from 2007. However, every once in a while I run across a problem that has a  solution so convoluted and inconvenient that I feel the need to put it in one place; so here we are.

Disclaimers:
I have been working in Scala for the past year or so (but am not an expert or particularly fond of it). If you haven’t touched Scala before, it’s very similar to Java but the syntax is not exactly the same (and also it’s a functional language, so you end up creating a lot of vals). I modified a lot of Java snippets to get this working in Scala. It is not perfect Scala, if you have a better solution, feel free to share it.
Also, I’ve pulled these code snippets out of a much larger project from multiple files, you will run into issues if you try to just copy/paste them.

The Problem

If you’ve ever had to handle data, then you know the biggest headache is in the structure of it. JSON and XML are both ways to describe and structure how data is organized. XML came to be around 1996, JSON was created in the early 2000s. In my experience, JSON is generally seen as the more “modern” and usable approach – this is the data format used most often in Javascript – but some people are still diehard XML fans.

If you’re working with third party API’s that have been around for a while then there’s a pretty good chance that they’ll be returning XML when you’re using JSON. If you’re unfamiliar with both formats, let me try to explain why this is what’s been keeping me up for the past five nights.

XML

<root>
	<person>
		<name>Jones</name>
		<age>23</age>
		<occupation>Consultant</occupation>
	</person>
</root>

JSON

{
	person: {
		"name": "Jones",
		"age": 23
		"occupation": "Consultant"
	}
}

Right off the bat you can see that they’re organized a little bit differently. XML uses tags, JSON uses brackets and semicolons. The examples above are extremely simple and can pretty easily be parsed by libraries like LiftWeb, no big deal.

Unfortunately, in real life, the examples are rarely so simple, your XML is probably going to look a little bit more like this

<root>
	<People>
		<Person id=1>
			<PersonalDetails>
				<Name>Jones</Name>
				<Age>23</Age>
				<Locations>
					<Location reason="Born">Alaska</Location>
					<Location reason="Work">Texas</Location>
					<Location reason="Study">Moscow</Location>
					<Location reason="Study">Beijing</Location>
				</Locations>
			</PersonalDetails>
			<WorkDetails>
				<Job>
					<JobTitle>Consultant</JobTitle>
					<Company>Credera</Company>
					<HireDate>2016</HireDate>
					<EndDate></EndDate>
					<Skills>
						<Skill expertise="1">Scala</Skill>
						<Skill expertise="3">Javascript</Skill>
						<Skill>AngularJS</Skill>
					</Skills>
				</Job>
				<Job>
					<JobTitle>Personal Assistant</JobTitle>
					<Company>SMU</Company>
					<HireDate>2014</HireDate>
					<EndDate>2016</EndDate>
					<Skills>
						<Skill>Dewey Decimal System</Skill>
					</Skills>
				</Job>
			</WorkDetails>
		</Person>
	</People>
</root>
				
				
				

Suddenly there’s stuff inside the brackets, but not all of them (those are called attributes), some of the tags don’t have anything between them, and the XML is a lot less readable. The equivalent JSON might look like this:

{ "people": [
	{
		"id": 1,
		"personalDetails": {
			"name": "Jones",
			"age": 23,
			"locations": [
				{"reason": "Born", "location": "Alaska"},
				{"reason": "Work", "location": "Texas"},
				{"reason": "Study", "location": "Moscow"},
				{"reason": "Study", "location": "Beijing"}
			]
		},
		"workDetails": {
			"jobs": [
				{
					"jobTitle" : "Consultant",
					"company" : "Credera",
					"hireDate" : 2016,
					"endDate" : undefined,
					"skills": [
						{"expertise": 1, "skill": "Scala"},
						{"expertise": 3, "skill": "Javascript"},
						{"expertise": undefined, "skill": "AngularJS"},
					]
				},
				{ 
					"jobTitle" : "Consultant", 
					"company" : "Credera", 
					"hireDate" : 2016, 
					"endDate" : undefined,
					"skills": [ 
					    {"expertise": 1, "skill": "Scala"}, 
						{"expertise": 3, "skill": "Javascript"}, 
						{"expertise": undefined, "skill": "AngularJS"}, 
					] 
				}
			
			]
		}
	}
]}

You can see how the similarities between XML and the JSON models start to fall apart due to convention as the model gets more complicated. In XML you might have a variable data model (the types of data can change according to what data is available) but in JSON you will rarely run into a key that has a value that is occasionally a list, often a string, and sometimes just doesn’t exist, all on the same API call.

Parsing XML into JSON

For this first part we used scala.xml.NodeSeq to extrapolate the information we wanted and place it into objects accordingly.

  • You can pull out nodes by using the \ followed by the node name
  • You can pull out attributes by using \@ followed by the attribute name
 def toJSONObject(xmlObject: NodeSeq) : List[Person] = {
     val listOfPeople = new ListBuffer[Person]
     val people = xmlObject \ "People"
     
     people.map { person =>
       val personId = person \@ "id"
       val personalDetails = person \ "PersonalDetails"
	   val personName = personalDetails \ "Name"
	   val personAge = personDetails \ "Age"
	   
	   val workDetails = person \ "WorkDetails"
	   val jobs = workDetails \ "Jobs"
	   
	   val jobList = new ListBuffer[Job]
	   jobs.map { job =>
	   		val title = job \ "Title"
		    val company = job \ "Company"
	        jobList += Job(jobTitle = title, companyName = company)
	   }

       listOfPeople += Person(id  = personId.text, name = personName.text, age = personAge.text, jobs = jobList.toList )
       
     }
     listOfPeople.toList
  }

After we figured out how to do it, this became simple and even enjoyable. scala.xml.NodeSeq allows us to walk down the XML tree structure, grab the exact text and attributes that we want, then reformulate them in JSON objects that we’ve defined. If the node is blank or doesn’t exist, it returns an empty string instead of a parsing error. You just have to make sure that in your pre-defined JSON objects that every field is an Option[].

Voila, problem of parsing the weird ambiguous XML structure has been solved.

Parsing JSON as XML

This is where it gets weird. Unfortunately, it seems like it’s a lot harder to make elegant code that reliably parses your complex JSON objects back into XML.

The Scala Elem type that’s found in scala.xml._ allows you to create XML structures and mix in values in an incredibly simple way:

val person = Person(id = 1, name = "Jones", age = 23, occupation = "Consultant")

val xmlObject = 
	<root>
		<people>
			<person>
				<id>person.id</id>
				<personaldetails>
					<name>person.name</name>
					<age>person.age</age>
				</personaldetails>
				<workdetails>
					<title>person.occupation</title>
				</workdetails>
			</person>
		</people>
	</root>
				

If you’re dealing with elements that have lists or fields that may or may not exist, then Elem isn’t going to cut it. You want something that can parse your JSON into XML with attributes and a minimum amount of typing on your part. A lot of the libraries will cleanly parse JSON objects into XML even if they have complex organizations, but it was a struggle to find a library that would also dynamically parse attributes.

Staxon

This is where the Staxon library comes in (you can find GitHub documentation here). They have examples on their wiki for converting XML to JSON and JSON to XML so I won’t steal their thunder by copy and pasting their exact code here – but I will show you what we did.

Staxon solves the attribute issue by changing the way you name the keys in your JSON objects. @Symbols denote a key that is an attribute for the containing key (so in the example below, if you had a list of jobs the XML would look like <job order=2><title>Con... etc. etc.</job>

{
	"person": {
		"@id" : 1,
		"name" : "Jones",
		"job" : 
		[{
			"@order" : 2,
			"title" : "Consultant",
			"company" : "Credera" },
		 {
			 "@order" : 1, 
			 "title" : "UX Consultant", 
			 "company" : "New Economic School of Moscow" }
		 }
		
		]
	}
}

Unfortunately, Scala being the finicky beast that it is, you can’t use an @ symbol as the beginning of a key name in a JSON object. If you use single quotes (`) to escape the @ symbol your IDE will likely not give you any errors, but it will probably throw a runtime error. Our way around this was to add underscores ( _ ) in the model where we wanted the @ symbol to be, and then when we stringified the object we simply did a replace.all('_', '@') to get the desired format.

We also modified the Input and Output streams (originally Java inputStream and outputStream) from the original Staxon documentation into ByteArrayInputStream/ByteArrayOutputStream so we could pass in and parse out Strings instead of just printing to a file or the command line.

Disclaimer 2.0: To re-emphasize before I get 50 code reviews, this snippet is not code complete –
we declare implicit values of objects, translators, and jsonformatters with Spray in other files in our code.
The base of this function is usually intended to return the result of an API call, not to just transform an object (That’s where the Future()) comes in at the end.
The functionality of this snippet is spread out over at least 4-5 files and multiple functions
– I ordered it this way for simplicity in reading, not for efficiency.
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;

import de.odysseus.staxon.json.JsonXMLConfig;
import de.odysseus.staxon.json.JsonXMLConfigBuilder;
import de.odysseus.staxon.json.JsonXMLInputFactory;
import de.odysseus.staxon.xml.util.PrettyXMLEventWriter;

import java.io.ByteArrayInputStream
import java.io.ByteArrayOutputStream
//This may not be the complete list you need ^ so don't hate me if you still have to import some other libraries

def editPerson(person: Person): Future[Unit] = {
   
	//This toPerson() is a different transformational function where we add in default attribute values like the namespace, not included in this snippet 
    val formattedPerson = toPerson(person)
	
	/* EDIT: Because this has been mentioned in the comments
 	* We put this object into a JSON format - This JSON -> String -> XML can (and should) be put into a separate modularized function, 
	* I'm laying it out this way so you can see the linear process and not have to jump between functions
	* We use implicit values in order to get the "toJson" to work (case class Person() ) etc. 
	* 
	*/
    val jsonPerson = formattedPerson.toJson
    
    //We stringify the JSON format and replace all the _ with @ signs to indicate an attribute
    val stringPerson = jsonPerson.toString().replaceAll("_", "@")
		
	//The input is established as a ByteArrayInputStream (this is so it works with the Staxon methods)
    val input =  new java.io.ByteArrayInputStream(stringPerson.getBytes)
	
	//We send it to a translator that parses it to XML
    val requestBody = toXml(input)
		
	//We have to add the content type to the httpEntity before sending it off
    val httpEntity = HttpEntity.apply(MediaTypes.`application/xml`, requestBody)
		
	//Attach it to the request and get the result	
    val request = Put("/api/call?action=edit", httpEntity).withHeaders()
    val result = pipeline(request)
    
   Future()
  }


//This is a modified version of what is on the Staxon GitHub to allow for Stringification
def toXml(json: ByteArrayInputStream): String = {
    
    val config = new JsonXMLConfigBuilder().multiplePI(false).build();
    val output = new ByteArrayOutputStream();
		try {
			/*
			 * Create reader (JSON).
			 */
			val reader = new JsonXMLInputFactory(config).createXMLEventReader(json);
			
			/*
			 * Create writer (XML).
			 */
			val writer = XMLOutputFactory.newInstance().createXMLEventWriter(output);
			val prettyWriter = new PrettyXMLEventWriter(writer); // format output
			
			/*
			 * Copy events from reader to writer.
			 */
			prettyWriter.add(reader);
			
			/*
			 * Close reader/writer.
			 */
			reader.close();
			writer.close();
			val finalOutput = output.toString()
			finalOutput
		} finally {
			/*
			 * As per StAX specification, XMLEventReader/Writer.close() doesn't close
			 * the underlying stream.
			 */
			
			json.close();
			output.close();	
		}
  }
Copying JSON to XML via StAX Event API (Modified)

There you go. It’s not the prettiest way to parse something with all of the transformations, but trust me when I say it is super effective

Helpful References

Also, a large amount of credit goes to the technical lead on my project who did a lot of research and was the one who eventually found the Staxon library. When I say “we” in this article, the research that went into finding and implementing this solution was truly a team effort.



2 thoughts on “Data Conversion Between XML and JSON (w/ Scala)”

  • Few things.

    First, there is a mistake on your first json example. The structure does not match with you xml. Person should not be a list on that case.

    Second, I don’t think you should be using xml literals in your scala code since theh are being deprecated.

    Also, you have a fundamental problem with the design. If you now want to implement a new conversion between a different representation (let’s call it bin) you will have to do xml to bin, bin to xml AND json to bin and bin to json. This increases as you add more representations. You could easily solve this by abstracting over typeclasses.

    First, you need to define a data structure that represents your concept such as person.

    case class Person(name: String, agr: Int)

    And then you will have a type class to convert from/to person.

    trait PersonEncoder[A] {

    def from(a: A): Person
    def to(person: Person): A
    }

    From here is as simple as creating instances of the type class and use them.

    For instance. From xml to json, which is your example.

    object Person{
    Implict val xmlEncoder = new PersonEncoder[Xml] …

    Implicit val jsonEncoder = nee PersonEncoder[Json] …

    }

    val json = Person.from(xml).to()

    Of course you will need to have the instances implicitly available.

    The last thing is that you should really try to remove mutations in your code. ListBuffer? Could you use recursion instead?

    There is also the function editPerson. It should return a new edited person instead of mutating that one you pass. Second, thag function is not being executed async, despite the fact if should. You need to do

    def editPerson(…) = Future {….}

    Instead of returning a Future in the last line. If you change it, please make sure you don’t modify the passed person, return a new one with the edits.

    Hope you find useful this code review.

  • I appreciate your comment! The JSON isn’t supposed to be exact, it’s just an example of structure. I’ll fix it though because I’m sure you’re not the only one that will bother 🙂

    As per my disclaimer – this isn’t code complete and was done in the middle of development of a larger project. Don’t worry, we do define implicit data structures and these functions are split over multiple files.

    “Person” is an oversimplified example in this case. Part of the issue is that there is not a direct correlation between the XML and JSON models and variable names so we have to do a translation between them regardless. The easiest way we have found is to do a translation between the object models and then implicitly convert it to JSON and then xml. I’ll look back over my examples to make sure I explained that correctly as there should be no need for double conversion.

    You’re right about recursion. There is probably a more Scala-y way to do it, but if there is I’m not aware/comfortable with it yet.

    The editPerson return type is also a hold over from our original code. Normally we’d return the result of the request, in this particular case we are not and are still determining error handling. There is a future in our method definition 🙂

Leave a Reply


%d bloggers like this: