More on XML parsing using XmlSerializer

February 03, 2010

...

In my previous post, I talked about how we could use System.Xml.Serialization.XmlSerializer and System.Runtime.Serialization.DataContractSerializer to parse XML into an object tree. I pointed out that DataContractSerializer in my mind has some advantages in that types, properties, and members to use with the deserialization do not need to be public. On the downside, DataContractSerializer puts a quite limiting constraint on the XML format in that it cannot parse XML attributes. This is as far as I know, an absolute constraint that cannot easily be circumvented. Thus, regrettably, if we are not in control of the XML format, DataContractSerializer is sometimes useless.

In those situations, we can still use XmlSerializer. In order to achieve the same encapsulation with XmlSerializer, we have to adjust our model a bit. Here’s one suggestion on how to do this:

My approach to this is inspired by Josh Bloch‘s Builder pattern. The idea is changing the classes used for deserialization from being domain objects to being builder objects that build domain objects. This has another advantage in that our domain objects are not “polluted” with attributes and interfaces related to deserialization and are 100% plain old CLR objects (POCOs). So, lets first do a change our “deserialization” class (formerly ‘Country’) to a builder object, like so:

public class CountryBuilder
{
   [XmlElement(ElementName = "name")]
   public string Name;

   [XmlElement(ElementName = "iso-3166-alpha-2-code")]
   public string Code;

   public Country Build()
   {
      return new Country(Name, Code);
   }
}

Note here that our builder object has a Build() method which returns the domain object ‘Country’. This is the object that we will pass on to our clients. The Country class now represents our domain object:

public class Country
{
   private readonly string _name, _code;
            
   internal Country(string name, string code)
   {
      _name = name;
      _code = code;
   }

   public string Name { get { return _name; } }
   public string Code { get { return _code; } }
}

We have now restricted access to the creation of Country objects in that the constructor is internal, and it is also immutable (cannot change state once created) though making its fields readonly. It only exposes getters for its internal state. We can then do the same to our list of countries. The builder object for countries would look like this:

[XmlRoot("countries")]
public class Countries
{
   [XmlElement(ElementName="country")]
   public CountryBuilder[] countries;

   public IEnumerable<Country> Build()
   {
      return countries.Select<CountryBuilder , Country>(x => x.Build());
   }
}

The code for doing the serialization will then look like this:

string xml = ...;
XmlSerializer xmlSerializer = new XmlSerializer(typeof(CountriesBuilder));
var builder = xmlSerializer.Deserialize(new StringReader(inputXml)) as CountriesBuilder;
IEnumerable<Country> cs = builder.Build();

What we have achieved now is that we now can control the accessibility and encapsulation of our domain model. The builder objects are still public to anyone, but that really does not matter much in my mind. Thus, a relatively swift parsing of XML of various formats into a well designed object model.