If you want to parse XML in .NET, you have a lot of options to choose from. You can use XmlDocument to parse the XML into a DOM tree, you can use the XmlReader to write an efficient “pull” parser, or you can leverage some of the features provided with various serialization APIs.
Given the case where you have a fairly straightforward XML document (not too deep document tree, not too complex set of attributes and elements) that maps pretty well to your domain model, the serialization options is in my mind a good choice that requires little coding. Compared with this approach, using XmlDocument seems to be a bit of an overkill if you don’t need advanced traversal of the document, and writing a parser by hand using XmlReader seems to require quite a bit of coding.
So, given the following sample XML document, I will investigate the serialization options:
<countries>
<country>
<iso-3166-alpha-2-code>AF</iso-3166-alpha-2-code>
<name>Afghanistan</name>
</country>
<country>
<iso-3166-alpha-2-code>AX</iso-3166-alpha-2-code>
<name>Åland Islands</name>
</country>
<country>
<iso-3166-alpha-2-code>AL</iso-3166-alpha-2-code>
<name>Albania</name>
</country>
</countries>
Using System.Xml.XmlSerializer
The first option that came to mind, was to use the XmlSerializer object to deserialize the XML into C# (or VB for that matter) objects. It first requires that I annotate my object model in order to tell the serializer how to deserialize the XML:
[XmlRoot("countries")]
public class Countries
{
[XmlElement(ElementName="country")]
public Country[] countries;
}
public class Country
{
[XmlElement(ElementName = "name")]
public string Name;
[XmlElement(ElementName = "iso-3166-alpha-2-code")]
public string Code;
}
Then, I can use the serializer to deserialize the code:
string xml = ...;
XmlSerializer xmlSerializer = new XmlSerializer(typeof(Countries));
Countries c = xmlSerializer.Deserialize(new StringReader(xml)) as Countries;
Pretty sweet, heh? Definitely. However, this has some drawbacks. If I want my Country class to be a well designed domain object that follows good OO design principles, I probably would like to encapsulate my data. Furthermore, I might want to restrict the creation of such objects from other parts of the code. In order for XmlSerializer to create my object, it requires that my types are public and that all properties or fields to set are public as well. What to do if I want to enforce my objects to be immutable once handed off to other parts of the code?
Using System.Runtime.Serialization.DataContractSerializer
Luckily, the serialization API that come with Windows Communication Framework has some neat features that fit like a glove. When defining my data model, it does not require that the types, neither the properties nor fields to set are public. Actually, I can restrict access to the type, its default constructor, and any of the properties or fields that I want to be deserialized! w00t!
So, this is what the Country class will looks like:
[DataContract(Name="country", Namespace="")]
internal class Country : IExtensibleDataObject
{
private Country() { }
[DataMember(Name="name")]
public string Name { get; private set; }
[DataMember(Name = "iso-3166-alpha-2-code")]
public string Code { get; private set; }
public ExtensionDataObject ExtensionData { get; set; }
}
The XML file contains a list of countries, and luckily, we have the CollectionDataContractAttribute to denote an element that is a list of elements. It supports generics, so that we can define our class as a strongly typed list:
[CollectionDataContract(Name="countries", Namespace="")]
internal class Countries : List<Country>, IExtensibleDataObject
{
public ExtensionDataObject ExtensionData { get; set; }
}
And that’s it. Now we can deserialize:
string xml = ...;
DataContractSerializer ser = new DataContractSerializer(typeof(Countries));
using (StringReader stringReader = new StringReader(xml))
{
using (XmlReader xmlReader = XmlReader.Create(stringReader))
{
Countries countries = (Countries)ser.ReadObject(xmlReader);
}
}
Alternatively, our result could be typed as a list of countries:
IList<Country> countries = (IList<Country>)ser.ReadObject(xmlReader);
Note that there is a limitation in the latter method in that deserializing XML attributes is not supported. Thus, an XML document like the following would not work:
<countries>
<country iso-3166-alpha-2-code="AF">
<name>Afghanistan</name>
</country>
<country iso-3166-alpha-2-code="AX">
<name>Åland Islands</name>
</country>
<country iso-3166-alpha-2-code="AL">
<name>Albania</name>
</country>
</countries>
This will, however, work using the XmlSerializer.