XML in SQL Server 2005

Michael Rys on XML in SQL Server 2005

Part two of an interview with Dr Michael Rys, program manager for XML in SQL Server at Microsoft. To jump back to part 1, click here.

Use cases for XML data

I asked Rys for some examples of when you might want to use the XML features of SQL Server 2005.

"There are three or 4 major scenarios that I see. The first, which is still probably about the 60% scenario, is that you use XML on the wire. You have your messages that might be SOAP or XML over HTTP, and you need to unpack the data. So you need a mechanism to get at the data, shred it out, and put it back into relational form; or going the other way, to take your relational data and put it into XML form.

"The interesting part is that often all or part of that message doesn’t really fit the relational model that well. Then people want to manage their XML messages as XML documents. That might simply be because they are doing routing, and don’t want to decompose and recompose the data; or it might be because the XML structure is really not relational.

"A third scenario is where people have actual business documents in XML form for example from Microsoft Office, Infopath, or Open Office, or Adobe documents. Many of these documents have useful structural information, and it is useful to use XPath expressions on them.

"Another scenario that I see is the ad-hoc use of XML, where you use XML to model something because the relational model doesn’t give you the capability. Examples of that are open-ended property bags, where you have an XML column where you store name-value pairs which are so instance-specific that it doesn’t make sense to put them into a column format. The rest of the structured information is clustered together using columns. You have your normal structural aspect and then you have an XML column that contains the varying properties.

"A further use is to store programming objects, or at least the state of programming objects, in the database. It’s very hard to represent the state of lots of varying objects in the database if you’re a programmer and want to change the schema all the time. Previously you might have put them into a binary blob, and then couldn’t access any of the properties in the database without getting the whole object back.

"You could use the CLR [Common Language Runtime] support that we have now in SQL Server 2005 to do that, but to be able to do that you have to know beforehand exactly what objects you want to put into the database, and you will not be able to manage heterogeneous sets of such objects. The CLR, at least in this release, also has a slight limitation which the XML data type doesn’t have. CLR objects can only be 8K. If people decide to store their objects in an XML data type, they still can access the properties using XQuery, while not having to bother about very strict object registration requirements. You just put the state of the object in the database, and keep the programming logic on the mid-tier."

XQuery and XPath Standards

The XQuery 1.0 and XPath 2.0 standards are critically important to SQL Server, but will they be finalised in time for its release? "No," says Rys. "We are going to ship before the standard gets its final recommendation. In April 2005 the W3C moved into what’s called the “last call” phase. This phase will take up until the summer. There is a summer break in August, and then the next phase is the Candidate Recommendation. During this phase the different vendors and software developers are invited to build prototypes based on the specification, and to make sure that individual features that the specification describes are interoperable, in the sense that they return the same results for the same syntax. That phase will probably take at least 4 to 5 months. Then you will have the inevitable discussions inside the working group whether certain tests should be included or not, and what the expected result for a certain test should be, and to determine what are the exit criteria out of the Candidate Recommendation phase. I would expect this to take 5 months. Assuming that we might get into Candidate Recommendation by October, then getting to the Proposed Recommendation phase will probably be late Q1 2006."

This being the case, the question is how SQL Server will be updated to achieve full conformity? "We tried to scope the XQuery support for this release to a subset of the specification that we felt was stable. Many of the more controversial features of the spec we have not implemented. We have also not implemented many of the functions, especially all the date-time functions. The goal that I have with the XQuery support in SQL Server is that we are shipping the subset with SQL Server 2005 now, and for the next release the standard will be fixed and finalised. We will still provide backward compatibility support for people that have been building on SQL Server 2005, in case something which we have supported changes."

Performance

Finally I asked Rys about the performance implications of using an XML data type versus shredding XML data into columns. "The general recommendation today is if your data fits the relational framework, and you want to have relational type queries over the data, then it is more efficient to shred it into relational form, and run your queries there. Even though we have indexes on our XML structures, the query expressions are internally more complex than if you query relational content. In an XML data type, even if you have fairly structured data, it is still such that you are not guaranteed that a specific element is necessarily always at the same place. So it is a bit more effort to execute those queries.

"The sweet spot for XML is not replacing relational processing. The sweet spot is to enable you to query data that you either had a hard time shredding before, or that you couldn’t query at all before."

Links

Some useful links are below:

XQuery at the W3C
Michael Rys Blog
Wikipedia entry on database normalization
Book: A first look at SQL Server 2005 for developers by Bob Beauchemin
More books on SQL Server 2005

Tim Anderson's ITWriting