[xquery-talk] SQL Server 2005

Sun Jan 22 10:59:07 PST 2006

On Jan 21, 2006, at 11:55 PM, Ronald Bourret wrote:

> Frank Cohen wrote:
>
>> Where I have a problem is with complex XML documents like those   
>> created using UBL for ebXML solutions. When a service or  
>> application  receives an XML document containing hundreds-to- 
>> thousands of  elements, lots of nesting, and many different schema  
>> versions then I  think its time to look at adding an XML database  
>> to the datacenter.
>
> This is an interesting problem.
>
> One assumes that the business applications involved are already  
> based on relational databases, so what happens when you stop  
> supplying those applications with their data? Do you rewrite the  
> applications to use native XML technology? Build a relational  
> wrapper over the data, in effect shredding it at the query level  
> instead of the storage level? Or are these simply brand new  
> applications built from the ground up?

General Motors commissioned a study with me to answer these  
questions. They wanted to know if the existing Java tools were  
appropriate for the ebXML applications they intended to build.

In GM's case they formed a standards body with the other automotive  
manufacturers to define a schema around automotive retailing (STAR.)  
Among other things the schema can represent a request for a purchase  
order (GPO.) Without any data in the elements the GPO turns into a  
7,500 character string - to give you an idea of the complexity here.  
The GPO can get you a windshield wiper (about 10K) or it can get a  
Suburban SUV (about 10 Mbytes.)

The results showed that the Java app servers had problems dealing  
with the complexity (scalability problems from DOM approaches and out- 
of-memory exceptions.) The more complex the data the worse things get.

> Do you rewrite the applications to use native XML technology?

If your system is already using a relational approach then I propose  
putting an XQuery and native XML DB in the mid-tier of a service to  
mitigate the complexity. You can do all sorts of cool things in the  
mid-tier: shredding, transformation, caching, policies.

> Build a relational wrapper over the data, in effect shredding it at  
> the query level instead of the storage level?

The app developer is always going to have more semantic knowledge of  
the data than a db and write cleaner, more efficient code. So it  
makes sense to shred in the query level. Also, most of the XQuery  
implementations I have seen have extensions to make Java calls or  
JDBC calls. So why not do it at the query level?

> Or are these simply brand new applications built from the ground up?

It took the better part of the 80's and 90's to get this far with  
relational approaches. It's likely to take another 20 years to get  
this XML thing down. :-)

>
> (The one argument you've made so far that would strongly push me  
> into the native camp is many schema versions, which seem to be more  
> painful in the relational world than the native XML world, although  
> still painful nonetheless.)
>
> Out of curiousity, what is the nature of the documents? In  
> particular, how deeply are they nested (excluding wrapper elements  
> that wouldn't map to relational structures)? Do they contain  
> repeating high-level structures, such as a document containing  
> multiple sales orders, which would be easily split into many  
> smaller documents? And how much of the data simply provides context  
> and doesn't need to be stored in the database? For example, a sales  
> order would probably include customer information, but there's a  
> good bet this is already in the database.
>

I'll post the three use cases that I think most developers confront  
in a separate message.

-Frank

> -- Ron
>
> _______________________________________________
> talk at xquery.com
> http://xquery.com/mailman/listinfo/talk
>