[xquery-talk] performance gain due to using xquery

Michael Kay mhk at mhk.me.uk
Thu May 25 16:14:06 PDT 2006


I tried this on the 100K and 1M versions of the XMark database and got
run-times of 20s and 270s respectively. 

Your "where" clause seems to have the effect of eliminating paths of length
1. So I changed the query to use "for $n in $j/*//*" which avoids collecting
paths of length 1 and this removes the need for the "where" clause. This
reduced the run times dramatically, to 0.7 and 6.6 seconds respectively. 

Your function local:pathOfNode() puts a "/" at the start of each path, which
you then carefully remove whenever you use the path. I changed the function
to avoid adding the "/".

You're using two different expressions to get the path to a node: a
recursive function for element nodes, string-join(ancestor-or-self) for text
nodes. The string-join appears to be faster. I changed both to use this: run
time for the 1M file is now 5.49 seconds.

I tried moving the string-join() call out of the user-defined function and
putting it inline in place of the function call. Surprisingly, this
increased the execution time to 11 seconds. Putting the call for text nodes
inline is fine, but not the call for element nodes. This has me completely
baffled at the moment: it's something I need to examine more closely. It
shows how important it is when tuning to try different things and make
measurements to see which performs best.

Your logic here:

{for $val in  distinct-values( $leafs)
let $kval := normalize-space($val)
return <value-per-path value='{$kval}' 
count='{count($leafs[. eq  $kval ])}'/>}

looks faulty, because you're comparing the normalized value of a text node
with the unnormalized value. I changed it to remove the normalize-space().
Runtime for the 1M file is now 4.87 seconds.

I haven't tried writing an XSLT equivalent. I suspect the performance will
not be very different - though there's always room for surprises.

I suspect there's still quite a bit of room for further tuning on this
query: there's still quite a bit of redundancy. My current version of the
query is:

declare function local:pathOfNode($node)
{ string-join($node/ancestor-or-self::*/local-name(), '/') };
let $j:= . 

let $paths := for $n in $j/*//* return local:pathOfNode($n)

for $p in distinct-values($paths) 
 
let $papa:= replace($p,'/[^/]*$','')
let $leafs :=$j//text()[normalize-space()]
[string-join(../ancestor::*/local-name(), '/') eq $p ] 

return 
<STATISTICS>
  <PATH> {string($p)} </PATH> 
  <RATIO> {let $c := count($paths[.=$papa]) return
           string( round( count($paths[.=$p]) div (if ($c=0) then 1 else $c)
* 100 ) )}</RATIO>
{for $val in  distinct-values($leafs) return
 <value-per-path value='{$val}' 
count='{count($leafs[. eq  $val ])}'/>} 

</STATISTICS> 

Michael Kay
http://www.saxonica.com/


> -----Original Message-----
> From: talk-bounces at xquery.com 
> [mailto:talk-bounces at xquery.com] On Behalf Of fatma helmy
> Sent: 24 May 2006 18:21
> To: talk at xquery.com
> Subject: [xquery-talk] performance gain due to using xquery
> 
> Dear all
> thanks to comments of michael key, my xquery is enhanced and 
> i ran it on saxon , for a file of size 12 M, it took 14 
> minutes to finish that is my optimized query
> 
> declare function local:pathOfNode($node)
> {if(empty($node/..)) then "" else
> concat(local:pathOfNode($node/..), "/",
> local-name($node))};
> let $j:= doc("try.XML") 
> 
> let $paths := for $n in $j//* return
> local:pathOfNode($n) 
> 
> for $p in distinct-values($paths) 
>  
> let $papa:= replace($p,'/[^/]*$','')
> let $leafs :=$j//text()[normalize-space()]
> [string-join(ancestor-or-self::element()/name(),'/')
> eq substring-after(string($p),"/") ] 
> 
> where count
> (tokenize(substring-after(string($p), "/"),"/")) >1 return 
> <STATISTICS> <PATH> {string($p)} </PATH> <RATIO> {string( 
> round( count($paths[.=$p]) div
> count($paths[.=$papa]) * 100 ) )}
> </RATIO>
> {for $val in  distinct-values( $leafs)
> let $kval := normalize-space($val)
> return <value-per-path value='{$kval}' 
> count='{count($leafs[. eq  $kval ])}'/>} </STATISTICS> 
> 
> now i have the following questions:-
> if i implemented the same function using xslt or by using api 
> from java or .net would i get performance gain more than 
> executing it on xquery engine?
> if the xquery was the best, is that due to its features as 
> xquery in general or is it due to saxon.
> 
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection 
> around http://mail.yahoo.com 
> _______________________________________________
> talk at xquery.com
> http://xquery.com/mailman/listinfo/talk



More information about the talk mailing list