[xquery-talk] (no subject)

fatma helmy fatmahelmy2000 at yahoo.com
Mon Apr 17 00:46:59 PDT 2006


I need your consultation on optimizing this code since
i am using stylus studio, professional edition

i run this code to produce statistics on xml file with
size 4M. the programmed finished the job after 30
minutes now i supply it with xml file with size 11M
it did not stop. the code is as follow

1.	declare function local:pathOfNode($node)
2.	{if(empty($node/..)) then ""  else
concat(local:pathOfNode($node/..), "/",
local-name($node))};
3.	let $j:= doc("book_sample.xml")
4.	let $paths := for $n in  $j//* return
local:pathOfNode($n)
5.	let $childpaths:= (for $item in $paths where
count(tokenize(substring-after(string($item),
"/"),"/")) >1 return $item)
6.	for $p in distinct-values($childpaths)
7.	let $toks:= tokenize(string($p),"/")
8.	let $papa:= string-join(subsequence($toks, 1,
count($toks) - 1), "/")
9.	let $var:=substring-after(string($p),"/")
10.	let $leafs
:=$j//text()[normalize-space()][string-join(ancestor-or-self::element()/name(),'/')
eq $var]   
11.	return 
12.	<STATISTICS>
13.	<PATH>
14.	{string($p)}
15.	</PATH>
16.	<RATIO>
17.	{string( round( count($childpaths[.=$p]) div
count($paths[.=$papa]) * 100 ) )}
18.	</RATIO>
19.	{for $val in distinct-values($leafs)
20.	return <value-per-path
value='{normalize-space($val)}'
21.	count='{count($leafs[. eq
normalize-space($val)])}'/>} 
22.	</STATISTICS>

this code produces all paths and then calculate the
ratio of node frequency relative to its parent
frequency.

the question now, is there any unecessary code the
delays the performance to that extent.

how to enhance my code to produce paths only with
nodes whose ratio is greater than certain value, to
prune infrequent paths from the start and not to go
further in them?



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


More information about the talk mailing list