[xquery-talk] Removing douplicate elements
Michael Kay
mhk at mhk.me.uk
Fri Aug 12 00:14:36 PDT 2005
The reason this can't use tail-call optimization is that the result of the
recursive call is combined with another value using the "and" operator - so
the recursive call isn't a tail call. It's quite possible to write a
non-recursive version of notIn, for example using "every $x in $list
satisfies not(deep-equal($pivot, $x)"
However, your algorithm is executing deep-equal n^2/2 times, which is going
to give horrible performance as the files get bigger. Improving it isn't
very easy. I think I would write a function that computes a checksum, such
that two elements that are deep-equal will always have the same checksum;
then sort by the checksum, and only use deep-equals to compare elements with
the same checksum. But choosing a good checksum algorithm probably depends
on knowing something about your data.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: talk-bounces at xquery.com
> [mailto:talk-bounces at xquery.com] On Behalf Of EXTERNAL Kruse
> Peter (Praktikant;CR/AEA1-Si)
> Sent: 10 August 2005 13:37
> To: talk at xquery.com
> Subject: [xquery-talk] Removing douplicate elements
>
> Hi folks
>
> I have an xml file with douplicate elements like this
>
> <connect>
> <a start="luigi" target="mario"/>
> <b start="luigi" target="princess"/>
> <a start="princess" target="mario"/>
> <a start="joshi" target="luigi"/>
> <a start="luigi" target="mario"/>
> <b start="luigi" target="mario"/>
> </connect>
>
> I want to remove all douplicates, so my result would look like:
>
> <connect>
> <a start="luigi" target="mario"/>
> <b start="luigi" target="princess"/>
> <a start="princess" target="mario"/>
> <a start="joshi" target="luigi"/>
> <b start="luigi" target="mario"/>
> </connect>
>
>
> I managed to develope the following:
> -------------------
> declare function local:notIn($pivot as node(),$list as node()*) as
> xs:boolean {
> if (count($list)=0) then
> true()
> else
> (not( deep-equal($pivot,$list[1]))
> and local:notIn($pivot, $list[position()>1]) )
> };
>
> for $c at $cpos in $connections
> where local:notIn($c,$connections[position()<$c])
> return $c
> -------------------
>
> With small files, this works, but with larger files, i get
>
> Error on line 20 of file:/.../connections.xq:
> Too many nested function calls. May be due to infinite recursion.
> Query processing failed: Run-time errors were reported
>
>
>
> Any hints ?
>
> Thanks a lot
> Peter
>
> _______________________________________________
> talk at xquery.com
> http://xquery.com/mailman/listinfo/talk
>
More information about the talk
mailing list