From gfourny at inf.ethz.ch Fri Dec 20 06:51:32 2019 From: gfourny at inf.ethz.ch (Ghislain Fourny) Date: Fri, 20 Dec 2019 14:51:32 +0000 Subject: [xquery-talk] Rumble 1.4.2 Willow Oak Message-ID: Dear all, Right on time for Christmas, I am happy to share news with you on the latest release of the free and open source Rumble, 1.4.2 Willow Oak. Rumble is a JSONiq engine with which you can query nested, heterogeneous data (e.g., JSON) on Spark, but it carefully hides Spark's RDDs and DataFrames from the user for a productive experience. Since the last announcement on this list (Rumble 1.1), the following is new: 1. Types Type expressions are all supported: cast as, castable as, treat as, instance of, typeswitch. Item types are checked in parallel if the input sequence is big. Rumble supports new types (you will recognize standard XML Schema types): date, time, dateTime, hexBinary, base64Binary, duration, and more. Grouping and ordering on these new types is supported (even in parallel, with DataFrames below the hood). 2. Functions User-defined functions are supported. This means you can define your own functions. Recursion works out of the box. declare function fibonacci($i) { if($i le 2) then 1 else fibonacci($i - 1) + fibonacci($i - 2) }; for $i in 1 to 20 return fibonacci($i) Higher-order functions are also supported (this follows the XQuery 3.0 standard). Functions can be passed as values to expressions and other functions and dynamically called. And they also get automatically serialized, shipped to the Spark cluster, deserialized and called if you build big sequences of functions. let $x := function($x as integer) as integer { $x * $x } return $x(4) Type-checking is made on all function parameters and returned values (if types are provided), and sequence types are automatically checked in parallel if sequences are large. 3. Parquet We started adding other formats that have a similar data model to JSON, for example with parquet-file(), you can open... Parquet files. This will nicely map it to a sequence of objects that can then be queried in parallel. More formats will follow. It is also possible, for convenience, to open small, local JSON files spread over multiple lines with json-doc(). (json-file() requires one object per line and is meant for the parallelization over large files on HDFS, S3 or local drive). 4. Bugfixes and enhancements Many small things that our students found have been fixed: variables bound by an outer FLWOR is visible in inner FLWOR expressions as well. We throw more user-friendly exceptions if you nest a big FLWOR inside a big FLWOR (which, for obvious reasons, will "break" on the cluster). The FLWOR count clause is more stable (that one was not easy to get to work on top of DataFrames). 5. Predicates position() and last() are supported in predicates. And predicates are executed in parallel on big sequences. It is very easy to get a subsequence from a big sequence (this otherwise requires more effort to do in Java or Scala) json-file("file.json")[1] json-file("file.json")[position() ge 10 and position() le last() - 20] 6. Counting optimizations Rumble auto-detects when a non-grouping variable is only counted, and will spontaneously get rid of all the items early, to only keep the count. This significantly improves performance of such queries: for $object in json-file("file.json") group by $country := $object.country return { "country" : $country, "count" : count($object) } And finally, for those interested in the nitty gritty, we uploaded a paper here: https://arxiv.org/pdf/1910.11582.pdf Enjoy! Kind regards and happy slide into 2020, Ghislain From gfourny at inf.ethz.ch Fri Dec 20 06:54:12 2019 From: gfourny at inf.ethz.ch (Ghislain Fourny) Date: Fri, 20 Dec 2019 14:54:12 +0000 Subject: [xquery-talk] Rumble 1.4.2 Willow Oak In-Reply-To: References: Message-ID: <1F9FEBB6-5BDA-425B-9639-4371CBAD49F3@inf.ethz.ch> ... and the address to download and try out: http://rumbledb.org/ Kind regards Ghislain From mixich.andreas at gmail.com Sat Dec 21 10:14:42 2019 From: mixich.andreas at gmail.com (Andreas Mixich) Date: Sat, 21 Dec 2019 19:14:42 +0100 Subject: [xquery-talk] Building a tree from sequence of maps Message-ID: Hi, I feel like I try to get a hold on a piece of wet soap with this... Background: Atom Syndication has an extension[1], which allows threading of entries. These entries are ordered in a flat sequence, one by one. As a result we end up with an Atom feed, that has a bunch of entries, where each entry could have a reference to the ID of another entry, which would then be it's parent. No nesting is done. A simplified input could look like this: declare variable $local:example := let $xml := The task I want to accomplish is to create an output *tree* of *nested* sections, resembling the natural flow of replies:
One of the many queries I tried is: declare function local:rec($data) { if (empty($data)) then () else ( let $current := head($data) let $children := tail($data)[@refid = $current/@id] return (
{ $current/* (: , prof:dump("current: " || $current/@id/data() || " children: " || $children/@id/data() => string-join()) :) , for $child in $children return local:rec($children) }
, local:rec(tail($data)) ) ) }; { local:rec($local:example/item) } Of course, this has not yet any logic, to keep out the already processed items (besides other issues). When I tried that, however, by removing them from the return sequence, I found no way to break out of scope and have that modified return sequence go back to the next recursion. Previous example results in this, btw.:
I can't believe, that there is no super easy way to do it. Any help would be greatly appreciated! -- Minden j?t, all the best, Alles Gute, Andreas Mixich -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike at saxonica.com Sat Dec 21 10:46:43 2019 From: mike at saxonica.com (Michael Kay) Date: Sat, 21 Dec 2019 18:46:43 +0000 Subject: [xquery-talk] Building a tree from sequence of maps In-Reply-To: References: Message-ID: Start with a function that gets the children of an item: declare variable $children := function($item) {return $xml / item[@refid = $item / @id ]}; Decide where to start: declare variable $root := $xml / item[1]; Now process the items recursively: declare function local:process-item($item, $get-children) {
{ $item / (@*, $get-children($item) / process-item(.)) }
}; and put it together like this: local:process-item($root, $children); I've deliberately written it this way using XQuery 3.1 higher-order functions to keep the recursion logic separate from the details of how you find the logical children of an item. But in XQuery 1.0 the same logic would work using a fixed function instead of a dynamic one. A version that detects cycles in the data is a little bit trickier, but still quite doable. Michael Kay Saxonica > On 21 Dec 2019, at 18:14, Andreas Mixich wrote: > > Hi, > > I feel like I try to get a hold on a piece of wet soap with this... > > Background: Atom Syndication has an extension[1], which allows threading of entries. These entries are ordered in a flat sequence, one by one. > As a result we end up with an Atom feed, that has a bunch of entries, where each entry could have a reference to the ID of another entry, which would then be it's parent. > No nesting is done. > > A simplified input could look like this: > > declare variable $local:example := > let $xml := > > > > > > > > > The task I want to accomplish is to create an output tree of nested sections, resembling the natural flow of replies: > > >
>
>
>
>
>
>
>
>
> > > One of the many queries I tried is: > > declare function local:rec($data) { > if (empty($data)) > then () > else ( > let $current := head($data) > let $children := tail($data)[@refid = $current/@id] > return ( >
> { > $current/* > (: , prof:dump("current: " || $current/@id/data() || " children: " || $children/@id/data() => string-join()) > :) > , for $child in $children > return local:rec($children) > } >
> , local:rec(tail($data)) > ) > ) > }; > > > { local:rec($local:example/item) } > > > Of course, this has not yet any logic, to keep out the already processed items (besides other issues). > When I tried that, however, by removing them from the return sequence, I found no way to break out > of scope and have that modified return sequence go back to the next recursion. > > Previous example results in this, btw.: > >
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> > I can't believe, that there is no super easy way to do it. Any help would be greatly appreciated! > > -- > Minden j?t, all the best, Alles Gute, > Andreas Mixich > _______________________________________________ > talk at x-query.com > http://x-query.com/mailman/listinfo/talk -------------- next part -------------- An HTML attachment was scrubbed... URL: From mixich.andreas at gmail.com Sat Dec 21 11:47:43 2019 From: mixich.andreas at gmail.com (Andreas Mixich) Date: Sat, 21 Dec 2019 20:47:43 +0100 Subject: [xquery-talk] Building a tree from sequence of maps In-Reply-To: References: Message-ID: Hello, thank you for the example. I am totally fine with the HOF style. I have access to XQ3.1 (BaseX and Saxon-PE/EE with oXygen) There is still a thing, that I need to solve, namely how to get aligned, that's why I will follow up to this in the next days, since I have drained out for now. :-) Thanks to Mr. Hager, was well. -- Minden j?t, all the best, Alles Gute, Andreas Mixich -------------- next part -------------- An HTML attachment was scrubbed... URL: From mixich.andreas at gmail.com Mon Dec 30 22:46:37 2019 From: mixich.andreas at gmail.com (Andreas Mixich) Date: Tue, 31 Dec 2019 07:46:37 +0100 Subject: [xquery-talk] Building a tree from sequence of maps In-Reply-To: References: Message-ID: Staying away from an issue for a few days can be quite clearing! When I revisited the task, I saw it right on! Or so I think... This is the solution I came up with: declare variable $local:xml := ; declare function local:get-parents($item) { $local:xml/item[not(@refid != $item/@id)] }; declare function local:get-children($item) { $local:xml/item[@refid = $item/@id] }; declare function local:process-item($item) {
{ let $t := $item/(@*, local:get-children($item)/local:process-item(.)) return $t }
}; let $root := local:get-parents($local:xml/item) for $item in $root return local:process-item($item) which produces:
and that seems to match the case. Thank you. -- Minden j?t, all the best, Alles Gute, Andreas Mixich -------------- next part -------------- An HTML attachment was scrubbed... URL: