From joewiz at gmail.com Fri Mar 24 12:06:48 2017 From: joewiz at gmail.com (Joe Wicentowski) Date: Fri, 24 Mar 2017 15:06:48 -0400 Subject: [xquery-talk] XQuery Update Facility and unwanted whitespace In-Reply-To: <4FFBE117.70204@saxonica.com> References: <4FFBE117.70204@saxonica.com> Message-ID: Hi all, Have there been any further developments in the area of unwanted reformatting of entire documents after applying XQuery Update operations to just a portion of a document? I'm using oXygen 18.1 with Saxon-EE XQuery 9.6.0.7, with "Strip whitespaces" set to "None ("none")", XQuery 3.0 support enabled, and XQuery Update enabled. For example, lines 1-2 of the source document began as this: But after running the XQuery Update, these two lines are now merged onto a single line: As you can imagine, this wreaks havoc with diff tools, so I would like to find a way, if possible, to limit the scope of whitespace changes to just where the query applies updates. Apologies if this turns out to be a product-specific question, but I'm not quite sure how to distinguish in this question between XQuery Update, Saxon, and oXygen. Thank you, Joe On Tue, Jul 10, 2012 at 4:00 AM, Michael Kay wrote: > > Andrew has answered the whitespace questions. > > It's a Saxon (not an Oxygen) restriction that XQuery 3.0 and XQuery Update > can't currently be used together in the same query. It happened that way > because both are implemented as extensions to the "core" XQuery 1.0 parser, > built using subclassing. (Done that way partly because of the code > separation between different Saxon editions). We need to fix this > mechanism, which is becoming pretty unmanageable with the number of > different language dialects supported. Ideally, I suppose, we should make a > complete break and move to a bottom-up table driven parser; but XQuery > parsing is so fragile with the number of context-dependent decisions that > need to be made, it's a risky change to contemplate. > > Michael Kay > Saxonica > > > On 09/07/2012 19:45, Joe Wicentowski wrote: > >> Hi all, >> >> I'm having a problem with query I wrote that makes use of the XQuery >> Update Facility. The problem is that unwanted whitespace inserted >> into the results of my query. Here is my source XML (a TEI-like >> list), the query in question, and the output showing the unwanted >> whitespace: >> >> source.xml: >> ----------------- >> >> See Middle East >> Middle East > target="#d68">68 >> >> >> fix-ids.xq: >> -------------- >> let $doc := doc('source.xml') >> for $item-id at $count in $doc//item/@xml:id >> let $new-id := concat('in', $count) >> let $new-target := concat('#', $new-id) >> let $targets := $doc//ref[@target = concat('#', $item-id)]/@target >> return >> ( >> (: fix @xml:ids :) >> replace value of node $item-id with $new-id >> , >> (: fix @targets :) >> for $target in $targets >> return >> replace value of node $target with $new-target >> ) >> >> output: >> ---------- >> >> See Middle East >> >> Middle East 68 >> >> >> >> Note that while the query only modifies attribute values, the results >> of the query are somehow re-indented. (Specifically, in the source, >> there was no whitespace between and , but in the >> results, is on a new line. >> >> Is this a serialization issue? Is there a way for me to declare some >> options that will prevent the unwanted whitespace from being inserted? >> >> I'm not sure whether this is a general XQuery issue or an >> implementation-specific issue, so let me know if this isn't the right >> forum for this question. I'm using oXygen 13 in XQuery Debugger mode >> with Saxon EE-XQuery 9.3.0.5. >> >> (On a related note, I see that XQuery 3.0 has new support for >> serialization options -- >> http://www.w3.org/TR/xquery-30/#id-serialization -- but oXygen doesn't >> seem to allow combining XQuery 3.0 with XQuery Update Facility and >> Saxon EE. This forum post instructs users to disable XQuery 1.1/3.0 >> support in order to use XQUF: >> http://www.oxygenxml.com/forum/topic6615.html.) >> >> Thanks, >> Joe >> _______________________________________________ >> talk at x-query.com >> http://x-query.com/mailman/listinfo/talk >> >> > > _______________________________________________ > talk at x-query.com > http://x-query.com/mailman/listinfo/talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike at saxonica.com Sat Mar 25 05:14:13 2017 From: mike at saxonica.com (Michael Kay) Date: Sat, 25 Mar 2017 12:14:13 +0000 Subject: [xquery-talk] XQuery Update Facility and unwanted whitespace In-Reply-To: References: <4FFBE117.70204@saxonica.com> Message-ID: <7EEC3B31-6D52-4513-8E75-A88EA7F9BEE7@saxonica.com> Whitespace in certain places isn't reported by the XML parser to the XQuery processor, so there is no way the XQuery processor can preserve it. Examples are whitespace between the XML declaration and the first element node, and whitespace within a start or end tag. Other things that aren't reported by the parser (and therefore can't be retained) include the choice of single-vs-double quotes around attribute values, entity references, CDATA section boundaries, redundant namespace declarations, and the order of attributes within a start tag. Using textual diff tools on XML documents isn't really a viable strategy - you need to do the diff in a way that is XML-aware. One way is to canonicalize the two documents and compare their canonical forms. Canonicalizing takes a very similar view to XDM - though not 100% identical - as to what's significant in an XML document and what isn't. Michael Kay Saxonica