[xquery-talk] The value does not conform to the lexical constraints defined for the xs:anyURI type

Benito van der Zander benito at benibela.de
Wed Dec 19 03:13:00 PST 2012


Hi,

btw. has anyone a regular expression matching exactly the allowed 
anyURIs of XSD 1.0?

I tried to make one by translating the BNF in RFC 2396 and 2732 to 
regex, by having a regex for every token, and substituting them 
everywhere the token is used in the BNF.

But the resulting regex:

((((([a-zA-Z][a-zA-Z0-9+-.]*:)?((//(((([a-zA-Z0-9-_.!~*''();:&=+$,]|%[a-fA-F0-9]{2})*@)?((([a-zA-Z0-9]([-a-zA-Z0-9]*[a-zA-Z0-9])?.)*[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?.?)|([0-9]+(.[0-9]+){3})|\[(([0-9a-fA-F]{1,4}(:[0-9a-fA-F]{1,4})*)?|([0-9a-fA-F]{1,4}(:[0-9a-fA-F]{1,4})*)?::([0-9a-fA-F]{1,4}(:[0-9a-fA-F]{1,4})*)?)(:[0-9]{1,3}(.[0-9]{1,3}){3})?\])(:[0-9]*)?)?|([a-zA-Z0-9-_.!~*''()$,;:@&=+]|%[a-fA-F0-9]{2})+)(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*)*)?)|(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*)*)))|(([a-zA-Z0-9-_.!~*''();@&=+$,]|%[a-fA-F0-9]{2})+(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*)*)?))([?]([;/?:@&=+$,\][a-zA-Z0-9-_.!~*''()]|%[a-fA-F0-9]{2})*)?)|([a-zA-Z][a-zA-Z0-9+-.]*:([a-zA-Z0-9-_.!~*''();?:@&=+$,]|%[a-fA-F0-9]{2})(([;/?:@&=+$,\][a-zA-Z0-9-_.!~*''()]|%[a-fA-F0-9]{2}))*))?(#(([;/?:@&=+$,\][a-zA-Z0-9-_.!~*''()]|%[a-fA-F0-9]{2})*))?

is just horrible.
(and it might not even work so well with unicode)

Benito


On 12/19/2012 10:31 AM, Michael Kay wrote:
> The validation rules for xs:anyURI in the XSD 1.0 specification are 
> notoriously troublesome, and it is not surprising that different 
> implementors interpret them differently.
>
> This is what XSD 1.0 says:
>
> <quote>
> The ·lexical space· of anyURI is finite-length character sequences 
> which, when the algorithm defined in Section 5.4 of [XML Linking 
> Language] is applied to them, result in strings which are legal URIs 
> according to [RFC 2396], as amended by [RFC 2732].
>
> Note:  Spaces are, in principle, allowed in the ·lexical space· of 
> anyURI, however, their use is highly discouraged (unless they are 
> encoded by %20).
> </quote>
>
> The "Note" here suggests that Sedna is wrong to reject the value (it 
> also suggests that your query is wrong to supply it, but that you 
> should be able to get away with it).
>
> The "algorithm" referred to in this rule is basically the escaping of 
> special characters such as space.
>
> Note that in XSD 1.1, the spec gives up trying to define what's valid 
> in an xs:anyURI and what isn't - all strings are now valid in the 
> lexical space of xs:anyURI.
>
> Michael Kay
> Saxonica
>
> On 19/12/2012 09:11, Robby Pelssers wrote:
>> Hi all,
>>
>>
>> I tested following Xquery with Sedna and Zorba:
>>
>> declare function local:getPipUri($id as xs:string) as xs:anyURI {
>>     xs:anyURI(concat("http://www.nxp.com/pip/", $id))
>> };
>>
>> local:getPipUri("CX24483 14LZ")
>>
>>
>> Sedna throws an exception:
>> 2012/12/19 10:07:09 database query/update failed (SEDNA Message: 
>> ERROR FORG0001
>> Invalid value for cast/constructor.
>> Details: The value does not conform to the lexical constraints 
>> defined for the xs:anyURI type.
>> Query line: 6, column:4
>> )
>>
>>
>> http://www.zorba-xquery.com/html/demo happy returns 
>> "http://www.nxp.com/pip/CX24483 14LZ"
>>
>> So how does the xs:anyURI cast work? Is the developer supposed to 
>> encode the String before passing it to xs:anyURI or is the anyURI 
>> function supposed to do this?
>>
>> Thx in advance,
>> Robby
>>
>> _______________________________________________
>> talk at x-query.com
>> http://x-query.com/mailman/listinfo/talk
>>
>
> _______________________________________________
> talk at x-query.com
> http://x-query.com/mailman/listinfo/talk
>





More information about the talk mailing list