[xquery-talk] The value does not conform to the lexical constraints defined for the xs:anyURI type

Michael Kay mike at saxonica.com
Wed Dec 19 03:41:17 PST 2012


Michael Sperberg-McQueen has defined types that match
different flavours of URI in

http://www.w3.org/2011/04/XMLSchema/TypeLibrary-URI-RFC3986.xsd

and

http://www.w3.org/2011/04/XMLSchema/TypeLibrary-IRI-RFC3987.xsd

To see the way these complex regular expressions are constructed, view
these documents at the raw XML level using (for example) curl.

Michael Kay
Saxonica


On 19/12/2012 11:13, Benito van der Zander wrote:
> Hi,
>
> btw. has anyone a regular expression matching exactly the allowed 
> anyURIs of XSD 1.0?
>
> I tried to make one by translating the BNF in RFC 2396 and 2732 to 
> regex, by having a regex for every token, and substituting them 
> everywhere the token is used in the BNF.
>
> But the resulting regex:
>
> ((((([a-zA-Z][a-zA-Z0-9+-.]*:)?((//(((([a-zA-Z0-9-_.!~*''();:&=+$,]|%[a-fA-F0-9]{2})*@)?((([a-zA-Z0-9]([-a-zA-Z0-9]*[a-zA-Z0-9])?.)*[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?.?)|([0-9]+(.[0-9]+){3})|\[(([0-9a-fA-F]{1,4}(:[0-9a-fA-F]{1,4})*)?|([0-9a-fA-F]{1,4}(:[0-9a-fA-F]{1,4})*)?::([0-9a-fA-F]{1,4}(:[0-9a-fA-F]{1,4})*)?)(:[0-9]{1,3}(.[0-9]{1,3}){3})?\])(:[0-9]*)?)?|([a-zA-Z0-9-_.!~*''()$,;:@&=+]|%[a-fA-F0-9]{2})+)(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*)*)?)|(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*)*)))|(([a-zA-Z0-9-_.!~*''();@&=+$,]|%[a-fA-F0-9]{2})+(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*)*)?))([?]([;/?:@&=+$,\][a-zA-Z0-9-_.!~*''()]|%[a-fA-F0-9]{2})*)?)|([a-zA-Z][a-zA-Z0-9+-.]*:([a-zA-Z0-9-_.!~*''();?:@&=+$,]|%[a-fA-F0-9]{2})(([;/?:@&=+$,\][a-zA-Z0-9-_.!~*''()]|%[a-fA-F0-9]{2}))*))?(#(([;/?:@&=+$,\][a-zA-Z0-9-_.!~*''()]|%[a-fA-F0-9]{2})*))? 
>
>
> is just horrible.
> (and it might not even work so well with unicode)
>
> Benito
>
>
> On 12/19/2012 10:31 AM, Michael Kay wrote:
>> The validation rules for xs:anyURI in the XSD 1.0 specification are 
>> notoriously troublesome, and it is not surprising that different 
>> implementors interpret them differently.
>>
>> This is what XSD 1.0 says:
>>
>> <quote>
>> The ·lexical space· of anyURI is finite-length character sequences 
>> which, when the algorithm defined in Section 5.4 of [XML Linking 
>> Language] is applied to them, result in strings which are legal URIs 
>> according to [RFC 2396], as amended by [RFC 2732].
>>
>> Note:  Spaces are, in principle, allowed in the ·lexical space· of 
>> anyURI, however, their use is highly discouraged (unless they are 
>> encoded by %20).
>> </quote>
>>
>> The "Note" here suggests that Sedna is wrong to reject the value (it 
>> also suggests that your query is wrong to supply it, but that you 
>> should be able to get away with it).
>>
>> The "algorithm" referred to in this rule is basically the escaping of 
>> special characters such as space.
>>
>> Note that in XSD 1.1, the spec gives up trying to define what's valid 
>> in an xs:anyURI and what isn't - all strings are now valid in the 
>> lexical space of xs:anyURI.
>>
>> Michael Kay
>> Saxonica
>>
>> On 19/12/2012 09:11, Robby Pelssers wrote:
>>> Hi all,
>>>
>>>
>>> I tested following Xquery with Sedna and Zorba:
>>>
>>> declare function local:getPipUri($id as xs:string) as xs:anyURI {
>>>     xs:anyURI(concat("http://www.nxp.com/pip/", $id))
>>> };
>>>
>>> local:getPipUri("CX24483 14LZ")
>>>
>>>
>>> Sedna throws an exception:
>>> 2012/12/19 10:07:09 database query/update failed (SEDNA Message: 
>>> ERROR FORG0001
>>> Invalid value for cast/constructor.
>>> Details: The value does not conform to the lexical constraints 
>>> defined for the xs:anyURI type.
>>> Query line: 6, column:4
>>> )
>>>
>>>
>>> http://www.zorba-xquery.com/html/demo happy returns 
>>> "http://www.nxp.com/pip/CX24483 14LZ"
>>>
>>> So how does the xs:anyURI cast work? Is the developer supposed to 
>>> encode the String before passing it to xs:anyURI or is the anyURI 
>>> function supposed to do this?
>>>
>>> Thx in advance,
>>> Robby
>>>
>>> _______________________________________________
>>> talk at x-query.com
>>> http://x-query.com/mailman/listinfo/talk
>>>
>>
>> _______________________________________________
>> talk at x-query.com
>> http://x-query.com/mailman/listinfo/talk
>>
>
>
>
> _______________________________________________
> talk at x-query.com
> http://x-query.com/mailman/listinfo/talk
>





More information about the talk mailing list