[xquery-talk] [xsl] Re: backticks in regex - tales of the unexpected part II

David Carlisle davidc at nag.co.uk
Mon Apr 7 11:32:33 PDT 2014


On 07/04/2014 18:04, Ihe Onwuka wrote:
> On Mon, Apr 7, 2014 at 5:49 PM, David Carlisle <davidc at nag.co.uk>
> wrote:
>>
>> No just that if you are writing vocabulary specific regex you need
>>  to use vocabulary specific regex terms. If I'm looking for words
>> in English I tend to use [a-z] even if some people try to sneak
>> accents into cafe or naive :-)
>>
>
> Well mine is not a regional vocabulary scenario. The backtick
> appears in a title which is used to create a url which (I believe)
> will not tolerate such characters.

well then grave accent is the least of your concerns with \w

URI letters are defined as ALPHA (%41-%5A and %61-%7A) ie [a-zA-Z] so
doesn't allow accented letters, or Greek or Cyrillic or 10s of thousands
of other characters included in \w

https://tools.ietf.org/html/rfc3986

Of course most user-facing systems such as html or XML allow a much
wider set of characters in href attributes and SYSTEM identifiers and
leave it to the system to %-encode according to the somewhat arcane URI
rules, cf IRI or LEIRI syntax.

David





More information about the talk mailing list