Monday, 6 June 2016

javascript regex oddities.

Get a JS console and try this


^ This regex '.' will match the single character 'a'.

Now try with a complex unicode char like an emoji:

> ["�"]

The JS regex matches half of the unicode character. 

What is interesting is if you specify a 2 letter match JS finds the character:



In other unrelated regex bugs: \w can not understand accents:

> Null

Reading more about crazy Unicode in Javascript. Note that some accents can be displayed as letter followed by accent (2 characters) and that the same character can be letter_with_accent (1 character). Ofcourse if this happens the string length is different and they don't match.

People are upset about Python's handling of unicode too.

