I’ve noticed a common problem with regular expressions and Hex Characters, so I thought I’d blog about it. The most common way to regex a UUID, or SHA1 or some other hex encoded binary value is this (and I’ve seen this in Perl libraries and StackOverflow answers).

[a-f0-9] or [A-F0-9]

Neither of these are correct as Hex is case insensitive and both of these regex’s are. Hex is most commonly lowercase (unless you’re Data::UUID), but that’s an aesthetic, not a requirement. The best way to match Hex is using a POSIX character class.

[[:xdigit:]] or\x

Which matches [A-Fa-f0-9] in a more readable manner, and intent driven manner

As a side note it’s "\\p{XDigit}" in a regex string in Java