Why is my regex working intermittently?

Davide de Paolis - Jan 13 '20 - - Dev Community

wtf + facepalm
also

Why is my Regular Expression failing every other time it is called?

Why is my RegEx working only the first time but not the second ( and working again the third?)

Nasty weird bugs caused by silly things


TLDR

When the RegExp test method is run with global flag (/g), the regex keeps internally the state of the search. Therefore at each invocation the regular exception will be run from the last index that was found previously.

const regex = /a/gi  // const regex = RegExp('a', 'gi')
regex.test("abc")  // --> true
regex.test("abc")  // --> false
regex.test("abc")  // --> true
Enter fullscreen mode Exit fullscreen mode

Solution

  • If not strictly necessary, avoid the global flag or
  • Use String.match(RegExp) instead of RegExp.test(String) or
"abc".match(regex) // --> ["a"]
"abc".match(regex) // --> ["a"]
"abc".match(regex) // --> ["a"]
Enter fullscreen mode Exit fullscreen mode
  • Recreate the Regex at each invocation (avoid a reference to a constant or any regex define elsewhere)
/a/gi.test("abc")  // --> true
/a/gi.test("abc")  // --> true
/a/gi.test("abc")  // --> true
Enter fullscreen mode Exit fullscreen mode

When we want to know whether a pattern is found in a string we can most commonly use two approaches:

we can check if the string matches with the regex (myString.match(myRegex))
or
we can test the regex against the string (myRegex.test(myString))

If I am not interested in finding the matches, and I just want to know if we found something or not, I prefer to use RegExp.test which is simpler returns a boolean instead of an array ( and it´s also slightly faster).

The other day I noticed a weird behavior on one of a lambda we recently deployed to staging.

In that Lambda we have a https://github.com/sindresorhus/p-waterfall :

  • parse and validate User Input
  • load data from DynamoDB
  • load configuration from AWS ParameterStore (SSM)
  • manipulate the data from dynamoDB together with the user input
  • compose a URL using the loaded configuration and the user data and validate it
  • fetch data from that URL
  • save result to a AWS SQS

We are still in the MVP stage and we have some unit tests and integration tests in place.
Everything was working fine in the tests and even after deployment. But we noticed that the behavior when deployed was a bit weird. the lambda was returning an error, every now and then, intermittently with apparently no reason, since the payload was always the same.

After activating some logging I realized that the composed URL was invalid, therefore I started looking at the configuration and data being loaded from DynamoDB or SSM - maybe some Permissions/Policies missing? (remember that when running locally with serverless offline the credentials and permissions are your own - and therefore different from those in the lambda container).
After some investigation, I found out that the composed URL was always the same and what was failing was the url-validation method - even though the input URL was exactly the same...

I could immediately recall some behaviour in the Regular Expression related to a shifting index during the search therefore I opened the RegExp.test Docs Gotcha!

Using test() on a regex with the global flag

If the regex has the global flag set, test() will advance the lastIndex of the regex. A subsequent use of test() will start the search at the substring of str specified by lastIndex. It is worth noting that the lastIndex will not reset when testing a different string.

What does it mean exactly?

It means for example that the regex /a/gi/ will test the string "abc" and find a match at the first character. When the regex is run again, it will start testing from that point on, therefore on "bc". Since on "bc" the regex can't find a match, it will restart from zero the next time, and here you go "abc" matches again.

If you think that this will happen even if you try another string like "axy" or that with a string containing more than one a the rhythm of successful matches becomes irregular, such behaviour could lead to quite nasty bugs.

const regex = /a/gi  // const regex = RegExp('a', 'gi')
regex2.test("abcad")  // --> true
regex2.test("abcad")  // --> true
regex2.test("abcad")  // --> false
regex2.test("abcad")  // --> true
Enter fullscreen mode Exit fullscreen mode

banging head on desk

As I said, this is documented pretty well and while writing this I tried out some other references and found of course lots of similar questions on StackOverflow - as old as 10 years! - so it should not have been a surprise, but it indeed caught us off guard.

I adjusted my method - wrote a couple more unit tests to check this edge case and that was it.
awesome

For some more fun/insights about RegEx checkout https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/

Hope it helps.


Photo by Mr Cup / Fabien Barral on Unsplash

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player