Regex taking long time to evaluate

At the time of login, I need to allow either username (alphanumeric and some special characters) or email address or username\domain format only. For this purpose, I used this regex with or (|) condition. Along with this, I need to allow some other language characters like Japanese, Chinese etc., so included those as well in the same regex. Now, the issue is when I enter characters (>=30) and @ or some special character, the evaluation of this regex is taking some seconds and browser goes in hang mode.

export const usernameRegex = /(^[a-zA-Z0-9._~^#!%+\-]+@[a-z0-9.-]+\.[a-z]{2,4})+|^[a-zA-Z0-9._~^#!\-]+\\([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+|^([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+$/gu;

When I tried removing the other language character set such as [\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+|^([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF] it works fine.

I understood that generally regex looks simple but it does a lot under the hood. Is there any modification that needs to be done in this regex, so that it doesn't take time to evaluate. Any help is much appreciated!

Valid texts:

stackoverflow,
stackoverflow1~,
stackoverflow!#~^-,
[email protected],
stackoverflow!#~^[email protected],
こんにちは,
你好,
tree\guava

EDIT:

e.g. Input causing the issue stackoverflowstackoverflowstackoverflow@

On giving the above text it is taking long time.

https://imgur.com/T2Vg4lg


Solution 1:

Your regex seems to consist of three regular expressions concatenated with |

(^[a-zA-Z0-9._~^#!%+\-]+@[a-z0-9.-]+\.[a-z]{2,4})+

^[a-zA-Z0-9._~^#!\-]+\\([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+

^([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+$
  • first regex (^...)+ how many times do you think this entire pattern can occur that starts at the beginning of the string. Either it's a second occurence OR it starts at the beginning of the string it can't be both.

    So ^[a-zA-Z0-9._~^#!%+\-]+@[a-z0-9.-]+\.[a-z]{2,4}

  • parts 2 and 3 are mostly identical, only that nr. 2 contains this block [a-zA-Z0-9._~^#!\-]+\\ followed by what's the rest of the 3rd part.

    So let's combine them: ^(?:[a-zA-Z0-9._~^#!\-]+\\)? ... and make sure to use non-capturing groups when possible.

  • ([abc]|[def])+ can be simplified to [abcdef]+. This btw. is the part that's killing your performance.

  • your regex ends with a $. This was only part of the last part, but I assume you always want to match the entire string? So let's make all 3 (now 2) parts ^ ... $

Summary:

/^[a-zA-Z0-9._~^#!%+-]+@[a-z0-9.-]+\.[a-z]{2,4}$|^(?:[a-zA-Z0-9._~^#!-]+\\)?[._-~^#!\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF]+$/u

A JS example how a simple regex would try to match a string, and how it fails, backtracks, retries with the other side of the | and so on, and so on.

// let's implement what `/([a-z]|[\p{Ll}])+/u` would do, 
// how it would try to match something.
const a = /[a-z]/; // left part
const b = /[\p{Ll}]/u; // right part

const string = "abc,";

const testNextCharacter = (index) => {
  if (index === string.length) {
    return true;
  }

  const pattern = index + "  ".repeat(index + 1) + "%o.test(%o)";
  const character = string.charAt(index);
  console.log(pattern, a, character);

  // checking the left part && if successful checking the next character
  if (a.test(character) && testNextCharacter(index + 1)) {
    return true;
  }

  // checking the right part && if successful checking the next character
  console.log(pattern, b, character);
  if (b.test(character) && testNextCharacter(index + 1)) {
    return true;
  }

  return false;
}

console.log("result", testNextCharacter(0));
.as-console-wrapper{top:0;max-height:100%!important}

And this are only 4 characters. Why don't you try this with 5,6 characters to get an impression how much work this will be at 20characters.