How can I validate an email address using a regular expression?
#1
Has anyone encountered challenges crafting the ultimate regular expression for validating email addresses? I've been iterating on my regex for years and it seems to cover most cases well. Still, now and then someone reports an issue that requires tweaking. Most recently, it missed four-character TLDs. I prefer a single, albeit complex, regex over multiple short ones for simplicity's sake. Here's the pattern I'm currently using:

Code:
/\A(?=[a-zA-Z0-9!#$%&'*+\/=?^_`{|}~-]{1,254})([a-zA-Z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-zA-Z0-9!#$%&'*+\/=?^_`{|}~-]+)*)@((?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}\z/

Although effective, I'm certain it can be optimized. Can someone provide a more comprehensive pattern that adheres to the complete RFC 5322 specification with IPv4 support omitted?
Reply
#2
I see where the challenge lies. One issue with your regex is the handling of quoted strings and comments, which your pattern does not currently support. Email addresses can also include quoted strings and domain literals, which are omitted in your version. Moreover, it should allow the use of comments as well. The complete RFC 5322 compliant regex is extremely lengthy and not often used in practice due to its complexity. For most applications, a concise regex that covers 99% of use cases is more advisable.
Here is a pattern that is more inclusive than yours but still doesn't cover every valid email according to RFC 5322. It should, however, validate most common email addresses correctly:

Code:
/^(?!(?:(?:[a-zA-Z0-9-'*+/=?^_`{|}~!,.;:<>[\]]|\\")(?:(?:[a-zA-Z0-9-'*+/=?^_`{|}~!,.;:<>[\]]|\\")*[a-zA-Z0-9-'*+/=?^_`{|}~!,.;:<>[\]]|\\")?)@)(?:(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.){1,62}[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?)$/

This pattern tries to balance between the full RFC 5322 specification and the practical need for email validation.
Reply
#3
I agree with OliviaSmith. Striving for a regex that covers each RFC specification may not be practical. However, it's also important to ensure your pattern is not too restrictive. For example, allowing international characters in the domain part might be something you want to consider. Many domains now support internationalized characters.
Here is a snippet using PHP's filter_var, which provides a reasonable level of email validation without having to use complex regex:

Code:
<
? php
$email = "[email protected]";
if (filter_var($email, FILTER_VALIDATE_EMAIL)) {
    echo "This email address is considered valid.";
} else {
    echo "This email address is considered invalid.";
} ?
>

If you must use a regex, consider updating your expression to include international characters and perhaps increase the maximum length of the top-level domain to support new longer TLDs like `.photography`.
To summarize, while crafting a 100% accurate email validation regex is theoretically possible, it might not be practical. Using simpler, more widely accepted methods like `filter_var` in PHP could be more beneficial. If a regex is required for some reason, make sure it is up-to-date with current email address formats, including longer TLDs and international domains.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)