One of the most common things for a user to enter into a form is an email address. Having users' email addresses lets you do things like send users their password if they happen to forget it. This saves users from having to create a new account on your site, and it also saves you some space on your database.
But sometimes a user will fill out a form and not put in a valid email. Have you ever received a form submission with something along the lines of "qwert" in the email field? Using regular expression matching, you can at least check the user's email address entry to see if it is technically valid.
A technically valid email address consists of a username, the "@" sign, and a server name. Valid usernames can contain letters, numbers, the underscore ("_"), the minus sign ("-"), and periods ("."). Valid server names are almost the same, except that server names cannot contain an underscore. Finally, the end of the domain name must have a "." in it followed by two or more letters, such as ".com", ".it", or ".info". Using our regular expression operators, we can build a regular expression that matches against valid email addresses.
Let's state that a valid username must start with at least one letter or one number:
^[a-z0-9]+
This is followed by zero or more letters or numbers, underscores "_", or minus signs "-":
[a-z0-9_-]*
Then it can be followed by zero or one "."s, followed by any number of letters or numbers, underscores "_", or minus signs "-":
(\.[a-z0-9_-]+)*
then followed by the @ sign:
@
Which is then followed by at least one letter or number:
[a-z0-9]+
Then it can be followed by zero or one "."s, followed by any number of letters or numbers, underscores "_", or minus signs "-":
(\.[a-z0-9_-]+)*
Finally, it can be followed by another ".", then two or more letters (the end of the domain name), and then the email string must end:
\.([a-z]+){2,}$
Put it all together and you get:
^[a-z0-9]+(\.[a-z0-9_-]+)*@[a-z0-9_-]+(\.[a-z0-9-]+)*\.([a-z]+){2,}$
But we haven't taken into account the case of the letters! The expressions above only allow for lowercase letters in the email address, but uppercase letters are valid as well. To save some time, we didn't include the "A–Z" ranges in our expressions, because we can use a slightly different form of ereg(), which is called eregi(). eregi() works exactly the same as ereg() except that it is not case-sensitive. That way you don't have to search for both uppercase and lowercase characters in your expressions, using the [a-zA-Z] syntax. You only need to use [a-z].
Our eregi() function now looks like this:
eregi("^[a-z0-9]+[a-z0-9_-]*(\.[a-z0-9_-]+)*@[a-z0-
9_-]+(\.[a-z0-9_-]+)*\.([a-z]+){2,}$", $email);
It's very ugly, but it does a very good job of checking for valid email addresses.
Here is a short script that demonstrates the prowess of this function:
Script for checkemail.php
<?
$email = array ("chris_2@company.com", "-
fred@broken.org", "joe.smith@works.it", "Busted@bad.a",
"strange@44.44", "works.fine.all_day@x.y.z.com","is-
dashed-line@d-a-s-h-e-d.com", "CC@c.com",
"this.works.fine@also.ok");
for($i = 0; $i < sizeof($email); $i++) {
if(eregi("^[a-z0-9]+[a-z0-9_-]*(\.[a-z0-9_-]+)*@[a-z0-9_-]+(\.[a-z0-
9_-]+)*\.([a-z]+){2,}$", $email[$i])) {
echo "<p>$email[$i] is valid.";
} else {
echo "<p>$email[$i] is <b>not</b> valid.";
}
}
?>