When to Use Character Entities in XHTML:-


The Universal Character Set covers a lot of characters, including the standard ASCII characters on your keyboard. You may be wondering when to use entities and when it's acceptable to type the character you want. After all, if this book were on a Web page, it could have been written entirely with decimal or hexadecimal entities.

The truth is, many Web browsers will be able to translate many of the characters you type. That is, it's okay to type a comma and a period rather than escaping them. And you can type all of the letters of the alphabet and numbers without worrying. After that, it gets trickier. If you're not sure about a certain character, it's best to escape it. You should always check to see how the character looks when viewed through the Web browsers you wish to support. You should always replace the following five items with their respective entities:

1. Quotation marks (" ")
2. The less-than and greater-than signs (< , >)
3. The ampersand (&)
4. Any characters not commonly found in English
5. Mathematical symbols

Quotation Marks

Quotation marks play an important role in XHTML because they are used to surround attributes in an XHTML element—for example: <img src="foo.gif">. Of course, quotation marks are also used to quote text on the page.

Generally speaking, it's best to escape all of the quotations in the general text. The most commonly used entity for this is the named one, &quot;. Here's an example:

<body>
<p>&quot;I love XHTML.&quot;</p>
</body>

The Greater-Than and Less-Than Signs

Greater-than and less-than signs should always be escaped when used in text because when unescaped, they represent the beginning and end of XHTML tags. The more recent browsers understand that a space around the unescaped signs means they are less-than and greater-than signs. But that feature can actually be a problem if you use < and > to surround an XHTML tag but accidentally leave a space within the brackets. The older browsers will ignore everything between the < and > tags, thinking that they are XHTML tags.

The most common way to escape these characters is by using the named entities: &lt; (less than) and &gt; (greater than). Here's an example:

<body>
<p>5 &lt; <br />
5 &gt; 3</p>
</body>

Other Mathematical Symbols

You should also always use entities to represent other mathematical symbols such as +, =, x , -, and ?. Although the Web was initially invented as a way for scientists to easily share information, browsers vary quite a bit in their interpretations of mathematical symbols. It's best to use entities.

The Ampersand

Because the ampersand (&) indicates the beginning of an entity, it should also be escaped. Of all the named entities, &amp; is used most often. Here's an example of the ampersand named entity:

<body>
<p>The &amp; symbol is frequently escaped by using the &amp;amp;
entity.</p>
</body>

This example is a bit trickier than others you've seen. Note that the second ampersand is escaped and then immediately followed by amp;. Viewed in a browser, &amp;amp; will render "&amp;." Try it out for yourself.

Characters Not Commonly Found in English

The early Internet was primarily built by the U.S. government as a defense-agency project. Although many browsers exist, the primary ones used throughout the world were built to support ASCII automatically, other characters not so well. ASCII was invented by IBM, a U.S.-based company, and represents North American and European letterforms and punctuation.

So the Web supports certain letters and punctuation most easily and U.S.-based language the most. It's not that other languages cannot be supported or represented; but you usually have to escape any non-English characters, even European characters such as Ĺ. Remember, if you can't type it directly from your keyboard without using special keys, it probably needs to be escaped.