A character encoding maps a to units of a specific width and defines byte
serialization and ordering rules. Many character sets have more than one encoding.
For example, Java programs can represent Japanese character sets using the EUC-JP
or Shift-JIS encodings, among others. Each encoding has rules for representing and serializing
a character set. Two of the more popular character encodings are the following:
The ISO 8859 series. This series defines 13 character encodings that
can represent texts in dozens of languages. Each ISO 8859 character encoding can
have up to 256 characters. ISO 8859-1 (Latin-1) comprises the ASCII character set,
characters with diacritics (accents, diaereses, cedillas, circumflexes, and so on), and additional symbols.
UTF-8 (Unicode Transformation Format, 8-bit form). This a variable-width character encoding
that encodes 16-bit Unicode characters as one to four bytes. A byte in UTF-8 is equivalent to 7-bit
ASCII if its high-order bit is zero; otherwise, the character comprises a variable number of bytes.
UTF-8 is compatible with the majority of existing web content and provides access to the Unicode
character set. Current versions of browsers and email clients support UTF-8. In addition, many new web
standards specify UTF-8 as their character encoding. For example, UTF-8 is one of the two required
encodings for XML documents (the other is UTF-16).
To produce an internationalized web application, you need to encode the following:
Request Character Encoding. The character encoding in which parameters in an
incoming request are interpreted.
This encoding converts parameters to string objects. For details, click here.
Page Character Encoding. The character encoding in which the JSP file is written.
Unless the page character encoding is set correctly, the JSP parser,
web container, or web server that reads the page cannot understand the characters before they do anything with them,
such as translating the JSP file into a servlet.
Page character encoding is used for the rendering of JSP files only if the
response character encoding has not been set separately. For details, click here.
Response Character Encoding. The character encoding of the textual
response generated by a web
component. This lets you control the encoding that the page uses when it is sent to the
browser. The web page encoding must be set appropriately so that the characters are rendered correctly
for a given locale. For details, click here.
All modern web browsers understand UTF-8, so that is a safe encoding to pick for the response.
In the IDE, it is a good encoding at the page level too. This is why UTF-8 is the default page character encoding
and also the default response character encoding for JSP files created in the IDE.