• No results found

Use Pre-Encoded Characters

Chapter 3. Servlet Best Practices Jason Hunter

3.1 Working Effectively with Servlets

3.1.2 Use Pre-Encoded Characters

One of the first things you learn when programming servlets is to use a PrintWriter for writing characters and an OutputStream for writing bytes. And while that's stylistically good advice, it's also a bit simplistic. Here's the full truth: just because you're outputting characters doesn't mean you should always use a PrintWriter!

A PrintWriter has a downside: specifically, it has to encode every character from a char to a byte sequence internally. When you have content that's already encoded—such as content in a file, URL, or database, or even in a String held in memory—it's often better to stick with streams. That way you can enable a straight byte-to-byte transfer. Except for those rare times when there's a charset mismatch between the stored encoding and the required encoding, there's no need to first decode the content into a

O’Reilly – Java Enterprise Best Practices 52

String and then encode it again to bytes on the way to the client. Use the pre-encoded characters and you can save a lot of overhead.

To demonstrate, the servlet in Example 3-1 uses a reader to read from a text file and a writer to output text to the client. Although this follows the mantra of using Reader/Writer classes for text, it involves a wasteful, needless conversion.

Example 3-1. Chars in, chars out

import java.io.*;

import java.util.prefs.*; import javax.servlet.*; import javax.servlet.http.*;

public class WastedConversions extends HttpServlet {

// Random file, for demo purposes only String name = "content.txt";

public void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException {

String file = getServletContext( ).getRealPath(name);

res.setContentType("text/plain"); PrintWriter out = res.getWriter( );

returnFile(file, out); }

public static void returnFile(String filename, Writer out)

throws FileNotFoundException, IOException { Reader in = null;

try {

in = new BufferedReader(new FileReader(filename)); char[ ] buf = new char[4 * 1024]; // 4K char buffer int charsRead;

while ((charsRead = in.read(buf)) != -1) { out.write(buf, 0, charsRead);

} }

finally {

O’Reilly – Java Enterprise Best Practices 53

} } }

The servlet in Example 3-2 is more appropriate for returning a text file. This servlet recognizes that file content starts as bytes and can be sent directly as bytes, as long as the encoding matches what's expected by the client.

Example 3-2. Bytes in, bytes out

import java.io.*;

import java.util.prefs.*; import javax.servlet.*; import javax.servlet.http.*;

public class NoConversions extends HttpServlet {

String name = "content.txt"; // Demo file to send

public void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { String file = getServletContext( ).getRealPath(name);

res.setContentType("text/plain");

OutputStream out = res.getOutputStream( );

returnFile(file, out); }

public static void returnFile(String filename, OutputStream out) throws FileNotFoundException, IOException { InputStream in = null;

try {

in = new BufferedInputStream(new FileInputStream(filename)); byte[ ] buf = new byte[4 * 1024]; // 4K buffer

int bytesRead;

while ((bytesRead = in.read(buf)) != -1) { out.write(buf, 0, bytesRead);

} }

finally {

if (in != null) in.close( ); }

O’Reilly – Java Enterprise Best Practices 54

} }

How much performance improvement you get by using pre-encoded characters depends on the server. Testing these two servlets against a 2 MB file accessed locally shows a 20% improvement under Tomcat 3.x. Tomcat 4.x shows a whopping 50% improvement. Although those numbers sound impressive, they of course assume that the application does nothing except transfer text files. Real-world numbers depend on the servlet's business logic. This technique (illustrated in Figure 3-2) are most helpful for applications that are bandwidth- or server CPU-bound.

Figure 3-2. Taking advantage of pre-encoded characters

The principle "Use Pre-encoded Characters" applies whenever a large majority of your source content is pre-encoded, such as with content from files, URLs, and even databases. For example, using the ResultSet getAsciiStream( ) method instead of getCharacterStream( ) can avoid conversion overhead for ASCII strings—both when reading from the database and writing to the client. There's also the potential for cutting the bandwidth in half between the server and database because ASCII streams can be half the size of UCS-2 streams. How much benefit you actually see depends, of course, on the database and how it internally stores and transfers data.

In fact, some servlet developers preencode their static String contents with String.getBytes( ) so that they're encoded only once. Whether the performance gain justifies going to that extreme is a matter of taste. I advise it only when performance is a demonstrated problem without a simpler solution.

To mix bytes and characters on output is actually easier than it probably should be. Example 3-3 demonstrates how to mix output types using the ServletOutputStream and its combination write(byte[ ]) and println(String) methods.

Example 3-3. ValueObjectProxy.java

O’Reilly – Java Enterprise Best Practices 55 import java.sql.*; import java.util.Date; import javax.servlet.*; import javax.servlet.http.*;

public class AsciiResult extends HttpServlet {

public void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { res.setContentType("text/html");

ServletOutputStream out = res.getOutputStream( );

// ServletOutputStream has println( ) methods for writing strings. // The println( ) call works only for single-byte character encodings. // If you need multibyte, make sure to set the charset in the Content-Type // and use, for example, out.write(str.getBytes("Shift_JIS")) for Japanese. out.println("Content current as of");

out.println(new Date( ).toString( ));

// Retrieve a database ResultSet here.

try {

InputStream ascii = resultSet.getAsciiStream(1); returnStream(ascii, out);

}

catch (SQLException e) {

throw new ServletException(e); }

}

public static void returnStream(InputStream in, OutputStream out) throws FileNotFoundException, IOException { byte[ ] buf = new byte[4 * 1024]; // 4K buffer

int bytesRead;

while ((bytesRead = in.read(buf)) != -1) { out.write(buf, 0, bytesRead);

} } }

Although mixing bytes with characters can provide a performance boost because the bytes are transferred directly, I recommend you use this technique sparingly because it can be confusing to readers and can be error-prone if you're not entirely familiar with how charsets work. If your character needs to extend beyond

O’Reilly – Java Enterprise Best Practices 56

ASCII, be sure you know what you're doing. Writing non-ASCII characters to an output stream should not be attempted by a novice.