• No results found

URI::Escape

In document Beginning Web Development with Perl (Page 56-60)

Working with CGI scripts sometimes means working closely with Universal Resource Identi- fiers (URIs) and Universal Resource Locators (URLs). It also means playing by a certain set of

1. The difference between a URL and URI is subtle. A URL is a type of URI meant to show the location of the resource. The Internet Engineering Task Force (IETF) has published a number of Request For Com- ments (RFC) documents that define these and many other Internet standards. For more information, see the IETF’s web site (http://www.ietf.org/) or the RFC Editor’s web site (http://www.rfc-editor.org/).

rules or standards for characters that are acceptable in a URI or URL.1RFCs 2396 and 2732

define the characters that are restricted when they appear in a URL.

In essence, you must escape reserved and unsafe characters if they appear in the query string of the URI. Usually, you escape characters by changing the value for the reserved char- acter to its hexadecimal (hex) equivalent preceded by a % instead of 0x. For example, the hex equivalent for a dollar sign ($) in a URL is %24; the URI hex for a space character is %20. Pro- grammers familiar with Microsoft Windows web design might recognize the %20 as a space, since it’s more common to see spaces in filenames on Windows systems than on Unix and Unix-like systems.

In Perl, there’s more than one way to accomplish a given task, and escaping characters is no exception. There’s nothing preventing you from manually escaping each invalid character within a URI, and, in fact, a regular expression wizard could account for all instances of reserved and unsafe characters, and substitute them with their hex equivalents in one line of code. That’s an enjoyable exercise for learning regular expressions, but I’ve found that the URI::Escape module saves a lot of time in this area.

You can download the URI::Escape module from your favorite CPAN mirror (find mirrors at http://www.cpan.org/). This is probably one of the easiest Perl modules to use.

The URI::Escape module includes two primary functions: uri_escape($string) and uri_unescape($string). The uri_escape() function accepts an optional second argument containing a set of characters to be escaped, as opposed to the default set of restricted characters from RFC 2396. These characters include the following: ;, /, ?, :, @, &, =, +, $, ,, [, ], -, _, ., !, ", *, ', (, and ).

When a string containing one of these characters is passed to the uri_escape() function, it will return a string with the restricted characters replaced with their safe counterparts. Con- versely, when the uri_unescape() function receives a string with escaped characters, it will replace those escaped characters with their restricted, unsafe counterparts. Sometimes, the best way to explain things is with an example. Consider the code in Listing 2-4.

Listing 2-4. A Safe String Example with uri_escape #!/usr/bin/perl -T

use strict; use URI::Escape; use CGI qw/:standard/;

my $unsafestring = "\$5/[3454]/this is a windows filename.asp"; my $safestring = uri_escape($unsafestring);

C H A P T E R 2■ P O P U L A R C G I M O D U L E S

C H A P T E R 2■ P O P U L A R C G I M O D U L E S 41

print header,

start_html("Making URLs Safe Is Our Business"),

p("The string that is unsafe for a URL is: $unsafestring\n"), p("When fed through the url_escape() function it becomes: $safestring\n"),

end_html; exit;

The code is pretty simple but illustrates the uri_escape() function very well. As usual, the URI::Escapefunctions are imported into the namespace with this code:

use URI::Escape;

From there, a string is created with all sorts of unsafe characters, including a $, brackets, and spaces. Notice the \ included in the string. The backslash doesn’t actually appear in the output, since it’s used to escape the $5, so that Perl doesn’t interpret the $5 as a variable! my $unsafestring = "\$5/[3454]/this is a windows filename.asp";

The string is then run through the uri_escape() function, with the results placed into a variable called $safestring:

my $safestring = uri_escape($unsafestring);

The next lines of code in the example are ones that you’ve seen in earlier examples, begin- ning the web page output and so on. Two lines of output to the resulting web page are based on the output from the uri_escape() function:

p("The string that is unsafe for a URL is: $unsafestring\n"),

p("When fed through the url_escape() function it becomes: $safestring\n"),

First, you’re shown the string as it would appear before any escaping of unsafe characters (the variable $unsafestring). Next, the result of the uri_escape function is shown as the con- tents of the $safestring variable. Viewing the page through a browser, as shown in Figure 2-3, illustrates the results of the program.

Parsing an escaped URI string is a useful task, not only when programming for the Web, but also when performing forensics or monitoring security logs. Attackers and malicious code will frequently disguise their code by escaping it using the hex equivalent. Feeding that encoded string into the uri_unescape() function can help reveal the intent of such an attack. Listing 2-5 shows an example of using uri_escape().

Figure 2-3. An escaped string viewed through a web browser

Listing 2-5. Using uri_unescape to Make a String Without Escape Characters #!/usr/bin/perl -T

use strict; use URI::Escape; use CGI qw/:standard/;

my $unsafestring = "\$5/[3454]/this is a windows filename.asp"; my $safestring = uri_escape($unsafestring);

my $unescstring = uri_unescape($safestring); print header,

start_html("Making URLs Safe Is Our Business"),

p("The string that is unsafe for a URL is: $unsafestring\n"),

p("When fed through the url_escape() function it becomes: $safestring\n"), p("When the escaped string is unescaped, it becomes: $unescstring\n"), end_html;

exit;

C H A P T E R 2■ P O P U L A R C G I M O D U L E S

Figure 2-4. An unescaped string viewed through a web browser

C H A P T E R 2■ P O P U L A R C G I M O D U L E S 43

This code is similar to that shown in Listing 2-4. The additions to this code show the uri_unescape()function being run, as well as the results of that function call. As you can see in Figure 2-4, the string is indeed unescaped correctly.

Two other functions within URI::Escape enable the developer to escape characters with a code above 255: uri_escape_utf8($string) and uri_unescape_utf8($string). These func- tions encode the characters as UTF-8 prior to escaping them. As with the normal uri_escape() function, the uri_escape_utf8() function also accepts an optional second argument contain- ing a string of unsafe characters.

In document Beginning Web Development with Perl (Page 56-60)

Related documents