Wednesday, March 10, 2010

Preventing Cross-Site scripting in Java

You can read about XSS here: Cross-site scripting

Been trying to figure out how to disable it using html encoding and what Java libraries are available.

OWASP's site has an article on this:

To quote:

"Injection attacks rely on the fact that interpreters take data and execute it as commands. If an attacker can modify the data that's sent to an interpreter, they may be able to make it misbehave. One way to help prevent this from happening is to encode the attacker's data in such a way that the interpreter will not get confused. HTML entity encoding is just such an encoding mechanism for many interpreters."

There are two ways to encode the data viz. entity reference and numeric reference:

From Wikipedia:

An entity reference uses the "&" symbol:

& quot; (double) quotation mark
& amp; ampersand
& apos; apostrophe (= apostrophe-quote)
& lt; less-than sign
& gt; greater-than sign

A numeric character reference refers to a character by its Universal Character Set/Unicode code point, and uses the format


where nnnn is the code point in decimal form, and hhhh is the code point in hexadecimal form

Although the OWASP article mentioned above talks about entity references, the code sample enclosed actually uses numeric entity encoding i.e.


encodes as:


where "hash" = the "#" character

Some further research around this issue leads to:
AntiXSS for Java which is a port to Java of the Microsoft Anti-Cross Site Scripting (AntiXSS) library for .NET applications

and to:
Open Web Application Security Project (OWASP)

which has a:
Enterprise Security API (ESAPI)

Click on the "Java EE" tab. There are two ways to invoke the functionality. One uses the classes directly:

import org.owasp.esapi.codecs.HTMLEntityCodec;

public static StringBuilder esapiCodecHtml (String s)
HTMLEntityCodec hec = new HTMLEntityCodec();
StringBuilder b = new StringBuilder(s.length());
char[] immune = { ',', '.', '-', '_', ' ' };
String returnStr = "";

String clean = ESAPI.encoder().canonicalize(s);
System.out.println ("Cleaned result is " + clean);

for (int i = 0; i < s.length(); i++)
char ch = s.charAt(i);
returnStr = hec.encodeCharacter(immune, ch);

return b;

Note: ESAPI canonicalizes input before validation to prevent bypassing filters with encoded attacks. Failure to canonicalize input is a very common mistake when implementing validation schemes. Canonicalization is automatic when using the ESAPI Validator.

and the other uses the wrapper:

import org.owasp.esapi.ESAPI;

public static String esapiEncodeForHTML (String s)
String returnStr = "";

String clean = ESAPI.encoder().canonicalize(s);
System.out.println ("Cleaned result is " + clean);
returnStr = ESAPI.encoder().encodeForHTML(s);

return returnStr;

They both convert <script></script> to andlt;scriptandgt;andlt;&hashx2f;scriptandgt;

where "and" is the "&" character.

Interestingly, this is a combination of both reference types.

Just to note: The example at the top converted the "/" to &hash47; whereas ESAPI converts it to &hashx2f; This is because one is decimal and one is hex!


Refer to:

XSS (Cross Site Scripting) Prevention Cheat Sheet

Refer to my SO question:

Java - XSS - HTML encoding - Character entity reference vs. Numeric entity reference


1 comment:

Anonymous said...

i try using your coding but it didn't work.
This code doesn't work
String clean = ESAPI.encoder().canonicalize(s);
System.out.println ("Cleaned result is " + clean); didn't get executed.
do you know why?