Archive for the ‘Unicode’ Category

Unicode security vulnerabilities – presentation from Internationalization and Unicode Conference 33

October 20th, 2009 by Chris Weber

I'm attaching my slides from the Unicode conference last week in San Jose, California. I'm getting much feedback for code-level action items. Providing details for code review and static analysis is in the works, with a focus on major frameworks such as ICU, .NET, and Java.

You can download the presentation here.

Unibomber tool for specialized XSS testing

July 28th, 2009 by Chris Weber

John Hernandez has been working hard at Casaba to build a specialized testing tool that automates some of the unique techniques we use to find cross-sites scripting bugs (XSS). At Black Hat I'm planning to demo what we have so far. It automates the testing process greatly, by auto-injecting a canary and ID into each input be it query string, HTTP header, or POST parameter. By combining injection with 'output encoding' detection, you get automation that assists pen-testers in finding vulnerability hotspots.

Because it basically bombs a Web-app with a slew of Unicode characters to find XSS bugs we named it the Unibomber.

Appended to the canary is a special character – special because it can transform into a 'dangerous' character through normalization, casing, or best-fit mapping operations. So we end up injecting these special characters all over the place and then detecting where they get transformed and displayed as output.

The beauty is that we can find both reflected and persistent XSS bugs this way. It's not a one-click tool though, this is intended for use by an experienced person who knows how to find and exploit a clever XSS bug.

Anyone who looks for XSS will likely find some good bugs with the Unibomber. We sure have!

32nd Internationalization and Unicode Conference presentation on Exploiting Unicode-enabled Software

September 11th, 2008 by Chris Weber

I'm glad to have had the chance to present at the Unicode conference yesterday, and meet all the wonderful people there.
You can download the presentation slides here for Exploiting Unicode-enabled software.

 

Generating test cases for Unicode-enabled software

September 10th, 2008 by Chris Weber

When it comes to Unicode implementations, there’s a rich set of test
cases to perform. Realizing it is the start. Automating it is the next
step.

At a high-level Unicode-related security bugs can be categorized into the following root-causes:

Canonicalization

  • Interpreting non-shortest form (e.g .UTF-8 encoding trickery)
  • Other decoding issues

Absorption (over-consumption)

  • Over-consuming invalid byte sequences or correcting rather than failing
  • When <41 C2 C3 B1 42>  becomes <41 42>

Character deletion and swallowing

  • “deletion of noncharacters” (UTR-36)
  • <scr[U+FEFF]ipt> becomes <script>
  • Use replacement characters instead!

Interpreting Syntax replacements

  • white space and line feeds
  • E.g. when U+180E acts like U+0020

Best-fit mappings

  • When σ becomes s
  • When ′ becomes ‘

Buffer overruns

  • Incorrect assumptions about string sizes (chars vs. bytes)
  • Improper width calculations

Timing issues

  • handling Unicode after security gates
  • Sometimes handling Unicode before a gate can be a problem too! E.g. BOM handling

Unicode formatter characters lead to cross-site scripting in popular browsers

September 5th, 2008 by Chris Weber

I'll be discussing some of the issues recently reported to Opera, Apple, and Mozilla at the 32nd Unicode Conference in San Jose next week. We discovered some issues with the way certain Unicode characters could be leveraged to enable cross-site scripting attacks in popular web browsers (aka User-Agents). These issues involve utilizing Unicode characters in ways which might bypass most filters, IPS, and IDS systems.

Handling Unicode when marshalling from .Net to a platform invoke

April 22nd, 2008 by Chris Weber

By default, the .Net runtime will marshall a string (and files in a value type) as a LPStr to a platform invoke (p/invoke) function. By default the .Net framework and runtime handles strings as UTF-16. That's two bytes representing a single Unicode 'code point', and more familiar, a single character. An LPStr on the other hand, is an ANSI character, so in order to convert, the runtime will perform a best-fit conversion to the classic windows-1252 code page. This conversion is well-documented here:

http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt

This might not be so surprising to people in tune with Unicode, but it's can lead to huge security problems when security filters are at risk. For example, if you're performing HTML filtering or file canonicalization, you need to perform so after the conversion to LPStr.

This default marshalling behavior is documented at: http://msdn2.microsoft.com/en-us/library/system.runtime.interopservices.marshalasattribute(VS.71).aspx

To properly and more safely deal with this, you can use the MarshallAsAttribute class to specify a LPWStr type instead of a LPStr. For example:

[MarshalAs(UnmanagedType.LPWStr)]

Because LPWStr is a pointer to a null-terminated array of Unicode characters, this ensures the Unicode code points are preserved across the marshalling.

I18N input validation whitelist filter with System.Globalization and GetUnicodeCategory

April 24th, 2007 by Chris Weber

Maybe you’re building internationalized code and wondering how to build a whitelist filter that will support all the different character sets your planning to support. If you support more than ten, especially some of the larger east Asian sets, this might seem like an unwieldy or tricky process.
Well luckily it’s easier than most people would think. Building a good input validation filter can be simplified with .Net’s GetUnicodeCategory. But use the method from the System.Globalization namespace as the other one in System.Char looks like it may become the subordinate.

With GetUnicodeCategory you can simply build a whitelist supporting the character categories you want to allow. So get away from thinking you have to write a regEx filter and list out all the character ranges you want to allow in each character set, it’s much simpler than that!

The Unicode standard assigns ever character to one of about 31 categories. They make sense too, for example Other Control charactes (Cc) , Lowercase Letter (Ll), Uppercase Letter (Lu), Math Symbol (Sm). So for example you might want to only allow letters, numbers, and punctuation in your whitelist. This could be achieved with the following snippet:


char cUntrustedInput; // the untrusted user-input
UnicodeCategory cInputTest = CharUnicodeInfo.GetUnicodeCategory(cUntrustedInput);
if (cTestCategory == UnicodeCategory.LowercaseLetter ||
cTestCategory == UnicodeCategory.UppercaseLetter ||
cTestCategory == UnicodeCategory.DecimalDigitNumber ||
cTestCategory == UnicodeCategory.TitlecaseLetter ||
cTestCategory == UnicodeCategory.OtherLetter ||
cTestCategory == UnicodeCategory.NonSpacingMark ||
cTestCategory == UnicodeCategory.DashPunctuation ||
cTestCategory == UnicodeCategory.ConnectorPunctuation)
{
// character looks safe, continue
}
else
{
// character is not allowed, fail
}

Not too bad eh.