While auditing some new changes to an Open Source project I often use, I wrote a test-suite to surface edge cases and discovered something I never noticed before: the ASCII Control Characters made it safely across the entire stack as-is. It didn’t raise any alarms or trigger escaping from Form controls, User Generated Content Sanitizers, or HTML Safe Markup rendering.
Form control systems generally only trigger an error with ASCII controls if the input validator is built with a Regular Expression that allows specific characters/ranges. Most just check to see valid ASCII characters. ASCII control characters do a lot of useful things like providing “newlines” and “tabs”. Most are irrelevant, but one is particularly problematic — the backspace character. “Backspaces” are problematic because Web Browsers, Text Editors and Developer IDEs typically consider most ASCII control characters to be non-printing or “invisible” and don’t render them onscreen, but they are still part of the “document”. If you copy a document or document part to your clipboard, the “invisible” characters copy too. If you then paste them into an interpreter window, they will be interpreted/execute in real-time — allowing a malicious user to carefully construct an evil payload.
Consider this trivial example with Python. It looks like we’re loading the
yosemite library, right?:
Wrong. Pasted into a Python interpreter, this will appear:
How? There are ASCII backspace control characters after certain letters. When pasted into an interpreter, they immediately “activate”. Using the escape sequences, the actual string above looks like:
If you want to try this at home, you can generate a payload with this line:
open('payload.txt', 'w').write("import y\bose\bm\bi\bt\be ")
Given a few lines of text and some creativity, it is not hard to construct “hidden” code that deletes entire drives or does other nefarious activities.
It’s an Edge Case
This really only affects situations meeting both of the following two requirements:
- People retrieve code published via User Generated Content (or potentially main site articles)
- The code is copy/pasted directly into an interpreter or terminal (I’ve test this against Python, Ruby, and Node.JS)
The target audiences for this type of vulnerability is developers and “technology organizations”. The potential vectors for bad payloads are:
- posts / articles / status
- bug-tracking / ticketing software
- “how to reproduce” instructions on security vulnerability report forms (ironic, i know)
What is and isn’t affected
I tested dozens of “internet scale” websites, only two filtered out the backspace control character: StackOverflow and Twitter. WordPress.com filters out the backspace, but WordPress open source posts do not.
I focused my search on the types of sites that people would copy/paste code from: coding blogs, “engineering team” notes, discussion software, bugtrackers, etc. Everything but the two sites mentioned above failed.
Bad payloads were also accepted in:
- Multiple Pastebin and ‘fiddle’ services
Most Python/Ruby/PHP libraries allowed bad payloads through Form validation or UGC sanitization.
Did I file reports with anyone?
I previously filed security reports with a handful of projects and websites. I also offered patches to escape the backspace issue with some projects as well.
I decided to post this publicly because:
- Open Source projects overwhelmingly believed the issue should be addressed by another component.
- Commercial Websites believed this is a social engineering hack and dismissed it.