a security vulnerability affecting most websites and open source projects, even facebook

While auditing some new changes to an Open Source project I often use, I wrote a test-suite to surface edge cases and discovered something I never noticed before: the ASCII Control Characters made it safely across the entire stack as-is. It didn’t raise any alarms or trigger escaping from Form controls, User Generated Content Sanitizers, or HTML Safe Markup rendering.

Form control systems generally only trigger an error with ASCII controls if the input validator is built with a Regular Expression that allows specific characters/ranges. Most just check to see valid ASCII characters. ASCII control characters do a lot of useful things like providing “newlines” and “tabs”. Most are irrelevant, but one is particularly problematic — the backspace character. “Backspaces” are problematic because Web Browsers, Text Editors and Developer IDEs typically consider most ASCII control characters to be non-printing or “invisible” and don’t render them onscreen, but they are still part of the “document”. If you copy a document or document part to your clipboard, the “invisible” characters copy too. If you then paste them into an interpreter window, they will be interpreted/execute in real-time — allowing a malicious user to carefully construct an evil payload.

An Example

Consider this trivial example with Python. It looks like we’re loading the yosemite library, right?:

import yosemite

Wrong. Pasted into a Python interpreter, this will appear:

import os

How? There are ASCII backspace control characters after certain letters. When pasted into an interpreter, they immediately “activate”. Using the escape sequences, the actual string above looks like:

import y\bose\bm\bi\bt\be

If you want to try this at home, you can generate a payload with this line:

open('payload.txt', 'w').write("import y\bose\bm\bi\bt\be ")

Given a few lines of text and some creativity, it is not hard to construct “hidden” code that deletes entire drives or does other nefarious activities.

It’s an Edge Case

This really only affects situations meeting both of the following two requirements:

  1. People retrieve code published via User Generated Content (or potentially main site articles)
  2. The code is copy/pasted directly into an interpreter or terminal (I’ve test this against Python, Ruby, and Node.JS)

The target audiences for this type of vulnerability is developers and “technology organizations”. The potential vectors for bad payloads are:

  • posts / articles / status
  • pastebins
  • bug-tracking / ticketing software
  • “how to reproduce” instructions on security vulnerability report forms (ironic, i know)

What is and isn’t affected

I tested dozens of “internet scale” websites, only two filtered out the backspace control character: StackOverflow and Twitter. WordPress.com filters out the backspace, but WordPress open source posts do not.

I focused my search on the types of sites that people would copy/paste code from: coding blogs, “engineering team” notes, discussion software, bugtrackers, etc. Everything but the two sites mentioned above failed.

Bad payloads were also accepted in:

  • Facebook
  • Multiple Pastebin and ‘fiddle’ services

Most Python/Ruby/PHP libraries allowed bad payloads through Form validation or UGC sanitization.

Did I file reports with anyone?

I previously filed security reports with a handful of projects and websites. I also offered patches to escape the backspace issue with some projects as well.

I decided to post this publicly because:

  • Open Source projects overwhelmingly believed the issue should be addressed by another component.
  • Commercial Websites believed this is a social engineering hack and dismissed it.

Small gotcha under Python’s PYTHONOPTIMIZE feature

A long, long, long time ago I started using interpreter optimizations to help organize code. If a block of code is within a constant (or expression of constants) that evaluate to False, then the block (or line) is optimized away.

if (false){ # this is optimized away }

Perl was very generous, and would allow for constants, true/false, and operations between the two to work.

Thankfully, Python has some optimizations available via the PYTHONOPTIMIZE environment variable (which can also be adjusted via the -o and -oo flags). They control the __debug__ constant, which is True (by default) or False (when optimizations are enabled), and will omit documentation strings if the level is 2 (or oo) or higher. Using these flags in a production environment allows for very verbose documentation and extra debugging logic in a development environment.

Unfortunately, I implemented this incorrectly in some places. To fine-tune some development debugging routines I had some lines like this:

if __debug__ and DEBUG_CACHE:
    pass

Python’s interpreter doesn’t optimize that away if DEBUG_CACHE is equal to False, or even if it IS False (i tested under 2.7 and 3.5). I should have realized this (or at least tested for it). I didn’t notice this until I profiled my app and saw a bunch of additional statistics logging that should not have been compiled.

The correct way to write the above is:

if __debug__:
    if DEBUG_CACHE:
        pass

Here is a quick test

trying running it with optimizations turned on:

export PYTHONOPTIMIZE=2
python test.py

and with them off

export PYTHONOPTIMIZE
python test.py

As far as the interpreter is concerned during the optimization phase, __debug__(False) and False is True.

import dis

def foo():
    """docstring"""
    if __debug__:
        print("debug message")
    return True

def bar():
    """docstring"""
    if __debug__ and False:
        print("debug message")
    return True

# ============

# this will be a lean function
dis.dis(foo)

print "- - - - - - - -"
# this will show extended logic
dis.dis(bar)

The default version of foo should look like this:

21           0 LOAD_CONST               1 ('debug message')
             3 PRINT_ITEM          
             4 PRINT_NEWLINE       

22           5 LOAD_GLOBAL              0 (True)
             8 RETURN_VALUE        

But you should see that the optimized version of foo creates some very lean code:

22           0 LOAD_GLOBAL              0 (True)
             3 RETURN_VALUE        

Now let’s look at the bar function when unoptimized

26           0 LOAD_GLOBAL              0 (__debug__)
             3 POP_JUMP_IF_FALSE       20
             6 LOAD_GLOBAL              1 (False)
             9 POP_JUMP_IF_FALSE       20

27          12 LOAD_CONST               1 ('debug message')
            15 PRINT_ITEM          
            16 PRINT_NEWLINE       
            17 JUMP_FORWARD             0 (to 20)

28     >>   20 LOAD_GLOBAL              2 (True)
            23 RETURN_VALUE        

Which generates the same code as the optimized version:

26           0 LOAD_GLOBAL              0 (__debug__)
             3 POP_JUMP_IF_FALSE       20
             6 LOAD_GLOBAL              1 (False)
             9 POP_JUMP_IF_FALSE       20

27          12 LOAD_CONST               1 ('debug message')
            15 PRINT_ITEM          
            16 PRINT_NEWLINE       
            17 JUMP_FORWARD             0 (to 20)

28     >>   20 LOAD_GLOBAL              2 (True)
            23 RETURN_VALUE

So let’s add a new function, biz

def biz():
    """docstring"""
    if __debug__:
        if False:
            print("debug message")
    return True

The unoptimized code:

33           0 LOAD_GLOBAL              0 (False)
             3 POP_JUMP_IF_FALSE       14

34           6 LOAD_CONST               1 ('debug message')
             9 PRINT_ITEM          
            10 PRINT_NEWLINE       
            11 JUMP_FORWARD             0 (to 14)

35     >>   14 LOAD_GLOBAL              1 (True)
            17 RETURN_VALUE    

And a lot of that gets optimized away:

35           0 LOAD_GLOBAL              0 (True)
             3 RETURN_VALUE        

Not many people use/abuse how the interpreter compiles code to their advantage, but if you do — pay attention to constructs like this.

Using __debug__ is a great way to hide logging code on a production environment. The evaluation of __debug__ only happens once, when Python first compiles the code, so there is very little overhead.

Optimized Archiving of Historical Time-Series Data for Analytics

The (forthcoming) Aptise platform features a web-indexer that tabulates a lot of statistical data each day for domains across the global internet.

While we don’t currently need this data, we may need it in the future. Keeping it in the “live” database is not really a good idea – it’s never used and just becomes a growing stack of general crap that automatically gets backed-up and archived every day as part of standard operations. It’s best to keep production lean and move this unnecessary data out-of-sight and out-of-mind.

An example of this type of data is a daily calculation on the relevance of each given topic to each domain that is tracked. We’re primarily concerned with conveying the current relevance, but in the future we will address historical trends.

Let’s look at the basic storage requirements of this as a PostgreSQL table:

| Column    | Format  | Size    |
| --------- | ------- | ------- |
| date      | Date    | 8 bytes |
| domain_id | Integer | 4 bytes |
| topic_id  | Integer | 4 bytes |
| count     | Integer | 4 bytes |

Each row is taking 20 bytes, 8 of which are due to date alone.

Storing this in PostgreSQL across thousands of domains and tags every day takes a significant chunk of storage space – and this is only one of many similar tables that primarily have archival data.

An easy way to save some space for archiving purposes is to segment the data by date, and move the storage of date information from a row and into the organizational structure.

If we were to keep everything in Postgres, we would create an explicit table for the date. For example:

CREATE TABLE report_table_a__20160101 (domain_id INT NOT NULL, topic_id INT NOT NULL, count INT NOT NULL DEFAULT 0);

| Column    | Format  | Size    |
| --------- | ------- | ------- |
| domain_id | Integer | 4 bytes |
| topic_id  | Integer | 4 bytes |
| count     | Integer | 4 bytes |

This structure conceptually stores the same data, but instead of repeating the date in every row, we record it only once within the table’s name. This simple shift will lead to a nearly 40% reduction in size.

In our use-case, we don’t want to keep this in PostgreSQL because the extra data complicates automated backups and storage. Even if we wanted this data live, having it within hundreds of tables is a bit much overkill. So for now, we’re going to export the data from a single date into a new file.

SELECT domain_id, topic_id, count FROM table_a WHERE date = '2016-01-01';

And we’re going to save that into a comma delimited file:

table_a-20160101.csv

I skipped a lot of steps above because I do this in Python — for reasons I’m about to explain.

As a raw csv file, my date-specific table is still pretty large at 7556804 bytes — so let’s consider ways to compress it:

Using standard zip compression, we can drop that down to 2985257 bytes. That’s ok, but not very good. If we use xz compression, it drops to a slightly better 2362719.

We’ve already compressed the data to 40% the original size by eliminating the date column, so these numbers are a pretty decent overall improvement — but considering the type of data we’re storing, the compression is just not very good. We’ve got to do better — and we can.

We can do much better and actually it’s pretty easy (!). All we need to do is understand compression algorithms a bit.

Generally speaking, compression algorithms look for repeating patterns. When we pulled the data out of PostgreSQL, we just had random numbers.

We can help the compression algorithm do its job by giving it better input. One way is through sorting:

SELECT domain_id, topic_id, count FROM table_a WHERE date = '2016-01-01' ORDER BY domain_id ASC, topic_id ASC, count ASC;

As a raw csv, this sorted date-specific table is still the same exact 7556804 bytes.

Look what happens when we try to compress it:

Using standard zip compression, we can drop that down to 1867502 bytes. That’s pretty good – we’re at 25.7% the size of the raw file AND it’s 60% the size of the non-sorted zip. That is a huge difference! If we use xz compression, we drop down to 1280996 bytes. That’s even better at 17%.

17% compression is honestly great, and remember — this is compressing the data that is already 40% smaller because we shifted the date column out. If we consider what the filesize with the date column would be, we’re actually at 10% compression. Wonderful.

I’m pretty happy with these numbers, but we can still do better — and without much more work.

As I said above, compression software looks for patterns. Although the sorting helped, we’re still a bit hindered because our data is in a “row” storage format. Consider this example:

1000,1000,1
1000,1000,2
1000,1000,3
1001,1000,1
1001,1000,2
1001,1000,3

There are lots of repeating patterns there, but not as many as if we represented the same information in a “column” storage format:

1000,1000,1000,1001,1001,1001
1000,1000,1000,1000,1000,1000
1,2,3,1,2,3

This is the same data, but as you can see it’s much more “machine” friendly – there are larger repeating patterns.

This transformation from row to column is an example of “transposing an array of data`; performing it (and reversing it) is incredibly easy with Python’s standard functions.

Let’s see what happens when I use transpose_to_csv below on the data from my csv file

def transpose_to_csv(listed_data):
    """given an array of `listed_data` will turn a row-store into a col-store (or vice versa)
    reverse with `transpose_from_csv`"""
    zipped = zip(*listed_data)
    list2 = [','.join([str(i) for i in zippedline]) for zippedline in zipped]
    list2 = '\n'.join(list2)
    return list2

def transpose_from_csv(string_data):
    """given a string of csvdata, will revert the output of `transpose_to_csv`"""
    destringed = string_data.split('\n')
    destringed = [line.split(',') for line in destringed]
    unzipped = zip(*destringed)
    return unzipped

As a raw csv, my transposed file is still the exact same size at 7556804 bytes.

However, if I zip the file – it drops down to 1425585 bytes.

And if I use xz compression… I’m now down to 804960 bytes.

This is a HUGE savings without much work.

The raw data in postgres was probably about 12594673 bytes (based on the savings, the file was deleted).
Stripping out the date information and storing it in the filename generated a 7556804 bytes csv file – a 60% savings.
Without thinking about compression, just lazily “zipping” the file created a file 2985257 bytes.
But when we thought about compression: we sorted the input, transposed that data into a column store; and applied xz compression; we resulted in a filesize of 804960 bytes – 10.7% of the csv size and 6.4% of the estimated size in PostgreSQL.

This considerably smaller amount of data can not be archived onto something like Amazon’s Glacier and worried about at a later date.

This may seem like a trivial amount of data to worry about, but keep in mind that this is a DAILY snapshot, and one of several tables. At 12MB a day in PostgreSQL, one year of data takes over 4GB of space on a system that is treated for high-priority data backups. This strategy turns that year of snapshots into under 300MB of information that can be warehoused on 3rd party systems. This saves us a lot of time and even more money! In our situation, this strategy is applied to multiple tables. Most importantly, the benefits cascade across our entire system as we free up space and resources.

These numbers could be improved upon further by finding an optimal sort order, or even using custom compression algorithms (such as storing the deltas between columns, then compressing). This was written to illustrate a quick way to easily optimize archived data storage.

The results:

Format Compression Sorted? Size % csv+date
csv+date 12594673 100
row csv 7556804 60
row csv zip 2985257 23.7
row csv xz 2362719 18.8
row csv Yes 7556804 60
row csv zip Yes 1867502 14.8
row csv xz Yes 1280996 10.2
col csv Yes 7556804 60
col csv zip Yes 1425585 11.3
col csv xz Yes 804960 6.4

Note: I purposefully wrote out the ASC on the sql sorting above, because the sort order (and column order) does actually factor into the compression ratio. On my dataset, this particular column and sort order worked the best — but that could change based on the underlying data.

The Dangers of URL Shorteners

In preparation for the next release, I just did an audit of all the errors that Aptise/Cliquedin encountered while indexing content.

Out of a few million URLs, there were 35k “well formed” urls that couldn’t be parsed due to “critical errors”.

Most of these 35k errors are due to URL shorteners. A small percentage of them are from shorteners that have dead/broken links. A much larger percentage of them are from shorteners that just do not seem to perform well at all.

I hate saying bad things about any company, but speaking as a former publisher — I feel the need to talk bluntly about this type of stuff. After pulling out some initial findings, I ran more tests on the bad urls from multiple unrelated IP addresses to make sure I wasn’t being singled out for “suspicious” activity. Unfortunately, the behavior was consistent.

The worst offenders are:

* wp.me from WordPress
* ctx.ly from Adobe Social Marketing
* trib.al from SocialFlow when used with custom domains fom Bitly

The SocialFlow+Bitly system was overrepresented because of the sheer number of their clients and urls they handle — and I understand that may skews things — but they have some interesting architecture decisions that seem to be reliably bad for older content. While I would strongly recommend that people NOT use any of these url shortening services, I more strongly recommend that people do not use SocialFlow’s system with a Custom Domain through bitly. I really hate saying this, but the performance is beyond terrible — it’s totally unacceptable.

The basic issue across the board is this: these systems perform very well in redirecting people to newer content (they most likely have the mapping of newer unmaksed urls in a cache), but they all start to stall when dealing with older content that probably needs a database query. I’m not talking about waiting a second or two for a redirect to happen — I’m consistently experiencing wait times from these systems that are reliably over 20 seconds long — and often longer. When using a desktop/mobile web browser, the requests reliably timeout and the browser just gives up.

While wp.me and ctx.ly have a lag as they lookup a url to redirect users to a masked url, the SocialFlow + Bitly combo has a design element that works differently:

1. Request a custom shortened url `http://m.example.com/12345`
2. The shortened url is actually served by Bitly, and you will wait x seconds for lookup + redirect
3. Redirected to a different short url on trib.al (SocialFlow) http://trib.al/abcde
4. Wait x seconds for second lookup + redirect
5. Redirected to end url `http://example.com/VERY+LONG+URL”

All the shorteners could do a better job with speeding up access to older and “less popular” content. But if I were SocialFlow, I would remove Bitly from the equation and just offer custom domains myself. Their service isn’t cheap, and from the viewpoint of a publisher — wait times like this are totally unacceptable.

trouble installing psycopg2 on OSX ?

I had some trouble with a new virtualenv — psycopg2 wouldn’t install.

I remembered going though this before, but couldn’t find my notes. I ended up fixing it before finding my notes ( which point to this StackOverflow question http://stackoverflow.com/questions/2088569/how-do-i-force-python-to-be-32-bit-on-snow-leopard-and-other-32-bit-64-bit-quest ) , but I want to share this with others.

psycopg2 was showing compilation errors in relation to some PostgreSQL libraries

> ld: warning: in /Library/PostgreSQL/8.4.5/lib/libpq.dylib, missing required architecture x86_64 in file

so then I checked how the file was built:

lets see how that was built…

$ file /Library/PostgreSQL/8.4.5/lib/

> /Library/PostgreSQL/8.4.5/lib/libpq.dylib: Mach-O universal binary with 2 architectures
> /Library/PostgreSQL/8.4.5/lib/libpq.dylib (for architecture ppc): Mach-O dynamically linked shared library
> /Library/PostgreSQL/8.4.5/lib/libpq.dylib (for architecture i386): Mach-O dynamically linked shared library i386

Crap. it’s built i386 only. The fix is easy right? We just need to export archflags and build.

$ export ARCHFLAGS=”-arch i386″
$ pip install –upgrade psycopg2

That works perfect, right?

Wrong.

> File “/environments/example-2.7.5/lib/python2.7/site-packages/psycopg2/__init__.py”, line 50, in
> from psycopg2._psycopg import BINARY, NUMBER, STRING, DATETIME, ROWID
>ImportError: dlopen(/environments/example-2.7.5/lib/python2.7/site-packages/psycopg2/_psycopg.so, 2): no suitable image found. Did find:
> /environments/example-2.7.5/lib/python2.7/site-packages/psycopg2/_psycopg.so: mach-o, but wrong architecture

I was dumbfounded for a few seconds, then I realized — Python was trying to run 64 bit (x86_64) , but I only have the 32 bit library.

the right fix? Rebuild PostgreSQL to support 64bit.

My PostgrSQL was a prebuilt package, and I don’t have time to fix that, so I need to do a few hacky/janky things

basically , we’re going to force python to run in i386 ( and not 64bit )

go to our virtualenv…

cd /environments/example-2.7.5/bin

back it up

cp python python-original

strip it…

# note that our last arg is the input, and the 2nd to last it output
lipo -thin i386 -output python-i386 python

replace it

rm python
mv python-i386 python

now install psycopg2

export ARCHFLAGS=”-arch i386″
pip install –upgrade psycopg2

yay this works !

now get some work done and save some time so you can build a 64bit PostgreSQL

What a Product Manager Is and Isn't, and Why You Should Probably Stop Trying to Hire One.

I’ve had a lot of people contact me over the past two years trying to recruit me for a Product Manger role or looking for referrals to qualified candidates. I have a solid network and am well respected in NY Technology, Advertising and Publishing circles — so I’m used to constant pings by Executives I’ve consulted with or recruiters I’ve worked with and am happy to help when I can.

I feel compelled to write a post because out of several dozen inquiries for positions titled with some variation of “Product Manger”, only one was actually involved with any sort of product management. The rest? Sigh…

There’s been a huge conflation of terms with regard to “product management” in the past few years and it seems to be over-represented in NYC area. This conflation really needs to stop. Now.

The role of a Product Manager has a bit of variation in it’s definition, but it’s usually something around the lines of “the person who is ultimately responsible for a product”. In a large organization, Product Managers are essentially divisional GMs or ‘micro-ceos’; in smaller ( and tech ) organizations, they tend to be inter-disciplinary people who might report to a “head of product” or directly to the CEO.

Generally speaking: Product Managers are highly skilled and highly experienced professionals, often with extensive background across one or more areas, who are tasked with developing or fine-tuning what a ‘product’ should be to best achieve business goals.

Most “Product Managers” I’ve known can be categorized like this:

* Most have 10+ years of professional experience, with pretty impressive track records; rarely do they have less than 5years experience;
* They either have advanced degrees like an MBA, MS, PHD or work-based equivalent, i.e. a C/VP/D level employee who have done some stellar work;
* All are experts / authorities in at least one discipline — and can somewhat function in whatever roles they oversee/interact with, as they’ve quite a bit of experience working across them. They understand when the Engineers are slacking off or overworking, when the Marketers have a ridiculous request, and when the project managers are over/under promising.

Sometimes people have a strong technical background – but that’s not a requirement, it’s a bonus over their experience leading teams and deeply understanding the marketplace they’re working in.

To give some quick examples:

1. I was recently at eConsultancy’s Digital Cream NYC event, in a room full of 150 people who were mostly Chief Marketing Officers / VPs of Marketing. If I were a technology company in the advertising space or a publisher looking to sell innovative new ad solutions, I would want to recruit a Product Manager from the attendee list. This is rather simple – the person who could best manage my advertising product, would be an expert in advertising. Few (if any) people there had any coding experience whatsoever.

2. Several publications that I know of built out Editorial Product departments staffed with former Senior Editors and Operational Editors. What better way to deliver on editorial needs than by hiring a seasoned journalist ?

3. A friend literally wrote the book on a certain technology, and is often called in to advise on different implementations of it — addressing the costs to scale/iterate, user behaviors, implementations, etc. He tends to advise people in a very “product management” capacity.

4. When Facebook buys a startup, their executive staff tend to be acquired as Product Managers to own a section of the Facebook experience.

Some of the things a Product Manager typically does is:

* Understand and manage the business goals: identify the best business opportunities , create and push products to address them.
* Understand the functionality and scope of the product: if it’s technology, they can code; if it’s a marketing product, they understand how and why advertising is bought.
* Understand the customers: make sure people will want to consume the product
* Make decisions and be qualified to make them: balance a mix of Strategic Decisions ( into markets or users ) and Operations ( costs to iterate – both financially and team morale )
* Manage the process : work with P&L sheets, quarterback the scope/design/build/deploy/sales process.
* other things I’m too tired to note. Product Managers are tasked with balancing the goals of the Organization against the needs of multiple types of Consumers and the people/resources to build them. It’s a lot of work, but it’s amazing fun for a lot of us.

The scores of “Product Manager” positions that are plentiful in NYC right now are nothing like my descriptions above – they tend to be a hybrid of skills belonging to a Digital Producer ( in the adverting world ) and Project Manager ( in , well any industry ). They are mostly what I consider entry level – with a max of 3 total years work experience , but often in the 1-2 range.

These positions tend to be highly administrative , require no expertise or inter-disciplinary skills, and don’t even have access to seeing budgets — much less managing them or trying to affect revenue operations. Sometimes they’ll include a bit of customer development work, but most often they don’t. These positions completely lack a “Strategy” component, tending to either be a very entry level position or a mislabling for the most incredibly experienced and talented Project Manager you’ve ever met.

Almost always these roles become filled by someone who honestly shouldn’t have that job. One of my more favorite “Product Manager” interactions was with someone who had just assumed the new role as their second-ever job, with their first job being several years as a Customer Service representative. If the company provided Customer Service, it would have been a really good fit — but the company provided a very technical service, and their “Product Manager” was really functioning more like a mix of an “Account Manager” and “Digital Producer”, they were visibly out of their element and unable to understand the needs of their clients or the capabilities of their team.

This is really a dis-service to everyone involved.

* It makes a potential employers look foolish to actual Product Managers , and labeled as a company to avoid.
* It skips over a huge pool of extremely talented Digital Producers and Project Managers who would excel at these roles.
* It creates a generation of early-career professionals with the title of a Product Manager, but without the relevant experience or skills to back it up.

Because “Product Manager” is so often a role that an experienced professional transitions into, it’s not uncommon to see someone with 1-2 years of “Product Manager” in their title, but a resume that shows 3 years as a Vice President and 5 years as a Director at a previous employer. You might even see someone with 3 years of “Product Manager” as a title — but an additional 9 years of “Digital Producer” or “Project Manager” experience behind them as well. Plenty of professionals from the Production side transition into Product Management too, once they’re well versed in their respective industries.

Mindless recruiters ( and certain nameless conglomerates ) of NYC don’t understand this though. They just focus on buzz-words: if someone has been in “product” for 2 years, they target them as if they’ve only been a professional for that long. It’s all too common for the salary cap of a not-really-a-product-manager position to be 1/4 the targeted recruit’s current salary. The compensation package and role should be commensurate with the full scope of someone’s work — i.e. 12 years, not 3 years.

So my point is simple – if you’re hiring a “Product Manger” you should really think at what you expect out of the role.

* If you’re really looking for a “Project Manager” or “Digital Producer” — which you most likely are — change your posting and recruit that person. You’ll find a great employee and give them a job they really want and care about. If you manage to get a Product Manager in that role, they’re going to be miserable and walk out the door.

* If you realize that you’re looking for a role that its both strategic and operational — and is going to be one of the most important hires for your organization or division, then either hire someone with relevant Product Management experience OR hire a relevant expert to be your “Product Manager”.

Dreamhost UX Creates Security Flaw

Last week I found a Security flaw on Dreamhost caused by the User Experience on their control panel. I couldn’t find a security email, so I posted a message on Twitter. Their Customer Support team reached out and assured me that an email response would be addressed. Six days later I’ve heard nothing from them, so I feel forced to do a public disclosure.

I was hoping that they would do the responsible thing, and immediately fix this issue.

## The issue:

If you create a Subversion repository, there is a checkbox option to add on a “Trac” interface – which is a really great feature, as it can be a pain to set up on their servers yourself (something I’ve usually done in the past).

The exact details of how the “one-click” Trac install works aren’t noted though, and the integration doesnt “work as you would probably expect” from the User Experience path.

If you had previous experience with Trac, and you were to create a “Private” SVN repository on Dreamhost – one that limits access to a set of username/passwords – you would probably assume that access to the Trac instance is handled by the same credentials as the SVN instance, as Trac is tightly integrated into Subversion.

If you had no experience with Trac, you would probably be oblivious to the fact that Trac has it’s own permissions system, and assume your repository is secured from the option above.

The “one click” Trac install from Dreamhost is entirely unsecured – the immediate result of checking the box to enable Trac on a “private” repository, is that you inherently are publicly publishing that repo from within the Trac browser.

For example, if you were to install a private subversion and one-click Trac install onto a domain like this:

my.domain.com/svn
my.domain.com/trac

The /svn source would be private however it would be publicly available under /trac/browser due to the default one-click install settings.

Here’s a marked-up screenshot of the page that shows the conflicting options ( also on http://screencast.com/t/A2VQT5gOVkK )

I totally understand how the team at Dreamhost that implemented the Trac installer would think their approach was a good idea, because in a way it is. A lot of people who are familiar with Trac want to fine-tune the privileges using Trac’s own very-robust permissions system, deciding who can see the source / file tickets / etc. The problem is that there is absolutely no mention of an alternate permissions system contained within Trac – or that someone may need to fine-tune the Trac permissions. People unfamiliar with Trac have NO IDEA that their code is being made public, and those familiar with Trac would not necessarily realize that a fully unsecured setup is being created. I’ve been using Trac for over 8 years , and the thought of the default integrations being setup like this is downright silly – it’s the last thing I would expect a host to do.

I think it would be totally fine if there is just a “Warning!” sign next to the “enable Trac” — with a link to Trac’s wiki for customization , or instructions ( maybe even a checkbox option ) on how a user can have Trac use the same authorization file as subversion.

But, and this is a huge BUT, people need to be warned that clicking the ‘enable Trac’ button will publish code until Trac is configured. People who are running Trac via an auto-install need to be alerted of this immediately.

This can be a huge security issue depending on what people store in Subversion. Code put in Subversion repositories tends to contain Third Party Account Credentials ( Amazon AWS Secrets/Keys, Facebook Connect Secrets, Paypal/CreditCard Providers, etc ), SSH Keys for automated code deployment, full database connection information, administrator/account default passwords — not to mention the exact algorithms used for user account passwords.

## The fix

If you have a one-click install of Trac tied to Subversion on Dreamhost and you did not manually set up permissions, you need to do the following IMMEDIATELY:

### Secure your Trac installation

If you want to use Trac’s own privileges, you should create this .htaccess file in the meantime to disable all access to the /trac directory

deny from all

Alternately, you can map access your Trac install to the Subversion password file with a .htaccess like this:

AuthType Basic
AuthUserFile /home/##SHELL_ACCOUNT_USER##/svn/##PROJECT_NAME##.passwd
AuthName “##PROJECT_NAME##”
require valid-user

### Audit your affected code and services.

* All Third Party Credentials should be immediately trashed and regenerated.
* All SSH Keys should be regenerated
* All Database Accounts should be reset.
* If you don’t have a secure password system in place , you need up upgrade

## What are the odds of me being affected ?

Someone would need to figure out where your trac/svn repos are to exploit this. Unless you’ve got some great obscurity going on, it’s pretty easy to guess. Many people still like to deploy using files served out of Subversion (it was popular with developers 5 years ago before build/deploy tools became the standard) , if that’s the case and Apache/Nginx aren’t configured to reject .svn directories — your repo information is public.

When it comes to security, play it safe. If your repo was accidentally public for a minute, you should wipe all your credentials.

Want to win? Make it easier, not harder.

In March of 2011 I represented Newsweek & The Daily Beast at the Harvard Business School / Committee of Concerned Journalists “Digital Leaders Summit”. Just about every major media property sent an executive there, and I was privileged enough to represent the newly formed NewsBeast (Newsweek+TheDailyBeast had recently merged, but have since split).

Over the course of two days, we covered a lot of concerns across the industry – analyzing who was doing things right and how/why others were making mistakes.

On the first day of the summit we looked at how Amazon was posturing itself for digital book sales – where their profits were hoping to be, where their losses were expected, and strategies for finding the optimal price structure for digital goods.

Inevitably, the conversation sidetracked to the Apple Ecosystem, which had just announced Subscriptions and their eBooks plan — consequently being their new competitor.

One of the other 30 or so people in attendance was Jeffrey Zucker from NBC, who went into his then-famous “digital pennies vs. analog dollars” diatribe. He made a compelling, intelligent, and honest argument that captivated the minds and attention of the entire room. Well, most of the room.

I vehemently disagreed with all his points and quickly spoke up to grab the attention of the floor… “apologizing” from breaking with the conventional view of this subject, and asking people to look at the situation from another point of view. Yes, it was true as Zucker stated that Apple standardized prices for digital downloads and set the pricing on their terms – not the producer’s. Yet, it was true that Apple allowed for records to be purchased “in part” and not as a whole – shifting purchase patters, and yes to a lot of other things.

And yes – Jeffrey Zucker didn’t say anything that was “wrong” – everything he said was right. But it was analyzed from the wrong perspective. Simply put, Zucker and most of the other delegates were only looking at portion of the scenario and the various mechanics at play. The prevailing wisdom in the room was way off the mark… by miles.

Apple didn’t gain dominance in online music because of their pricing system or undercutting retailers – which everyone believed. Plain and simple, Apple took control of the market because they made it fundamentally easier and faster for someone to legally buy music than to steal it. When they first launched (and still in 2012) it takes under a minute for someone to find and buy an Album or Single in the iTunes store. Let me stress that – discovery, purchase and delivery takes under a minute. Apple’s servers were relatively fast at the start as well – an entire album could be downloaded within an hour.

In contrast, to legally purchase an album in the store would take at least two hours – and at the time they first launched, encoding an album to work on an MP3 player would take another hour. To download a record at that time would be even longer: services like Napster (already dead by the iTunes launch) could take a day to download; torrent systems could take a day; while file upload sites were generally faster, they suffered from another issue that torrents and other options did as well – mislabeled and misdirected files.

Possibly the only smart thing the Media Industry has ever done to curb piracy is what I call the “I Am Spartacus” method — wherein “crap” files are mislabeled to look like Top 40 hits. For example: in expectation of a new Jay-Z record, internet filesharing sites are flooded with uploads that bear the name of the record… but contain white noise, another record, or an endless barrage of insults (ok, maybe not the last one… but they should).

I pretty much shut the room up at that point, and began a diatribe of my own – which I’ll repeat and continue here…

At the conference, Jeffrey Zucker and some other media executives tended to look at the digital economy like this: If there are 10 million Apple downloads of the new Beyonce record or the 2nd Season of “Friends”, those represent 10 million diverted sales of a $17.99 CD – or 10MM diverted sales of a $39.99 dvd. If Apple were to sell the CD for 9.99 with a 70% cut, they’re only seeing $7 in revenue for every $17.99 — 100 million times. Similarly, if 10MM people are watching Friends for $13.99 (or whatever cost) on AppleTV instead of buying $29.99 box sets, that’s about $20 lost per viewer — 10 million times.

To this point, I called bullshit.

Digital goods such as music and movies have incredibly diminished costs for incremental units, and for most of these products they are a secondary market — records tend to recoup their various costs within the first few months, and movies/tv-shows tend to have been wildly profitable on-TV / in-Theaters. The music recording costs 17.99 and the DVD 29.99 , not because of fixed costs and a value chain… but because $2 of plastic, or .02¢ of bandwidth, is believed by someone to be able to command that price.

Going back to our real-life example, 10MM downloads of “Friends” for 13.99 doesn’t equate to 10MM people who would have purchased the DVD for $39.99. While a percentage of the 10MM may have been willing to purchase the DVDs for the higher price, another — larger — percentage would not have. By lowering the price from 39.99 to 13.99, the potential market had likely changed from 1MM consumers to 10MM. Our situation is not an “apples-to-apples” comparison — while we’re generating one third the revenue, we’re moving ten times as many units and at a significantly lower cost (no warehousing, mfg, transit, buybacks, etc).

While hard copies are priced to cover the actual costs associated with manufacturing and distributing the media, digital media is flexibly priced to balance convenience with maximized revenue.

Typical retail patterns release a product at a given introductory price (e.g. $10) for promotional period, raise it to a sustained premium for an extended period of time (e.g. $17), then lower it via deep discounted promotions for holiday sales or clearance attempts (e.g. $5). Apple ignored the constant re-pricing and went for a standardized plan at simple price-points.

Apple doesn’t charge .99¢ for a song, or $1.99 for a video because of some nefarious plan to undervalue media — they came up with those prices because those numbers can generate significant revenue while being an inconsequential purchase. At .99¢ a song or $9.99 an album, consumer’s simply don’t think. We’re talking about a dollar for a song, or a ten dollar bill for a record.

Let me rephrase that, we’re talking about a fucking dollar for a song. A dollar is a magical number, because while it’s money, it’s only a dollar. People lose dollar bills all the time, and rationalize the most ridiculous of purchases away… because it’s only a dollar. It’s four quarters. You could find that in the street or in your couch. A dollar is not a barrier or a thought. You’ll note that a dollar is not far off from the price of a candy bar, which retailers incidentally realized long ago that “Hey – let’s put candy bars next to the cash registers and keep the prices relatively low, so people make impulse buys and just add it onto their carts”.

Do you know what happens when you charge a dollar for something? People just buy it. At 13.99 – 17.99 for a cd, people look at that as a significant purchase — one that competes with food, vacations, their children’s college savings. When you charge a dollar a song – or ten dollars a record – people don’t make those comparisons… they just buy.

And buy, and buy, and buy. Before you know it, people end up buying more goods — spending more money overall on media than they would have under the old model. Call me crazy, but I’d rather sell 2 items with little incremental cost at $9.99 each than 1 item at $13.99 — or even 1 item at $17.99.

Unfortunately, the current stable of media executives – for the most part – just don’t get this. They think a bunch of lawyers, lobbyists and paying off politicians for sweetheart legislations are the best solution. Maybe that worked 50 years ago, but in this day and age of transparency and immediacy, it justq doesn’t.

Today: you need to swallow you pride, realize that people are going to steal, that the ‘underground’ will always be ahead of you, and instead of wasting time + money + energy with short-term bandaids which try to remove piracy ( and need to be replaced every 18months ) — you should invest your time and resources into making it easier and cheaper to legally consume content. Piracy of goods will always exist, it is an economic and human truth. You can fight it head-on, but why? There will always be more pirates to fight; they’re motivated to free content, and they’re doubly motivated to outsmart a system. Fighting piracy is like a chinese finger trap.

Instead of spending millions of dollars chasing 100% market share that will never happen (and I can’t stress that enough, it will never happen), you could spend thousands of dollars addressing the least-likely pirates and earn 90% of the market share — in turn generating billions more in revenue each year.

Until decision makers swallow their pride and admit they simply don’t understand the economics behind a digital world, media companies are going to constantly and mindlessly waste money. Almost every ( if not EVERY ) attempt at Digital Rights Management by major media companies has been a catastrophe – with most just being a waste of money, while some have resulted in long term compliance costs. I can’t say this strongly enough: nearly the entire industry of Digital Rights Management is a complete failure and not worth addressing.

Today, the media industry is at another crossroads. Intellectual property rights holders are getting incredibly greedy , and trying to manipulate markets which they clearly don’t understand. In the past 12 hours I’ve learned how streaming rights to Whitney Houston movies were pulled from major digital services after her death to increase DVD sales [ I would have negotiated with digital companies for an incremental ‘fad’ premium, expecting the hysteria to die down before physical goods could be made ], and read a dead-on comic by The Oatmeal on how it has – once again – become easer to steal content than to legally purchase it [ http://theoatmeal.com/comics/game_of_thrones ].

As I write this (Feb 2012) it is faster to steal a high quality MP3 (or FLAC) of record than it is to either: a) rip the physical CD to the digital version or b) download the item from iTunes ( finding/buying is still under a minute ). Regional release dates for music , movies and TV are unsynchronized (on purpose!) , which ends up in the perverse scenario where people in different regions become incentivized to traffic content to one another — i.e. a paying subscriber of a premium network in Europe would illegally download an episode when it first airs on the affiliate in the United States, one month before the European date.

Digital economics aren’t rocket science, they’re drop-dead simple:

  1. If you make things fast and easy to legally purchase, people will purchase it.
  2. If you make things cheap enough, people will buy them – without question , concern, or weighing the purchase into their financial plans.
  3. If you make it hard or expensive for people to legally purchase something, they will turn to “the underground” and illegal sources.
  4. Piracy will always exist, innovators will always work to defy Digital Rights Management, and as much money as you throw at creating anti-piracy measures… there will always be a large population of brilliant people working to undermine them.

My advice is simple: pick your battles wisely. If you want to win in digital media, focus on the user experience and maximizing your revenue generating audience. If your content is good, people will either buy it or steal it – if your content is bad, they’re going somewhere else.

I’m glad to no longer be in corporate publishing. I’m glad to be back in a digital-only world, working with startups , advertising agencies, and media companies that are focused on building the future… not trying to save an ancient business model.

2016 Update

Re-reading this, I can’t help but draw the parallels to the explosion of Advertising and Ad Blocking technologies in recent years. Publishers have gotten so greedy trying to extract every last cent of Advertising revenue and including dozens of vendor/partner javascript tags, that they have driven even casual users to use Ad Blocking technologies.

Python Fun : Upgrading to 2.7 on OSX ; Installing Mysql-Python on OSX against MAMP ( ruby gem too)

I needed to upgrade from Python 2.6 to 2.7 and ran into a few issues along the way. Learn from my encounters below.

# Upgrading Python
Installing the new version of Python is really quick. Python.org publishes a .dmg installer that does just about everything for you. Let me repeat “Just about everything”.

You’ll need to do 2 things once you install the dmg, however the installer only mentions the first item below:

1. Run “/Applications/Python 2.x/Update Shell Command”. This will update your .bash_profile to look in “/Library/Frameworks/Python.framework/Versions/2.x/bin” first.

2. After you run the app above, in a NEW terminal window do the following:

* Check to see you’re running the new python with `python –version` or `which python`
* once you’re running the new python, re-install anything that installed an executable in bin. THIS INCLUDES SETUPTOOLS , PIP , and VIRTUALENV

It’s that second thing that caught me. I make use of virtualenv a lot, and while I was building new virtualenvs for some projects I realized that my installs were building against `virtualenv` and `setuptools` from the stock Apple install in “/Library/Python/2.6/site-packages” , and not the new Python.org install in “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages”.

It’s worth nothing that if you install setuptools once you’ve upgraded to the Python.org distribution, it just installs into the “/Library/Frameworks/Python.framework” directory — leaving the stock Apple version untouched (basically, you can roll back at any time).

# Installing Mysql-Python (or the ruby gem) against MAMP

I try to stay away from Mysql as much as I can [ i <3 PostgreSQL ], but occasionally I need to run it: when I took over at TheDailyBeast.com, they were midway through a relaunch on Rails , and I have a few consulting clients who are on Django. I tried to run cinderella a while back ( http://www.atmos.org/cinderella/ ) but ran into too many issues. Instead of going with MacPorts or Homebrew, I've opted to just use MAMP ( http://www.mamp.info/en/index.html ) There's a bit of a problem though - the persons who are responsible for the MAMP distribution decided to clear out all the mysql header files, which you need in order to build the Python and Ruby modules. You have 2 basic options: 1. Download the "MAMP_components" zip (155MB) , and extract the mysql source files. i often used to do this, but the Python module needed a compiled lib and I was lazy so... 2. Download the tar.gz version of Mysql compiled for OSX from http://dev.mysql.com/downloads/mysql/ Whichever option you choose, the next steps are generally the same: ## Copy The Files ### Where to copy the files ? mysql_config is your friend. at least the MAMP one is. Make sure you can call the right mysql_config, and it'll tell you where the files you copy should be stashed. Since we're building against MAMP we need to make sure we're referencing MAMP's mysql_config
iPod:~jvanasco$ which myqsl_config
/Applications/MAMP/Library/bin/mysql_config

iPod:~jvanasco$ mysql_config
Usage: /Applications/MAMP/Library/bin/mysql_config [OPTIONS]
Options:
–cflags [-I/Applications/MAMP/Library/include/mysql -fno-omit-frame-pointer -D_P1003_1B_VISIBLE -DSIGNAL_WITH_VIO_CLOSE -DSIGNALS_DONT_BREAK_READ -DIGNORE_SIGHUP_SIGQUIT -DDONT_DECLARE_CXA_PURE_VIRTUAL]
–include [-I/Applications/MAMP/Library/include/mysql]
–libs [-L/Applications/MAMP/Library/lib/mysql -lmysqlclient -lz -lm]
–libs_r [-L/Applications/MAMP/Library/lib/mysql -lmysqlclient_r -lz -lm]
–plugindir [/Applications/MAMP/Library/lib/mysql/plugin]
–socket [/Applications/MAMP/tmp/mysql/mysql.sock]
–port [3306]
–version [5.1.44]
–libmysqld-libs [-L/Applications/MAMP/Library/lib/mysql -lmysqld -ldl -lz -lm]

### Include

Into /Applications/MAMP/include you need to place the mysql include files into a subdirectory called “mysql”


mkdir -P /Applications/MAMP/Library/include
cp -Rp MySQL-Distribution/include /Applications/MAMP/Library/include/mysql

### Lib

Into /Applications/MAMP/Library/lib you need to place the mysql lib files


mkdir -P /Applications/MAMP/Library/include
cp -Rp MySQL-Distribution/lib /Applications/MAMP/Library/lib/mysql

## Configure the Env / Installers

Note: if you’re installing for a virtualenv, this needs to be done after it’s been activated.

Set the archflags on the commandline:


export ARCHFLAGS="-arch $(uname -m)"

### Python Module

I found that the only way to install the module is by downloading the source ( off sourceforge! ).

I edited site.cfg to have this line:


mysql_config = /Applications/MAMP/Library/bin/mysql_config

Basically, you just need to tell mysql to use the MAMP version of mysql_config to figure everything out itself.

the next steps are simply:


python setup.py build
python setup.py install

If you get any errors, pay close attention to the first few lines.

If you see something like the following within the first 10-30 lines, it means the various files we placed in the step above are not where the installer wants them to be:


_mysql.c:36:23: error: my_config.h: No such file or directory
_mysql.c:38:19: error: mysql.h: No such file or directory
_mysql.c:39:26: error: mysqld_error.h: No such file or directory
_mysql.c:40:20: error: errmsg.h: No such file or directory

If you look up a few lines, you might see something like this:


building '_mysql' extension
gcc-4.0 -fno-strict-aliasing -fno-common -dynamic -g -O2 -DNDEBUG -g -O3 -arch i386 -Dversion_info=(1,2,3,'final',0) -D__version__=1.2.3 -I/Applications/MAMP/Library/include/mysql -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c _mysql.c -o build/temp.macosx-10.5-i386-2.7/_mysql.o -fno-omit-frame-pointer -D_P1003_1B_VISIBLE -DSIGNAL_WITH_VIO_CLOSE -DSIGNALS_DONT_BREAK_READ -DIGNORE_SIGHUP_SIGQUIT -DDONT_DECLARE_CXA_PURE_VIRTUAL

note how we see “/Applications/MAMP/Library/include/mysql” in there. When I quickly followed some instructions online that had all the files in /include — and not in that subdir — this error popped up. Once I changed the directory structure to match what my mysql_config wanted, the package installed perfectly.

### Ruby Gem

Assuming you’re using bundler:


bundle config build.mysql
--with-mysql-include=/Applications/MAMP/Library/include/mysql/
--with-mysql-lib=/Applications/MAMP/Library/lib
--with-mysql-config=/Applications/MAMP/Library/bin/mysql_config

and then before you do a `bundle install` , set the env vars


> export ARCHFLAGS="-arch x86_64"

or


> export ARCHFLAGS="-arch $(uname -m)"

## Test it

If things install nicely, lets make sure it works…


$ipod:~ jvanasco$ python
>>> import _mysql

Oh , crap:


Traceback (most recent call last):
File "", line 1, in
File "build/bdist.macosx-10.5-i386/egg/_mysql.py", line 7, in
File "build/bdist.macosx-10.5-i386/egg/_mysql.py", line 6, in __bootstrap__
ImportError: dlopen(/Users/jvanasco/.python-eggs/MySQL_python-1.2.3-py2.7-macosx-10.5-i386.egg-tmp/_mysql.so, 2): Library not loaded: libmysqlclient.18.dylib
Referenced from: /Users/jvanasco/.python-eggs/MySQL_python-1.2.3-py2.7-macosx-10.5-i386.egg-tmp/_mysql.so
Reason: image not found

Basically what’s happening is that as you run it, mysql_python drops a shared object in your userspace. That shared object is referencing the location that the Mysql.org distribution placed all the library files — which differs from where we placed them in MAMP.

There’s a quick fix — add this to your bash profile, or run it before starting mysql/your app:

export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:DYLD_LIBRARY_PATH=/Applications/MAMP/Library/lib/mysql"

# Conclusion

There are too many posts on this subject matter to thank. A lot of people posted variations of this method – to which I’m very grateful – however no one addressed troubleshooting the Python process , which is why I posted this.

I also can’t stress the simple fact that if the MAMP distribution contained the header and built library files, none of this would be necessary.