Small gotcha under Python’s PYTHONOPTIMIZE feature

A long, long, long time ago I started using interpreter optimizations to help organize code. If a block of code is within a constant (or expression of constants) that evaluate to False, then the block (or line) is optimized away.

if (false){ # this is optimized away }

Perl was very generous, and would allow for constants, true/false, and operations between the two to work.

Thankfully, Python has some optimizations available via the PYTHONOPTIMIZE environment variable (which can also be adjusted via the -o and -oo flags). They control the __debug__ constant, which is True (by default) or False (when optimizations are enabled), and will omit documentation strings if the level is 2 (or oo) or higher. Using these flags in a production environment allows for very verbose documentation and extra debugging logic in a development environment.

Unfortunately, I implemented this incorrectly in some places. To fine-tune some development debugging routines I had some lines like this:

if __debug__ and DEBUG_CACHE:
    pass

Python’s interpreter doesn’t optimize that away if DEBUG_CACHE is equal to False, or even if it IS False (i tested under 2.7 and 3.5). I should have realized this (or at least tested for it). I didn’t notice this until I profiled my app and saw a bunch of additional statistics logging that should not have been compiled.

The correct way to write the above is:

if __debug__:
    if DEBUG_CACHE:
        pass

Here is a quick test

trying running it with optimizations turned on:

export PYTHONOPTIMIZE=2
python test.py

and with them off

export PYTHONOPTIMIZE
python test.py

As far as the interpreter is concerned during the optimization phase, __debug__(False) and False is True.

import dis

def foo():
    """docstring"""
    if __debug__:
        print("debug message")
    return True

def bar():
    """docstring"""
    if __debug__ and False:
        print("debug message")
    return True

# ============

# this will be a lean function
dis.dis(foo)

print "- - - - - - - -"
# this will show extended logic
dis.dis(bar)

The default version of foo should look like this:

21           0 LOAD_CONST               1 ('debug message')
             3 PRINT_ITEM          
             4 PRINT_NEWLINE       

22           5 LOAD_GLOBAL              0 (True)
             8 RETURN_VALUE        

But you should see that the optimized version of foo creates some very lean code:

22           0 LOAD_GLOBAL              0 (True)
             3 RETURN_VALUE        

Now let’s look at the bar function when unoptimized

26           0 LOAD_GLOBAL              0 (__debug__)
             3 POP_JUMP_IF_FALSE       20
             6 LOAD_GLOBAL              1 (False)
             9 POP_JUMP_IF_FALSE       20

27          12 LOAD_CONST               1 ('debug message')
            15 PRINT_ITEM          
            16 PRINT_NEWLINE       
            17 JUMP_FORWARD             0 (to 20)

28     >>   20 LOAD_GLOBAL              2 (True)
            23 RETURN_VALUE        

Which generates the same code as the optimized version:

26           0 LOAD_GLOBAL              0 (__debug__)
             3 POP_JUMP_IF_FALSE       20
             6 LOAD_GLOBAL              1 (False)
             9 POP_JUMP_IF_FALSE       20

27          12 LOAD_CONST               1 ('debug message')
            15 PRINT_ITEM          
            16 PRINT_NEWLINE       
            17 JUMP_FORWARD             0 (to 20)

28     >>   20 LOAD_GLOBAL              2 (True)
            23 RETURN_VALUE

So let’s add a new function, biz

def biz():
    """docstring"""
    if __debug__:
        if False:
            print("debug message")
    return True

The unoptimized code:

33           0 LOAD_GLOBAL              0 (False)
             3 POP_JUMP_IF_FALSE       14

34           6 LOAD_CONST               1 ('debug message')
             9 PRINT_ITEM          
            10 PRINT_NEWLINE       
            11 JUMP_FORWARD             0 (to 14)

35     >>   14 LOAD_GLOBAL              1 (True)
            17 RETURN_VALUE    

And a lot of that gets optimized away:

35           0 LOAD_GLOBAL              0 (True)
             3 RETURN_VALUE        

Not many people use/abuse how the interpreter compiles code to their advantage, but if you do — pay attention to constructs like this.

Using __debug__ is a great way to hide logging code on a production environment. The evaluation of __debug__ only happens once, when Python first compiles the code, so there is very little overhead.

Optimized Archiving of Historical Time-Series Data for Analytics

The (forthcoming) Aptise platform features a web-indexer that tabulates a lot of statistical data each day for domains across the global internet.

While we don’t currently need this data, we may need it in the future. Keeping it in the “live” database is not really a good idea – it’s never used and just becomes a growing stack of general crap that automatically gets backed-up and archived every day as part of standard operations. It’s best to keep production lean and move this unnecessary data out-of-sight and out-of-mind.

An example of this type of data is a daily calculation on the relevance of each given topic to each domain that is tracked. We’re primarily concerned with conveying the current relevance, but in the future we will address historical trends.

Let’s look at the basic storage requirements of this as a PostgreSQL table:

| Column    | Format  | Size    |
| --------- | ------- | ------- |
| date      | Date    | 8 bytes |
| domain_id | Integer | 4 bytes |
| topic_id  | Integer | 4 bytes |
| count     | Integer | 4 bytes |

Each row is taking 20 bytes, 8 of which are due to date alone.

Storing this in PostgreSQL across thousands of domains and tags every day takes a significant chunk of storage space – and this is only one of many similar tables that primarily have archival data.

An easy way to save some space for archiving purposes is to segment the data by date, and move the storage of date information from a row and into the organizational structure.

If we were to keep everything in Postgres, we would create an explicit table for the date. For example:

CREATE TABLE report_table_a__20160101 (domain_id INT NOT NULL, topic_id INT NOT NULL, count INT NOT NULL DEFAULT 0);

| Column    | Format  | Size    |
| --------- | ------- | ------- |
| domain_id | Integer | 4 bytes |
| topic_id  | Integer | 4 bytes |
| count     | Integer | 4 bytes |

This structure conceptually stores the same data, but instead of repeating the date in every row, we record it only once within the table’s name. This simple shift will lead to a nearly 40% reduction in size.

In our use-case, we don’t want to keep this in PostgreSQL because the extra data complicates automated backups and storage. Even if we wanted this data live, having it within hundreds of tables is a bit much overkill. So for now, we’re going to export the data from a single date into a new file.

SELECT domain_id, topic_id, count FROM table_a WHERE date = '2016-01-01';

And we’re going to save that into a comma delimited file:

table_a-20160101.csv

I skipped a lot of steps above because I do this in Python — for reasons I’m about to explain.

As a raw csv file, my date-specific table is still pretty large at 7556804 bytes — so let’s consider ways to compress it:

Using standard zip compression, we can drop that down to 2985257 bytes. That’s ok, but not very good. If we use xz compression, it drops to a slightly better 2362719.

We’ve already compressed the data to 40% the original size by eliminating the date column, so these numbers are a pretty decent overall improvement — but considering the type of data we’re storing, the compression is just not very good. We’ve got to do better — and we can.

We can do much better and actually it’s pretty easy (!). All we need to do is understand compression algorithms a bit.

Generally speaking, compression algorithms look for repeating patterns. When we pulled the data out of PostgreSQL, we just had random numbers.

We can help the compression algorithm do its job by giving it better input. One way is through sorting:

SELECT domain_id, topic_id, count FROM table_a WHERE date = '2016-01-01' ORDER BY domain_id ASC, topic_id ASC, count ASC;

As a raw csv, this sorted date-specific table is still the same exact 7556804 bytes.

Look what happens when we try to compress it:

Using standard zip compression, we can drop that down to 1867502 bytes. That’s pretty good – we’re at 25.7% the size of the raw file AND it’s 60% the size of the non-sorted zip. That is a huge difference! If we use xz compression, we drop down to 1280996 bytes. That’s even better at 17%.

17% compression is honestly great, and remember — this is compressing the data that is already 40% smaller because we shifted the date column out. If we consider what the filesize with the date column would be, we’re actually at 10% compression. Wonderful.

I’m pretty happy with these numbers, but we can still do better — and without much more work.

As I said above, compression software looks for patterns. Although the sorting helped, we’re still a bit hindered because our data is in a “row” storage format. Consider this example:

1000,1000,1
1000,1000,2
1000,1000,3
1001,1000,1
1001,1000,2
1001,1000,3

There are lots of repeating patterns there, but not as many as if we represented the same information in a “column” storage format:

1000,1000,1000,1001,1001,1001
1000,1000,1000,1000,1000,1000
1,2,3,1,2,3

This is the same data, but as you can see it’s much more “machine” friendly – there are larger repeating patterns.

This transformation from row to column is an example of “transposing an array of data`; performing it (and reversing it) is incredibly easy with Python’s standard functions.

Let’s see what happens when I use transpose_to_csv below on the data from my csv file

def transpose_to_csv(listed_data):
    """given an array of `listed_data` will turn a row-store into a col-store (or vice versa)
    reverse with `transpose_from_csv`"""
    zipped = zip(*listed_data)
    list2 = [','.join([str(i) for i in zippedline]) for zippedline in zipped]
    list2 = '\n'.join(list2)
    return list2

def transpose_from_csv(string_data):
    """given a string of csvdata, will revert the output of `transpose_to_csv`"""
    destringed = string_data.split('\n')
    destringed = [line.split(',') for line in destringed]
    unzipped = zip(*destringed)
    return unzipped

As a raw csv, my transposed file is still the exact same size at 7556804 bytes.

However, if I zip the file – it drops down to 1425585 bytes.

And if I use xz compression… I’m now down to 804960 bytes.

This is a HUGE savings without much work.

The raw data in postgres was probably about 12594673 bytes (based on the savings, the file was deleted).
Stripping out the date information and storing it in the filename generated a 7556804 bytes csv file – a 60% savings.
Without thinking about compression, just lazily “zipping” the file created a file 2985257 bytes.
But when we thought about compression: we sorted the input, transposed that data into a column store; and applied xz compression; we resulted in a filesize of 804960 bytes – 10.7% of the csv size and 6.4% of the estimated size in PostgreSQL.

This considerably smaller amount of data can not be archived onto something like Amazon’s Glacier and worried about at a later date.

This may seem like a trivial amount of data to worry about, but keep in mind that this is a DAILY snapshot, and one of several tables. At 12MB a day in PostgreSQL, one year of data takes over 4GB of space on a system that is treated for high-priority data backups. This strategy turns that year of snapshots into under 300MB of information that can be warehoused on 3rd party systems. This saves us a lot of time and even more money! In our situation, this strategy is applied to multiple tables. Most importantly, the benefits cascade across our entire system as we free up space and resources.

These numbers could be improved upon further by finding an optimal sort order, or even using custom compression algorithms (such as storing the deltas between columns, then compressing). This was written to illustrate a quick way to easily optimize archived data storage.

The results:

Format Compression Sorted? Size % csv+date
csv+date 12594673 100
row csv 7556804 60
row csv zip 2985257 23.7
row csv xz 2362719 18.8
row csv Yes 7556804 60
row csv zip Yes 1867502 14.8
row csv xz Yes 1280996 10.2
col csv Yes 7556804 60
col csv zip Yes 1425585 11.3
col csv xz Yes 804960 6.4

Note: I purposefully wrote out the ASC on the sql sorting above, because the sort order (and column order) does actually factor into the compression ratio. On my dataset, this particular column and sort order worked the best — but that could change based on the underlying data.

trouble installing psycopg2 on OSX ?

I had some trouble with a new virtualenv — psycopg2 wouldn’t install.

I remembered going though this before, but couldn’t find my notes. I ended up fixing it before finding my notes ( which point to this StackOverflow question http://stackoverflow.com/questions/2088569/how-do-i-force-python-to-be-32-bit-on-snow-leopard-and-other-32-bit-64-bit-quest ) , but I want to share this with others.

psycopg2 was showing compilation errors in relation to some PostgreSQL libraries

> ld: warning: in /Library/PostgreSQL/8.4.5/lib/libpq.dylib, missing required architecture x86_64 in file

so then I checked how the file was built:

lets see how that was built…

$ file /Library/PostgreSQL/8.4.5/lib/

> /Library/PostgreSQL/8.4.5/lib/libpq.dylib: Mach-O universal binary with 2 architectures
> /Library/PostgreSQL/8.4.5/lib/libpq.dylib (for architecture ppc): Mach-O dynamically linked shared library
> /Library/PostgreSQL/8.4.5/lib/libpq.dylib (for architecture i386): Mach-O dynamically linked shared library i386

Crap. it’s built i386 only. The fix is easy right? We just need to export archflags and build.

$ export ARCHFLAGS=”-arch i386″
$ pip install –upgrade psycopg2

That works perfect, right?

Wrong.

> File “/environments/example-2.7.5/lib/python2.7/site-packages/psycopg2/__init__.py”, line 50, in
> from psycopg2._psycopg import BINARY, NUMBER, STRING, DATETIME, ROWID
>ImportError: dlopen(/environments/example-2.7.5/lib/python2.7/site-packages/psycopg2/_psycopg.so, 2): no suitable image found. Did find:
> /environments/example-2.7.5/lib/python2.7/site-packages/psycopg2/_psycopg.so: mach-o, but wrong architecture

I was dumbfounded for a few seconds, then I realized — Python was trying to run 64 bit (x86_64) , but I only have the 32 bit library.

the right fix? Rebuild PostgreSQL to support 64bit.

My PostgrSQL was a prebuilt package, and I don’t have time to fix that, so I need to do a few hacky/janky things

basically , we’re going to force python to run in i386 ( and not 64bit )

go to our virtualenv…

cd /environments/example-2.7.5/bin

back it up

cp python python-original

strip it…

# note that our last arg is the input, and the 2nd to last it output
lipo -thin i386 -output python-i386 python

replace it

rm python
mv python-i386 python

now install psycopg2

export ARCHFLAGS=”-arch i386″
pip install –upgrade psycopg2

yay this works !

now get some work done and save some time so you can build a 64bit PostgreSQL

Dreamhost UX Creates Security Flaw

Last week I found a Security flaw on Dreamhost caused by the User Experience on their control panel. I couldn’t find a security email, so I posted a message on Twitter. Their Customer Support team reached out and assured me that an email response would be addressed. Six days later I’ve heard nothing from them, so I feel forced to do a public disclosure.

I was hoping that they would do the responsible thing, and immediately fix this issue.

## The issue:

If you create a Subversion repository, there is a checkbox option to add on a “Trac” interface – which is a really great feature, as it can be a pain to set up on their servers yourself (something I’ve usually done in the past).

The exact details of how the “one-click” Trac install works aren’t noted though, and the integration doesnt “work as you would probably expect” from the User Experience path.

If you had previous experience with Trac, and you were to create a “Private” SVN repository on Dreamhost – one that limits access to a set of username/passwords – you would probably assume that access to the Trac instance is handled by the same credentials as the SVN instance, as Trac is tightly integrated into Subversion.

If you had no experience with Trac, you would probably be oblivious to the fact that Trac has it’s own permissions system, and assume your repository is secured from the option above.

The “one click” Trac install from Dreamhost is entirely unsecured – the immediate result of checking the box to enable Trac on a “private” repository, is that you inherently are publicly publishing that repo from within the Trac browser.

For example, if you were to install a private subversion and one-click Trac install onto a domain like this:

my.domain.com/svn
my.domain.com/trac

The /svn source would be private however it would be publicly available under /trac/browser due to the default one-click install settings.

Here’s a marked-up screenshot of the page that shows the conflicting options ( also on http://screencast.com/t/A2VQT5gOVkK )

I totally understand how the team at Dreamhost that implemented the Trac installer would think their approach was a good idea, because in a way it is. A lot of people who are familiar with Trac want to fine-tune the privileges using Trac’s own very-robust permissions system, deciding who can see the source / file tickets / etc. The problem is that there is absolutely no mention of an alternate permissions system contained within Trac – or that someone may need to fine-tune the Trac permissions. People unfamiliar with Trac have NO IDEA that their code is being made public, and those familiar with Trac would not necessarily realize that a fully unsecured setup is being created. I’ve been using Trac for over 8 years , and the thought of the default integrations being setup like this is downright silly – it’s the last thing I would expect a host to do.

I think it would be totally fine if there is just a “Warning!” sign next to the “enable Trac” — with a link to Trac’s wiki for customization , or instructions ( maybe even a checkbox option ) on how a user can have Trac use the same authorization file as subversion.

But, and this is a huge BUT, people need to be warned that clicking the ‘enable Trac’ button will publish code until Trac is configured. People who are running Trac via an auto-install need to be alerted of this immediately.

This can be a huge security issue depending on what people store in Subversion. Code put in Subversion repositories tends to contain Third Party Account Credentials ( Amazon AWS Secrets/Keys, Facebook Connect Secrets, Paypal/CreditCard Providers, etc ), SSH Keys for automated code deployment, full database connection information, administrator/account default passwords — not to mention the exact algorithms used for user account passwords.

## The fix

If you have a one-click install of Trac tied to Subversion on Dreamhost and you did not manually set up permissions, you need to do the following IMMEDIATELY:

### Secure your Trac installation

If you want to use Trac’s own privileges, you should create this .htaccess file in the meantime to disable all access to the /trac directory

deny from all

Alternately, you can map access your Trac install to the Subversion password file with a .htaccess like this:

AuthType Basic
AuthUserFile /home/##SHELL_ACCOUNT_USER##/svn/##PROJECT_NAME##.passwd
AuthName “##PROJECT_NAME##”
require valid-user

### Audit your affected code and services.

* All Third Party Credentials should be immediately trashed and regenerated.
* All SSH Keys should be regenerated
* All Database Accounts should be reset.
* If you don’t have a secure password system in place , you need up upgrade

## What are the odds of me being affected ?

Someone would need to figure out where your trac/svn repos are to exploit this. Unless you’ve got some great obscurity going on, it’s pretty easy to guess. Many people still like to deploy using files served out of Subversion (it was popular with developers 5 years ago before build/deploy tools became the standard) , if that’s the case and Apache/Nginx aren’t configured to reject .svn directories — your repo information is public.

When it comes to security, play it safe. If your repo was accidentally public for a minute, you should wipe all your credentials.

Want to win? Make it easier, not harder.

In March of 2011 I represented Newsweek & The Daily Beast at the Harvard Business School / Committee of Concerned Journalists “Digital Leaders Summit”. Just about every major media property sent an executive there, and I was privileged enough to represent the newly formed NewsBeast (Newsweek+TheDailyBeast had recently merged, but have since split).

Over the course of two days, we covered a lot of concerns across the industry – analyzing who was doing things right and how/why others were making mistakes.

On the first day of the summit we looked at how Amazon was posturing itself for digital book sales – where their profits were hoping to be, where their losses were expected, and strategies for finding the optimal price structure for digital goods.

Inevitably, the conversation sidetracked to the Apple Ecosystem, which had just announced Subscriptions and their eBooks plan — consequently being their new competitor.

One of the other 30 or so people in attendance was Jeffrey Zucker from NBC, who went into his then-famous “digital pennies vs. analog dollars” diatribe. He made a compelling, intelligent, and honest argument that captivated the minds and attention of the entire room. Well, most of the room.

I vehemently disagreed with all his points and quickly spoke up to grab the attention of the floor… “apologizing” from breaking with the conventional view of this subject, and asking people to look at the situation from another point of view. Yes, it was true as Zucker stated that Apple standardized prices for digital downloads and set the pricing on their terms – not the producer’s. Yet, it was true that Apple allowed for records to be purchased “in part” and not as a whole – shifting purchase patters, and yes to a lot of other things.

And yes – Jeffrey Zucker didn’t say anything that was “wrong” – everything he said was right. But it was analyzed from the wrong perspective. Simply put, Zucker and most of the other delegates were only looking at portion of the scenario and the various mechanics at play. The prevailing wisdom in the room was way off the mark… by miles.

Apple didn’t gain dominance in online music because of their pricing system or undercutting retailers – which everyone believed. Plain and simple, Apple took control of the market because they made it fundamentally easier and faster for someone to legally buy music than to steal it. When they first launched (and still in 2012) it takes under a minute for someone to find and buy an Album or Single in the iTunes store. Let me stress that – discovery, purchase and delivery takes under a minute. Apple’s servers were relatively fast at the start as well – an entire album could be downloaded within an hour.

In contrast, to legally purchase an album in the store would take at least two hours – and at the time they first launched, encoding an album to work on an MP3 player would take another hour. To download a record at that time would be even longer: services like Napster (already dead by the iTunes launch) could take a day to download; torrent systems could take a day; while file upload sites were generally faster, they suffered from another issue that torrents and other options did as well – mislabeled and misdirected files.

Possibly the only smart thing the Media Industry has ever done to curb piracy is what I call the “I Am Spartacus” method — wherein “crap” files are mislabeled to look like Top 40 hits. For example: in expectation of a new Jay-Z record, internet filesharing sites are flooded with uploads that bear the name of the record… but contain white noise, another record, or an endless barrage of insults (ok, maybe not the last one… but they should).

I pretty much shut the room up at that point, and began a diatribe of my own – which I’ll repeat and continue here…

At the conference, Jeffrey Zucker and some other media executives tended to look at the digital economy like this: If there are 10 million Apple downloads of the new Beyonce record or the 2nd Season of “Friends”, those represent 10 million diverted sales of a $17.99 CD – or 10MM diverted sales of a $39.99 dvd. If Apple were to sell the CD for 9.99 with a 70% cut, they’re only seeing $7 in revenue for every $17.99 — 100 million times. Similarly, if 10MM people are watching Friends for $13.99 (or whatever cost) on AppleTV instead of buying $29.99 box sets, that’s about $20 lost per viewer — 10 million times.

To this point, I called bullshit.

Digital goods such as music and movies have incredibly diminished costs for incremental units, and for most of these products they are a secondary market — records tend to recoup their various costs within the first few months, and movies/tv-shows tend to have been wildly profitable on-TV / in-Theaters. The music recording costs 17.99 and the DVD 29.99 , not because of fixed costs and a value chain… but because $2 of plastic, or .02¢ of bandwidth, is believed by someone to be able to command that price.

Going back to our real-life example, 10MM downloads of “Friends” for 13.99 doesn’t equate to 10MM people who would have purchased the DVD for $39.99. While a percentage of the 10MM may have been willing to purchase the DVDs for the higher price, another — larger — percentage would not have. By lowering the price from 39.99 to 13.99, the potential market had likely changed from 1MM consumers to 10MM. Our situation is not an “apples-to-apples” comparison — while we’re generating one third the revenue, we’re moving ten times as many units and at a significantly lower cost (no warehousing, mfg, transit, buybacks, etc).

While hard copies are priced to cover the actual costs associated with manufacturing and distributing the media, digital media is flexibly priced to balance convenience with maximized revenue.

Typical retail patterns release a product at a given introductory price (e.g. $10) for promotional period, raise it to a sustained premium for an extended period of time (e.g. $17), then lower it via deep discounted promotions for holiday sales or clearance attempts (e.g. $5). Apple ignored the constant re-pricing and went for a standardized plan at simple price-points.

Apple doesn’t charge .99¢ for a song, or $1.99 for a video because of some nefarious plan to undervalue media — they came up with those prices because those numbers can generate significant revenue while being an inconsequential purchase. At .99¢ a song or $9.99 an album, consumer’s simply don’t think. We’re talking about a dollar for a song, or a ten dollar bill for a record.

Let me rephrase that, we’re talking about a fucking dollar for a song. A dollar is a magical number, because while it’s money, it’s only a dollar. People lose dollar bills all the time, and rationalize the most ridiculous of purchases away… because it’s only a dollar. It’s four quarters. You could find that in the street or in your couch. A dollar is not a barrier or a thought. You’ll note that a dollar is not far off from the price of a candy bar, which retailers incidentally realized long ago that “Hey – let’s put candy bars next to the cash registers and keep the prices relatively low, so people make impulse buys and just add it onto their carts”.

Do you know what happens when you charge a dollar for something? People just buy it. At 13.99 – 17.99 for a cd, people look at that as a significant purchase — one that competes with food, vacations, their children’s college savings. When you charge a dollar a song – or ten dollars a record – people don’t make those comparisons… they just buy.

And buy, and buy, and buy. Before you know it, people end up buying more goods — spending more money overall on media than they would have under the old model. Call me crazy, but I’d rather sell 2 items with little incremental cost at $9.99 each than 1 item at $13.99 — or even 1 item at $17.99.

Unfortunately, the current stable of media executives – for the most part – just don’t get this. They think a bunch of lawyers, lobbyists and paying off politicians for sweetheart legislations are the best solution. Maybe that worked 50 years ago, but in this day and age of transparency and immediacy, it justq doesn’t.

Today: you need to swallow you pride, realize that people are going to steal, that the ‘underground’ will always be ahead of you, and instead of wasting time + money + energy with short-term bandaids which try to remove piracy ( and need to be replaced every 18months ) — you should invest your time and resources into making it easier and cheaper to legally consume content. Piracy of goods will always exist, it is an economic and human truth. You can fight it head-on, but why? There will always be more pirates to fight; they’re motivated to free content, and they’re doubly motivated to outsmart a system. Fighting piracy is like a chinese finger trap.

Instead of spending millions of dollars chasing 100% market share that will never happen (and I can’t stress that enough, it will never happen), you could spend thousands of dollars addressing the least-likely pirates and earn 90% of the market share — in turn generating billions more in revenue each year.

Until decision makers swallow their pride and admit they simply don’t understand the economics behind a digital world, media companies are going to constantly and mindlessly waste money. Almost every ( if not EVERY ) attempt at Digital Rights Management by major media companies has been a catastrophe – with most just being a waste of money, while some have resulted in long term compliance costs. I can’t say this strongly enough: nearly the entire industry of Digital Rights Management is a complete failure and not worth addressing.

Today, the media industry is at another crossroads. Intellectual property rights holders are getting incredibly greedy , and trying to manipulate markets which they clearly don’t understand. In the past 12 hours I’ve learned how streaming rights to Whitney Houston movies were pulled from major digital services after her death to increase DVD sales [ I would have negotiated with digital companies for an incremental ‘fad’ premium, expecting the hysteria to die down before physical goods could be made ], and read a dead-on comic by The Oatmeal on how it has – once again – become easer to steal content than to legally purchase it [ http://theoatmeal.com/comics/game_of_thrones ].

As I write this (Feb 2012) it is faster to steal a high quality MP3 (or FLAC) of record than it is to either: a) rip the physical CD to the digital version or b) download the item from iTunes ( finding/buying is still under a minute ). Regional release dates for music , movies and TV are unsynchronized (on purpose!) , which ends up in the perverse scenario where people in different regions become incentivized to traffic content to one another — i.e. a paying subscriber of a premium network in Europe would illegally download an episode when it first airs on the affiliate in the United States, one month before the European date.

Digital economics aren’t rocket science, they’re drop-dead simple:

  1. If you make things fast and easy to legally purchase, people will purchase it.
  2. If you make things cheap enough, people will buy them – without question , concern, or weighing the purchase into their financial plans.
  3. If you make it hard or expensive for people to legally purchase something, they will turn to “the underground” and illegal sources.
  4. Piracy will always exist, innovators will always work to defy Digital Rights Management, and as much money as you throw at creating anti-piracy measures… there will always be a large population of brilliant people working to undermine them.

My advice is simple: pick your battles wisely. If you want to win in digital media, focus on the user experience and maximizing your revenue generating audience. If your content is good, people will either buy it or steal it – if your content is bad, they’re going somewhere else.

I’m glad to no longer be in corporate publishing. I’m glad to be back in a digital-only world, working with startups , advertising agencies, and media companies that are focused on building the future… not trying to save an ancient business model.

2016 Update

Re-reading this, I can’t help but draw the parallels to the explosion of Advertising and Ad Blocking technologies in recent years. Publishers have gotten so greedy trying to extract every last cent of Advertising revenue and including dozens of vendor/partner javascript tags, that they have driven even casual users to use Ad Blocking technologies.

Python Fun : Upgrading to 2.7 on OSX ; Installing Mysql-Python on OSX against MAMP ( ruby gem too)

I needed to upgrade from Python 2.6 to 2.7 and ran into a few issues along the way. Learn from my encounters below.

# Upgrading Python
Installing the new version of Python is really quick. Python.org publishes a .dmg installer that does just about everything for you. Let me repeat “Just about everything”.

You’ll need to do 2 things once you install the dmg, however the installer only mentions the first item below:

1. Run “/Applications/Python 2.x/Update Shell Command”. This will update your .bash_profile to look in “/Library/Frameworks/Python.framework/Versions/2.x/bin” first.

2. After you run the app above, in a NEW terminal window do the following:

* Check to see you’re running the new python with `python –version` or `which python`
* once you’re running the new python, re-install anything that installed an executable in bin. THIS INCLUDES SETUPTOOLS , PIP , and VIRTUALENV

It’s that second thing that caught me. I make use of virtualenv a lot, and while I was building new virtualenvs for some projects I realized that my installs were building against `virtualenv` and `setuptools` from the stock Apple install in “/Library/Python/2.6/site-packages” , and not the new Python.org install in “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages”.

It’s worth nothing that if you install setuptools once you’ve upgraded to the Python.org distribution, it just installs into the “/Library/Frameworks/Python.framework” directory — leaving the stock Apple version untouched (basically, you can roll back at any time).

# Installing Mysql-Python (or the ruby gem) against MAMP

I try to stay away from Mysql as much as I can [ i <3 PostgreSQL ], but occasionally I need to run it: when I took over at TheDailyBeast.com, they were midway through a relaunch on Rails , and I have a few consulting clients who are on Django. I tried to run cinderella a while back ( http://www.atmos.org/cinderella/ ) but ran into too many issues. Instead of going with MacPorts or Homebrew, I've opted to just use MAMP ( http://www.mamp.info/en/index.html ) There's a bit of a problem though - the persons who are responsible for the MAMP distribution decided to clear out all the mysql header files, which you need in order to build the Python and Ruby modules. You have 2 basic options: 1. Download the "MAMP_components" zip (155MB) , and extract the mysql source files. i often used to do this, but the Python module needed a compiled lib and I was lazy so... 2. Download the tar.gz version of Mysql compiled for OSX from http://dev.mysql.com/downloads/mysql/ Whichever option you choose, the next steps are generally the same: ## Copy The Files ### Where to copy the files ? mysql_config is your friend. at least the MAMP one is. Make sure you can call the right mysql_config, and it'll tell you where the files you copy should be stashed. Since we're building against MAMP we need to make sure we're referencing MAMP's mysql_config
iPod:~jvanasco$ which myqsl_config
/Applications/MAMP/Library/bin/mysql_config

iPod:~jvanasco$ mysql_config
Usage: /Applications/MAMP/Library/bin/mysql_config [OPTIONS]
Options:
–cflags [-I/Applications/MAMP/Library/include/mysql -fno-omit-frame-pointer -D_P1003_1B_VISIBLE -DSIGNAL_WITH_VIO_CLOSE -DSIGNALS_DONT_BREAK_READ -DIGNORE_SIGHUP_SIGQUIT -DDONT_DECLARE_CXA_PURE_VIRTUAL]
–include [-I/Applications/MAMP/Library/include/mysql]
–libs [-L/Applications/MAMP/Library/lib/mysql -lmysqlclient -lz -lm]
–libs_r [-L/Applications/MAMP/Library/lib/mysql -lmysqlclient_r -lz -lm]
–plugindir [/Applications/MAMP/Library/lib/mysql/plugin]
–socket [/Applications/MAMP/tmp/mysql/mysql.sock]
–port [3306]
–version [5.1.44]
–libmysqld-libs [-L/Applications/MAMP/Library/lib/mysql -lmysqld -ldl -lz -lm]

### Include

Into /Applications/MAMP/include you need to place the mysql include files into a subdirectory called “mysql”


mkdir -P /Applications/MAMP/Library/include
cp -Rp MySQL-Distribution/include /Applications/MAMP/Library/include/mysql

### Lib

Into /Applications/MAMP/Library/lib you need to place the mysql lib files


mkdir -P /Applications/MAMP/Library/include
cp -Rp MySQL-Distribution/lib /Applications/MAMP/Library/lib/mysql

## Configure the Env / Installers

Note: if you’re installing for a virtualenv, this needs to be done after it’s been activated.

Set the archflags on the commandline:


export ARCHFLAGS="-arch $(uname -m)"

### Python Module

I found that the only way to install the module is by downloading the source ( off sourceforge! ).

I edited site.cfg to have this line:


mysql_config = /Applications/MAMP/Library/bin/mysql_config

Basically, you just need to tell mysql to use the MAMP version of mysql_config to figure everything out itself.

the next steps are simply:


python setup.py build
python setup.py install

If you get any errors, pay close attention to the first few lines.

If you see something like the following within the first 10-30 lines, it means the various files we placed in the step above are not where the installer wants them to be:


_mysql.c:36:23: error: my_config.h: No such file or directory
_mysql.c:38:19: error: mysql.h: No such file or directory
_mysql.c:39:26: error: mysqld_error.h: No such file or directory
_mysql.c:40:20: error: errmsg.h: No such file or directory

If you look up a few lines, you might see something like this:


building '_mysql' extension
gcc-4.0 -fno-strict-aliasing -fno-common -dynamic -g -O2 -DNDEBUG -g -O3 -arch i386 -Dversion_info=(1,2,3,'final',0) -D__version__=1.2.3 -I/Applications/MAMP/Library/include/mysql -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c _mysql.c -o build/temp.macosx-10.5-i386-2.7/_mysql.o -fno-omit-frame-pointer -D_P1003_1B_VISIBLE -DSIGNAL_WITH_VIO_CLOSE -DSIGNALS_DONT_BREAK_READ -DIGNORE_SIGHUP_SIGQUIT -DDONT_DECLARE_CXA_PURE_VIRTUAL

note how we see “/Applications/MAMP/Library/include/mysql” in there. When I quickly followed some instructions online that had all the files in /include — and not in that subdir — this error popped up. Once I changed the directory structure to match what my mysql_config wanted, the package installed perfectly.

### Ruby Gem

Assuming you’re using bundler:


bundle config build.mysql
--with-mysql-include=/Applications/MAMP/Library/include/mysql/
--with-mysql-lib=/Applications/MAMP/Library/lib
--with-mysql-config=/Applications/MAMP/Library/bin/mysql_config

and then before you do a `bundle install` , set the env vars


> export ARCHFLAGS="-arch x86_64"

or


> export ARCHFLAGS="-arch $(uname -m)"

## Test it

If things install nicely, lets make sure it works…


$ipod:~ jvanasco$ python
>>> import _mysql

Oh , crap:


Traceback (most recent call last):
File "", line 1, in
File "build/bdist.macosx-10.5-i386/egg/_mysql.py", line 7, in
File "build/bdist.macosx-10.5-i386/egg/_mysql.py", line 6, in __bootstrap__
ImportError: dlopen(/Users/jvanasco/.python-eggs/MySQL_python-1.2.3-py2.7-macosx-10.5-i386.egg-tmp/_mysql.so, 2): Library not loaded: libmysqlclient.18.dylib
Referenced from: /Users/jvanasco/.python-eggs/MySQL_python-1.2.3-py2.7-macosx-10.5-i386.egg-tmp/_mysql.so
Reason: image not found

Basically what’s happening is that as you run it, mysql_python drops a shared object in your userspace. That shared object is referencing the location that the Mysql.org distribution placed all the library files — which differs from where we placed them in MAMP.

There’s a quick fix — add this to your bash profile, or run it before starting mysql/your app:

export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:DYLD_LIBRARY_PATH=/Applications/MAMP/Library/lib/mysql"

# Conclusion

There are too many posts on this subject matter to thank. A lot of people posted variations of this method – to which I’m very grateful – however no one addressed troubleshooting the Python process , which is why I posted this.

I also can’t stress the simple fact that if the MAMP distribution contained the header and built library files, none of this would be necessary.

Facebook Developer Notes – Javascript SDK and Asynchronous Woes

I’m quickly prototyping something that needs to interact with Facebook’s API and got absolutely lost by all their documentation – which is plentiful, but poorly curated.

I lost a full day of time trying to figure out why my code wasn’t doing what I wanted it to do, trying to understand how it works so I could figure out what I was actually telling it to do. I eventually hit the “ah ha!” moment where I realized that by following the Facebook “getting started” guides, I was telling my code to do embarrassingly stupid things. This all tends to dance around the execution order , which isn’t really documented at all. Everything below should have been very obvious — and would have been obvious, had I not gone through the “getting started” guides, which really just throws you off track.

Here’s a collection of quick notes that I’ve made.

## Documentation Organization

Facebook has made *a lot* of API changes over the past few years, and all the information is still up on their site… and out on the web. While they’re (thankfully) still supporting deprecated features, their documentation doesn’t always say what is the preferred method or not – and the countless 3rd party tutorials and StackOverflow activity don’t either. The “Getting Started” documentation and on-site + github code samples also doesn’t tie together with the API documentation well either. If you go through the tutorials and demos, you’ll see multiple ways to handle a login/registration button… yet none seem to resemble what is going on in the API. There’s simply no uniformity, consistency, or ‘official recommendations’.

I made the mistake of going through their demos and trying to “learn” their API. That did more damage than good. Just jump into the [Javascript SDK API Reference Documentation](https://developers.facebook.com/docs/reference/javascript/) itself. After 20 minutes reading the API docs themselves, I realized what was happening under the hood… and got everything I needed to do working perfectly within minutes.

## Execution Order

The Javascript SDK operations in the following manner:

1. Define what happens on window.fbAsyncInit – the function the SDK will call once Facebook’s javascript code is fully loaded. This requires, at the very least, calling the FB.init() routine. FB.init() registers your app against the API and allows you to actually do things.
2. Load the SDK. this is the few lines of code that start “(function(d){ var js, id = ‘facebook-jssdk’;…” .
3. Once loaded, the SDK will call “window.fbAsyncInit”
4. window.fbAsyncInit will call FB.init() , enabling the API for you.

The important things to learn from this are :

1. If you write any code that touches the FB namespace _before_ the SDK is fully loaded (Step 3), you’ll get an error.
1. If you write any code that touches the FB namespace _before_ FB.init() is called (Step 4), you’ll get an error.
1. You should assume that the entire FB namespace is off-limits until window.fbAsyncInit is executed.
1. You should probably not touch anything in the FB namespace until you call FB.init().

This means that just about everything you want to do either needs to be:

1. defined or run after FB.init()
2. defined or run with some sort of callback mechanism, after FB.init()

That’s not hard to do, once you actually know that’s what you have to do.

## Coding Style / Tips

The standard way the Facebook API is ‘instructed to integrated is to drop in a few lines of script. The problem is that the how&why this works isn’t documented well, and is not linked to properly on their site. Unless you’re trying to do exactly what the tutorials are for – or wanting to code specific Facebook API code on every page, you’ll probably get lost trying to get things to run in the order that you want.

Below I’ll mark up the Facebook SDK code and offer some of ideas on how to get coding faster than I did… I wasted a lot of time going through the Facebook docs, reading StackOverflow and reverse engineering a bunch of sites that had good UX integrations with Facebook to figure this out.

// before loading the Facebook SDK, load some utility functions that you will write

One of the move annoying things I encountered, is that Facebook has that little, forgettable, line in their examples that read:

// Additional initialization code here

You might have missed that line, or not understood its meaning. It’s very easy to do, as its quite forgettable.

That line could really be written better as :

// Additional initialization code here
// NEARLY EVERYTHING YOU WRITE AGAINST THE FACEBOOK API NEEDS TO BE INITIALIZED / DEFINED / RUN HERE.
// YOU EITHER NEED TO INCLUDE YOUR CODE IN HERE, OR SET IT TO RUN AFTER THIS BLOCK HAS EXECUTED ( VIA CALLBACKS, STACKS, ETC ).
// (sorry for yelling, but you get the point)

So, let’s explore some ways to make this happen…

In the code above I called fb_Utils.initialize() , which would have been defined in /js/fb_Utils.js (or any other file) as something like this:

// grab a console for quick logging
var console = window['console'];


// i originally ran into a bunch of issues where a function would have been called before the Facebook API inits.
// the two ideas i had were to either:
// 1) pass calls through a function that would ensure we already initialized, or use a callback to retry on intervals
// 1) pass calls through a function that would ensure we already initialized, or pop calls into an array to try after initialization
// seems like both those ideas are popular, with dozens of variations on each used across popular sites on the web
// i'll showcase some of them below

var fb_Utils= {
	_initialized : false
	,
	isInitialized: function() {
		return this._initialized;
	}
	,
	// wrap all our facebook init stuff within a function that runs post async, but is cached across the site
	initialize : function(){
		// if you wanted to , you could migrate into this section the following codeblock from your site template:
		// -- FB.init({
		// --    appId : 'app_id'
		// --    ...
		// -- });
		// i looked at a handful of sites, and people are split between calling the facebook init here, or on their templates
		// personally i'm calling it from my templates for now, but only because i have the entire section driven by variables


		// mark that we've run through the initialization routine
		this._initialized= true;

		// if we have anything to run after initialization, do it.
		while ( this._runOnInit.length ) { (this._runOnInit.pop())(); }
	}
	,
	// i checked StackOverflow to see if anyone had tried a SetTimeout based callback before, and yes they did.
	// link - http://facebook.stackoverflow.com/questions/3548493/how-to-detect-when-facebooks-fb-init-is-complete
	// this works like a charm
	// just wrap your facebook API commmands in a fb_Utils.ensureInit(function_here) , and they'll run once we've initialized
	ensureInit :  function(callback) {
		if(!fb_Utils._initialized) {
			setTimeout(function() {fb_Utils.ensureInit(callback);}, 50);
		} else {
			if(callback) { callback(); }
		}
	}
	,
	// our other option is to create an array of functions to run on init
	_runOnInit: []
	,
	// we can then wrap items in fb_Utils.runOnInit(function_here) , and they
	runOnInit: function(f) {
		if(this._initialized) {
			f();
		} else {
			this._runOnInit.push(f);
		}
	},
	// a few of the Facebook demos use a function like this to illustrate the api
	// here, we'll just wrap the FB.getLoginStatus call , along with our standard routines, into fb_Utils.handleLoginStatus()
	// the benefit/point of this, is that you have this routine nicely compartmentalized, and can call it quickly across your site
	handleLoginStatus : function(){
			FB.getLoginStatus(
				function(response){
					console.log('FB.getLoginStatus');
					console.log(response);
					if (response.authResponse) {
						console.log('-authenticated');
					} else {
						console.log('-not authenticated');
					}
				}
			);
		}
	,
	// this is a silly debug tool , which we'll use below in an example
	event_listener_tests : function(){
		FB.Event.subscribe('auth.login', function(response){
		  console.log('auth.login');
		  console.log(response);
		});
		FB.Event.subscribe('auth.logout', function(response){
			  console.log('auth.logout');
			  console.log(response);
		});
		FB.Event.subscribe('auth.authResponseChange', function(response){
			  console.log('auth.authResponseChange');
			  console.log(response);
		});
		FB.Event.subscribe('auth.statusChange', function(response){
			  console.log('auth.statusChange');
			  console.log(response);
		});
	}
}

So, with some fb_Utils code like the above, you might do the following to have all your code nicely asynchronous:

1. Within the body of your html templates, you can call functions using ensureInit()

fb_Utils.ensureInit(fb_Utils.handleLoginStatus)
fb_Utils.ensureInit(function(){alert("I'm ensured, but not insured, to run sometime after initialization occurred.);})

2. When you activate the SDK – probably in the document ‘head’ – you can decree which commands to run after initialization:

window.fbAsyncInit = function() {
	// just for fun, imagine that FB.init() is located within the fb_Utils.initialize() function
	FB.init({});
	fb_Utils.runOnInit(fb_Utils.handleLoginStatus)
	fb_Utils.runOnInit(function(){alert("When the feeling is right, i'm gonna run all night. I'm going to run to you.");})
	fb_Utils.initialize();
};

## Concluding Thoughts

I’m not sure if I prefer the timeout based “ensureInit” or the stack based “runOnInit” concept more. Honestly, I don’t care. There’s probably a better method out there, but these both work well enough.

In terms of what kind of code should go into the fb_Utils and what should go in your site templates – that’s entirely a function of your site’s traffic patterns — and your decision of whether-or-not a routine is something that should be minimized for the initial page load or tossed onto every visitor.

OMG! Apple is trying to patent someone's app! [ no they're not ]

A tumblr posting just popped up on my radar about Apple trying to patent an app that is identical to one by the company Where-To [Original Posting Here]

The author shows a image comparing a line drawing in Apple’s patent to a screenshot of an application called “Where-To”. The images are indeed strikingly similar.

The author then opens:
>> It’s pretty easy to argue that software patents are bad for the software industry.

Well yes, it is pretty easy to argue that. It’s also pretty easy to argue that Software Patents are really good for the software industry. See, you can cherry-pick edge cases for both arguments and prove either point. You can make an easy argument out of anything, because it’s easier to do that and argue on black&white philosophical beliefs than it is to think about complex systems.

That’s a huge problem with bloggers though– they don’t like to think. They just like to react.

The author continues:

>Regardless of where you stand on that issue, however, it must at least give you pause when Apple, who not only exercises final approval over what may be sold on the world’s largest mobile software distribution platform, but also has exclusive pre-publication access (by way of that approval process) to every app sold or attempted to be sold there, quietly starts patenting app ideas.

> But even if you’re fine with that, how about this: one of the diagrams in Apple’s patent application for a travel app is a direct copy, down to the text and the positions of the icons, of an existing third-party app that’s been available on the App Store for years.

Believe it or not this happens ALL THE TIME. It’s not uncommon to see major technology companies have images from their biggest competitors in their patent diagrams. Patent diagrams are meant to illustrate concepts, and if someone does something very clear — then you copy it. So you might see a Yahoo patent application that shows advertising areas that read “Ads by Google” ( check out the “interestingness” application Flickr filed a few years ago ), or you might have an Apple patent application that shows one very-well-done user interface by another company being used as an example to convey an idea. This isn’t “stealing” ( though I wonder how someone can argue both against and for intellectual property in the same breath ) – it’s just conveying a concept. Conveying a concept or an interface in a patent doesn’t mean that you’re patenting it, it just means you’re using it to explain a larger concept.

The blogger failed to mention a few really key facts:

1. This was 1 image out of 10 images.
2. Other screenshots included a sodoku game, an instant message, a remote control for an airline seat’s console, a barcoded boarding pass, and a bunch of other random things.
3. The Patent Application is titled “Systems And Methods For Accessing Travel Services Using A Portable Electronic Device” — it teaches about integrating travel services through a mobile device. Stuff like automating checking, boarding , inflight services and ground options for when you land. The Where-To app shows interesting things based on geo-location.

You don’t need to read the legalese claims to understand the two apps are entirely unrelated — you could just read the title, the abstract, or the laymans description. If someone did that, they might learn this was shown as an interface to navigate airport services:

> In some embodiments, a user can view available airport services through the integrated application. As used herein, the term “airport services” can refer to any airport amenities and services such as shops, restaurants, ATM’s, lounges, shoe-shiners, information desks, and any other suitable airport services. Accordingly, through the integrated application, airport services can be searched for, browsed, viewed, and otherwise listed or presented to the user. For example, an interface such as interface 602 can be provided on a user’s electronic device. Through interface 602, a user can search for and view information on the various airport services available in the airport.

Apple’s patent has *nothing* to do with the design or functionality of the Where-To app. They’re not trying to patent someone else’s invention, nor are they trying to patent a variation of the invention or any portion of the app. They just made a wireframe of a user interface that they liked (actually, it was probably their lawyer or draftsman) to illustrate an example screen.

Either 2 things happened:

1. The blogger didn’t bother reading the patent, and just rushed to make conclusions of his own.
2. The blogger read the patent, but didn’t care — because there was something in there that could be controversial.

Whichever reason doesn’t matter — both illustrates my underlying point that 99% of people who are talking about software patents should STFU because they’re unable or unwilling to address complex concepts. Whenever patent issues come up, the outspoken masses have knee-jerk reactions based on ideology (on all sides of the issue), and fail to actually read or investigate an issue.

There was even a comment where someone noted:

> Filing date is December 2009….which means Apple’s priority date is December 2008. From what I can see, this app went on sale in mid 2009….going to be hard to argue it is prior art.

They didn’t bother reading the application either. On the *very first line* , we see:

> [0001]This application claims the benefit of U.S. Provisional Patent Application No. 61/147,644, filed on Jan. 27, 2009, which is hereby incorporated by reference herein in its entirety.

How the commenter decided that *December 2008* was a priority date bewilders me. The actual priority date is written in that very first line! They also brought up the concept of ‘Priority’ – which is interesting because that suggests they understand how the USPTO works a bit. “Priority” lets an applicant use an earlier date as their official filing date under certain conditions — either a provisional application is turned into a non-provisional application, or a non-provisional application is split into multiple applications. In both of these cases no new material can be submitted to the USPTO after the ‘priority date’ – It’s just a convenient way to let inventors file information about their invention quickly, and have a little more time to get the legal format in full compliance. A provisional application does have 1 year to be turned into a a non-provisional application — but there’s no backwards clock to claim priority based on your filing date.

I’ve been growing extremely unsatisfied with Apple over the past few years, and I’d love to see them get ‘checked’ by the masses over an issue. Unfortunately, there is simply no issue here.

*Update: The brilliant folks at TechCrunch have just stoked the fire on this matter too, citing the original posting and then improperly jumping to their own conclusions. They must be really desperate for traffic today. Full Article Here*

Don't Get Too Excited About the LOC's Copyright Decisions

Today the Library of Congress announced new laws ( perhaps more accurately interpretations of existing laws , as their rulings created ‘exemptions’ to the DMCA laws ) designed to strengthen the concepts of “Fair Use” in as it applies to the corpus of U.S. Copyright law.

The LOC’s decisions are both shocking and enlightening — few expected such an interpretation could ever be possible given the extensive amount of lobbying special interests spend before lawmakers.

Honestly, while I’d agree that their decisions are “correct” and within the spirit of the law, I’m completely fucking floored they had the balls to do the right thing. This is – undoubtedly – a HUGE day in U.S. law.

In a nutshell, the Library of Congress said that it does not violate the DMCA or Copyright Law to circumvent digital protections — that is to say that one is free to descramble a DVD for legal use, jailbreak a digital device (ie: iphone), or circumvent a hardware dongle for legally obtained software. For years people have said that a common-sense and fair interpretation of the law should allow for these things — but industry lobbies used highly paid lawyers with bizarre reasonings and countless campaign donations to influence the development of laws to suit their interests.

While I’m very excited about this win for democracy and fairness, I’m not entirely sure that the decisions are anything to be excited about in terms of ‘resolution’ to these issues.

While the Library of Congress has clarified the law to allow for these types of uses as *not* a violation of Copyright , they have not (nor are they probably able to) ensured that these are rights that may not be given up through contract law.

For decades, lawyers have relied upon contract law to make up for deficiencies in copyright law – creating new protections for their clients by sidestepping any arguments around copyright. For example, while it would not be a violation of US Copyright Law (under the new interpretations) for a user to modify Apple’s software, there could exist a contractual clause — like an End User License Agreement [EULA] or Terms of Service [TOS] between a consumer and Apple or their cellphone carrier to make modification of the device prohibited. Apple could then sue customers based not on Copyright, but on Contract Law.

If you don’t think these types of contracts would come into play, look at the full text of TOS and EULA of software that you buy… or websites that you use like Facebook or MySpace. You might note numerous passages that talk about who can access the servers and under what conditions — large media companies like these routinely use Contract law to chip away at access to fair-use content. Expecting industries to become more relaxed at this practice, while they lose certain copyright protections they believed they had, is nothing short of ridiculous.

It would have been truly remarkable if Congress were to ensure that people have irrevocable rights to circumvent copy protections and modify devices — rights that can not be given up or outlawed within any contract. Sadly we don’t have that yet. However, this decision also means that the numerous lawsuits that the media lobby might bring up in these areas would not be in federal courts and handled by federal investigative agencies — but that they would be in civil courts with the plaintiffs responsible for their entire bill. I’ll drink to that!