The Dangers of URL Shorteners

In preparation for the next release, I just did an audit of all the errors that Aptise/Cliquedin encountered while indexing content.

Out of a few million URLs, there were 35k “well formed” urls that couldn’t be parsed due to “critical errors”.

Most of these 35k errors are due to URL shorteners. A small percentage of them are from shorteners that have dead/broken links. A much larger percentage of them are from shorteners that just do not seem to perform well at all.

I hate saying bad things about any company, but speaking as a former publisher — I feel the need to talk bluntly about this type of stuff. After pulling out some initial findings, I ran more tests on the bad urls from multiple unrelated IP addresses to make sure I wasn’t being singled out for “suspicious” activity. Unfortunately, the behavior was consistent.

The worst offenders are:

* wp.me from WordPress
* ctx.ly from Adobe Social Marketing
* trib.al from SocialFlow when used with custom domains fom Bitly

The SocialFlow+Bitly system was overrepresented because of the sheer number of their clients and urls they handle — and I understand that may skews things — but they have some interesting architecture decisions that seem to be reliably bad for older content. While I would strongly recommend that people NOT use any of these url shortening services, I more strongly recommend that people do not use SocialFlow’s system with a Custom Domain through bitly. I really hate saying this, but the performance is beyond terrible — it’s totally unacceptable.

The basic issue across the board is this: these systems perform very well in redirecting people to newer content (they most likely have the mapping of newer unmaksed urls in a cache), but they all start to stall when dealing with older content that probably needs a database query. I’m not talking about waiting a second or two for a redirect to happen — I’m consistently experiencing wait times from these systems that are reliably over 20 seconds long — and often longer. When using a desktop/mobile web browser, the requests reliably timeout and the browser just gives up.

While wp.me and ctx.ly have a lag as they lookup a url to redirect users to a masked url, the SocialFlow + Bitly combo has a design element that works differently:

1. Request a custom shortened url `http://m.example.com/12345`
2. The shortened url is actually served by Bitly, and you will wait x seconds for lookup + redirect
3. Redirected to a different short url on trib.al (SocialFlow) http://trib.al/abcde
4. Wait x seconds for second lookup + redirect
5. Redirected to end url `http://example.com/VERY+LONG+URL”

All the shorteners could do a better job with speeding up access to older and “less popular” content. But if I were SocialFlow, I would remove Bitly from the equation and just offer custom domains myself. Their service isn’t cheap, and from the viewpoint of a publisher — wait times like this are totally unacceptable.

Facebook Developer Notes – Javascript SDK and Asynchronous Woes

I’m quickly prototyping something that needs to interact with Facebook’s API and got absolutely lost by all their documentation – which is plentiful, but poorly curated.

I lost a full day of time trying to figure out why my code wasn’t doing what I wanted it to do, trying to understand how it works so I could figure out what I was actually telling it to do. I eventually hit the “ah ha!” moment where I realized that by following the Facebook “getting started” guides, I was telling my code to do embarrassingly stupid things. This all tends to dance around the execution order , which isn’t really documented at all. Everything below should have been very obvious — and would have been obvious, had I not gone through the “getting started” guides, which really just throws you off track.

Here’s a collection of quick notes that I’ve made.

## Documentation Organization

Facebook has made *a lot* of API changes over the past few years, and all the information is still up on their site… and out on the web. While they’re (thankfully) still supporting deprecated features, their documentation doesn’t always say what is the preferred method or not – and the countless 3rd party tutorials and StackOverflow activity don’t either. The “Getting Started” documentation and on-site + github code samples also doesn’t tie together with the API documentation well either. If you go through the tutorials and demos, you’ll see multiple ways to handle a login/registration button… yet none seem to resemble what is going on in the API. There’s simply no uniformity, consistency, or ‘official recommendations’.

I made the mistake of going through their demos and trying to “learn” their API. That did more damage than good. Just jump into the [Javascript SDK API Reference Documentation](https://developers.facebook.com/docs/reference/javascript/) itself. After 20 minutes reading the API docs themselves, I realized what was happening under the hood… and got everything I needed to do working perfectly within minutes.

## Execution Order

The Javascript SDK operations in the following manner:

1. Define what happens on window.fbAsyncInit – the function the SDK will call once Facebook’s javascript code is fully loaded. This requires, at the very least, calling the FB.init() routine. FB.init() registers your app against the API and allows you to actually do things.
2. Load the SDK. this is the few lines of code that start “(function(d){ var js, id = ‘facebook-jssdk’;…” .
3. Once loaded, the SDK will call “window.fbAsyncInit”
4. window.fbAsyncInit will call FB.init() , enabling the API for you.

The important things to learn from this are :

1. If you write any code that touches the FB namespace _before_ the SDK is fully loaded (Step 3), you’ll get an error.
1. If you write any code that touches the FB namespace _before_ FB.init() is called (Step 4), you’ll get an error.
1. You should assume that the entire FB namespace is off-limits until window.fbAsyncInit is executed.
1. You should probably not touch anything in the FB namespace until you call FB.init().

This means that just about everything you want to do either needs to be:

1. defined or run after FB.init()
2. defined or run with some sort of callback mechanism, after FB.init()

That’s not hard to do, once you actually know that’s what you have to do.

## Coding Style / Tips

The standard way the Facebook API is ‘instructed to integrated is to drop in a few lines of script. The problem is that the how&why this works isn’t documented well, and is not linked to properly on their site. Unless you’re trying to do exactly what the tutorials are for – or wanting to code specific Facebook API code on every page, you’ll probably get lost trying to get things to run in the order that you want.

Below I’ll mark up the Facebook SDK code and offer some of ideas on how to get coding faster than I did… I wasted a lot of time going through the Facebook docs, reading StackOverflow and reverse engineering a bunch of sites that had good UX integrations with Facebook to figure this out.

// before loading the Facebook SDK, load some utility functions that you will write

One of the move annoying things I encountered, is that Facebook has that little, forgettable, line in their examples that read:

// Additional initialization code here

You might have missed that line, or not understood its meaning. It’s very easy to do, as its quite forgettable.

That line could really be written better as :

// Additional initialization code here
// NEARLY EVERYTHING YOU WRITE AGAINST THE FACEBOOK API NEEDS TO BE INITIALIZED / DEFINED / RUN HERE.
// YOU EITHER NEED TO INCLUDE YOUR CODE IN HERE, OR SET IT TO RUN AFTER THIS BLOCK HAS EXECUTED ( VIA CALLBACKS, STACKS, ETC ).
// (sorry for yelling, but you get the point)

So, let’s explore some ways to make this happen…

In the code above I called fb_Utils.initialize() , which would have been defined in /js/fb_Utils.js (or any other file) as something like this:

// grab a console for quick logging
var console = window['console'];


// i originally ran into a bunch of issues where a function would have been called before the Facebook API inits.
// the two ideas i had were to either:
// 1) pass calls through a function that would ensure we already initialized, or use a callback to retry on intervals
// 1) pass calls through a function that would ensure we already initialized, or pop calls into an array to try after initialization
// seems like both those ideas are popular, with dozens of variations on each used across popular sites on the web
// i'll showcase some of them below

var fb_Utils= {
	_initialized : false
	,
	isInitialized: function() {
		return this._initialized;
	}
	,
	// wrap all our facebook init stuff within a function that runs post async, but is cached across the site
	initialize : function(){
		// if you wanted to , you could migrate into this section the following codeblock from your site template:
		// -- FB.init({
		// --    appId : 'app_id'
		// --    ...
		// -- });
		// i looked at a handful of sites, and people are split between calling the facebook init here, or on their templates
		// personally i'm calling it from my templates for now, but only because i have the entire section driven by variables


		// mark that we've run through the initialization routine
		this._initialized= true;

		// if we have anything to run after initialization, do it.
		while ( this._runOnInit.length ) { (this._runOnInit.pop())(); }
	}
	,
	// i checked StackOverflow to see if anyone had tried a SetTimeout based callback before, and yes they did.
	// link - http://facebook.stackoverflow.com/questions/3548493/how-to-detect-when-facebooks-fb-init-is-complete
	// this works like a charm
	// just wrap your facebook API commmands in a fb_Utils.ensureInit(function_here) , and they'll run once we've initialized
	ensureInit :  function(callback) {
		if(!fb_Utils._initialized) {
			setTimeout(function() {fb_Utils.ensureInit(callback);}, 50);
		} else {
			if(callback) { callback(); }
		}
	}
	,
	// our other option is to create an array of functions to run on init
	_runOnInit: []
	,
	// we can then wrap items in fb_Utils.runOnInit(function_here) , and they
	runOnInit: function(f) {
		if(this._initialized) {
			f();
		} else {
			this._runOnInit.push(f);
		}
	},
	// a few of the Facebook demos use a function like this to illustrate the api
	// here, we'll just wrap the FB.getLoginStatus call , along with our standard routines, into fb_Utils.handleLoginStatus()
	// the benefit/point of this, is that you have this routine nicely compartmentalized, and can call it quickly across your site
	handleLoginStatus : function(){
			FB.getLoginStatus(
				function(response){
					console.log('FB.getLoginStatus');
					console.log(response);
					if (response.authResponse) {
						console.log('-authenticated');
					} else {
						console.log('-not authenticated');
					}
				}
			);
		}
	,
	// this is a silly debug tool , which we'll use below in an example
	event_listener_tests : function(){
		FB.Event.subscribe('auth.login', function(response){
		  console.log('auth.login');
		  console.log(response);
		});
		FB.Event.subscribe('auth.logout', function(response){
			  console.log('auth.logout');
			  console.log(response);
		});
		FB.Event.subscribe('auth.authResponseChange', function(response){
			  console.log('auth.authResponseChange');
			  console.log(response);
		});
		FB.Event.subscribe('auth.statusChange', function(response){
			  console.log('auth.statusChange');
			  console.log(response);
		});
	}
}

So, with some fb_Utils code like the above, you might do the following to have all your code nicely asynchronous:

1. Within the body of your html templates, you can call functions using ensureInit()

fb_Utils.ensureInit(fb_Utils.handleLoginStatus)
fb_Utils.ensureInit(function(){alert("I'm ensured, but not insured, to run sometime after initialization occurred.);})

2. When you activate the SDK – probably in the document ‘head’ – you can decree which commands to run after initialization:

window.fbAsyncInit = function() {
	// just for fun, imagine that FB.init() is located within the fb_Utils.initialize() function
	FB.init({});
	fb_Utils.runOnInit(fb_Utils.handleLoginStatus)
	fb_Utils.runOnInit(function(){alert("When the feeling is right, i'm gonna run all night. I'm going to run to you.");})
	fb_Utils.initialize();
};

## Concluding Thoughts

I’m not sure if I prefer the timeout based “ensureInit” or the stack based “runOnInit” concept more. Honestly, I don’t care. There’s probably a better method out there, but these both work well enough.

In terms of what kind of code should go into the fb_Utils and what should go in your site templates – that’s entirely a function of your site’s traffic patterns — and your decision of whether-or-not a routine is something that should be minimized for the initial page load or tossed onto every visitor.

Everyone's talking about the need for a privacy oriented Open Source solution for an open social graph

And a lot of people are asking me “Weren’t you doing that four years ago?”

Well yes, I was. In fact I still do.

My company FindMeOn Open Sourced a lot of technology that enables a private and security based open social graph, in 2006

The [findmeon node standard](http://findmeon.org/projects/findmeon_node_standard/index.html) allows people to create ad-hoc links between nodes in an graph. Cryptographic key signing allows publicly unconnected links to be verifiably joined together to trusted parties.

Our commercial service manages node generation and traversing the graph. Even using an account linked to a third party, as ourselves, privacy is maintained .

– [A syntax highlighted example is on the coprorate site](http://findmeon.com/tour/?section=illustrated_example)
– [The way the commercial + open source stuff melds is explained in this image](http://findmeon.com/tour/?section=abstracted)

There’s also a bunch of graphics and images related to security based inter-network social graphs on my/our Identity research site. A warning though, half of it is about monetizing multi-network social graphs:

– [IdentityResearch](http://www.destructuring.net/IdentityResearch)

Twitter is worth a lot, Twitter advertising is not, Bad journalism is worthless

I set out to write a quick correction on a bad article that was discussed on the NY-Tech mailing list earlier this week, but this ends up being half about why Technology journalists and bloggers should just stop – as they rarely know what they’re talking about.

The article “How Much Are Twitter’s Tweets Really Worth?” on BusinessWeek.com has been gaining a bit of buzz across the industry this week. It’s a pretty good summation about how advertising works on Twitter – not because it’s a concise overview, but because it’s about as mindless and poorly conceived an article as the concepts that it speaks about. The writer, Spencer E. Ante, is an associate editor for Business Week. He has an impressive resume and articles behind him, so perhaps this was a postmodern experiment, or maybe he was just hungover from New Years eve. Whatever the explanation is, I’d love to hear it – as its the worst written article I’ve read in ages. The article is no longer online, so I’ll have to use quotes from a cached version in my criticism below. Let’s all take a moment and thank the “Fair Use” clause of US Copyright Law.

# UPDATE
The article’s disappearance was not because of a paywall issue, but because it was – indeed – a steaming pile of shit. Businessweek now states:
> This story contained a factual error that rendered its premise incorrect. The story is no longer available. We regret the error.

I’m keeping this up, not to “rub it in”, but to note that the “factual errors” and “incorrect premise” are something that are pandemic to technology journalism. Writers at BusinessWeek, TechCrunch, Mashable, etc rarely know what they’re talking about – and giving them a podium to stand on is just… dangerous.

# Bad journalism is worthless , Twitter is worth a lot

The first half of Ante’s story is a schizophrenic overview of the recent search deals Twitter signed with Google and Microsoft. Ante starts:
> Google and Microsoft are paying Twitter $25 million to crawl the short posts, or tweets, that users send out on the micro-blogging service. It sounds like big money.

Sounds like big money? That **is** big money – Twitter is making $25 Million dollars to give two search engines a ToS license and access to index their data. In a world where Search Engine Optimization is a skillset or service, Twitter is getting paid by the major engines so they can optimize themselves. This is pretty much unheard of.

For whatever reason though, Ante then goes on to comment:
> But do the math and the payments look less impressive. Last year, Twitter’s 50 million users posted 8 billion tweets, according to research firm Synopsos, which means Google and Microsoft are paying roughly 3¢ for every 1,000 tweets. That’s a pittance in the world of online advertising.

This is where Ante shows that he must be drunk, hungover, or a complete idiot: This deal has absolutely nothing to do with online advertising. Google and Microsoft aren’t paying to advertise on Twitter, they’re paying to be able to show tweets in their own search engines. In fact, given how the integration of this deal works – where Tweets appear in the search engine results with a link back to Twitter – it should be Twitter who is paying the search engines. This is a syndication deal, not an advertising one. And this is to syndicate user-generated-content, not editorial! Twitter now has a giant ad, at the top of most search engine pages as syndicated content , and they got **paid** for it! Getting paid to advertise your brand, instead of paying for it, isn’t a pittance – it’s brilliant, revolutionary, and (dare I say) mavericky.

One of my companies is a media site. We’re not a “top media site” yet, but we’re hoping to grow there. Handling technology and operations, I deal with advertising networks from the publisher side a lot. Another one of my companies is advertising oriented, with a focused on optimizing online media buying and selling. Suffice to say, I know the industry well – which is why I find Ante’s next bit of information troubling:
> Top media sites often get $10 or $20 per thousand page views; even remnant inventory, leftover Web pages that get sold through ad networks, goes for 50¢ to $1 per thousand.

Here’s a quick primer. If you’re a media site with a decent enough brand or demographic, regardless of being at the “top” , you’re getting a fairly decent CPM. I don’t think Ante’s numbers are “right” for “top media sites” – in reality, top media destinations are a bit higher per inventory slot. Additionally, most web pages have multiple slots which together create a “Page CPM” that is the combination of the two. While each slot might get $10-20 , an average of 2 slots on a page would net $20-40. If you look at ad networks that publish their rates (like the premier blog network FederatedMedia.net) , or speak to a friend in the industry, you’ll get instant confirmation on this.

In terms of the remnant inventory, I think these numbers are even more off. Remnant inventory for random, run-of-the-mill websites and social networks will absolutely run in the 10¢ to $1 range. “Top” media sites are of a different caliber, and will monetize their remnant inventory at a higher range, usually in the $2-8 range, or utilize a behavioral tracking system that will net CPMs in that similar $2-8 range.

My main issue with this passage has nothing to do with numbers. What I find even more inappropriate, and wholly irresponsible, is that Twitter is not a “Top Media Site”. Twitter is undoubtedly a “Top Site”, however it is a social network or service. Twitter is not about providing media or content, it is about transactional activity and user-generated content. This is a big different in terms of online advertising. For a variety of reasons ( which mostly tie in to consumer attention span and use cases ) Social Networks have a significantly lower CPM – with most monetizing at a sub $2CPM rate, and a few occasionally breaking into a $2-8 range.

Ante’s comparisons just aren’t relevant in the slightest bit. Across the entirety of his article. But hey, there’s a quote to support this:
> The deals put “almost no value” on Twitter’s data, says Donnovan Andrews, vice-president of strategic development for the digital marketing agency Tribal Fusion.

Really? Really? A $25 Million Dollar deal to syndicate user-generated-content, puts “almost no value” on that data ? Either this quote must have been taken out-of-context, Donnovan Andrews has no idea what he’s talking about, or I just haven’t been given keys to the kool-aid fountain yet. Since Donnovan and I have a lot of friends in common (we’ve never met), and journalists tend to do this sort of thing… I’m going to guess that the quote is out of context.

# Twitter advertising is not (worth a lot)

The second half of Ante’s article is a bit more interesting, and shows the idiocy of Twitter advertisers:
> A few entrepreneurs are showing ways to advertise via Twitter. Sean Rad, chief executive of Beverly Hills-based ad network Ad.ly, has signed up 20,000 Twitter users who get paid for placing ads in their tweets. To determine the size of the payments, the startup has developed algorithms that measure a person’s influence. Reality TV star Kim Kardashian, with almost 3 million followers, gets $10,000 per tweet, while business blogger Guy Kawasaki fetches $900 per tweet to his 200,000 fans.

Using Twitter for influence marketing like “Paid Tweets” is a great idea – however these current incarnations are heavily favoring the advertising network, not the advertiser.

There is absolutely no way, whatsoever, to measure “reach” on Twitter – the technology, the service, and the usage patterns render this completely impossible. The number of Followers/Fans is a figure that merely represents “potential reach”; trying to discern the effective reach of each tweet is just a crapshoot.

When an advertiser purchases a CPM for an ad, they purchase 1000 impressions of the ad in a user’s browser. Software calculates the delivery of each ad to a browser, and those programs are routinely audited by respected accounting firms to ensure stability. Most advertisers, and all premium rate (as above) advertisers have strict requirements as to how many ads can be on a page (standard: max 2-3) and the position (require ads to be “above the fold”). 1000 deliveries roughly equates to 1000 impressions.

When an advertiser purchases a CPM on an email, they purchase 1000 deliveries of the email, featuring their ad, to users’ inboxes. When emails bounce or are undeliverable, they don’t count against this number – only valid addresses do. The 1000 deliveries are , usually, successful email handoffs. A term called the “Open Rate” refers to the percentage of those 1000 emails that are actually opened by the user, and load the pixel tracking software (this method usually works, it is not absolute but good enough). Typical Open Rates vary by industry, but tend to hover around a global 25%; with content-based emails around 35% , and marketing messages at 15%. With these figures in mind, 1000 email deliveries roughly equates to 250 impressions.

When an advertiser purchases a CPM on a Twitter, they merely purchase a branded endorsement (which is very valuable in its own right) that has a potential reach of X-Followers. This number of followers does not equate to the number of people who will see the tweet “above the fold”, nor does it equate to the number of people who will see the tweet on their page at all. Twitter has absolutely no offerings ( at the current time ) to count the number of people exposed to a tweet on their website – either at all, or in accordance with an optimal advertising situation. Twitter has itself stated that 80% of their traffic comes from their API – which makes those capabilities technically impossible for that traffic.

Gauging the number of Tweets sent out over the API won’t work either — Twitter applications built on the API tend to have “filtering” capabilities, designed to help users make sense of potentially hundreds of Tweets that come in every hour. When these client-side lists or filters are used, sponsored tweets may be delivered to the application- but they are never rendered on screen

Looking at common use-patterns of Twitter users, if someone is following a handful of active users, all Tweets that are at least an hour old will fall below the fold… and tweets that are older than two hours will fall onto additional pages. This means that twitter users would effectively need to be “constantly plugged in” to ensure a decent percentage of impressions on the sponsored tweets.

A lot of research has gone into understanding usage patterns in Twitter, as people try to derive what “real” users are: a significant number of Twitter accounts are believed to be “inactive” or “trials” – users who are following or followed-by less than 5-10 users; the projected numbers for “spam” accounts fluctuates daily. Even in the most conservative figures, these numbers are well into the double digits.

Social Marketing company Hubspot did a “State of the Twittersphere, June 2009” report. Some of their key findings make these “pay per tweet” concepts based on the number of followers even more questionable. Most notably, Hubspot determined that a “real” Twitter user tweets about once per day (the actual number is .97). Several different Twitter audits have pegged the average number of accounts followed by ‘seemingly real’ accounts ( based on number of followers, followings, and engagements with the platform, etc ) to be around 50 – so an average user should expect about 50 subscribed Tweets daily as well. The Twitter.com site shows 5 tweets “above the fold” ( which represents 20% of their traffic, and a quick poll twitter clients shows an average of 7 ). Assuming Tweets are spread out evenly during the day, an average user would need to visit Twitter about 9 times a day in order to ensure seeing sponsored Tweets. In the online publishing and social media world, expecting 9 visits per day, every day, by users is… ridiculously optimistic. Realistically, users likely experience a backlog of older, unseen, tweets on login – and sponsored tweets get lost in the mix.

As I stated before, the “celebrity advocacy” concept of a sponsored Tweet is very desirable concept for advertisers — and one that would decidedly command a higher rate than other forms of advertising. However, the concept of “Actual Reach” on Twitter is nebulous at best. A better pricing metric for Twitter-based advertising would be CPC (cost per click ) or CPA ( cost per action ) , where tweeters would be paid based on how many end-users clicked a link or fully completed a conversion process.

The "Bra Colors Facebook Status Meme" isn't really about Breast Cancer.

A bunch of Social Media Blogs and Journalists are reporting that there is a viral Social Media Breast Cancer Awareness Campaign, in which women post the colors of the bras as a Facebook status.

It’s a neat idea for a story, but its not true.

Aside from the fact that this viral campaign isn’t organized by any Breast Cancer Awareness non-profit or an advertising agency, and its a really bad idea for a Breast Cancer awareness campaign [ a) it’s more appropriate for lingerie designers, b) it dilutes the association with pink that the Susan G. Komen foundation has been fighting for ] one only needs to do a quick web-search to discover that this is really a weeks-old chain-letter meme that is constantly morphing and getting hijacked.

A week ago, someone posted a question on [Answers.Yahoo.com](http://answers.yahoo.com/question/index;_ylt=Ahbp.aJJpkF6aYN1JYX.rdIjzKIX;_ylv=3?qid=20091229223537AAHTqYE) , and a respondent copy-pasted the text as it appeared then:
> right girls let’s have some fun. write the color of the bra you’re wearing right now as your status on fb and dont tell the boys. they will be wondering what all the girls are doing with colors as their status. forward this to all the girls online

Several other respondents confirmed this was the letter in that posting, and this is only one of dozens of similar explanations of this across the internet dated last week.

At some point over the last few days, someone decided to hijack the meme and make it a little more socially responsible – and they added the Breast Cancer bit to it. It’s nice, and its sweet, and its a great way to turn around a stupid internet joke into something serious. If someone looks at a one of these Facebook status postings today, no matter the author’s intent, they’ll associate with a tie-in to Breast cancer, since that’s what current media coverage states.

Nevertheless, the meme is not necessarily about Breast Cancer awareness. It’s currently getting interpreted as such, but only some participants share that intent.

OpenID is bad for Registration

OpenID is a really useful protocol that allows users to login and authenticate — and I’m all for providing users with services based on it — but I’ve ultimately decided that it’s a bad idea when Registration is involved.

The reason is simple: in 99% of implementations, OpenID merely creates a consumer of your services; it does not create a true user of your system — it does not create a customer.

Allowing for OpenID registrations only gives you a user that is authenticated to another service. That’s it. You don’t have an authenticated contact method – like an email address, phone number, screen name, inbox, etc; you don’t have a channel to contact that customer for business goals like customer retention / marketing, or legal issues like security alerts or DMCA notices.

The other 1% of implementations are a tricky issue. OpenID 1.0 has something called “Simple Registration Extensions”, some of this has been bundled into 2.0 along with “Attribute Exchange”. These protocols allow for the transfer of profile data, such as an Email Address, from one party to another — so the fundamental technology is there.

What does not exist is a concept of verifiability or trust. There is no way to ensure that the email address or other contact method provided to you is valid — the only thing that OpenID proves, is that the user is authoritatively bound to their identity URL.

The only solution to this problem is for websites to limit what systems can act as trusted OpenID providers — meaning that my website may trust an OpenID registration or data from a large provider like MySpace or Facebook, but not from a self-hosted blog install.

While this seems neat on some levels, it quickly reduces OpenID to merely be a mechanism for interacting with established social sites — or, perhaps better stated, a more Open Standards way of implementing “Facebook Connect” across multiple providers. A quick audit of sites providing users with OpenID logins limited to trusted partners showed them overwhelmingly offering logins only though OpenID board members. In itself, this isn’t necessarily bad. My company FindMeOn has been offering similar registration bootstrapping services based on a proprietary stack mixed with OpenId for several years; this criticism is partially just a retelling of how others had criticized our products — that it builds as much user-loyalty into the Identity Providing Party as it does into the Identity Requesting Party. In layman’s terms – that means that offering these services strengthens the loyalty of the consumer to company you authenticate to as much as it offers you a chance to convert that user. In some situations this is okay – but as these larger companies continue to grow and compete with the startups and publishers that build off their platforms, questions are spawned as to whether this is really a good idea.

This also means that if you’re looking at OpenID as a registration method with some sort of customer contact method ensured, you’re inherently limited to a subset of major trusted providers OR going out and signing contracts with additional companies to ensure that they can provide you with verified information. In either situation, OpenID becomes more about being a Standards Based way of doing authentication than it is about being a Distributed Architecture.

But consider this — if you’re creating some sort of system that is leveraging into the large-scale social network to provide identity information, OpenID may be too limiting. You may get to work with more networks by using the OpenID standard, but your interaction will be minimal; If you were to use the network integration APIs , you could support fewer networks, however you’d be able to have a richer — and more viral — experience.

Ultimately, using OpenID for registration is a business decision that everyone needs to make for their own company — and that decision will vary dependent upon a variety of factors.

My advice is to remember these key points:

– If the user interaction you need is simply commenting or ‘responding’ to something, binding to an authoritative URL may suffice

– If the user interaction you need requires creating a customer, you absolutely need a contact method : whether it’s an email, a verified phone number, an ability to send a message to the user on a network, etc

– If you need a contact method, OpenID is no longer a Distributed or Decentralized framework — it is just a standards based way of exchanging data, and you need to rely on B2B contracts or published public policies of large-scale providers to determine trust.

– Because of limited trust, Network Specific APIs may be a better option for registration and account linking than OpenID — they can provide for a richer and more viral experience.

On that 'Zombie Photos' report…

CNN and BBC have both covered something called ‘Attack of the Zombie Photos’ – an experiment out of the University of Cambridge that tested to see how long a photo that was deleted by a website would really be deleted.

I found the experiment to be incredibly flawed and misleading.

The researches tested not the networks themselves, but internet cache copies. So a network could very well have deleted the image from their servers, but that change(deletion) had not propagated to their Content Delivery Network (CDN) in time — ie: the photo was primarily deleted from their servers, but a distribution copy on another (possibly 3rd party) server had yet to be deleted or timed out.

While the researchers did indicate that they were testing the CDN in a graph, their text barely made mention of it, their analysis none, and they routinely called the CDNs “photo servers” — as if they were the primary record. It seems as if the report was more about FUD ( fear , uncertainty , doubt ) than examining real issues.

You can view the report here : [Attack of the Zombie Photos](http://www.lightbluetouchpaper.org/2009/05/20/attack-of-the-zombie-photos)

My comment is below

> I think this experiment is great in theory, but flawed in practice and conclusions.
You are not testing to see if an image is deleted from the Social Network, but from their CDN. That is a HUGE difference. These social networks may very well delete them from their own servers immediately, but they are not exposed to the general internet because a (often third party) cache is employed to proxy images from their servers to the greater internet. Some of these caches do not have delete functionality through an API, the content – whatever it is – just times out after x hours of not being accessed. It also is often ‘populated’ into the cache by just mapping the cache address onto the main site. Example: http://cdn.img.network.com/a may be showing content for several hours that was deleted from http://img.network.com/a

>Perhaps you know this already – but in that case you are presenting these findings in a way that serves your point more than the truth of the architecture.

>In terms of your inference of the EU and UK acts, I wouldn’t reach the same conclusions that you have. Firstly, one would have to decide that an unmarked photo, living at an odd cache address with no links in from a network identifying it or its content, would be deemed “personally-identifiable data” — I would tend to disagree. Secondly, while the purpose of it may be to “share it”, it would really be “share it online” – and dealing with cache servers and the inherent architecture of the internet , I think the amount of time for changes to propagate after a request for deletion would easily satisfy that requirement. I also wonder if the provision to access ‘user data’ means that it is done in real time or in general. I’m pretty sure all these sites store metrics about me that i can’t see.

>Again, I will also reiterate that we are talking about ‘cached’ data here — and that the primary records have been deleted of the requested data. At what point do you feel that privacy acts and litigation should force the use to access / view *every* bit of data stored :
– primary record
– server caches
– data center/isp caches
– network ( university , business , building , etc ) caches
– computer / browser caches

> Your arguments open up a ‘can of worms’ with the concepts of network optimization. I wouldn’t be surprised if your university operates a server on its internet gateway that caches often requested images — would they too be complicit in this scheme for failing to delete them immediately ? How would they even know to do so ? How could the network operator identify and notify every step in the chain that has ever cached an instance of the image ?

On Terms of Service and Privacy Policy

Elias Bizannes and some other folks from the DP project have started working on a way to unify network legal contracts.

A little over a year ago I set out on the same path, trying to bootstrap a “Social Media Standards” organization. We’ve both come to many of the same conclusions, some differences, and focused on some different areas — as he spent much time with networks, while I spent more time with startups and ad firms.

## Here are some key points

– I don’t think that a universal legal doc is possible, recommended, or even a good idea. All the portability concepts tie in very strongly to a company’s business operations — it’s both unrealistic and arrogant to mandate ‘you will do this!’. However I think clarity and guidelines are in order.

– I propose the following:

– a ‘legend’ of datapoints / concepts , where there is a menu of set options that network operators can choose from
– each datapoint and option has an iconic, easy to read, representation… very much like the CC licenses
– there are several recommended configurations of datapoints & options that have trademarked names
– operators may also create customized configurations that reference individual icons

This approach would gives users the ability to identify and easily read TOS agreements, while affording network operators flexibility. In other words — adopting this system could never conceivably hurt their business.

– Enforceability is an issue, as are the differences in legal concepts and wording between countries. Who can sue? How? Where? My idea has been to use SocialMediaStandards as a non-profit licensing group : networks would be able to say that their legal contracts are compatible with specific legal concepts or iconic configurations offered by the group ; in doing so and displaying the trademarked images , they would be liable to the group under contract law if they should make false claims. This would allow the group to litigate on behalf of end users who would be otherwise unable to do so, and greatly simplify enforcement as some other legal concepts get thrown into the mix. Users would still be able to sue for breach-of-contract, fraud, misrepresentation as well — this would give the group the ability to file suit as well.

– I identified common sets of datapoints , and broken them each into 2 categories : Content and Activity. I think each should be treated differently. Content is what a user directly enters into a network, Activity is the networks’ value-add. ie: I can upload a bio (content) and then there is the number of times that bio has been viewed (activity). For every datapoint, I believe there should be the same – but independant, options to regulate content and activity.

– Elias did something similar, breaking things down into ‘nouns’ and ‘verbs’. There is a bit of overlap on both our concepts, but they’re still quite a bit different.

You can see my concepts [by clicking here](http://www.destructuring.net/IdentityResearch/Essays/2008_06/2008-06-SocialMediaStandards-PrivacyAndTos-InitialThoughts/2008-06-SocialMediaStandards-PrivacyAndTos-InitialThoughts.html)

You can see Elias’s concepts [by clicking here](http://wiki.dataportability.org/display/work/Elias+conceptual+framework)

## On ownership of data

Elias and I fought on this for a while. Then we realized that we were both a little drunk, and talking about the same thing: that a user does, and always should own their content — as afforded by copyright law in most countries.

Where we differ a bit is as follows, and is a bit of a controversial topic:

I believe that it is more than reasonable – and should be required – for a user to enter in a contract with a network , that grants an irrevocable non-exclusive license for the network to use and redistribute uploaded content in the original context once it has been interacted with by others. I don’t beleive in this clause/point for the sake of the network , but for the sake of other users. This concept is akin to publishing a letter-to-the-editor in a newspaper or magazine: you can’t undo things once published; people still have the ability to make clippings of the content. It’s also like loaning a photo to a friend — they may return the original, but copies may have been made and are floating around.

I believe this doesn’t affect the concept or legality of ownership. The user still owns their content, and has all legal rights to it. There is simply a non-exclusive license granted to the network to keep content active… such as in the event that a photo is commented on or added to another’s virtual photobook; or a thread of discourse on a topic doesn’t have key sections missing.

Some people believe that this concept strips them of rights. I believe they are attempting to create rights where none existed before. Once something has become public, become shared, it is impossible to undo it — one cannot take back their own words once they have been heard by others. I believe proof of this exists in the simple virtue that other users could simply screencap or printscreen on this content, and that while technology can allow things to be ‘undone’ it doesn’t mean that it should.

That being said, I believe that networks should require users to enter into a covenant with one another , and the network , to agree that items should be forever published. I also stress that contracts like this are more important for the other users as they are for the network.

Why Portability ?

Last week I had the pleasure of meeting up with Elias Bizannes of the DataPortability.org project a few times.

One day he asked me: Why portability ?

This was my answer:

Data portability is a trick, and a really good one at that. It’s not the be-all/end-all solution some make it out to be; while it offers some groups important advantages, to others it is more than threatening to their business concerns. What makes the concept of portability incredibly interesting, and brilliant in some ways, is the necessary balance of all concerned parties to actually pull it off. At it’s core, portability is far less about the concepts of portability and than it is about the democratization and commodification of Social Networks.

Portability is presented as a good thing for users, which it undoubtedly is on the surface. But- and this is a huge “but”… there is the an all important sell-in to the networks — they who actually have to implement ways to users to port in and port out. This offering to networks is complicated, because while porting ‘in’ makes sense, porting ‘out’ is an entirely different matter — and one that may be detrimental to a business. More importantly, while open standards and ‘libraries’ may be free, there are real and serious costs with implementing portability:
– engineering and coding costs : using architects, developers and network engineers to integrate these libraries and APIs
– administrative costs : making sure portability works within current legal contracts, creating new contracts, etc

Small / Niche networks look towards portability as an amazing opportunity — with a few clicks they can import thousands of new users, and for small sites integration can be a matter of hours. Under this premise, it makes sense for smaller groups to abide by the democratic principles of portability, and allow for information to port-out as freely as it ports in. There is no real downside

For Medium networks, or Large networks that have lost their prime , portability is a chance to streamline customer retention methods. By keeping profiles up to date, these networks can seem more lively to new users ( i.e. no more messages that read “Last updated in 2004″ ) — and they offer existing users the ability to browse the same unified & standardized data in a comfortable environment.

The concept of unifying & standardizing data resonates very well with me — I first tried to convince people this would happen in 2006, and in 2009 it has finally started to catch on. It’s really amazing seeing this happen. Before the advent of social networking, networks competed with one another based on their userbase — people migrated from network to network because of who was on it, a mixture of critical mass and critical usage; popularity of online networking, portability and network integration efforts have completely shifted that. Users and content are now the same no matter where you go – and this is increasing at a faster rate. Networks now compete as a layer of user experience and user interface for this data.

For network operators this can — and should — be liberating. The emancipation of users allows networks to stop wasting resources on antagonistic retention methods that lock people into their network… freeing internal resources that can be spent on product improvements, making it easier and better for users to share , connect and interact with others.

Simplest put, networks should focus on making products that consumers WANT to use, not products that consumers dislike or despise yet are locked into using for some reason. Whether they’re pushing for portability or not, virtually every social network or other consumer website is doing this right now, and its sad.

The allure of portability to large networks is an entirely different story. On the surface, portability offers little or no advantage to large networks. As sheppards and herders of massive userbases, networks rightfully fear openness as a way to lose the attention of their users. In deliberate steps, and under carefully controlled conditions, large networks have begun to test the waters… dictating how people can use their network off-site through platforming and ‘connecting’, and offering incredibly limiting export options.

Pundits like to use the term ‘opening the gates’ or ‘tearing down the walls’. I liken this form of tempered portability to ‘testing the waters’ and ‘opening a window’. Large networks are not embracing portability, they’re trying to simulate it on their terms , in ways that best leverage their brand identity and commercial offerings to retain consumer loyalty.

I personally think this is great — but it shouldn’t be called portability or ‘opening up’; this is simply a relaxed posturing.

What I dislike are the grand PR and marketing initiatives around large-scale ‘portability’ efforts. The large firms are all stuck in a cyclical pattern where one group ‘opens up’ a bit more than the last, forcing another group to try and outdo the last. This behavior of metered and restrained openness, and the creation and advocating of new ‘open’ standards that primarily drive the creator’s brand instead of users… this isn’t portability, this is sportability.

Portability and the true Open isn’t about half-assed , ill-conceived standards and initiatives that were designed to create PR buzz and just be open-enough to seem like a viable option. Portability is about getting stuff done with the right product, and putting the user front and foremost. We’re unfortunately left with a market-driven approach, where the large networks are in competition to release the least open standards they can, while still outdoing their competition.

While all of this is happening ‘on the surface’, there is a seedy underbelly to all this. Large networks realized an opportunity that they have all been looking towards and investing in — one which may not be so user friendly. Increased portability and inter-connectedness mean an opportunity for better consumer profiling — one that translates to higher better audience measurements and targeting, offering the chance for significant improvements in advertising performance. Portability offers networks a diamond in the rough. I had spent several years through FindMeOn developing audience profiling/targeting concepts, and quantifying the market opportunity and potential effects — they are huge. This should be rather unsurprising — you may have noticed that the largest proponents of portability efforts over the past few months are subsidiaries or sister companies to some of the world’s largest advertising networks and inventories.

As a quick primer: Social Networks make their money (if ever) either through subscription or advertising models; most are forced into ad-supported models because consumers just won’t pay. Ad supported models are at an odd moment in history right now: users have become so accustomed to ads, that they tune them out completely — dropping CPMs sharply. The transactional model of ‘do a task, watch an ad, repeat’ was overused to much, that it became ‘ask do a task, ignore an ad, do the first phase, ignore another ad, do another phase, ignore another ad’; no matter what networks do, the previous over-advertising has made a generation of users wholly oblivious to advertising — sp some social networks can only get 5-10¢ to show 1k ads of remnant inventory, while others can charge $3 to show the same amount of targeted ads. While that might look like a decent improvement, online advertising elsewhere is doing far better. Behavioral networks can often charge $10 CPM if they hit a user on a content site, and niche sites or strongly branded properties where ads are purchased as a mixture of direct and endemic advertising can generate $40 or more per CPM.

Social networks are left at an odd crossroads today: once a network grows to millions of users, the brand simply isn’t focused enough to be to offer reputable or effective endemic advertising; nor is the property likely to be niche enough to command premium CPMs for placement next to highly relevant content. Networks are unfortunately left with behavioral advertising – which should (and would) be doing better right now, if it weren’t for the overexposure/fatigue that users feel. However, portability efforts offer networks the chance to greatly improve behavioral advertising relevance.

So to summize my answer to the original question posed by Elias…”why portability ?”

> 1. If you’re a small or medium network, you’re going to pick up users.
> 2. If you’re a larger network, having your standard/platform adopted can result in market domination
> 3. If you’re a larger network, you have the potential to improve advertising revenue

Perhaps more than a decade in online business and advertising have left me a bit jaded, but I see little that is particularly grand or noble in these efforts. We’re not talking about curing cancer… we’re talking about making it easier to share photos, comment on things, and improving advertising. For industry professionals like myself , these are really exciting times — but let’s do each other a favor and tone down the idealism a bit and admit to / talk about the factors that are really driving all this. Maybe then we can start taking some real strides, instead of all these tiny little baby steps.

Collecting my thoughts on data portability & open systems

Last week I had the pleasure of meeting up with Elias Bizannes of the DataPortability.org project a few times.

We got to nerd out about different concepts – and our positions – on the overarching theme of integrated networks… and I thought I’d use my photographic memory (even when drinking Bookers’ all night ) to share my thoughts and some of his comments. That didn’t work out too well — or perhaps it did, as my recollections were fueled with bourbon.

In all seriousness, I haven’t spoken with most people on any of these concepts in at least a year, so it was completely fun for me… and given all the recent developments in this area, its nice to see how some attitudes have changed and new concepts have begun to take shape.

Over the next 3 days I’ll release a section of my thoughts in different areas. I like planning out postings like this — it gives me something to look forward to in terms of writing !