Everyone's talking about the need for a privacy oriented Open Source solution for an open social graph

And a lot of people are asking me “Weren’t you doing that four years ago?”

Well yes, I was. In fact I still do.

My company FindMeOn Open Sourced a lot of technology that enables a private and security based open social graph, in 2006

The [findmeon node standard](http://findmeon.org/projects/findmeon_node_standard/index.html) allows people to create ad-hoc links between nodes in an graph. Cryptographic key signing allows publicly unconnected links to be verifiably joined together to trusted parties.

Our commercial service manages node generation and traversing the graph. Even using an account linked to a third party, as ourselves, privacy is maintained .

– [A syntax highlighted example is on the coprorate site](http://findmeon.com/tour/?section=illustrated_example)
– [The way the commercial + open source stuff melds is explained in this image](http://findmeon.com/tour/?section=abstracted)

There’s also a bunch of graphics and images related to security based inter-network social graphs on my/our Identity research site. A warning though, half of it is about monetizing multi-network social graphs:

– [IdentityResearch](http://www.destructuring.net/IdentityResearch)

Use Case Scenarios are important for product development: The "Search" Feature

Whenever a new project starts, we do a few standard things:

– Identify the general product / idea
– Identify several classes of users it appeals to
– Draft Use Case Scenarios for each user class

If, for example, your project is a “game”:

– you might identify the general idea as a game played on a court where two teams each try to sink a ball into a basket;
– the user classes would be children, competitive sports – high school, college, professional , casual adults;
– a use case scenario might be an adult goes to a gym to work out and sees 5 other friends who want to play a game together.

Use cases can really help help you focus on specific product features — figuring out what have the greatest utility, broadest appeal, or largest differentiators against competitive goods and services. They’re often created both during team brainstorming sessions and as homework for the various client ‘stakeholders’ in a project. These stakeholders who best represent the end-consumers should create at least 1/3 of the Use Cases, and should sign-off on all of them. In a startup/corporate environment, that would mean the Product Manager and perhaps some C-Level executives; in an agency environment that would mean the Client and their team, not the internal strategist or team. Why? Because when the stakeholders drive the Use Case creation, you have better insight into the core business goals, market opportunity, and targeted user demographics.

Like everything else in your project, your Use Cases will shift with time as your product matures and you get a better idea of who your actual audience is — so you’ll always have to revisit them to update and add new scenarios. Despite this changing nature, it is unbelievably important to really think things through and create detailed use cases. In the past year alone, I’ve been part of three projects that all became seriously derailed and stressed because of bad Use Case Scenarios on the same exact product feature — the “Search” function — so I’ll use that as a paradigm.

In every situation, the original use cases described something very simple, like:

– I type in /chocolate/ and it shows me a list of recipes that match chocolate. like in the title.”

But then they progressed as the stakeholders used the first version:

– When I type in /chocolate/ it should show me a list of recipes that have chocolate in the title, or as an ingredient.”

And then they progress a little more:

– There is chocolate in the description of this item , and it’s not showing up in search. I meant for the description to be part of it too.”

And then…:

– “Someone commented and said this recipe could be good with chocolate, that should be in the search results. But it should go later in the results.”

Oh no:

– “Wait a second… why am I not seeing chefs/authors who write about chocolate. they’re most certainly relevant.”

And then, overload…:

– “This kinda works. But I should be able to narrow these results down, like in Yahoo or Google. And we should show more info from the recipe in here. What about a picture ? And misspellings / near spellings ? It should detect those. People spell certain ingredients differently. We have a lot of Europeans searching, how will é ç and other characters match in search or recipes ? This seems to be broken. It is broken. This sucks, you’re wasting my time and money.”

To the stakeholder , there is no difference between these requests — they specified a search function, and they expected it to work a certain way; the product team failed at each interval to deliver on their expectations. To the stakeholder, the search function is a “black box” — they don’t know and don’t care if the mechanics behind each iteration are different… it’s a search box!

To the product team though, each iteration was a completely different product and each one required vastly different amounts of resources.

The first iteration — searching on the title — was a simple and straightforward search on a single field… and described as such, a team would just search directly on the database. The resources allocated to this would be minimal – it’s literally a few lines of code to implement.

As the search use case gets refined, the product design moves from searching on a single field to searching on multiple fields — probably using joins and views — and calculating search results. By the end of the product refinement, its quite clear that a simple in-house search solution can’t deliver the experience or results the stakeholder actually wants, so we need to look into other solutions like Solr/Lucene , Sphinx or Xapian. These advanced options aren’t terribly difficult to implement — but they go beyond a single search function into running and maintaining separate search servers , configuring the engines, creating services to index documents, creating resultset rules for sorting, creating error-handlers for when the search system is down, etc etc etc. The simple “Search” button grew from a few lines of code into a considerable undertaking that requires dedicated people, days of work, and a constant tailoring of the resultset rules.

Eventually the product teams will scream “Feature Creep!” and a manager will flatly say “Out of Scope.” Items like this are unfortunately both — but they shouldn’t be. The intent and expectations of the stakeholder have rarely changed in this process, they just failed to articulate their wants and expectations. The blame, however, is shared: the client should have better described their needs; the product manager should have asked better questions and better managed the stakeholders “Use Case homework”.

With a properly written out Use Case Scenario — in which the stakeholder actually illustrates the experience they expect — the product team is likely recommended the latter scenario, and offer tiered suggestions leading up to the desired expectations with the resources/costs at each point.

Unfortunately the status quo is for stakeholders to half-ass the Use Case. Few product or project managers will pick up on the shortcoming , and the tech team will never pick up on it. So “Search” — and any other feature — gets reduced to a line item with little description or functional specification, and when development beings it becomes built in the easiest / simplest way to satisfy that request. This predictably and unilaterally results in expectations being failed and the project getting derailed. Not only are the simplest and most robust solutions to “search” built, but every single step in between — costing dollars and immeasurable team spirit and energy.

The old adage about medication — An ounce of prevention is worth a pound of cure — holds extremely well as a truth about product development. Articulating exactly what you want and need to accomplish before development begins will save dollars and countess hours of stress.

10 Startup / Interactive Lessons ( which I learned the hard way )

Over the past 12 years, I learned these 10 things the hard way.

# 10 You and your team are not your core audience.
You’re a super user, which probably corresponds to a 5-10% demographic of product traffic, and where you want your users to one-day be. You’ve got great insights and direction, but you can’t make a product that is only “for you”. Remember about that other 90%. Unless your business model suggests you can ignore them all! Generally speaking, 20% of your users will account for 80% of your traffic – so try to remember that other 10-15%. You should also track metrics every few weeks — see how your users break down into usage patterns, and see where your team falls in there.

# 9 If your team isn’t using your product on a daily basis – you need a new team, a new product, or both.
You’ve got a huge issue if your team isn’t using your product on a daily basis. They’re going to have different usage patterns than your core demographic, but if you’re not building something that they want-to or can use on a daily basis… you’ve either got the wrong team, the wrong product, or both. Don’t accept excuses, don’t try to rationalize behavior. The bottom line is that if your team isn’t full of passionate and dedicated users of your product, and you can’t sell them on it… how can you expect your team to convince consumers and investors ? You can’t.

# 8 “If you build it they will come” == bullsh*t
You need a solid marketing plan, for your site or your new features. Just putting something out there won’t suffice — people need to learn that your product is awesome. If you don’t have the resources to drive people to your product, rethink your resource allocations *immediately* — maybe you can scale back your vision to save some resources for marketing. People need to know that your product exists, and they’ll learn how to use it by good example — those are two tasks that your team needs to lead on. Also remember that despite what you think and how hard you work, whatever you build won’t be the most amazing thing in the world — so make sure you have resources budgeted to be nimble and respond to users…

# 7 Jack be Nimble, Jack be Quick…
If you’re a consumer oriented product, you’ll often need to change direction , add features, etc many many times after launching. You need a technology platform and internal process that lets you do that. People love to talk about getting their startup going by outsourcing and offshoring the development. This is such an incredibly bad idea. To illustrate, try to count the number of startups you know of that outsourced their product development and had a successful exit. I can count them all on a single hand — and still have fingers left.
Why? If you go the outsource route, it means you’ve decided “This is what our product MUST be” — but when your users help you realize what your product SHOULD be… you’re facing change orders, new contracts, and even trying to reserve some other company’s time. Then you have to deal with the transfer of knowledge and technology when you eventually need to move in-house — figuring out how you can have your internal team support and extend a product that someone else built. If you’re going to contract something out, do a prototype or a microsite or a feature — but don’t have someone else build your core product you, it’s a proven recipe for failure.
In simpler terms, you can’t outsource your core business competency.

# 6 Listen to your lawyers, don’t obey them.
It’s easy to forget that lawyers give legal advice, not legal rules — and that at their very cores, lawyers mitigate risk while entrepreneurs take risks. I don’t mean to suggest that you should be doing anything specifically “risky” or illegal, but that you remember your lawyers will always push you towards solutions approaching 0% risk – which means you may miss many marketing, product, and business opportunities. Good marketing and successful products often push the limits of what is allowed; opening up your company to some amount of liability may be a risk that offers a far greater reward than any penalty you can incur.

# 5 Product Management is not Project Management
This confusion seems to inflict folks in the East Coast and Advertising / Interactive fields. ( if you’re from a West Coast software background, you’re probably immune ). A Project Manager handles resource allocation and making sure that deliverables and commitments keep to a schedule. A Product Manager makes sure that the deliverables actually make sense, and represent/understand the Business Goals, Market Opportunity, Competitive Advantage, and End Users. Product Management is a role — Project Management is a task. Whether you’re working on a startup, online product, or interactive campaign : you need to have a capable Product Manager who is part of the day-to-day checkin process. You also need to make sure to make sure that the people who handle resource allocation understand the roles, responsibilities, and workflow of each person they’re managing — otherwise you have some departments slacking off while others are completely overloaded trying to meet deadlines that were either unreasonably imposed on them, or that they agreed to without understanding the full scope.

# 4 If you have a good idea, it’ll probably get stolen.
This is just how things work – people are often inspired by someone else, or they’re ruthless and copy it verbatim. The exception is when someone else had the same good idea on their own — but then you’ll probably have people trying to steal that idea too, effectively doubling the rampant thievery going on. Arrgh! If you’ve been out in the market for a while and no one is competing with you, you may want to ask yourself why ? Competition doesn’t just validate your idea, it also gives you the chance to better measure the market opportunity and how the audience responds by looking at your competitors. If you stole your idea from someone else you know all this already, so there’s no need to address you too.

# 3 Nothing is confidential. Trust is an arbitrary term. Respect is earned.
The only people that you can trust to keep a secret are your lawyers, because they’ll be disbarred and lose their career. Proving that someone leaked a secret, shared a “confidential” presentation, violated an NDA, etc is not only hard to do, but very costly — which is why people do that all time. If you’re honest and forthcoming in all your dealings, word will spread and you’ll increasingly meet more people who are similar. You shouldn’t expect that anyone will keep a secret just because you asked them to — and you should always be prepared for the worst and expect the opposite.
This isn’t to say that you shouldn’t bother with privacy contracts, but that you should be smart about what you share. The vast majority of potential partners and investors will scoff at an NDA in preliminary meetings, but as your relationship progresses and they need access to more proprietary information — your internal numbers, market research, bookkeeping, etc — negotiating for an NDA is commonplace. You should always ask yourself if you really think this group is serious about working with you, or trying to do market research of their own for another project or investment with a competitor.

# 2 When it comes to a market opportunity, you can trust your gut – the experts aren’t always right.
Two charming examples about how “I was right” and “they were wrong” involved music and net experts telling me that “there will only be MySpace, and no other sites will ever be relevant for music”, and internet experts mandating that social network walls will never come down so portable identity / users will never happen. I’m not trying to flatter myself with this — neither of those companies had a successful exits, just a series of patent applications and legal headaches on one of them trying to keep the products afloat. I do mean to suggest that this is a very common situation — and Bessemer Venture Partners has a quirky take on this, they maintain an “anti-portfolio” of successful projects they turned down. As a word of caution – while the experts may be wrong about your market opportunity… they may be right about the monetization / business viability. Any time someone shoots down your ideas, you should use their arguments to both try and build a better/stronger product, and also to disprove the viability — because they could be right and may have just saved you from a lot of headaches, grief and capital losses.

# 1 Listen to your users — but be smart about how you proceed.
These days everyone says “Listen to your users” — and you should, its a good mantra. However, please remember that you need to analyze what your users say , not just take it at face value. One of my companies makes a lot of product decisions based on user feedback, and we do extensive “User Acceptance Testing” and Focus Groups whenever we want to test out an idea, or launch something new. We always profile / qualify the users who give us feedback to determine what kind of user they are ( ie: super user, industry insider, mass market, etc ) , and make note of both what they say and what they do. It never ceases to amaze me how many people think that they’re a super-user — when they’re barely a casual/incidental user; or how many users say that they really love a particular feature, that it is the most important, and they want more things like it — while their usage patterns and other interview questions show a strong preference and reliance on another feature. Listening to your users isn’t just keeping track of what they say — it encompasses understanding what they mean, discovering what they forgot to say, and working with them to enrich their experience.

# Note

I didn’t learn these all at once, and I didn’t make all the mistakes myself. I did make some myself; others were imposed on my by management or partners. In every situation my life was complicated by these issues – and I can only hope others don’t repeat these mistakes.

OpenID is bad for Registration

OpenID is a really useful protocol that allows users to login and authenticate — and I’m all for providing users with services based on it — but I’ve ultimately decided that it’s a bad idea when Registration is involved.

The reason is simple: in 99% of implementations, OpenID merely creates a consumer of your services; it does not create a true user of your system — it does not create a customer.

Allowing for OpenID registrations only gives you a user that is authenticated to another service. That’s it. You don’t have an authenticated contact method – like an email address, phone number, screen name, inbox, etc; you don’t have a channel to contact that customer for business goals like customer retention / marketing, or legal issues like security alerts or DMCA notices.

The other 1% of implementations are a tricky issue. OpenID 1.0 has something called “Simple Registration Extensions”, some of this has been bundled into 2.0 along with “Attribute Exchange”. These protocols allow for the transfer of profile data, such as an Email Address, from one party to another — so the fundamental technology is there.

What does not exist is a concept of verifiability or trust. There is no way to ensure that the email address or other contact method provided to you is valid — the only thing that OpenID proves, is that the user is authoritatively bound to their identity URL.

The only solution to this problem is for websites to limit what systems can act as trusted OpenID providers — meaning that my website may trust an OpenID registration or data from a large provider like MySpace or Facebook, but not from a self-hosted blog install.

While this seems neat on some levels, it quickly reduces OpenID to merely be a mechanism for interacting with established social sites — or, perhaps better stated, a more Open Standards way of implementing “Facebook Connect” across multiple providers. A quick audit of sites providing users with OpenID logins limited to trusted partners showed them overwhelmingly offering logins only though OpenID board members. In itself, this isn’t necessarily bad. My company FindMeOn has been offering similar registration bootstrapping services based on a proprietary stack mixed with OpenId for several years; this criticism is partially just a retelling of how others had criticized our products — that it builds as much user-loyalty into the Identity Providing Party as it does into the Identity Requesting Party. In layman’s terms – that means that offering these services strengthens the loyalty of the consumer to company you authenticate to as much as it offers you a chance to convert that user. In some situations this is okay – but as these larger companies continue to grow and compete with the startups and publishers that build off their platforms, questions are spawned as to whether this is really a good idea.

This also means that if you’re looking at OpenID as a registration method with some sort of customer contact method ensured, you’re inherently limited to a subset of major trusted providers OR going out and signing contracts with additional companies to ensure that they can provide you with verified information. In either situation, OpenID becomes more about being a Standards Based way of doing authentication than it is about being a Distributed Architecture.

But consider this — if you’re creating some sort of system that is leveraging into the large-scale social network to provide identity information, OpenID may be too limiting. You may get to work with more networks by using the OpenID standard, but your interaction will be minimal; If you were to use the network integration APIs , you could support fewer networks, however you’d be able to have a richer — and more viral — experience.

Ultimately, using OpenID for registration is a business decision that everyone needs to make for their own company — and that decision will vary dependent upon a variety of factors.

My advice is to remember these key points:

– If the user interaction you need is simply commenting or ‘responding’ to something, binding to an authoritative URL may suffice

– If the user interaction you need requires creating a customer, you absolutely need a contact method : whether it’s an email, a verified phone number, an ability to send a message to the user on a network, etc

– If you need a contact method, OpenID is no longer a Distributed or Decentralized framework — it is just a standards based way of exchanging data, and you need to rely on B2B contracts or published public policies of large-scale providers to determine trust.

– Because of limited trust, Network Specific APIs may be a better option for registration and account linking than OpenID — they can provide for a richer and more viral experience.

Why Portability ?

Last week I had the pleasure of meeting up with Elias Bizannes of the DataPortability.org project a few times.

One day he asked me: Why portability ?

This was my answer:

Data portability is a trick, and a really good one at that. It’s not the be-all/end-all solution some make it out to be; while it offers some groups important advantages, to others it is more than threatening to their business concerns. What makes the concept of portability incredibly interesting, and brilliant in some ways, is the necessary balance of all concerned parties to actually pull it off. At it’s core, portability is far less about the concepts of portability and than it is about the democratization and commodification of Social Networks.

Portability is presented as a good thing for users, which it undoubtedly is on the surface. But- and this is a huge “but”… there is the an all important sell-in to the networks — they who actually have to implement ways to users to port in and port out. This offering to networks is complicated, because while porting ‘in’ makes sense, porting ‘out’ is an entirely different matter — and one that may be detrimental to a business. More importantly, while open standards and ‘libraries’ may be free, there are real and serious costs with implementing portability:
– engineering and coding costs : using architects, developers and network engineers to integrate these libraries and APIs
– administrative costs : making sure portability works within current legal contracts, creating new contracts, etc

Small / Niche networks look towards portability as an amazing opportunity — with a few clicks they can import thousands of new users, and for small sites integration can be a matter of hours. Under this premise, it makes sense for smaller groups to abide by the democratic principles of portability, and allow for information to port-out as freely as it ports in. There is no real downside

For Medium networks, or Large networks that have lost their prime , portability is a chance to streamline customer retention methods. By keeping profiles up to date, these networks can seem more lively to new users ( i.e. no more messages that read “Last updated in 2004″ ) — and they offer existing users the ability to browse the same unified & standardized data in a comfortable environment.

The concept of unifying & standardizing data resonates very well with me — I first tried to convince people this would happen in 2006, and in 2009 it has finally started to catch on. It’s really amazing seeing this happen. Before the advent of social networking, networks competed with one another based on their userbase — people migrated from network to network because of who was on it, a mixture of critical mass and critical usage; popularity of online networking, portability and network integration efforts have completely shifted that. Users and content are now the same no matter where you go – and this is increasing at a faster rate. Networks now compete as a layer of user experience and user interface for this data.

For network operators this can — and should — be liberating. The emancipation of users allows networks to stop wasting resources on antagonistic retention methods that lock people into their network… freeing internal resources that can be spent on product improvements, making it easier and better for users to share , connect and interact with others.

Simplest put, networks should focus on making products that consumers WANT to use, not products that consumers dislike or despise yet are locked into using for some reason. Whether they’re pushing for portability or not, virtually every social network or other consumer website is doing this right now, and its sad.

The allure of portability to large networks is an entirely different story. On the surface, portability offers little or no advantage to large networks. As sheppards and herders of massive userbases, networks rightfully fear openness as a way to lose the attention of their users. In deliberate steps, and under carefully controlled conditions, large networks have begun to test the waters… dictating how people can use their network off-site through platforming and ‘connecting’, and offering incredibly limiting export options.

Pundits like to use the term ‘opening the gates’ or ‘tearing down the walls’. I liken this form of tempered portability to ‘testing the waters’ and ‘opening a window’. Large networks are not embracing portability, they’re trying to simulate it on their terms , in ways that best leverage their brand identity and commercial offerings to retain consumer loyalty.

I personally think this is great — but it shouldn’t be called portability or ‘opening up’; this is simply a relaxed posturing.

What I dislike are the grand PR and marketing initiatives around large-scale ‘portability’ efforts. The large firms are all stuck in a cyclical pattern where one group ‘opens up’ a bit more than the last, forcing another group to try and outdo the last. This behavior of metered and restrained openness, and the creation and advocating of new ‘open’ standards that primarily drive the creator’s brand instead of users… this isn’t portability, this is sportability.

Portability and the true Open isn’t about half-assed , ill-conceived standards and initiatives that were designed to create PR buzz and just be open-enough to seem like a viable option. Portability is about getting stuff done with the right product, and putting the user front and foremost. We’re unfortunately left with a market-driven approach, where the large networks are in competition to release the least open standards they can, while still outdoing their competition.

While all of this is happening ‘on the surface’, there is a seedy underbelly to all this. Large networks realized an opportunity that they have all been looking towards and investing in — one which may not be so user friendly. Increased portability and inter-connectedness mean an opportunity for better consumer profiling — one that translates to higher better audience measurements and targeting, offering the chance for significant improvements in advertising performance. Portability offers networks a diamond in the rough. I had spent several years through FindMeOn developing audience profiling/targeting concepts, and quantifying the market opportunity and potential effects — they are huge. This should be rather unsurprising — you may have noticed that the largest proponents of portability efforts over the past few months are subsidiaries or sister companies to some of the world’s largest advertising networks and inventories.

As a quick primer: Social Networks make their money (if ever) either through subscription or advertising models; most are forced into ad-supported models because consumers just won’t pay. Ad supported models are at an odd moment in history right now: users have become so accustomed to ads, that they tune them out completely — dropping CPMs sharply. The transactional model of ‘do a task, watch an ad, repeat’ was overused to much, that it became ‘ask do a task, ignore an ad, do the first phase, ignore another ad, do another phase, ignore another ad’; no matter what networks do, the previous over-advertising has made a generation of users wholly oblivious to advertising — sp some social networks can only get 5-10¢ to show 1k ads of remnant inventory, while others can charge $3 to show the same amount of targeted ads. While that might look like a decent improvement, online advertising elsewhere is doing far better. Behavioral networks can often charge $10 CPM if they hit a user on a content site, and niche sites or strongly branded properties where ads are purchased as a mixture of direct and endemic advertising can generate $40 or more per CPM.

Social networks are left at an odd crossroads today: once a network grows to millions of users, the brand simply isn’t focused enough to be to offer reputable or effective endemic advertising; nor is the property likely to be niche enough to command premium CPMs for placement next to highly relevant content. Networks are unfortunately left with behavioral advertising – which should (and would) be doing better right now, if it weren’t for the overexposure/fatigue that users feel. However, portability efforts offer networks the chance to greatly improve behavioral advertising relevance.

So to summize my answer to the original question posed by Elias…”why portability ?”

> 1. If you’re a small or medium network, you’re going to pick up users.
> 2. If you’re a larger network, having your standard/platform adopted can result in market domination
> 3. If you’re a larger network, you have the potential to improve advertising revenue

Perhaps more than a decade in online business and advertising have left me a bit jaded, but I see little that is particularly grand or noble in these efforts. We’re not talking about curing cancer… we’re talking about making it easier to share photos, comment on things, and improving advertising. For industry professionals like myself , these are really exciting times — but let’s do each other a favor and tone down the idealism a bit and admit to / talk about the factors that are really driving all this. Maybe then we can start taking some real strides, instead of all these tiny little baby steps.

A primer on Web-based Business P&L Operations & Optimization

A Primer on Web-based Business P&L Operations & Optimization

# Overview

A lot of what I have been doing professionally over the past few years has involved the management of P&L sheets ( Profit & Loss ). For those unfamiliar with the term, P&L sheets are basically a management guidebook for charting how your organization, department, or product spends and generates revenue. Not only do they provide you with the figures that you need for annual reports, for budgeting, and for investment opportunities, but they also facilitate your understanding of your financial strengths and weaknesses, and suggest where you may be able to improve.

The first time that I had to deal with P&L sheets was during the infancy of FindMeOn. I was attempting to forecast corporate overhead and monthly / yearly expenses during our investment pitching, and this stuff was *completely* foreign to me. I resorted to recruiting a specialist onto the team to help me handle this P&L stuff. Unfortunately, this person was only a self-described ‘specialist’, and didn’t have the expertise that they had claimed — so the workflow fell on me to compute all of the Cost of Sales for overhead and fixed costs. The ‘specialist’ would then review and standardize the data, which was nothing more than a sign-off as ‘correct’. Potential investors were not impressed. Our spreadsheets were ultimately chicken scratches made by a madman (me!) with math that barely make sense to the untrained … and I didn’t know any better until a VC friend was kind enough to sit me down and give me a stern “talking to”.

For some insane reason, there isn’t much information out there for budding startup folks to reference on this topic.

Before continuing , I’d like to thank some friends for giving feedback on an early draft of this document:

– Avi Deitcher, founder of Atomic Inc, an operations / management consulting service. Aside from standard consulting , Avi routinely functions as an interim CIO, COO, CTO for startups. He’s one of the few people that I know who recognizes that both Microsoft and Open Source products have their places, and who maintains a brilliant blog on operational strategy at http://blog.atomicinc.com/.
– Rick Webb, Partner and COO of The Barbarian Group, recently named Digital Agency of the Year … again. Brands, agencies, startups, and even Internet Week and the Webby’s turn to Rick for sage advice as the guru of all things online. Rick often blogs about smart stuff over at http://www.barbariangroup.com/employees/rick_webb.

A full P&L sheet for a web oriented company is a huge document, taking into account labor, taxes, overhead, infrastructure, vendors, multiple departments etc etc etc… to give a full summation of the company’s activities. To handle this data, I use an Excel document that has a half-dozen sheets with hundreds of columns and rows to account for everything a company spends / earns / generates. The document is never quite complete, nor is it easy to describe. Explaining a full P&L sheet in one article would be a daunting task, and likely overwhelming.

Instead, this article is designed as a primer — very limited and with a focus on the Cost of Goods Sold and Sales with relation to scalability issues on fixed cost items like bandwidth.

If you really are interested in knowing about P&Ls for tech companies in-depth, in the near future you’ll be treated to some articles on this subject by Avi. You might get a series of articles co-written by us as well — we’re currently working out the details.

Before I move forward, I want to admit to an ‘error’, as my friend Avi has pointed out to me: I conflate several concepts in this overview in my attempts to make this more understandable to the non-Business School crowd. I also gloss over certain important items to make this article more digestable. Specifically, those items are:

– Cost of Sales
– I focus on the cost of serving up Web content of any type in regards to bandwidth.
– I do not cover hosting related costs: power, software / cpu licenses, hardware, colocation fees, management overhead, engineering staff, etc.
– I do not cover corporate infrastructure costs: technologists who build / maintain your product, content creators for your product, a sales force, other employees, office overhead, etc.
– Optimization of Cost of Sales
– I limit myself to the sections that I cover above
– Revenues from Advertising business
– I introduce this topic due to how it + COS can affect product strategy, however, this may not be applicable to your particular business

Avi also mentioned a very important differentiation that people should keep in mind, which I’ll quote below:

> You should be discussing COS (cost of sales), which is what you use in service, and not COGS (cost of goods sold), which is what you use in products. Yes, I know, many people mix them up, but not the real experts. Allow it to be part of your education of the readers.

So … Avi nailed it, and I admit it — I constantly conflate COS and COGS. In my defense, it’s because the web projects that I work on tend to straddle the two – and we often might build verticals off the same infrastructure as a Service to institutional clients and a Product to individuals (or vice versa).

# Core Concepts

The approach that I will take with this article is to provide a quick and simple primer for people with little or no business background so that they are better able to understand the costs of operating a web based business. Understanding this stuff is *extremely* important to both startups and to established businesses alike:

– Costs are a function of scalability in userbase and in usage. It obviously costs more money to deliver your product to more consumers more times.
– Costs are a function of product design. Your product, at its heart, is really a User Experience. Depending on the way you have designed the product, the architecture, caching, etc, this User Experience that you want to deliver may be either cost effective or cost prohibitive.

Here’s an important rule for you to remember:
> _Your online property – whether it is a service, publication, network, or marketing exercise – is a product._

If you are able to better understand the costs related to your product, you may be able to alter the User Experience or the Product Design in order to make it more profitable. If your web-property is a service, you need to calculate how much you must charge clients in order to break even. In one of my proposed B2B projects at FindMeOn, we realized that a campaign that we wanted to launch with a non-profit was prohibitively expensive — the client couldn’t even cover the projected bandwidth costs, much less any break even or profit for our labor. In another B2C project I worked on, we realized that we would have to charge users a high monthly fee to be able to provide the service level that we wanted, which led us back to the drawing board to fine tune the product — in terms of web engineering, application design, and user experience / capabilities.

I want everyone reading this article to walk away with at least one more concept ingrained in their minds:

> _ Understanding scaling costs is important, because you may have to rethink your Product Design and User Experience to meet Business Goals. _

It does not matter if you don’t understand the technical intricacies of scaling costs for your product — you can always hire a specialist like me to assist you with that. You simply need to understand that beyond labor and servers and overhead required to produce your product, it costs money to deliver the final digital product to consumers — and those costs vary.

# Understanding Costs Through Real World Examples

I’ve always found that the most efficient methods to both learn and to teach are by example, so to illustrate these concepts I’ll utilize a few real world examples that I have encountered over recent months.

In the following examples, I *mostly* focus on making products more cost-effective on a per-page basis based on ad support. Let me repeat this – two of the examples focus on ad supported business models, and in both I refrain from discussing types of ad optimization / targeting strategies. One could easily offset high costs with very efficient advertising or charge-models that allow for a product with a more resource-intensive user experience. Similarly, these examples are based on ad supported revenue as a concern — and note the word concern. This does not mean that the ads are the only revenue, nor that they are profitable or even cover the associated costs — this simply means that it costs money to deliver content, and this cost is at least partially offset through the display of ads.

## Example 1 – A Publishing Oriented Website

A publishing oriented website wanted to deliver a rich user experience through an innovative and rewarding site design that featured lots of high quality images. I was brought in after the prototype was built to their original specs, and I took the prototype for a test drive to derive potential usage patterns and formulate their COGS. What I discovered was not good for their bottom line.

The original version of the site followed a fairly standard paradigm in web publishing: there were 10 articles on the front page, with each one consisting of a block of text and one or more images. What wasn’t standard was the pageload — the articles’ aggregate content and images ended up at around 1.3MB of bandwidth, and the site’s HTML, CSS, JavaScript and associated images were an additional 600k.

The site’s current revenue stream was advertising based, and there were 2 IAB standard ad blocks — that meant that every unique visit to their website cost them an initial 1.9MB in bandwidth, but only generated revenue from displaying two ad units.

Similarly, the performance for repeat users was equally disappointing. Repeat visits usually receive a discount for cached files – html pages, support files (js / css), images or flash media – however, in this instance, the site integrated new content in such a way that caching was not optimized. I calculated that someone who visits their website after an update would incur 700kb of bandwidth and only be exposed to 2 ads on their front page.

By understanding the associated costs and business interests, I was able to work with their product design team to reshape the User Experience (UX) and the site-design into something that was more cost-effective – all while maintaining a positive user experience.

– The initial pageload was dropped down to a total of 400k
– Repeat pageloads were dropped down to approximately 75k (after content update)
– ‘Full Story’ content was replaced by a mix of ‘Full Story’ with Teaser / Excerpt content — designed to entice people to ‘click for more’ and create new ad impressions through focus pages.
– Images were tiered to include previews on the front page, medium resolution photos within the focus pages, and a new class of image-focus pages that were ad supported and included the hi-res photo.
– The site was slightly redesigned and optimized to lower the size of support files to 250k

## Example 2 – A Niche Social Network with Rich Media

A pre-beta niche Social Network included the usual mix of social networking fare (profiles, friends, messages) along with some premium content (high resolution photos & videos, etc.) for their target demographic.

Social Networking projects can be difficult to balance because of the nature of experience and the advertising that they provide for — different classifications of pages generate different usage patterns, necessary bandwidth and advertising rates. As a very basic overview: Home pages, Profile pages, and published content garner significantly higher rates as they demand more of a user’s attention; “messaging” functionality, modal pages, and gaming/application pages simply don’t monetize well, as they don’t foster user attention – people tend to tune out the ads.

To address this company’s needs, I worked with the product team to achieve the follwoing goals:

– Allowed for higher quality images and a richer user experience on the ‘Tier 1′ advertising pages (profiles and published content)
– Dropped the modal pages to the simplest html possible, while maintaining good design / UX. Migrated as much of the css / js / etc. into external files; aggressively cached and recycled this content onto the function / modal pages.
– Limited the amount of AJAX used. While it affords an amazing UX, AJAX also (typically) means “interactions without ads”. By reviewing and selectively limiting AJAX usage, we were able to drive more Ad impressions without hurting the User Experience.
– Determined ideal video sizes (in terms of bandwidth) to balance revenue goals against. Used interstitial ads and segments to handle videos in excess of this number.
– Decided which products and functionalities could be used as loss-leaders to maintain brand loyalty, and which could not be; set parameters for performance review.

Through simple techniques like those mentioned above, we were able to make the networks’ operations incredibly more cost-effective — making their P&L sheets more profitable.

## Example 3 – A Social Media Campaign for a Brand

Not all web properties are business entities — many are produced as Marketing / ROI campaigns for brands. Because (micro)sites like these rarely have any sort of revenue stream and exist mostly as an advertising expense for brand loyalty, they’re a bit of an anomaly in terms of streamlining costs.

In instances such as this, I work with the Creative Team ( Digital / Production Agency & Advertising Agency) and the Brand’s Internal Marketing Team to clearly define a User Story that best meet the brand’s goals (ie: what is the user’s takeaway / brand impact), to list out any ROI indicators, and to define a User Story that best meets the Creative team’s user experience.

Many people assume that the brand’s Creative Team and Internal Marketing Team want the same thing — that which is best for the brand. This is really a half truth — while they are both indeed focused on what is best for the brand, the Creative Team almost always thinks about this within the scope of the current Campaign / Project, while the brand’s internal marketing team is also thinking about long term growth and how the current campaign or medium ties into other campaigns or mediums that are simultaneously being marketed to users.

When streamlining a branded social media campaign, you need to first know what the brand’s own goals are (how many people they want to reach, what is the messaging) and then what the Creative Team’s user experience goals are. You are then able to create a technical user story, map it to a clickpath, calculate the cost of each impression, and suggest optimizations that won’t derail the Creative Team’s UX and Campaign while simultaneously meeting the brand’s goals.

# Calculating Costs

To calculate costs for a web-based product, I break things down into 3 phases:

– Define Typical User Paths
– Profile User Path Elements
– Input everything into generic spreadsheet software that can handle multiple worksheets which reference one another.

## Define Typical User Paths

As a start, let’s clearly define what I mean by the term user path:

> A user path (in this context) refers to how a typical user will interact with your web property during an engagement; it is the finite list of pages they encounter in a single visit.

The word / phrase userpath can refer to many things in different contexts – this is what it refers to within the scope of this article.

Note how I said typical user. If you’re a startup founder, I may be about to break your heart…

The internet isn’t a Field of Dreams. While it’s nice to be hopeful that “If you build it , they will come” – you need to realistically know that people won’t be flocking to you in droves like you’re Kevin Costner with a mystical ballpark with the children of the corn. User’s are going to trickle in, and more importantly they’re not going to have the usage patterns that you either want or expect.

If you’re reading this article, here’s a surprise — you’re not a typical internet user. (What a great self-selecting audience I have here!). You may be a “Power User”, a “Proficient” user, or some other marketing-speak styled distinction about someone who works in digital media or uses the internet as their profession. Regardless, it doesn’t matter what you *are* … it matters what you are *not*, and you are not typical nor emblematic of internet users as a whole. You think of, use and interact with websites very differently than most – and may have problems with grasping typical user patterns. Let’s try to fix that.

A realistic goal for user experience is this:
– A user will visit your website 3 times a month
– Each one of these visits includes viewing 3 pages

(it is realistic, with a good product, to achieve 5 visits of 9 pages each, but that depends entirely upon the product – these numbers are a safer starting point)

Unless you’re Facebook, Twitter, or some other “Crack App” of the moment, these numbers will apply to more than 90% of your users. I’ve rarely found more than 6% of site visitors to be ‘active’ users for the measured period — and keep in mind that even the most successful web services tout their active memberships to be only about 25% of their registration base.

Fanatical users are a limited bunch — and while your application may be designed for their experience, most of your web traffic will come from casual users.

As a rule of thumb, I aim for numbers that represent the 80th percentile of users — ie, the common experience shared by 80% of your users.

It’s really hard believing numbers like this. I still don’t entirely… but I’m “in the industry” and think of technology differently than most others. I cannot imagine not checking into any of my major apps a few times a day — and I’m a “member” of dozens more websites than non-industry friends. But I also admit that I too have that old Friendster account that I use for 10 minutes a few times a year, a Perlmonks account that gets me for a few 3-page visits when troubleshooting code each month, and I even jump onto MySpace for a few minutes every now and then. At times I’ve been a fanatical user of each of those services — now I barely think of them in passing.

Truth be told, that’s how most people are… and for every person that reloads your website 100 times in a single day, there are dozens that don’t get past the first page.

If you want hard data on this , I suggest the following

1- Look at your own server logs
2- Look at any quantcast / compete / alexa score

You’ll see that the numbers come out the same. You’ll likely also see another characterization — I’ve found that there is a sharp transition between the casual user and the diehard user. Its fairly amazing … people either love your product or they are ambivalent.

Getting back to the userpath…

With realistic numbers in mind, you need to create some sample clickpaths that entice those 80% of your users to interact with your site. If you’re feeling fancy, make another userpath for the powerusers that you hope to attract. Generating the userpaths is dead-simple — you just click around your site, assuming that you visited the index page, or got to it from a search engine / ad — and try figure out what users are likely to click on.

Even better — if you’ve already launched (or just don’t want to estimate / forecast these numbers), you can install some tracking & analytics packages then generate a report based on real data. This is my preferred method, however, it cannot be done before there is a beta — and this sort of work is really beneficial when you’re developing an initial product. For pre-launch products, I use information from sites like Alexa, Compete and Quantcast to determine usage patterns of similar-in-functionality or competitive landscape websites.

Userpath’s aren’t very complex — here’s one from my publishing example above:

Publishing site userpath

1 User visits homepage
2 Clicks on Article Focus
3 Clicks on Video Focus
4 Clicks on Related Content button

## Profile User Path Elements

You can profile User Path elements relatively simply with a wordprocessing / spreadsheet program and a browser like Safari or Firefox. Note: Firebug in Firefox and “developer mode” in Safari make this much easier.

For every page in your userpath, you should create a new page / worksheet to include the page elements. I’ll break them down into 5 categories:

– The actual Page HTML
– Included Images
– External JS Files
– External CSS Files
– Rich Media (flash, video, etc.)

I’ll also break down each category into usage classifications:

– is only used on page
– is used on some pages (ie: section pages)
– is used throughout the site

The reason why we classify the elements is that it allows us to understand and to forecast caching for re-usability.

## The Spreadsheet

I use the same basic setup for every spreadsheet that I build: one sheet for the userpath, which references a sheet for each page in the userpath.

The userpath sheet really just contains a listing of all the different pages in a userpath, along with some functions / calculations. We include references to data contained in sheets that are dedicated to breakdowns of each actual page in the userpath. Each page in the userpath gets its own sheet to handle a breakdown of all the content contained within.

### Sample Sheet: Userpath

This is what a Userpath worksheet in my spreadsheet software usually looks like:

Page Type Delivery Bandwidth(per page) Impressions (per Gigabyte) CPM @ 50¢ # of pages in visit delivery bandwidth per visit
Index Page (ref) (func) (func) ? (func)
Article Page (ref) (func) (func) ? (func)
Focus Image Page (ref) (func) (func) ? (func)
Focus Video Page (ref) (func) (func) ? (func)

This userpath sheet lists the different ‘classes’ of pages that the users will encounter, along with their calculated bandwidth and the number of times each page is encountered — per-visit. This turns an almost intangible user experience into a graphed representation.

Here is what you need to know about the specific columns:

– we reference the ‘Delivery Bandwidth’ per page type to date in its own sheet (see below)
– we calculate the ‘Impressions per Gigabyte’ – or how many impressions of this page fit in 1 gigabyte of data
– we calculate the CPM for delivery with bandwidth priced at 50¢ / Gigabyte – how many impressions can i have if bandwidth costs 50¢ / gigabyte
– we have # of pages per visit as a number. We can quickly change this when trying to optimize new paths. This is traffic forecasting. Some pages might only get 1 view, while others receive 20 views.
– we calculate the amount of bandwidth used for this pagetype per delivery

I typically compute a few more values on this page too:

– Subtotal for all bandwidth per visit
– how many site visits you can get per gigabyte
– how many site visits you can get at different CPM levels

The userpath page acts as both an index and as a conclusion — mapping out many referenced sheets and summarizing them as well. The only data entered / manipulated on this page is the number of pages per visit — everything else is calculated based on data referenced from other sheets.

The usage of many references, lets makes this live document. As the site progresses or as I start optimizing (by altering planned site traffic and directing users onto different types of pages to consume the premium content; or adjusting the bandwidth on each page) the underlying costs and associated income are quickly measured. This approach makes it easy not only just to gauge performance, but also to set goals and to devise new ways to meet these goals.

### Sample Sheet: Page Type

For every classification of page, I create a worksheet that reads a little something like this:

Page Type kilobytes re-use discount total cost (in K)
HTML Size ? 1 (func)
Images Used Across Site ? .35 (func)
Images Used On Parts Of Site ? .7 (func)
Images Specific To Page ? .95 (func)
JS Used Across Site ? .35 (func)
JS Used On Parts Of Site ? .7 (func)
JS Specific To Page ? .95 (func)
CSS Used Across Site ? .35 (func)
CSS Used On Parts Of Site ? .7 (func)
CSS Specific To Page ? .95 (func)
Flash Used Across Site ? .35 (func)
Flash Used On Parts Of Site ? .7 (func)
Flash Specific To Page ? .95 (func)
Images Used Across Site ? .35 (func)
Images Used On Parts Of Site ? .7 (func)
Size of Video File / Stream ? 1 (func)

Huh?!? You ask. This looks confusing! Well, that’s because it is! Unless, of course, it isn’t! Or maybe it is? N’est pas?

The above chart is really just a template that I use for every classification of a webpage in a userpath.

Classifications are pages that are ‘similar’ – either built off of the same general template, or that have similar size & function on a sitemap. In publishing sites, you would have classifications such as: Index, Archives, Entries, Entry-Photos, Entry-Videos, Ancillary; in social networking sites, classifications would be: Dashboards, Profiles, Modal / Messaging, Ancillary, etc.

For each page classification, I break down the client-downloaded-assets into specific components: the actual HTML, external images, js, css, and rich media. I then classify each of those components by their usage type: items that are loaded only for that page, ones that are used throughout the page-class, or ones used site-wide.

By tallying up each type of site content, and classifying its usage type, we are able to get an idea of what is actually sent to the browser each time. Not to mention we get a good reference on how to better optimize the site further down the road.

Let’s think of a real-world example: imagine a publishing site where you read an article about a celebrity and view a couple of photos of said celebrity at a press function. On this ‘Article’ classification, we can think of page assets as such:

– Used Across Site

The main css / js files; the site design template images, etc.

– Used on Parts of Site

‘Section’ images on the site design – like ‘active’ status on the nav, in-site promotions, category specific design files, any special js / css used for this template, etc.

– Page Specific

The HTML, the photos of the celebrity appearing only on this page

For those unfamiliar with caching, here’s a quick overview: every time a user visits this website, they download the css / js files unless there is an active cache. Once that file has been cached, it remains until some sort of expiry (closing the browser, waiting 10 months, waiting until the file has changed on the server, etc.) and will not download again. If a user clicks on the index page, then a category page, and then two discrete articles, they will only download the uniquely different files one time each.

In order to get these numbers you can do one of two things:

1. For an active site, you can use server logs and analysis software (or Regular Expressions and some clever scripting) to get accurate numbers for your 80th percentile
2. Safari and Firefox both have features that report the media on a page, along with their size and URI (uniform resource locator). These reports can be coerced into a spreadsheet page with some light scripting or a bit of “copy and find / replace fun”. Using these tools, it becomes fairly simple to figure out what the file loads are, and where the current and potential re-usabilities lay.

To account for caching, I use a discount rate — which I have shown in the above example as 35, 70 and 95. This means that *on average* these items are only served from our server on 35%, 70% and 95% of the requests. Note how I use the concept of served – the numbers that we care about are how many times the content is fetched from our servers, not needed. Users have their own caches running locally on their browser or machines, networks / ISPs and offices often have shared caches too – so there are many times when the browser may request a file, but that request is served by a server / service that we don’t pay for.

# A Fun Note On: Bandwidth

A lot of people wonder “How much should bandwidth cost?”

While the answer is always a joking “as little as possible”, the actual question is a bit more difficult to answer.

Every service provider charges for bandwidth using a different price, and a different method.

It’s not uncommon to see one hosting company to charge .10¢ for a gigabyte of bandwidth, while another charges $3 or more per gigabyte.

Bandwidth is also charged by different units.

Managed hosting companies (they provide hardware and you pay monthly for exclusive or shared usage) tend to charge in terms of (giga)bytes — or for each actual bit of data transferred. Co-location facilities (you provide hardware and they provide a place to store it) tend to charge in bits (mega) — in which you pay for the size of the ‘pipe’ needed to deliver your content 95% of the time. Content Delivery Networks (CDNs) use a mix of the two; it’s a crap shoot to guess. Some firms also charge connection or per-request fees for clients accessing the data.

I personally like to price things out at the exact vendor costs that I am using at the time, but when projecting ‘expected’ expenses, I like to use 50¢/GB as a good ballpark. It’s the midpoint of most companies that I work with – across data-centers, hosting firms, and CDNS – so I am able to easily obtain a figure in the range in which my expenses are likely to run.

I also like to convert bandwidth to Gigabytes for simplicity. That’s how I serve content, think about storage and handle CPMs. Call me lazy — but it’s easier to standardize all of my bandwidth oriented concerns using a singe unit of measurement.

So how do you standardize units ?

## Quick Reminder: Bits and Bytes

– there are 8 bits in a bite
– when abbreviated, a lowercase b means bit and an uppercase B means byte.

## Quick analogies for pricing models :

GigaByte based pricing – this is like the water company charging you each month based on the total number of gallons you consume.

Connection-Fee charges – In addition to the per-gallon charges, you’re also charged a fee every time you turn the faucet on.

Throughput charges – Instead of billing you for the water, you’re billed for the size of the pipe your house needs to receive all its water.

## About : Throughput based pricing

Many companies charge bandwidth based on throughput — the number of bits delivered in a month. Your throughput is measured at the datacenter every 5 minutes or so, and logged to database. At the end of the billing cycle, they use something called 95th percentile billing — they drop off the top 5% of usage spikes and charge you for the size of the pipe needed to deliver the rest.

If you do a bit of math, you’ll learn that a 1 Megabit connection has the theoretical potential to deliver about 320 gigabytes a month. That’s at full-speed nonstop, which is unrealistic. In practice, you’re likely to see a 1 Megabit connection drive somewhere around between 150 and 200 GigaBytes a month between standard usage and the periodic bursts. So when you need to convert the monthly cost of a Megabits/second pipe to GigaBytes, use 175.

# A Note On: Optimizing

Once you are able to estimate your initial page bandwidth, and use a spreadsheet full of related cells, you’ll have a good tool to utiliaze when setting goals and working to achieve them:

– Your page class sheets will give you a good idea of what needs to be optimized, and what does not.
– Your userpath sheet can have page-visits for each item altered to show different ways of herding users
– You can make multiple workbooks, userpath sheets or page-class sheets to test out different concepts.

My personal tips:

– I often clone a page and experiment with how I can better leverage cached images on the site to try and lower the bandwidth cost
– On my userpath sheet, I’ll often have multiple user paths that each look like that first table. They’ll reference different pages, possibly / often alternatives to existing pages, and also have different numbers for pages-per-visit, as I alter different potential user experiences.
– On ad supported websites, on the main P&L sheet, I group these bandwidth costs with the costs for the server infrastructure that supports them AND the ad revenue generated since they all affect one another. I generally leave the labor out of this equation and keep it with the corporate labor costs.

# A Note On: Advertising

I use the phrase “Ad supported” quite a bit, so I feel the need to remind you about the types of advertising revenue that you can realistically expect to generate. Realistically is the key word.

If you’re a niche site and you have a good team driving your company, you could be aiming for a $25 effective page-CPM (the aggregate of all ads on the page) when you are functioning at your peak. While you may command $75-100 as an ideal page-CPM rate, after you deal with sales and inventory concerns (you’re not selling 100% of your adspace at the numbers you command) your average will be around $25 — if you’re lucky.

There are plenty of other costs associated with running your company too – you have to pay for content production, licensing fees, your office space, lawyers, accountants, marketing, etc etc etc. Making your ad supported operations as efficient as possible is really important, because you not only have to cover the costs of delivery — but of production, overhead, and then hopefully profit as well.

Here’s another Startup reality check-

For all other sites — and especially for niche sites on non-publisher content pages — you should aim for a $2 CPM and feel happy if you get that. You should be ecstatic if you make anything more. Even with the most advanced targeting and tracking, you might bump up to $4-8 if you’re very lucky. The industry standard right now is well under $2 for ads that very few people notice on websites that very few people care about. Unless you have the largest PR initiative behind you, that is going to be your web project.

Many large social networks are selling overflow / non-targeted CPMs in the .10¢ to .75¢ range, and only commanding more money from their less-used advanced targeting buys or site sponsorships. The more targeting that you either have or push for, the more infrastructure that you need on your side — either specialists to handle sales, or technologists to implement 3rd party tools. So while it is possible to make more, you will absolutely need to spend more in order to do so.

In essence, you will not get a $25 CPM out of the gate for premium content pages, and you will likely be making under $2CPMs for everything else.

# A Note On: Application Priorities

I noted before that 90% or more of your users will likely be casual — account for about 3 visits of 3 pages each per month. What I should also add is that it is not uncommon to see 70% or more of a site’s traffic driven by one or two URL’s on your site — such as the homepage, a profile/dashboard page, or a latest-news page.

It’s important to use analytics tools to figure out exactly what these pages are, and be extremely aggressive in your optimization and monetization strategies — performance on these pages can make or break your business.

It’s also important to identify these pages for prioritization in your Product Management and Design. These pages can drive most of your revenue, and set the tone for your user experience – leading to stronger connections with your userbase. You will often want to prioritize the integration of new user and management features into these pages.

You’ll also want to keep constant metrics on your application’s usage — your userbase demographics will change as you scale, and users often shift their own patterns as your competitive landscape changes.

A company I recently worked with realized that 72% of their traffic was coming from 3 pages that shared the same admin tool — despite earlier patterns of heavy usage on another part of the site. This was because of both from a 20x scale in users from their startup to production phases, and the introductions and failures of other players in their space. By keeping tabs on their metrics, they were able to fine-tune their product. Instead of spending their finite cash and resources to enrich the overall user experience throughout the site, they decided to focus on streamlining the management operations for the new high-traffic pages. The result? Those pages became even more popular with users and cheaper to maintain — providing increased revenue to overhaul the rest of the site, and a larger/more loyal userbase to advertise the new features to.

# A Note On: The Numbers

This is not an exact science.

Nothing in this article is exact. Many of the figures are based on forecasting usage patterns. When actual server logs are used to show actual user / download patterns, their numbers vary — and are accurate for their site and their site alone.

The point of this exercise and skill is to determine the likely *average* usage patterns, and to optimize a web service to meet them.

These figures also do not take into account a lot of VERY important ideas:

– You may need to scale servers — add additional servers to offset CPU and memory stress. Each additional server can cost between $20 to $1000 a month to operate
– This does not include any sort of labor costs — no engineers, content producers, management, or anything of the like. You will often need to scale your labor to accommodate more consumers.
– There are tens, hundreds, thousands of obvious things that I’ve omitted from this overview

# A Note On: Startups – Being Scared (or Not) and Entrepreneurial Spirit ; or How much should you budget ?

I wrote this article to talk about a concerns for companies that are focused on generating revenue through web-applications. I did not write this article to talk about making cool and kick ass webapps.

If you’re trying to build a business – one with employees and healthcare and offices and a water cooler – you need to be thinking operationally and about income streams. You can have the world’s greatest webapp, but if it can’t support itself or its staff — you just don’t have a business, you have a hobby.

There’s nothing wrong with that. Not everyone is trying to build a company on the outset. If you’re a startup person, there are some good – and proven – ideas in this article that can help you cut costs and be more efficient. Are you trying to build the next kick ass webapp for fun or in your spare time? If so, you can forget about 90% of the things that I’ve written about in this article — your application can probably scale to ~100,000 users without you having to worry about bills.

My friend Rick Webb recently blogged about Dodgeball.com’s demise — their product was great, the users were loyal, and the costs were low — even he could cover them himself. But the company wasn’t able to be financially self-sufficient, so the company that acquired them shut it down. This is probably a bad example, and has more to do with the parent company being inept in its handling of their acquisition, its technologies / capabilities – and most imporantly its team.

If you’re building something like Dodgeball just to build it, then do it. Don’t look back. Have fun, learn some things, and maybe turn it into a company down the road when you meet the right Sales and Operations team. But if you are a company, or you have a technology that you’re trying to turn into one — think for a moment on how you can be profitable, and if you can even achieve that.

I lost a lot of money with my startup FindMeOn, and I learned some tough lessons: We focused on product development when we should have focused on business development; and we focused on business development when we should have focused on product development. It sounds like a Catch-22, but its a simple idea — much like a seesaw on a playground, you need to find balance between what you want your product to become and what you need for it to be a source of actual income. So instead of capitalizing on the technology we invented to integrate social networks and port data, I found myself pulled in directions every-which-way — trying to make the product better and to create revenue streams to keep things moving forward at once. Small teams can’t handle sucessfully, you need focus. The result was us stopping work on the product, and just amassing an army of lawyers to handle a portfolio of patents.

If I knew in 2005 what I know now, things would have worked out very differently.

When Rick first gave me feedback on this, he recommended that I address the ballpark costs involved in covering a startup – so I wanted to conclude on that note.

I thought about the issue for some time, and came with my final thought: Getting a startup off the ground will cost you everything you’re willing to put into it; nothing less, nothing more. If you care about your business, you’ll find a way to make things work — or fail trying.

For example: If you are a small firm, or a beta product, you’ll easily find a host that gives you ‘unmetered’ bandwidth until you serve 500GB for well under $50 a month, and not much more after that. If you’re a large firm with an inept IT department, you’ll get the same package for more than $500/month from a ‘premier’ host — one that also charges you $5/GB for bandwidth over your allocation.

Technology and service options have grown so much in the past 4 years — that you can find ways to make anything work. In 2005, we spent nine weeks coding a scalable image and document serving / storage system from scratch; in 2006, that was rendered completely-useless by Amazon’s S3 – which not only handled serving and storage , but backup as well. Their solution wasn’t just better than ours — it was far cheaper even before adding in all the engineering, maintenance, and hardware that we no longer needed.

Of course, many people think that Amazon charged “too much” to serve images and was unreliable in general. Panther Express came out of nowhere — founded by the original DoubleClick team — and offered CDN services for the masses, targeting disgruntled Amazon customers as one of their demographics. Panther quickly rose to prominence – bringing an enterprise level CDN like Akamai to the masses at a much better price point and integration cost. Today there are also companies like Slicehost that offer virtual servers for $20/month — versus $200/month for similar specced dedicated hardware not too-long-ago.

So if you’re a startup — where there’s a will, there’s a way. Covering your fixed costs will almost never be an issue. If you do ever have this problem, you’re probably wildly successful and in prime position to quickly receive funding or a selloff based on your userbase alone — and you can worry about becoming profitable then. ( On the flip side, you might also have an app that wastes bandwidth or is not realistically profitable, and should rethink your product and approach ).

# A Final Note

Just to illustrate how crazy numbers can be: In a single month, friends who had a design firm spent $7,000 on their web hosting charges; other friends who run a global advertising network spent under $2,000. I’ve seen amazing amounts of traffic served on $20 virtual servers and on $150 dedicated servers.

If you are a startup and you want to build an application — just find a way to do it. It exists. I assure you. The costs of scaling your product are negligible — the only substantial costs will be for your own labor and time. If you get to the point that you need to worry about costs optimization, you already have a VC check in your hands.

If you are a startup trying to be profitable / sufficient from day 1 — and there are a lot of you out there now — your tech costs should be well under $1,000 a month if you have under 100,000 active users. If you have more than that, optimize your application and get an advertising specialist or new VP of business development in your office — because you should be making money. You need to think about these numbers not for daily operations, but for making yourself look profitable for investment or acquisition.

And finally – if you are a brand thinking about running a digital project, think of your scaling costs in terms of CPMs — how much are you willing to pay per user impression for the visit? You’ll probably have to use some sort of ‘premier’ web host and CDN, which will cost between $1,000 and $3,000 on setup / monthly fees alone. Beyond that, you’ll be paying significantly less per CPM to maintain a good relation with your customers than you would on advertising CPMs to attract new ones.

Exploring Affiliate Marketing

A few of the startups I’m working with are going the Affiliate Marketing route to create new revenue streams.

As an experiment, I’ve decided to try and integrate that on my blog as well.

People who read me tend to have money and like shiny things – so why not put a big ‘BUY ME NOW!’ button here? Just to the right…

Thoughts on Open Source, Open Standards, and Online Advertising : Data Sportability Pt 2

In 2005 I started FindMeOn after noticing some serious flaws in the use of OpenID. The base of the system grew out of the identity & publisher syndication components of a music website I had been working on with friends for a few years. When the music project went on hiatus, I decided to flesh out the identity system into its own entity. I wanted FindMeOn to be a full-fledged standalone / open source project to allow for secure online identity management/syndication because I truly cared about that, and no one else did at the time. On the flip side, years in marketing taught me the marketing value of information identity could deliver — so the system was designed to create a revenue model that gives brands & ad agencies better insight to their consumer distribution across networks.

From late 2005 to mid 2006 I met with dozens of agency execs, online experts and VC investors to vett my concept, and I learned my monetization scheme wasn’t enough — everyone required a higher monetization potential from it. By April 2006 the answer was clear: FindMeOn was not just going to offer cross-site information for dispersion intel, but for social demographics and online advertising… selling targeted advertising or media planning services.

I spent the next few months learning how the entrepreneur in me could reconcile open source beliefs with unadulterated american capitalism.

Maybe I’m wrong, won’t you tell me if I’m coming on too strong
==============================================================

With this in mind, I offer the following industry commentary. Keep in mind that this is pure conjecture from research and analysis; I can offer this only as insight not fact — but I am certain that it is accurate.

As I mentioned in my followup to DataPortability Podcast #5, The Facebook management team was absolutely brilliant in concepting their API strategy. I will easily credit them with getting the whole portability thing rolling by releasing their API – which set the precedent of a platform API that users and developers would adopt en-masse. It was working so well, that Facebook was gaining tons of user activity within-site, and gaining new developers to build applications FOR them. Facebook was also becoming a much bigger threat to their competitors than previously thought…

MySpace and the other major social networks suddenly had an entirely new level to compete on. While these other networks were constantly shifting between friend & foe with third party developers ( blocking their widgets, announcing partnership deals, repeat ), Facebook – who previously kept all widgets off their network – suddenly had a dedicated & robust *platform* dedicated to widget/app developers that was the darling of the internet community. Facebook was suddenly making developers happy, users happy, and — most threatening of all — showing a giant head start in this new ‘economy’ by seting the bar.

Lurking in the background was a stealthy figure who was realizing they would soon need to compete against Facebook: google. Why? Well, the search/advertising giant wasn’t worried so much about Facebook as a Social Media competitor, but what intelligence gleaned from Social Media could power — online advertising.

Here are some neat facts about the social media advertising market in the US in the Summer of 2007.

– Social Media advertising is the largest growing segment of internet advertising — as its the largest growing segment of the internet ).

– The 2008 projections for social media ad spends are around 800MM; the 2009 projections are 1300MM; and 1900MM in 2010.

– Social Media is probably the worst performing sector of online advertising. As an illustrative figure: its responsible for 90% of impressions, but only 10% of revenue.

A well optimized online publisher, like the New York Times, commands hefty eCPMs ( effective costs per 1000 ad impressions ) – upwards of $20; with a rumored $85 eCPM page monetization. MySpace is somewhere between .10¢ eCPM for a generic buy to $2.00 for an ultra optimized query — not very impressive.

Facebook has long been one of the best monetized social networks, consistently demanding eCPMs in the $1.50 to $8 range. A rumor was circulating in the Summer of 2007 that the Palo Alto firm was developing an off-site advertising network to display ads across the internet based on cookied data off their users. This is what Google was scared of.

As more population demographics adopt the Facebook platform, this rumored ad system increasingly jeopardized Google’s position as the internets premier ad network. Even more troubling, Google knew that Facebook had the talent and power to develop this competition — they weren’t just a large firm, but recruiting the new employees Google wanted first, and even hiring key staff members away from Mountain View.

There Ain’t No Second Chance Against The Thing With Forty Eyes
==============================================================

Google and MySpace had to respond – and act fast. So they come up with a daring little plan: they teamed up together to sketch out a competing platform, roped in a couple of other networks who were threatened by the burgeoning Facebook, and wanted to beat them with sheer numbers. Since Facebook had a ‘closed’ platform, Google decided to ‘open’ things up to foster more adoption with tons of “open standards” and “open source” — even calling their system ‘OpenSocial’. Through the use of the word “Open” everywhere, and multi-network capabilities, the new alliance of ‘once-enemies , now friends’ gathered against the mighty Facebook would hopefully woo more developers to the ‘OpenSocial’ market — stagnating Facebook’s platform growth.

As a quick side note, Google’s OpenSocial project kind of sounds like a whole lot like FindMeOn’s “Open SN (Open Social Network)” in both function and name. One would think their army of patent and trademark lawyers would have ‘googled’ their own product ideas for clearance…

Since everyone was trading punches over being more open and more awesome than the other guy, Facebook quickly had an equally brilliant reply — they subsidized free hosting through a partnership program with Sun and Joyent, started giving out cash grants to spur development, and their backing investors started a new VC fund focused solely on Facebook applications. Take that! said Facebook as a sea of developers eagerly built products for their platform.

You’ve got to roll with the punches to get to whats real
========================================================

Over the next four months, a plethora of large scale announcements would come from Google and Facebook as new players jumped into the fray.

Google decided to make OpenSocial a non-profit venture to bolster PR, even pulling in Yahoo to the relaunched initiative; the announcement was met by the praise of many tech-pundits, who talked about how wonderful the concept of a non-profit was. Predictably, everyone likened the initiative to civic minded non-profits – and none suggested the more relevant correlation: non-profit registered industry lobby fronts like the ‘National Smokers Alliance’ or ‘Global Climate Council’ that pipe tobacco and oil dollars into misleading consumer campaigns. Who can forget 2007’s hit webformercial “Carbon Dioxide: Some call it pollution, we call it Life”.

Nothing short of a ‘pissing match’ started between the large tech giants. In an almost round-robin fashion, each company would announce a new product that somehow ‘outdoes’ the last announcement from a competitor. Facebook expanded privacy controls, Google announced a ‘Social Graph API’, Microsoft jumped in with their ‘Windows Live API’, MySpace teamed with Yahoo and eBay to do ‘Data Availability’. Every other week, a new batch of PR announcements and partnerships are released — all accompanied by a hastily created set of documents, big-name backers, and incorporating one or more open standards while creating a few of their own.

These initiatives have been so hasilty and half-assed designed, that I wouldn’t be surprised if we soon learn that half of these products came solely out of the marketing departments, and the technology teams never saw anything until after a press announcement.

Today, *everyone* has an Open Standard and an Open Platform — myself included — which begs the obvious question: what good are open standards and platforms, if everyone has a different one? And are things really open when their main purpose is to further a proprietary system?

Perhaps more importantly – how many of the tech giants have collaborated with third-party developers to define these new Open platforms?

The industry’s modus operandi seems to be

1. BigTech decides what to open up and how
2. BigTech invites top widget makers / networks to be launch partners
3. Third party developers are then told “So this is how you’ll use it. Welcome to the new status quo. Happier?”.

Now I could be wrong — I’m three thousand miles removed from the SF bubble where all the ‘Open’ decisions are made — but I’ve yet to hear of any interactive agencies, dev shops, or brands who build/finance most of the ‘widget’ development being included in these conversations. I’ve been meeting with them non-stop to try and rectify that — and as of yet, no one I’ve met has even been polled by a large ‘platform’ for their input.

Thoughts on Open Source, Open Standards, and Online Advertising : Data Sportability Pt 1


Note

This is the first part of a series that I have been working on for a few weeks. The current combined text is 6,000 words – so I’m releasing it in sections.

Apologies to those who have been expecting this sooner — I originally wrote this in early/mid May, but have been busy with business too much to work on editing.

Preface
=======

I’ve been using a new term when I talk to people of the internets: Data Sportability. I use it to describe how sporty and flashy ‘data portability’ is, and how that flashiness and sportiness is the true essence of this new ‘movement’ (note: I mean the general movement of data portabality, not the Data Portability working group.

The utopian pitcure of interconnected networks… with data sharing, integration and portability abound is indeed something beautiful — but its just a veneer. Beneath the surface, or more aptly ‘under the hood’, it’s a vicious fight over who has the fastest car, the biggest engine, the latest fuel-injected cooling systems… you get the idea.

Like most services on the internet, Data Sportability isn’t about the end user, it’s about the big networks and service providers… and who has the coolest car.

I’m hoping it picks up, so people other than my friends know what I’m talking about.

Interested? Read on!

Too hot to handle
=================

Unless you’ve been living under a rock, “Data Portability” is hottest thing to hit the internet since the Paris Hilton sex tape… and as we all know in Paris’ own words, “That’s Hott!”. Also very much like Ms Hilton, portability is nice and pretty on the outside, but deeply troubled on the inside.

Here’s a quick history lesson-

Two years ago, the internet was a pretty different place than it is today. There were only a handful of major social networks, and most people ( users, pundits, experts ) looked at minor networks, niche ones (example: CafeMom), and social applications (example: LastFM, Flickr) with utter contempt. The major networks were also doing everything in their power to ‘lock’ users down into their systems — completing blocking images/videos/widgets etc from appearing on user pages whenever a service like YouTube or PhotoBucket had a popularity spike.

Thanks to technical innovations that lowered the barriers to entry, and whitelabel services like Ning and KickApps, everyone and their mother has a social network of their own today.

To maintain the loyalty of their userbases in then tens of millions, all the major players are quickly adapting with standards, platforms, and press releases touting how ‘open’ they are. Companies that recently charged users through subscription models to access their walled gardens are suddennly embracing openness, and pushing for new paradigms in the industry. And the pundits and network evangelists… they simply *love* talking about integration, open standards, and data portability ( as either the base concept or the new standards group ‘DataPortability.org’ ) — but that only raises the obvious question: why have so many groups gone a complete 180° turn?

The popular response ( aka: the public relations soundbite ) is that the networks are now proudly putting their users first; that we’ve all grown together, learned from our mistakes, and the old marketing department heads / decision makers have been replaced with new evangelists… embracing open standards and cooperation; Rainbows are everywhere and unicorns have magically appeared, frolicking in the streets.

Kool-Aid seems to be the most popular drink around.

It’s all about the benjamins
============================

Let’s be real for a second- the social internet isn’t about connecting people, it’s about monetizing their experience. Anyone who tells you otherwise is lying or stupid.

Once upon a time (or just nine months ago), Social Networks weren’t all that different from cellphone carriers in the way they operated — they locked you into a contract/network, made it a pain-in-the-ass to communicate with people on other networks, and basically held you hostage to not leave. If you manage to finally figure a way out of their maze, they magically offer you every single premium imaginable to stay.

A few years ago US the cellphone industry got regulated – users could finally port their phone number from one carrier to another. Citizens embraced this as finally seeing progress… but they didn’t realize it was at the expense of some shady stuff behind the scenes thanks to line items and back-room deals from industry lobbyists. After years of resistance the networks didn’t actually ‘cave’ in… they knew they eventually *had* to give in, so they figured out ways to handle it on their terms — protecting their end interests.

Data portability is pretty much the same, perhaps a bit more duplicitous… as a ton of extremely corporate interests are neatly packaged in a pretty little user friendly PR campaign. Data portability isn’t about empowering a user, or promoting open source and open standards — it’s about data mining, user tracking, and advertising efficiency.

I know because I’ve been there, I’ve done that; I helped write the playbook. My company FindMeOn was one of the first out of the gates selling the ‘Data Portability’ illusion — and over the past 9 months, every single big tech firm has gone through the exact same growing pains and learnings curves we did: they’ve released the same exact technologies, in roughly the same orders, even using roughly the same names.

So I’m going to talk about what FindMeOn was really up to all along, and explain what the new players in this arena are really doing — it’s anything but the grand illusion of user control. In the process I’ll predict the next few developments from bigtech, dispel some illusions, and recontextualize this faux openness into what it really is – internet marketing, plain and simple.

Some may point out dozens of pundits and developers who have only the best intentions. To that I say: sure they are — but look who pays their bills and is funding their research, it’s for a reason!