Digital publishing is a tough business. Monetizing content is done through the dual streams of advertising and audience revenue that have supported offline publishers for centuries, but new middlemen and platforms have disrupted both of these revenue categories, with each new entrant to the value chain claiming their piece of the digital revenue pie.
Digital advertising is rife with new technology vendors that promise to increase revenue through targeting and extending reach, but the effective cost-per-thousand for digital impressions continues to fall. Audience revenue is challenged by the plethora of free content that competes with paid offerings. Although improving, pure digital audience revenue is still in its infancy as a category, and the volume of pure digital subscribers is still small relative to historical print levels. The question facing publishers is how to maximize total revenue from their digital distribution channels across both primary revenue streams. An important tool available to publishers trying to answer this question is dynamic customer lifetime value scoring.
Customer lifetime value (CLV) is a (relatively) old concept. Calculating the expected operating margins received from a customer has been done long before the birth of the Internet. For a digital publisher, measuring CLV requires knowledge of the revenue received from a customer and the revenue received from advertising delivered to that customer. In most cases, the direct costs for a digital customer are close to zero, which is the part of the magic of digital delivery platforms. Thus, digital CLV is really a revenue measurement and active lifetime forecasting exercise. Where most digital publishers run into trouble calculating CLV for their audience is obtaining data on both revenue streams for each customer and forecasting a customer’s active lifetime.
Let’s Talk About the Data Challenge First
Digital advertising revenue data typically comes from the advertising server and the billing system. Many publishers use Google’s DFP advertising server to deliver digital impressions and their DART sales manager (DSM) to record revenue. Other ad servers and billing systems are similar in their capabilities. The digital impression delivery data is organized in a manner consistent with how the impressions are sold and delivered.
1. New Analytics Tool now Available
Digital audience traffic data often comes from Google Analytics, which has both a paid and free version. Omniture is another common source for digital audience traffic data. The data from both of these products are organized according to how the site is tagged, which is rarely consistent with the way the advertising data is structured. Merging data from these sources often requires aggregating both data sets to a common level of reporting, often to the day, not a very granular level for analysis. To address this problem, Mather Economics has developed a tagging system that captures all the data needed for CLV and other analysis for each unique visitor. We call this tool ListenerTM.
The second challenge for measuring digital CLV is forecasting the expected active lifetime of a subscriber. In many cases, publishers use historical retention as a guide to future behavior, which is a reasonable approximation in most cases. A best practice is to develop a forecast algorithm using survival modeling, an econometric technique developed in the health care field, that can adjust active lifetime forecasts for subscribers in response to changes in factors that affect retention, such as price changes, retention campaigns and product enhancements. If a CLV score is calculated using a forecast algorithm instead of a static historical retention curve, we call it a “dynamic CLV” score.
2. Monetizing the new Intelligence
Once digital publishers have an effective CLV metric in place, they can monetize that investment through their customer acquisition efforts, which is an increasingly important component of the digital publishing business model (see linked article on digital publishers eschewing ad tech.) An effective customer acquisition tactic for digital publishers is to offer content for free to prospective subscribers. This can be done through a “freemium” model or metered access. In addition to these models on their own site, social media platforms are now offering publishers a revenue share on advertising impressions delivered adjacent to content publishers provide to the social media companies. CLV can inform publishers as they decide what content should be only for paying customers and what content should be offered for free.
To walk through a sample CLV calculation, let us evaluate the revenue from a non-subscribing visitor that reads an average of 50 article pages a month, each with four advertising positions at an average eCPM of $8. This visitor is generating $1.60 in digital advertising revenue per month (50*4*$8/1,000). If this publisher limits access to 20 articles per month and charges $9.99 for monthly access, they are putting 120 advertising impressions at risk, or $0.96 per month. If there is a 3 percent conversion rate for subscription offers at a 20-article level of free access, the publisher is putting $0.96 in advertising revenue at risk for $0.30 in expected monthly subscription revenue ($9.99*3%). We can add a time dimension to this analysis by estimating how many months of active subscription life this publisher can expect from the subscriber versus how likely the non-subscribing reader is to continue coming to the site in the future.
3. Examples of Applying CLV
To illustrate an application of CLV by a publisher, we can review the case of a digital publisher in a major metropolitan market in the United States with two Major League Baseball teams. This publisher has a large digital sports audience, and we helped them analyze the audience to evaluate the potential for a digital sports content product. We found that the audience was large enough to support a digital sports product as an add-on to their core publication. The most interesting finding from the project was how much the economics differed for the two baseball team fan bases, which determined how much content should be offered to each group for free to maximize total digital revenue.
One of the teams had a digital audience that was largely national in its distribution, while the other team’s digital audience was almost exclusively local. From an advertising revenue perspective, that meant that the team with a local audience generated much more traffic that could be sold through the direct sales force to local advertisers at higher eCPMs. The other team had about half of their digital traffic from fans living outside the local metropolitan area, which was sold through programmatic channels at a lower eCPM.
From an audience perspective, the team with the national audience had a higher propensity to subscribe, in part because the out-of-town audience was eager to have access to the coverage, in part because they were outside of the print distribution area and their local sports coverage likely does not cover this team in detail. They also had demographic characteristics that were found to be indicative of subscription buyers. The other team’s fans could read the coverage through the print platform or through other local coverage, so they had less demand for access to digital coverage. Also, they tended to have characteristics indicative of a group less likely to subscribe, such as a younger age profile and a greater share of mobile content consumption.
So, How does CLV Help this Publisher Decide What to Do?
The CLV calculation demonstrated that fans from the team with more local audience should get more free content than fans from the team with a more national audience. We found that the opportunity cost of lost advertising revenue from a more restrictive access policy to the local audience outweighed the likely additional subscription revenue that would be realized. The opposite was true of the more national team’s audience.
Of course, the level of free access to the sports content is not set by team affiliation. It is possible for this publisher to determine the level of free access to this enhanced sports coverage for each unique visitor (grouped into segments) and for the level of free access offered to be a function of whether the visitor is in-market or out-of-market, what platform he is coming from, his overall digital engagement and other characteristics. The revenue-maximizing level of access also can be determined by what his likely retention would be once he was acquired as a subscriber and how he would likely react to future price increases once he had reached the end of the promotional offer.
As we like to say, data by itself is worthless. What you do with the data makes it valuable. A dynamic CLV calculation is a robust analytical approach for making profit-maximizing strategic and tactical decisions. The incremental profit created by these decisions should yield a substantial return on the investments in data and analytics made by digital publishers.
Three Primary Items Needed to Properly Measure CLV
1. Get the data
But not just a report of how many visitors came to your site over a 30-day period that any of the analytics tools out there will show. The key here is to have granularity in both user behavior (data for each user and each page that the user consumed) and advertising delivery (data for each ad impression as it is being delivered to each user). Using Listener™ (which was built specifically for the publishing industry) detailed information on online activity can be captured. It works similar to Google Analytics or Omniture, but the focus is really on individual users and how they engage with the site. Every event on a page is tracked and reconciled to an individual user, whether he is anonymous or registered. The primary events captured are ad impressions and paywall activity (i.e., which users were prompted to subscribe, how far in the conversion funnel did they go), but really anything that can happen on a website is in scope. The primary reason for preferring Listener as the tool of choice is the ability to get a unique list of users (both known and anonymous) to analyze and match with other data.
To get a complete view of advertising performance at the user level, an integration with an ad server is required. Many ad servers (such as Google’s Doubleclick) will allow third party tagging in a custom creative, and Listener uses this functionality to capture detailed information about the ad. If you are using DFP Premium, a creative wrapper makes this easy to set up just once rather than tagging every single campaign. However, the tagging itself will only capture the IDs of the ad and where it was delivered (for example, line item ID 123456 was delivered Aug. 21 at 12:01 p.m. on a specific URL to a specific user cookie). While this improves the granularity of tracking ad delivery, in order to bring even more context to that ad, an API call can be made to most ad servers to pull key metadata about the ad. DFP has a robust reporting API (or if you don’t have time to set up a script, even using the query tool in the dashboard will get you everything you need) which can be used to pull metadata at the unique Line Item ID and Creative ID level to match a CPM, an ad size, an advertiser, the line item type, etc., to each ad. The custom tagging will capture a Line Item ID, Creative ID and create an event for each ad being delivered, while the API is used to enrich the information for each line item.
Lastly, to complete a full view of the customer, a unique login ID is captured by Listener upon authentication that can be translated to a customer database. Some publishers use email as the ID, others use a hashed ID along with a translation table to link with an account number and customer database. This enables very rich offline customer information to be matched back to the customer’s online activity. Of course, offline data is only linked to an online user if they have registered or subscribed and actually logged in at least once.
2. Calculate an Operating Margin
Perhaps the most important part of doing a CLV study is to correctly assign a dollar amount to each event and user. Once a complete profile is built for the user (as shown above), it should be clear the revenues and costs each user is generating on a weekly or monthly basis. For a publisher who has paid subscriptions, revenues include the subscription price (digital-only or print+digital), print advertising revenues (converted and applied to a per-day per-copy level), and digital advertising revenues (usually averaged over a one- to two-month period). Costs include delivery and production costs for a print subscriber, but digital costs are generally zero. When calculating operating margin, typically only variable costs are considered since a new start or a customer stop on the margin will not have an impact on any fixed costs. The approach is fairly straight-forward once the user database is built. Revenues and costs are converted to a weekly customer level. While the above operating margin is assigned to a paid subscriber, for anonymous users, only the digital ad revenue is included as the operating margin.
This diagram shows a fairly typical distribution of newspaper subscribers by operating margin. The first spike is likely the top-end of a weekend subscription while the second spike likely represents a full daily delivery schedule.
3. Use predictive modeling
This is where most analysts will stop and be content with the operating margin above. But the way to really understand user value is to apply a statistical model that predicts retention for existing subscribers and acquisition probability for non-subscribers. Without a statistical model, the operating margin really measures a snapshot in time or possibly past revenue spend, not necessarily what a customer will do in the future. In effect, CLV without predictive modeling is really Customer Lifetime Revenue and should be used with that caution in mind.
For current subscribers, a survival model (originally developed for the healthcare industry and somewhat similar to a discrete choice model but with a time component) is best. This model measures all the various customer characteristics and their impact on long-term retention. For example, a user’s subscription price, tenure, demographics, digital usage, start source, etc., all have some impact on his probability to “survive” over some period of time. A parametric survival model run over several years of history generates coefficients for each variable, which can be applied to current subscribers and forecast expected retention. Once the survival model is complete and applied to the current customer user database, you can take the operating margin and cumulate against the expected retention over some amount of time. Generally, a tw0- to five-year forecast of value with some discounting of future revenues.
The x-axis here is the number of days from a subscription start and the y-axis is the survival probability. Taking the operating margin times the expected survival probability and cumulating this value over time (the area under the curve) yields the raw CLV.
On a related note, a churn model also can be built to forecast short-term stops. This is usually a discrete choice model (logit or probit). While not really part of the CLV score, churn modeling can help on a day-to-day basis for prioritizing high/low risk subscribers for retention campaigns. The churn model is used as an overlay against the CLV score to identify any high-value subscribers who are at risk of stopping. A retention campaign can be targeted to these high-value, high-churn subscribers to try and prevent a significant loss in subscriber value.
What about Anonymous Users?
The exercise above will generate a CLV value for current subscribers, but what about the anonymous users? While noted above that digital advertising revenue was the only input to anonymous users, this is not entirely accurate. For every anonymous user, there is a dynamically updating forecast of conversion probability, which discounts potential subscription revenue. Behind this is yet another model, a discrete choice model forecasting the expected acquisition probability of each anonymous user. Some of the variables in this model might be engagement (time on site, time per article, number of articles, return frequency, content anchor, etc.), device usage, geography, time of day, etc., and by measuring paywall activity on historical users, coefficients are generated and applied to current anonymous users on the site.
Thus, the CLV of anonymous users is still the expected operating margin of subscription and advertising revenues, but it is discounted by the probability of acquisition. This is important because while a user may look very valuable from an advertising operating margin perspective, his CLV considers the probability of him actually paying a subscription fee and retaining. Therefore, perhaps a website’s most valuable subscribers are not those that generate high ad revenue, but those who might generate low to medium ad revenue and have a relatively higher conversion probability.