In an E-Commerce company, work revolves around product data. That does not only apply to search filters, bidding strategies and on- and offsite advertisement, but also to data analysis. Do our best-price products really convert better than the rest? Which price ranges are we selling the most in our wine section?
Are the Christmas deals driving as much revenue as expected? This two-part series looks at methods of getting product data into Adobe or Google Analytics – beyond the traditional, but limited on-site tracking via a Data Layer.
Whether it is your Product Information Management (PIM) tool or the article database in your CMS – the raw material of a digital business is its database. As Digital Analysts, in order to analyze performance in regards to product attributes (e.g. orders by product category), we need to transfer those product attributes into our Analytics tools.
So say we want to use a new piece of information like the author of an article or the user ratings score, the size of the products, or the “current deals” label. Traditionally, we asked the website developer to add that piece of information to the Data Layer as a new variable (e.g. prod_rating: “4.5”). Then our Tag Management System could easily pick that variable up and send it to any Marketing or Analytics tool that needed it.
Yet Another Data Layer Variable…?
But having the developer code all that information into the Data Layer requires the usual release cycles, i.e. a lot of waiting and communication effort. That quickly gets to a point where you think twice whether you want to bother the dev team with yet another Data Layer variable. Besides, more and more attributes increase the request sizes, which in turn increases the data that needs to flow between the tracking tool’s server and the browser and is more likely to break Google Analytics’ order tracking when someone orders many products. And the shop has to query and generate more and more data which slows down the website performance. And confidential data should not be in the public Data Layer at all (e.g. the profit margin of a product).
Other and Better Methods
So the classic Data Layer is great, but it has its limits. Luckily, there are other methods of getting data into your Analytics tools. At the Swiss online marketplace siroop.ch, we have tried several. Some are rather new like Tealium’s “Dynamic Data Layer Management” which we will look at in the second part of this article series. This first part will look at the well-established Adobe Analytics Classifications and compare them to Google Analytics’ Data Imports aka “Dimension Widening”.
Adobe’s “Classifications” versus Google’s “Dimension Widening”
Adobe Analytics Classifications aka “SAINT” have been an extremely useful Adobe/
Omniture/SiteCatalyst feature for a long time. Google however has added a similar feature with “Dimension Widening” four years ago which has been enhanced constantly.
Both features make it possible for you to track only a key value on the website, e.g. the product SKU (aka the product ID), and then import all the other data for this SKU (the product brand, the category etc.). In Google Analytics, the dimensions you import into are common “custom dimensions”, whereas in Adobe’s case, you set up “classifications” which are a special form of dimensions that work almost exactly like normal dimensions, but their amount is not limited like with eVars and props. This means you can create as many classification dimensions as you wish.
Google does not allow you to widen all types of dimensions, e.g. you can widen the Product SKU, but not product dimensions. In our case, this is important as we also have a Product Variant ID which determines certain attributes, like the exact price range or the exact model of a product (e.g. iPhone 8 with 64GB or 132GB). Another example is the product’s Merchant ID which helps us to import further merchant data as classifications (like the merchant’s name or status or the account manager responsible for this merchant).
Google also does not offer a Classification Rule Builder to automatically fill classification values without having to import or upload anything. As an example, you can generate classification values based on RegExp extractions like “get the string from the second slash to the next one and write it into the Dimension “Product Deals Label”. For data uploads in Google Analytics, you can also not use an FTP server like in Adobe, you have to do the upload via the Google Analytics API, which often means dependency on developers.
Data Imports/Classifications Example:
Product Price Range per Category as a Determinant of Conversion Rate
To give you a simple example of what can be done thanks to Classifications/Data Imports: Since we order the Level 1, 2, and 3 Categories as well as the Product Price into Classifications in Adobe and then group these prices into range buckets (<50 CHF, 50-99.99 CHF, etc.) via the Classification Rule Builder, we can see with just a few clicks how the price range strongly influences the Product Conversion Rate (we calculate that as “Orders / PDP Views (1pV) => Visits with Orders of a Product from this Category divided by Visits with at least one Product Detail View of this category). However, in some categories, higher price ranges (e.g. in Category 1), inspite of their low Conversion Rates, create a much bigger chunk of revenue than the low, but highly converting products:
The Key: Retroactive or Not?
Let’s return to the comparison though: Google’s and Adobe’s approach most importantly used to differ in one aspect: Google’s Data Imports were not retroactive, i.e. they affected data only from the moment it was imported and processed, while Adobe’s classifications were and are retroactive. So in Adobe, importing the product classifications (e.g. product names) today will show product names in the reports for yesterday and last month, while Google would show names for today, but not for the past (see below for why I am using the past tense for Google).
Each approach has its pros and cons: Google punishes you for not synching data right away. That will lead to missing data when new products come into the shop and people view them right away, as the data won’t be there until the next product data import to GA has been processed. Furthermore, imports tend to rely on complex backend processes which sometimes fail. In this case, until the bug is fixed your data will be gone entirely and it cannot be recovered.
If that wasn’t risky enough, wrongly imported values cannot be erased or changed. Since Google’s import is not retroactive, you always get the value at the time of the import (e.g. the current name) of an attribute in your reports for the reporting time from the last import to the next import when the value changed. That stinks for frequent things like corrected typos in product attributes where you prefer having one consistent value over time and not a new row each time the product name is changed or corrected.
But the retroactivity is not useful for things like temporary labels like “best price”, “special deal of the week” etc. Here Adobe’s striving for retroactive consistency hurts because as soon as the product stops being a “special deal of the week”, it is no longer discernible as such in the reports anymore.
“Query Time” Makes Google Analytics Reports Work (Almost) Like Adobe Classifications
Adobe’s retroactive approach used to fit most cases better. The number of consistent attributes (think name, brand, category, producer, size, color) is usually much higher than that of temporary or changing attributes (think “deals labels” or prices). This may be why Google Analytics about a year ago added the “Query Time” Import feature (in “beta” and for GA360 clients only). It works more or less like Adobe’s Classifications, meaning that the values imported are retroactive, so the newest value always overrides the previous value, no matter which time range you report on.
But: Query Time Import is Limited to 1 GB – for all files together!
So as a GA360 client, you can now either use the classic Processing Time Import for temporary attributes or the Query Time Import for attributes that are consistent over time. I would love to tell you more about them, but we have not been able to get the Query Time import to work so far after trying that for almost year now together with ProductsUp (which generates all of our product feeds, see my tiny review here), our agency, and the Google Analytics 360 support. It turned out that, while the Processing Time Imports can in total occupy up to 1 Terabyte of data, the Query Time Imports are limited to 1 GB of imported data, and that applies not, as we thought, to the size of single files, but to the size of all files together.
So if you first upload a file of 600 MB and then another one of the same size, you are over 1 GB in total and you’re getting rejected. In conclusion, the Query Time import is simply not Enterprise-ready at the moment, as the product data we want to import goes way over 1 GB and will increase over time. Google told us this may be one of the “Beta” restrictions and may at some point be removed. So I will keep hoping for that Beta phase to end, for the time being we will rely on Adobe-only in terms of product data imports.
Google Analytics Dimension Widening vs. Adobe Analytics Classifications
Row Limits Or “why is ‘other’ our top category if we have only 9 product categories?”
One more paragraph on row or value limits: If you have a lot of values, i.e. many products and thus many product SKUs (we have almost a million products for example), you will soon run into these limits. The limits I am talking about here are general limits and not only for product data or data imports. Limits exist for the sake of computing speed so Google’s/Adobe’s servers don’t have to go through too many rows before serving you a report.
Google and Adobe handle these limits very differently. There are cases (like URL parameters) where Google’s approach is more useful. When it comes to product data however, Adobe’s way seems more useful in every-day work.
The free GA can show you 50,000 different values per metric-dimension combination per day in a normal report (75,000 in GA360). You can get all the values exported by waiting a bit and doing an “unsampled” CSV export or in the pre-aggregated unsegmentable “Custom Tables”. But values over the limit are simply grouped into “other” in standard or Custom Reports for ad-hoc drill downs, segmentation and the other typical analysis work that end users tend to do inside the tool every day.
Now this limit does not only apply to the keys (the different Product SKUs) themselves, but also to all the imported dimensional values based on those keys! As an example, if you view a report for yesterday with product categories, only the categories for the first 75,000 products (SKUs) that got traffic yesterday will be shown. Aall the other SKUs will show a product category of “(other)”.
So GA simply cuts off after 75,000 :(. In our case, we thus often have up to 30% of our Revenue in the unidentifiable Category “(other)” in GA. The more products we get into our shop, the worse this gets.
Adobe limits you to 500,000 different values per dimension per month (independent from metrics). For products, this limit can supposedly be increased on special request (we don’t need that yet because even though we have almost a million products, only a part of them is actually viewed in a given month and many of them are inactive or out of stock). But if you want all the rows, Adobe is more tedious than GA, as this is only possible via the feel-like-2001 “Data Warehouse” interface where you also cannot use the powerful segment-based Calculated Metrics or Custom Date Ranges.
Sorting By Traffic is the Key!
The main advantage however is the sorting that Adobe applies to its limits. Adobe does not simply cut off after 500,000 values have been reached in a month, it always sorts the values by their amount of traffic. Thus it will show the 500,000 items with the most traffic (the most Hits) in a month. The rest (their Classification values alike) will show as “Low Value”. But since the values are ordered according to their traffic, it never happened to us that a product that was bought or added to the cart ended up under “Low Value”. That is because the buying process alone (product detail page, add to cart, cart view, checkout, order) creates so many Hits for this product SKU already that the product easily jumps out of the “Low Value” swamps with all those unimportant one-Hit-wonder products.