Google Analytics is free, offers so much, and there is actually no reason whatsoever to pay for an Analytics tool, right? In this post, I am sharing the three things in GA that bother me the most – coincidentally, three things that I usually get in paid Analytics tools. And no, it is not about privacy.
We all love Google Analytics. So much has been written about the great features it offers. And rightfully so. Google Analytics has made Web/Digital Analytics/Intelligence/YourBuzzword mainstream compatible. Its user interface is probably the most user-friendly on the market, its AdWords integration is just plain awesome, and ground-breaking features like Ad-hoc Segmentation, Intelligence Events or Multi-Channel Funnels have forced other vendors to innovate faster.
Some things you pay for because they really are important
Google Analytics offers you so much awesome stuff that you may wonder: “Why pay for an analytics tool if GA gives us all this for free?” Of course, the answer to this question depends on your business requirements and cannot be given in general. But still, there are some things that most paid tools offer which GA does not. And the more I have been working with different tools, the more I have come to understand that these things are really important.
It is not about privacy concerns
So amidst all the Google Analytics laurels, I thought it is maybe time for a post on the biggest drawbacks of the “free” GA. No, I am not talking about the never-ending privacy concerns, although, around here, privacy issues are usually the reason why companies opt for a paid tool over GA. I will restrict myself to what you experience when using the tool on a daily basis.
Many people from Google itself to Avinash Kaushik have seemed to downplay this. But for me as for some of our clients, sampling is a huge issue and the one I lament the most.
How does sampling work? Instead of reading every row in your data, GA reads only every 2nd or 3rd or 4th or xth row (depending on the sampling rate). That helps speed up your report generation (a real issue for some of the paid tools) and reduces the computing load on Google Analytics’ servers.
Sample rates of 0.003 percent can be just good enough – but not in Google Analytics
Sampling is a common method in statistics. Think of how few people (“rows”) are needed to get valid electoral forecasts: In Germany, Emnid, one of the bigger opinion research institutes, usually asks between 1,800 and 3,000 people, a sample of just 0.003-0.005 percent of the population entitled to vote. That sounds ridiculous in comparison to the 15 to 60 or more percent you usually get in Google Analytics, but of course it depends on what you analyze and how representative your sample is. Emnid, for example, selects the people it interviews carefully to reflect the overall population as well as possible.
Sampling works on a property level (not a profile level, so creating a profile with a very reduced data set won’t help you).
For many reports, sampling does not cause much of a distortion. Think of the electoral forecast again where there are only about six parties with some sensible share of the vote. Here, your data can have only six different values. In that case, a sample of 0.003 or even less percent can be enough. But the more values you can have, the more traffic you track, the smaller the segments you want to analyze and the more complex your reports, the worse it gets with sampling in Google Analytics.
1. Go to your on-site search report and apply a boring standard segment like “Mobile and Tablet Traffic”. GA now informs you with a hint in yellow that your data has been sampled to contain only 42 percent of the data (you might see a different number).
2. Use the little slider on the top right to change the sampling rate:
Even if you go all the way to the right to get the highest precision possible, you never get a 100-percent sample. The maximum for the property I am looking at right now is just under 60 percent in standard reports with one segment applied, and if I take longer data sets (90 days is now the maximum), more segments or Custom Reports, I usually get little more than a 30 percent sampling rate.
3. Compare the report with three different rates: the lowest sampling rate, the medium (default) rate, and the highest one. See how the data changes mightily even for the Top 10 items which contain the things with the most traffic (so they should be least affected by sampling).
In my case, I get the following three views of the same report (top 10 On-Site Search Terms for the last month):
(click to enlarge)
As you can see, the lowest sample rate on the right produces outright rubbish, so I often wonder why GA even offers it, because every report with this sampling rate becomes just nonsense. None of the true top 10 terms appear here. But even the differences between highest and medium (pre-defined) sample rate are considerable – again, it is a TOP 10 list, not the 30th page of your highly segmented report with very tiny data chunks! For example, “chau” advances from 5th to 3rd and from 289 searches to 347 (+20%), and apart from 1, 2 and 8, no terms stay in the same position.
So depending on who looks at the report with which sampling rate, he might draw different conclusions. But I admit, this is a rather soft and not that deterring of an example. Lunametrics has some more extreme cases to share, as well as some common remedies. Those remedies won’t solve your sampling problems, they only mitigate them a bit, and the more they mitigate them, the more cumbersome they get, like using the API to export data on a daily basis (because a lower data range reduces the sample rate) and then reaggregate it somewhere else (see also “Analytics Canvas” on this).
So much for sampling. What else do I lament in GA?
2. No vendor support
There are tons of resources about Google Analytics out there. But there is nothing like being able to call someone who knows how the tool works deep down, someone who can look at your data and maybe even your raw data in the vendor’s data centers and explain to you what exactly happened that your data looks so weird. Yes, even the paid vendors’ support sometimes doesn’t understand what has happened in their tools, but at least then you know it’s not your fault and you can stop spending hours of searching for explanations.
3. No raw data
I hate solving data discrepancy problems (tool A reports 4,000, but GA/Webtrends/Adobe Analytics/Webtrekk only 3,000). But unfortunately you get those problems, no matter which tool you use. Even with a Tag Management System where you can control that two tags/tools are fired exactly through the same trigger (= according to the same rule), these problems persist, and they cause companies to not trust their web analytics data. This then becomes a fatal analytics decelerator. Yes, I have read all about how to “slay the data quality dragon” and how to value precision over accuracy (long live Jim Novo’s vintage post).
But still, grave data quality problems abound and need to be fixed. Tag Management solves the point where data is collected (when the HTTP request to the collection server is generated), but then you still have to solve the mystery of what happened between the request and your report, i.e. data logging and data processing. Here, it can help tremendously if you can get the raw log files to find out whether what you collected was logged correctly. If you have access to raw log files, you can also get other sometimes helpful data like the exact timestamp and the IP address. Now, the mystery of data processing is still ahead of you, although here, your tool vendor should be able to help you – if he offers support, of course. Note that this is just one benefit of having raw data.
The difference between paid and free GA
To conclude this post, I found my biggest confirmation that these three things are really important when I checked out the features again that you have to pay for when using GA Premium. The three main distinguishers you get in Google Analytics Premium are exactly the ones I have just written about:
- Unsampled data
- Vendor Support
- Raw Data (with BigQuery)
What is it you miss most in Google Analytics? I am looking forward to your comments.
[Thank you to my colleague Matthew Brandt for contributing to this article]