If, like me, you’ve spent a large part of your career analysing search engine traffic, you will be constantly looking for new perspectives on your data which will help you gain more understanding and insight (and therefore improve your strategy and results).
For me, mainly focussing on PPC traffic, the common analysis I’ll perform is to look at keyword ‘segments’ – this is simply finding commonalities within keywords and slicing up the data to pick out trends or opportunities presented by looking at keywords in this format.
For example, what I’ll usually do is get keyword data for metrics such as cost/conversion and ROI and measure the deviation at ‘segment’ level:
Whilst a well structured search marketing campaign should help keep keywords with similar performance metrics together in campaigns and adgroups, looking at these ‘segments’ still provides me with one of the most actionable pieces of analysis on a regular basis.
It can help when in cases where there are a large number of keywords with a relatively low amount of traffic/data – we can’t draw conclusions on minimal data, so this low-level aggregation allows us to get better insight into these terms.
Within the variation of keywords we start to tie actual performance metrics to underlying ‘intent’ – one of the key targets for search marketing should always be to search for the most users with the requisite amount of intent to allow profitability.
For the purpose of this article, we’ll consider a dummy set of keywords and metrics, from which I’ll illustrate various processes.
Up until recently, choosing which segments I was interested in was done on the fly and using the SUMIF in Excel to grab the information:
This formula simply looks for occurrences of the phrase ‘hotels at’ within our keyword list and sums up all the rows which have a match.
By manually spotting terms and phrases which could be used to slice up the data I end up with a list of segments which help me understand if there is anything interesting going on:
Then it occurred to me that there is a problem with the ad-hoc way of creating these segments: It’s very possible to miss groups out altogether which would be useful to get visibility of. It would appear that a more formal process could make this whole process more robust and useful.
Breaking Keywords Into NGrams
In the field of Natural Language Processing, these ‘stems’ that I’ve been using have a formal definition – an n-gram. A simple definition of an n-gram is “a contiguous sequence of n items from a given sequence of text or speech”.
For, example the 2-gram sequence of “How now brown cow”, would be: “how now, now brown, brown cow”. [Note: A popular application which gives further examples is Google’s ‘Ngram Viewer’]
So thinking about our list of keywords, we would want to analyse the text within each variant and categorise it in terms of ngrams – rather than my ad-hoc approach to creating segments, this covers all bases and makes the analysis more complete.
This is something which I would do by programming the process in Python or R, but for the purpose of this article, we’ll try to use this online tool to generate our list of n-grams. All we need to do is grab the list of keywords and paste the data in:
So now we have a list of all the bigrams (phrases of length 2), ranked by the number of occurrences in our keyword list, we can quickly combine these with some dummy data in Excel using the previously shwon SUMIF Formula:
Although the keyword lists used here are reasonably limited in size, hopefully this gives you an illustration of how utilising NLP techniques can help get a better understanding of the performance of keywords at an ngram level.
Going through this process has certainly proved useful for me – I’d be really interested to hear other examples of how text analysis techniques can be utilised.