08 11 / 2011

Startups: Hacking a Cohort Analysis with Google Analytics

At a recent Seedcamp day I was talking with a few teams about how to do a cohort analysis quickly and easily. We all know the value of actionable metrics above vanity metrics (thanks in part to Eric Ries’s new book), but getting them out is often a surprisingly difficult task, especially for a small team. At Crashpadder we do two kinds of cohort analysis - one is a custom built tool that runs off our database, and the other is a google analytics hack that I’ll show below.

The advantage of the database report is it’s accurate, it uses our more complex business logic and KPIs, and as we built it, it does exactly what we need. The downside is it’s not realtime/slowish (obvious N+1 issues with grouping data over and over), and needs to be run on our server (i.e. by me) - our new datalytics team member can’t just get stuck in. It’s not the kind of thing you do in a web request/live page, but we can run it once a day/week/month in a few minutes.

Google Analytics is great as everyone can access it from anywhere, we already use it for loads of tracking, and it has a relatively simple web interface. Downsides include accuracy and customisation. And did I mention accuracy?

So we use the database report for detailed analysis, and the Google Analytics hack to quickly spot trends, compare with other data, have data available on request, and so on. 

So if you want to get started with some cohort analysis at your cool new startup, but don’t want to slow down your lighting fast build-test-learn loop and MVP building in the early days, try this.

The hack

The essence of this is to push custom variables to GA, along with your normal page data, that describes the current user, and then create advanced segments in GA to limit what you’re seeing to these users. We push 5 bits of data to GA at crashapdder; these will obviously be different for every startup:

  • Is the visitor currently logged in
  • Are they a member of Crashpadder
  • Are they a host or a guest
  • The year they joined (for this cohort analysis)
  • The month they joined (for this cohort analysis)

So here’s your normal GA code:


<script type="text/javascript" defer="defer">

  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-XXXXXX-1']);
  
  _gaq.push(['_setDomainName', 'www.mydomain.com']);
  _gaq.push(['_trackPageview']);
  _gaq.push(['_trackPageLoadTime']);
  
    
  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();

</script>

We add to this our custom variables as below. You should be able to pull this out of your current user variable/model/object (we’re using Ruby/Rails); simplified a bit, our code looks like this:


<script type="text/javascript" defer="defer">

  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-XXXXXX-1']);
  
  _gaq.push(['_setDomainName', 'www.mydomain.com']);
  _gaq.push(['_trackPageview']);
  _gaq.push(['_trackPageLoadTime']);
  
  _gaq.push(['_setCustomVar', 1, 'Logged in', <%= current_user.blank? ? "'no'" : "'yes'" %>, 2]);
  _gaq.push(['_setCustomVar', 2, 'Member', <%= current_user.blank? ? "'no'" : "'yes'" %>, 1]);
  _gaq.push(['_setCustomVar', 3, 'Host/guest', <%= (current_user.blank? || current_user.is_a_host?) ? "'host'" : "'guest'" %>, 1]);
  _gaq.push(['_setCustomVar', 4, 'Join month', <%= current_user.blank? ? "'0'" : "'#{current_user.created_at.month}'" %>, 1]);
  _gaq.push(['_setCustomVar', 5, 'Join year', <%= current_user.blank? ? "'0'" : "'#{current_user.created_at.year}'" %>, 1]);
  
  
  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();

</script>

This then renders in your browser something like this:


<script type="text/javascript" defer="defer">

  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-XXXXXX-1']);
  
  _gaq.push(['_setDomainName', 'www.mydomain.com']);
  _gaq.push(['_trackPageview']);
  _gaq.push(['_trackPageLoadTime']);
  
  _gaq.push(['_setCustomVar', 1, 'Logged in', 'yes', 2]);
  _gaq.push(['_setCustomVar', 2, 'Member', 'yes', 1]);
  _gaq.push(['_setCustomVar', 3, 'Host/guest', 'guest', 1]);
  _gaq.push(['_setCustomVar', 4, 'Join month', '8', 1]);
  _gaq.push(['_setCustomVar', 5, 'Join year', '2009', 1]);
  
  
  
  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();

</script>

(A keen observer will notice that the first two seem to do the same thing - whether they’re a member or logged in. The number on the end of each line specifies the lifetime of the cookie GA uses to track the user. We set whether they’re logged in to that session only, and whether they’re a member to lifetime. So after the first login, we know if you’re a member again, even if you don’t login the second time.)

Google Analytics

Now head over to your GA account, and click ‘Advanced Segments’. Under ‘custom segments’, click ‘New Custom Segment’.

Give this segment a title, say ‘November 2011 signups’

Drop down the variable type (default is usually ‘Ad Content’), and select the corresponding ‘Custom Variable (Value XX)’ for the join year. So in this case, Value 04 is our join month, and Value 05 is our join year.

Note: you need to choose the value rather than the key - in the above example, the key is ‘join month’ or ‘join year’, and the value is 2008/2009, 1-12 etc.

In the ‘containing’ box, enter the year you’re grouping by (here 2011). Click the ‘Add AND statement’ link, and repeat with the month, this time choosing the Value 04 and the month number, here 11. This is creating a segment that isolates everyone who joined in year=2011 and month=11.

Save the segment, and you’re now looking at your data only for people who joined in November 2011.

We create a new one every month for the incoming users, and can quickly filter all data by a particular cohort, or compare two cohorts with one another.

If you jump over to your goal tracking, you can start to see which cohort is converting better, what their profile looks like in the months after they joined, estimate their lifetime value, value half-life and so on.

Feel free to leave comments below!

  1. blogbourse reblogged this from danhilltech
  2. blogbourse reblogged this from danhilltech
  3. psdtohtmlshop reblogged this from danhilltech
  4. du-hoc-nhat-ban reblogged this from danhilltech
  5. du-hoc-nhat-ban reblogged this from danhilltech
  6. aleve-sideeffects reblogged this from danhilltech
  7. crashpadderhq reblogged this from danhilltech
  8. danhilltech posted this