Welcome to TQ CMS knowledge base. Here you will find useful information for all sorts of things.

Merging Google Analytics with your Data Warehouse

Many of the projects that I’ve been working on over the last 6 to 9 months have evolved to include more and more data integration. Specifically companies are looking to import more Google Analytics data into other systems to do various types of analysis.

Data integration is nothing new, companies have been doing it for a long time. I’ve written in the past about integrating Google Analytics with CRM data.

Other analytics vendors have been supporting this functionality for a while. But I think as more organizations move to Google Analytics they’re looking to get the same functionality as their other tools. Hence this uptick in data integration.

So I figured this would be a good time for a high-level overview of the most common way to integrate Google Analytics with a data warehouse.

What is a Data Warehouse?

In reality a data warehouse is just a big database that pulls from lots of different sources. A businesses would build a data warehouse to do more complex types of analysis. Other types of data you might find in a data warehouse is customer data, purchase history information or demographic information.

When all of this data is put together you can do some interesting types of analysis. For example, you can do better analysis about online behavior and offline conversion. Or tie long-term purchase history to online behavior.

Another benefit of building a data warehouse is the ability to create very custom marketing segments. Adding information about conversion behavior and marketing activities can help augment historical sales data and create new marketing programs.

Gary Angel wrote a great post on some of the reason why you would want to build a data warehouse. Avinash Kaushik also wrote an interesting counter-point, why you might not want to build a data warehouse.

My opinion, this is very useful, but do it in an agile, flexible way that does not take years. This stuff changes too fast to rebuild.

Almost all data integration projects revolve around a primary key. A primary key is a unique number that exists in different data sets. Because this number exists in multiple data sets you can connect the data sets using the key.

To add Google Analytics data to a data warehouse you need to add some type of primary key to Google Analytics. In most of the work that I’ve done this key is a visitor ID. This anonymous identifier usually comes from some other system like a CRM.

When we add the customer ID to Google Analytics we store in a visitor scoped custom variable. This means that the ID is stored in a cookie on the visitor’s machine. We normally set the cookie when the visitor makes a purchase or logs into the site. Basically any time we can positively identify the visitor.

For the sake of this example I’m going to use a little piece of JavaScript to create a unique ID for visitors. I’m actually going to extract a unique ID from the Google Analytics __utma cookie. Then I’m going to set the unique ID in a custom variable, specifically in Custom Variable slot 5.

NOTE: The function _ugc() is a generic function that get’s a cookie value. In this case it’s extracting part of the __utma cookie.

Once the ID is in the custom variable we should start to see data in Google Analytics, specifically the Custom Variables report. Here’s the custom var:

And here are all the visitor IDs:

I know what you’re thinking, “You can’t store personally identifiable information in Google Analytics!” But this isn’t personally identifiable information. And besides, this is the same as the transaction ID stored in the ecommerce data.

Now that we’re done debating PII, we need to merge this data with other data. This is where things get complicated. The rest of the process is all programming. And all the programming is custom. The language you use, how you store the data, it all depends on your IT department and their platform of choice. The rest of this post is meant to be a high-level overview.

First, let’s extract the data from GA via the API.

This query will generate a list of all the unique IDs stored in the custom variable. The key dimension in this query is ga:customVarValue5, that’s the value of the Custom Variable in slot 5 which holds the IDs. I’m using the Data Feed Query Explorer to test these queries.

The basic query is something like this:

https://www.google.com/analytics/feeds/data?
ids=XXXXXXXXX&
dimensions=ga:customVarValue5&
metrics=ga:visits&
start-date=2011-04-20&
end-date=2011-05-04&
max-results=50

And the data coming back would look something like this:

So where do you put this query? You need to create some type of code that runs automatically. You might pull data from Google Analytics daily or weekly. It all depends on the volume of your data and your specific needs.

Once I have a list of IDs I can create some looping logic, again part of an automated program, that iterates over the list of IDs and makes an API request for each ID number found above. This second query will get detailed information about each individual ID.

The secret to the second query is adding a custom segment that includes the ID number. Look for 1041533899 in the query below. That’s the ID number I pulled from the first API call above.

https://www.google.com/analytics/feeds/data?
ids=XXXXXXXXXXX&
dimensions=ga:customVarValue5,ga:date,ga:source,ga:medium,ga:keyword,ga:pagePath&metrics=ga:visits&
segment=dynamic::ga:customVarValue5==1041533899&
start-date=2011-04-20&
end-date=2011-05-04&
max-results=50

With this query I’m getting the data, the traffic source, keyword and all pages that the visitor viewed. I could also pull things like revenue, conversions… But I’m looking for data that I don’t already have via some other system. So I would stick to things that are happening on the site, like where the visitor came from and what they did on the site (watch a video, use a calculator, etc.)

Once this query comes back it’s time to do the data integration. You might want to manipulate the data before you store it in a database or you might choose to insert the raw data. Again, a lot of this depends on your specific environment.

The important thing is that you now have the data. Integrate.

Gotchas To Watch For

There are some definite gotchas when implementing this type of soltion. Most notably, you need to deal with data sampling applied via the API. From the Google Analytics API docs:

Google Analytics calculates certain combinations of dimensions and metrics on the fly. To return the data in a reasonable time, Google Analytics only processes a sample of the data. In these situations, the metric data returned in each entry includes a confidence interval.

So massively huge data sets might need more frequent queries. Or you might need to break the data down into multiple profiles and then query it via the API. Again, this is what makes this type of solution so custom.

Another limitation is that each query is limited to a maximum of 7 dimensions and 10 metrics. But you can get around that with some creative queries.

Thanks for sticking through this rather vague post. I know it lacks some detail, but this is a very specific technique that can differ from one client to another.

There is 56 guides in the database.

More news

Bootstrap
25 February, 2014

Bootstrap

I have implemented bootstrap into the project and changed all code in the administration to use this new framework and I really love the result that I see after a couple of hours of coding. To make things look so good has never been so easy as with
Guides
20 February, 2014

Guides

I have added some of my guides for different things when it comes to the different areas in the computer, I will continue to add guides and especially when it comes to things that I am using every now and then in my own work.
First phase completed
17 February, 2014

First phase completed

Then I had time to finish the dirty work with the page and now it only remains little touches here and there and I need to upload information about what I'm doing and my progress on the project as well.
Time for a change
16 February, 2014

Time for a change

Today I've decided to update my own personal website with the latest version of my project called TQ CMS and with a completely new design that is responsive. My goal with this project is to make it as easy as possible for the end user while

Spotlight

The next step

The next step

I have now resumed my programming after a couple of months of vacation which was very much needed. I got a call one day from a friend that needed a system which can handle peoples reports on various problems like kitchen problems, water problems etc and I decided to make a system for it. Little did I know that it would totally revolutionise my own project and that it would teach me much needed jquery skills to further enhance the CMS. I also