Whether for my own sites or while consulting for other businesses, I’ve often run into a nagging feeling that the ubiquitous Google Analytics (GA) is not entirely properly installed, and that it might not be tracking the whole site. Here’s a checklist to know for sure while remaining sane.
1. Why the anguish? First, establish why you think you may have a problem
1.1. Different tools report different results? Check settings, pinpoint where discrepancies occur, accept imperfection
About everyone uses Google Analytics, but many organizations also use one or several other tools as a complement. For instance, if you’re interested in tracking individual users using their personal data, you’re prohibited from doing so in GA’s terms of service and will have to use something like MixPanel. Here are a couple pointers if your tools report very different results:
- Make sure all your tools are set on the same time zone.
- Be careful not to compare apples and oranges, because different tools – even within the Google toolbox – define and measure similar-sounding metrics in a different way, and deal with timeframes differently.
- Try to pinpoint where the discrepancy specifically comes from: is it a section of the site that’s not getting tracked in one specific tool, or a type of user agent, or a country?
- Are the numbers lining up directionally, but GA’s numbers are consistently smaller? Ad blockers might explain this, also some countries like China are more likely to block a big player such as Google than niche startups without political exposure.
If all your tools are properly aligned and you manage to get data that goes in the same direction and reflect similar numbers over time, accept that to some extent, data quality sucks. There are plenty of technical reasons, starting with shaky network connections from your users’ network devices, that will prevent tools from capturing exactly the same traffic numbers.
- [Google] Data discrepancies between AdWords and Analytics
- [Google] Google Analytics and DFP discrepancies
- [Segment] Mixpanel and Google Analytics: Debugging Reporting Discrepancies in Four Steps
- [Hubspot] Why do HubSpot and Google Analytics not match?
- [Heap] Resolving Data Discrepancies
- [Optimizely] Troubleshooting: Analytics discrepancies, Google Analytics and Universal Analytics
- [Adroll] Discrepancies Between AdRoll and Google Analytics
- [Twitter] Common analytics discrepancies
- [Facebook] What are the differences between third-party ad reporting and Facebook Ads?
1.2. Some reported traffic seems bogus? Exclude bots, test & rogue sites
The following settings will help, depending on the problem you might be facing (you don’t need to do everything, pick your battles):
- Enabling Bot Filtering in Admin > View settings.
- Excluding internal traffic
- Excluding “rogue” sites that somehow use your GA property.
- Blocking referral spammers
Ideally you should set up a different reporting view (formerly known as a “profile”) for dev/test if you’re constantly working on your site’s code. If your developers are disciplined they are already using a configuration file or environment variables to differentiate the various environments your site is running on. Google Tag Manager (more on that later) supports this natively.
Alternatively you can use the debug version of the analytics.js library, which logs messages in the JS console, which is displayed in browser dev tools. But make sure not to use analytics_debug.js in production, as it is heavier than the regular code. Some people do run the debug code live, knowingly or not.
The 3rd-party open source Google Analytics Debugger lets you see tracking beacon data in the Chrome, Firefox and Opera dev tools. Make sure to click the red button to start recording data or else the GA Debugger tab will remain empty.
WASP.inspector is also pretty neat for analytics profiling, and supports Google Tag Manager and its (optional) data layer.
The most “hardcore” way to debug GA code involves HTTP proxies such as Fiddler (how to) or Charles (how to). By now we’re in fairly technical territory, but these are the most powerful and versatile solutions, especially if you load GA through a plugin or library as discussed later.
If after some forensics it appears that some bot farm operated out of Ukraine is really messing with your site, read up on Improving Analytics Integrity. Some of the more serious risks that site owners face go way beyond protecting your traffic stats and should involve your system administration folks.
1.3. Our setup is so complex, we don’t know where to start? Establish a baseline
In case your GA account incrementally grew into a full-fledged install that stretches the tool to its limits, with all advanced features enabled, things can become a bit overwhelming. Consider adding a different profile for your site that just sticks to the basics, with no filters, integrations, rewriting or anything else that might interfere with raw data collection.
This way you’ll have a “GA baseline” for the big metrics such as users or pageviews. If there’s a large discrepancy between the baseline profile and the advanced one, then again try to pinpoint where it’s happening.
1.4. We just don’t know that GA has been properly set up? Keep reading
Maybe your developer left leaving a work in progress behind them and there’s no one in the team that knows much about GA, or maybe you just “feel” your traffic numbers are low, but in any case, you don’t trust your numbers without any specific reason. Then start with the basic troubleshooting steps in the following section.
2. GA basics: is this thing even on?
This is the part where the tech support guy on the phone asks you whether your computer is plugged in! Start from whether you’re seeing data in GA at all, then read:
- Check your web tracking code setup
- Get started with Tag Assistant Recordings
- Install the Tag Assistant Chrome extension
- Troubleshoot common tracking setup mistakes
- 15 Ways Your Google Analytics Might Be Broken, Part 1
Make sure to tweak Tag Assistant settings like displayed in the screenshot to the right.
Depending on your website’s architecture and size, you may audit all pages with Tag Assistant, test one page per page template, or use some form of automation. GA’s real-time tracking also comes handy for live debugging.
Some tracking issues have organizational or administrative root causes, when an organization has not kept track of its GA accounts, properties, and views, and ended up with overlapping or incomplete tagging. You obviously want to iron this type of mess out so that you end with a clean GA account organization that all stakeholders understand and comply with.
3. Google concepts that can lead to misunderstanding: time tracking, pageviews vs sessions, sampling
There’s a few things in how Google Analytics works that can lead to falsely thinking it doesn’t work, when in reality it does “work”, but within its own design assumptions. Some results can be misleading if you don’t understand these underpinnings.
3.1. Time Tracking
GA’s default time tracking is relatively crude, and downright useless on 1-page sessions where it assumes that people didn’t stay on that one page at all, for lack of “engagement hits“. In other words, no time is recorded for sessions that “bounce” on the first page, even if in reality the user actively read that page for 20 minutes. This makes Time on Page and pretty inaccurate metric. Engagement tracking is a whole topic unto itself that’s beyond the scope of this entry, but these articles should help:
- REAL Time On Page in Google Analytics. Pretty sophisticated approach that takes into account whether the browser window/tab is currently into focus, which is really important given some people’s propensity to open dozens of tabs.
- Riveted – A Google Analytics plugin for measuring active time on site.
3.2. Sessions vs. Users vs Pageviews
The custom report UI lets you mix metrics that don’t really make sense together, which can lead to the display of seemingly impossible results. Here again this is because of what Sessions mean in GA. Calculating session-level metrics on hit-level dimensions may result in seeing more users than sessions, which makes sense only when you understand how GA calculates these metrics. If you’re working with page-level dimensions or the Hour dimension then, counter-intuitively, you want to look at Unique Pageviews rather than Sessions.
GA uses sampling, which applies at different thresholds depending on the type of report you’re consulting and whether you use the free GA or Analytics 360. Google would say this is a feature, not a bug, but if you have high traffic you need to understand when and how sampling might affect you. The limit on the number of hits for a standard Analytics account is 10M hits/month while it’s at 1B+ hits/month for paying customers. AnalyticsPros has a tool to assess whether you might be going over that limit.
Also look out for specific reports that are based on a very small sample, as explained in this article.
Make sure not to mix up legacy (ga.js) and current (analytics.js aka Universal Analytics, introduced in 2013) code. Migration from ga.js involves retagging, especially if you used deprecated features such as custom variables. Don’t get confused by older content, especially if you’re googling around to find implementation guidance.
Whether you want to use the alternative async version of the analytics.js script is your call depending on whether a sizable part of your audience still uses old browsers.
If you’re using a CMS plugin (e.g. one of the several popular WordPress plugins for GA), make sure it’s up-to-date, its settings are accurate, and that its authentication with GA is current. Check the support forums for your plugin (example) to see if other people are reporting issues similar to yours (assuming you’re facing a specific issue rather that a vague feeling that GA doesn’t work well).
5. Tag management & other consolidation methods
Centralized tag managers such as Google Tag Manager (GTM) – whose arrival killed many of its independent competitors – or Tealium, can both cause and help solve problems with your tracking. This entry is long enough as it is, so I’ll just brush on this because you could write a whole post just about GTM.
If you’re already using GTM to load GA, here are a couple links about its debugging:
- Troubleshooting GTM
- 10 Ways Your Google Tag Manager Setup Might Be Broken
- GTM Developer Guide: Avoiding common pitfalls
Even though they do introduce a bit of extra complexity, tag managers are for the most part meant to be part of the solution by making it easier to manage and maintain several tracking codes in one place. To get started, see Install Google Analytics via Google Tag Manager and Safely Migrating To Google Tag Manager.
GTM includes versioning, with a UI showing differences between versions, a la Git blame. This alone will make it easier to isolate and roll back errors that may have been introduced at some point during the life of your tracking code:
Hopefully at some point GTM will auto-introduce annotations in GA. In the meantime, do use annotations to document configuration changes that may explain inflection points in trends reported by GA.
An alternative is to use a data hub such as Segment, an analytics library such as Angulartics, or cloud integration software such as Zapier (though that type of software is usually combined rather than substituted to running the GA script on your site). To get a sense of the pros and cons, here’s how Segment positions itself against tag managers. Technically you could even load GA via Segment via GTM, though it’s not necessarily a good idea.
Again, adding more pieces to the puzzle obviously adds another layer of complexity that introduces its own potential bugs and outages. Verify that the interaction points do work, and avoid running the native GA code in parallel with the third-party tool in charge of collecting data. And understand that some of the troubleshooting tools such as the Tag Assistant may or may not work with your setup.
So if you’re only using a relatively simple GA setup and nothing else, you might not need a tag manager. On the other hand, the more moving pieces you have, and the more people involved in their setup and maintenance, the more it makes sense to use a centralized tool. There’s never a perfect one-size-fits-all solution in the tech world. Instead, it’s all about understanding the trade-offs and making informed decisions based on your existing technical infrastructure, functional needs, human resources at hand, and plans for the future.
You might want to start from a functional GA setting before transitioning to a tag manager or data hub, unless you’ve concluded that you need to start from a clean slate.
6. GA features prone to breakage: cross-domain tracking, view filters, goals/funnels, ecommerce
6.1. Cross-domains and subdomains
Cross-domain tracking can be tricky to get right and varies depending on your setup:
- one vs. several subdomains
- one vs. several top-level domains
- whether (sub)domains are supposed to be treated as separate sites or the same one as far as GA is concerned
There’s not one true way to organize your GA tracking, it depends to a large extent on how you are organized (e.g. centralized or not across departments or countries).
6.2. View filters
This is possibly the one feature most likely to break your reporting, so check that you are well-versed with filter syntax before implementing live. It is very easy to write a regular expression that’s more greedy than intended. You can use the free RegExp Tester Chrome extension, the RegExr website, or RegexBuddy (Windows, $40) to learn, write and test your regular expressions. Be warned, there is a learning curve, so instead you may want to start with this guide if you’re new to view filters. Do not confuse them with table filters which are just temporary visual filters, while view filters are non-reversible data filters.
If you are going to use view filters, let me insist on my earlier recommendation to use separate test and baseline properties. Right from Google’s documentation:
“Filters are destructive. Filtering your incoming hits permanently changes those hits in that view, according to the type of filter. Therefore, you should ALWAYS maintain an unfiltered view of your data.”
If you are using several filters, bear in mind they are executed serially one after the other, meaning the output of the first one feeds as the input of the second one. Meaning the order in which filters are serially chained matters. Using more than one Include filter can lead to data loss and should be done with caution.
Also use the filter verification functionality provided by Google.
6.3. Goals & Ecommerce
Goals can be improperly set up for similar reasons as filters, typically because of URL typos or regexp errors. Funnels can also appear to have leakage in case they include (sub)domains but you haven’t set up cross-domain tracking. See Troubleshooting goals for details, and this entry on funnel problems.
Ecommerce transaction tracking can break if you don’t properly escape or remove apostrophes or pass along incorrect values (e.g. a non-numerical value where a number is expected) to GA.
6.4. Campaigns & Remarketing
Remarketing involves meeting several account requirements that are not enabled by default. Nothing too complicated though.
7. Recap: Useful Chrome Extensions
There’s no such thing as perfect data in the online world. Narrow down the issues and break them down in small tractable bits, and don’t spend too much time overthinking your measurement infrastructure as opposed to analyzing, and more importantly acting on the data. If you can be 95% confident in the direction and order of magnitude of your data, then the extra work to converge towards ever-elusive perfection might not be worth it relative to spending the same energy on growing your business. You want to have sound and powerful tracking, but keep in mind there are diminishing returns here as in every endeavor.
Don’t hesitate to contact me if you’re in the middle of assessing your web tracking and marketing toolset, procedure, and resources.