What Good Is Bad Data?

BY Noah Jenkins // Business Analyst

There’s a common saying that, when it comes to data collection, measurement without optimization and actionable insights is wasteful. Data has to be associated with something to work toward, otherwise it becomes collection just for the sake of collection. While that is absolutely true, if the data itself isn’t reliable and accurate, acting effectively on it becomes nearly impossible.

Obviously collecting inaccurate data is never the intent, but the nature of data collection is that there are inherent obstacles to work around to ensure data is as accurate as possible. In implementing an analytics strategy, we do whatever we can to avoid these pitfalls, but our main line of defense is data validation. Leading up to deployment, immediately following, and continuously ongoing, there’s a level of scrutiny that has to be employed to be sure that all the work put into the collection pays off in analysis. Any time analytics strategies are reworked or adjusted, it establishes a new baseline for analysis, making it imperative that the baseline reflects the true data as much as possible so that the standard going forward isn’t based on bad data.

Tag Management

Regardless of implementation method, whether it be through a tag manager like Google Tag Manager, Adobe Launch, or directly through the site code itself, or using a plug-in from a third party to provide event data, every recordable action in analytics originates from a hit. Each type of hit, from page loads to specific clicks and form submissions to timed events, plays a role in creating a complete, cohesive view of website interactions to report on. However, each has its own set of caveats and fundamental logic that, when used correctly, can greatly help fine-tune an analytics setup. On the other hand, when used haphazardly without a solid supporting strategy, it can result in misleading and inaccurate data collection.

Hit type is the basis for which all events are collected. Is all the data we want to collect directly associable with this action? Is this action truly indicative of a user interaction, and how does that affect other metrics like bounce rate? It all comes back to how the hit is identified.

Then it comes down to what those hits actually send and collect. Every piece of data collected should, ideally, provide some insight that could help optimize the site, improve the user experience, and drive the online business forward. Most often, the register of a simple hit isn’t enough. More granular, descriptive data helps provide a clearer picture of overall site activity, but again, that data has to be reliable to be usable. If the metadata associated with a hit doesn’t provide useful context, it loses its value as part of the setup.

Analytics setups begin and end with tag management, regardless of how they’re deployed. From the fundamental logic behind when the event is captured down to the details of what data is being recorded, every part of the setup is crucial in building out that holistic view of performance. Therefore, it’s critical that the setup is constantly put under a microscope.

Analytics Reporting

The most reliable place to check for any discrepancies, blind spots, or other implementation issues is the data itself. Analytics platforms provide standard measurements that can be great indicators of a potential issue in an analytics setup. Drastic spikes and falls, questionable data flows associated with stock metrics like bounce rate, pages per session, and total vs. unique hit counts can tie directly back to a potential setup issue.

In addition, analytics platforms typically have the ability to implement alert matrices to call out any of these questionable anomalies specifically. Day-over-day, week-over-week, month-over-month, and year-over year analysis within the platform provides a ton of value when it comes to analyzing site performance, but both sides of that coin need to be intact for the analysis to be worth anything. Alerts and historical reporting help keep us honest, but at the end of the day, one quick look at data at one point in time isn’t enough to establish a reliable baseline.

What happens when a site sees a spike or drop off based around the weekend? Is a cold weather clothing retailer really going to have comparable site traffic in January as opposed to July? Do we really want to use an unprecedented situation that a global pandemic has presented as a comparison point for year-over-year analysis?

Logic and data validation is something that is continual. One snapshot at one point in time can’t be taken as the source of truth if it’s an outlier, but it can’t automatically be assumed that point in time is the problem. Businesses are always looking for ways to improve, and in this increasingly digital age we are becoming more and more a part of, website performance is becoming more and more prevalent in those efforts. With that, strategies shift, implementations evolve, tests take place, and optimizations go in place. Validation can’t exist in one of these phases and not the others.

Defined validation checkpoints and timing cadences need to be utilized to make sure every phase of an implementation is accounted for and collecting reliable, usable data. Any number of site issues can arise, especially following an implementation. There has to be a level of paranoia combing through the setup and resulting data. As useful as tools like an alert matrix can be, they should be the last line of defense.

At the end of the day, it comes down to us. Our ability to find and diagnose anomalies. Our willingness to scrutinize our own work. And our proficiency in responding and correcting accordingly.

Ongoing Monitoring

Even through all this scrupulous validation, eventually we get to a point where there is no fine-tuning left to be done with that particular build. That doesn’t mean any sense of monitoring ends. Rather we just transition to being more reliant on alerts matrices, ad hoc validation, and collaboration with other involved analysts.

As much as there has to be some paranoia in validation, there is a simultaneous level of trust in our ability to implement that, at some point, we have to rely on. Otherwise, any quality assurance or validation involved in a build just would not end. At the end of the day, implementation is part of our job, and if there was not a level of expertise involved in that, we wouldn’t be doing it. The nature of that process means that mistakes will be made, so as crucial and tangible as that actual implementation is, none of it is worth a thing without a thorough, dependable validation process behind it.