Data quality: Why is it so hard to get right?

By Jason Helmick on January 18, 2016

Data is a vital asset for any organization today, regardless of industry or size. We expect information to always be available when making decisions; we depend on data to provide insight on our customers, fine-tune our products and design marketing strategies.

But simply making data available for business decisions is far from enough – this kind of valuable, actionable data also needs to be accurate. Unfortunately, as the amount of data collection increases at never-before-seen speeds, so does inaccuracy. Likewise, the effects of inaccurate data are not always immediately perceived.  Instead, they tend to creep more subtly across the organization, translating into bouts of negative customer experience, poor feedback scores, unexpected revenue loss, marketing failures and much more.

Thankfully, organizations are aware that their data quality needs improvement. According to IBM, 88 percent of worldwide companies have some type of data quality solution in place today and it’s estimated that the vast majority of companies plan to make data quality a priority for their ecosystems in the next 12 months. However, while increased awareness for data quality is the right direction, most of these initiatives will fail. Here’s why.

Why is it so hard to get data quality right?

1. It’s just not easy

Data quality may be one of the most challenging aspects to implement within the data domain. The advance of Hadoop and other distributed technologies have created data at scale we haven’t seen before and we’re still playing catch up. We can now collect massive amounts of data , but what to do with it?  Not only that, how do we gauge its accuracy? It’s the kind of stuff that keeps you up at night: Not only are we dealing with massive volumes of information, but we’re also trying to grasp new types of data including unstructured chat logs, support tickets and social media feeds.

2. Traditional tools no longer work

Not too long ago, marketers and business analysts had a myriad of specific, sophisticated tools at their disposal to help with data quality. Take, for instance, credit card and address validation services; it made perfect sense to use these to clean up data for those particular channels. Now? Not so much.

3. Data is now a global asset

Back to the address service example: Using a specific API made sense when data was used only for operational purposes and was constrained to specific departments. This approach hardly works today, not only due to the mind-bending volume and lack of data structure, but also because data has become a global asset distributed across the entire business. Several entities, both internal and external, consume and generate data that needs to be understood and incorporated back into the organization. This circular aspect of data—constantly streaming in and out of the business—highlights the need to invest in a centralized data management architecture. Even though centralized data management is an emerging strategy, companies following this blueprint have seen considerable benefits. According to Experian, businesses investing in a centralized approach to data management saw the highest profits in 2014.

4. It’s really about the people

Put the elephant aside for a moment. A successful data quality program is not so much about the technologies used, but instead it’s about us, the people. This means that we need to invest time and money to build a team that truly understands and appreciates the business. While the investment in technology is absolutely necessary, we need to cultivate a people-oriented culture around data management and quality, because people are the ones who make it work.

5. Data moves at (almost) supersonic speeds

Organizations expect information to be used for intelligence the moment it arrives. Leaders want to know—without delay—how their businesses are trending on social media or which micro segmentation strategy to use in the next marketing campaign.  It sounds logical until we realize that, for the most part, raw data and quality are almost like oil and water; they don’t exactly mix.

How to create “good” data

Organizations need to look at data quality as a strategic project that transcends technology and leaps into the business, drawing success from its resources, its culture, and most importantly, its people. It is also vital to not aim too high: Start small by using known data sets, prototyping and building towards a sophisticated data quality platform.

Want more information on data quality? Stay tuned for my next post in which I’ll discuss the different stages of data quality, the benchmarks for each, and why it’s virtually impossible to skip stages. In the meantime, start small, get to know your data—and have fun!

Get our content first. In your inbox.

Contributor

Jason Helmick

Jason Helmick is an author for Pluralsight. His IT career spans more than 25 years or enterprise consulting on a variety of technologies, with a focus on strategic IT business planning. He’s a highly successful IT author, columnist, lecturer, and instructor, specializing in automation practices for the IT pro. Jason is a leader in the IT professional community, and serves as board member and COO/CFO of PowerShell.Org. Jason’s publications include Learn Windows IIS in a Month of Lunches, and he has contributed to numerous industry publications and periodicals, including PowerShell Deep Dives and Microsoft TechNet Magazine. He is a sought-after speaker at numerous technical conferences and symposia, and has been a featured presenter in the Microsoft Virtual Academy.