Data Equality or Data Quality

We hear so much about equality. Wage equality, gender equality, racial and ethnic equality and, these days, especially marriage equality. We like to believe that all people are created equal. Whether it’s true or not, the same cannot be said for business data. Data equality is not something we hear much about. But then why do we spend so much time trying to make all data equal? We structure and scrub and filter and black-boxify to get all of our data sources to fit the way our business intelligence systems think.

But that is not the nature of business data. It is static and streamy. Structured and unstructured. Transaction based and participatory. It comes in many formats from many sources, some trusted, others flowing from the great unknown. Data inequality does not, however, mean that data quality is low.

Of course, we would rather use only clean, structured data. But that doesn’t reflect reality. Business, like life, is chaotic. Models and projections often fail to accurately predict the future when you feed them with neat packages of sanitized data. Some of the most creative solutions in business and other fields are produced when someone says, ‘Work with what you have’. It forces us to imagine a path through the woods that is not there yet.

The key is always about figuring out what is important and what is junk. And when the dataset grows, this becomes more complicated. If your analytics solution cannot handle data inequality, it will (1) cause you to waste time and resources scrubbing and sorting and (2) will give you projections based on the perfect world that you will have created to get it to work.

Treating all data as equal will give you and your executive team a lovely, clean set of output. But what happens when you try to apply that output to your business and markets? Consider this single data source scenario. You know those people in the supermarket who hand out free food samples? Let’s say they actually had a system for reporting the results of their shifts. They might show that 100 people chose the chocolate and 50 took the vanilla sample. Clean input. Clean output. But what do you really know?

What about the 400 people who passed by without taking a sample? A secondary survey could indicate what personal characteristics would discourage a shopper from taking a free sample – diet, health or religious factors, brand loyalty, embarrassment or the desire to just be left alone, etc. You could then review video recordings of each interaction to get more feedback.

Clearly, these sources of data are not going to be employed by any but the most intrepid marketers. Therefore, there is no way to get a truly complete and accurate picture of the data that would be of great value to the brand. On the other hand, by passively recording some, but not all of the raw data – did/did not take a sample, male/female, shopping with/without kids, time of day, zip code, etc. – you start to get a picture that can add value to the free sample model beyond exposure.

If you want your analytics to be clean, presentable and easily decipherable by data experts and executives alike you should not have to worry about all the sources of your data being clean and equally structured. Your business intelligence solution should take care of that.

But what if all your data is not equal? Well, it never is. But if your IT team needs to understand all the data so that they can ‘manage’ it before it can be analyzed, your business decision makers will not have on-the-fly insights and they probably will not get as real a picture of the environment that they need to understand.

For now, only some advanced business intelligence solutions can integrate diverse data sources into valuable, insightful models. Wouldn’t it be great if we had a solution like that for society? Maybe someday.

You can easily integrate new and leverage existing data environments in Necto 15. Check it out at

Subscribe to our blog