What does it mean to see data clearly? An interview with Patricia Gardner.

What does ‘learning to see data’ really mean to you?

It means you have to understand the nature of data itself. Data is rarely linear. You have to look at it with fresh eyes each and every time. You can’t try and fit the new data into some framework that you have encountered before and you have to be careful about presumptions. 

That said, awareness of past experiences can be super helpful as long as you understand that this data is not that data and everything is relative. Learning to see data requires an open mind and an inquisitive nature.

Patricia Gardner

If you approach data with a pre-existing notion of what’s in there, you may miss important information which is unique to this data set, patterns, relationships between information within, etc.

How has your approach  changed over the years?

Well, I started my career in the tech sector dealing with data and I’ve always had an inquisitive nature. As a result, I think I’ve always been pretty open-minded when approaching new data. But, as we all know, the amount of data available has grown exponentially over the last couple of decades and new tools have been developed for dealing with this volume in a more comprehensive fashion and the trick is having a solid understanding of how these tools work, what they actually bring to the table, and how they can be used.

As an example, data visualization, clustering, concept searching and such have been around for many, many years but the industry struggled with developing a way to utilize the technology such that it actually provided substantive value. Today the focus is on active or continuous learning which, technically, are built on the concept/clustering technology but, is being deployed in a way that can provide substantive value.

The tools available to analyze data today are all relatively the same (most vendors are using the same tools). Even those software vendors who have developed their own tools – the underlying infrastructure is the same. They may have developed their own algorithm or have a user interface that provides data in a slightly different fashion but the foundation of the technology is still the same.

Given all this, the biggest differentiator is the team you have deploying and managing the use of these tools. That’s the secret sauce, that’s what makes the biggest difference and we are the secret sauce. We understand and are well versed in using these tools but, because we understand the underlying technology, we are able to think of innovative ways to use them which, again, can provide substantive value.

How does technology influence the process?

Technology can influence the process in a big way or a small way depending on your data and what you need to accomplish. The type and volume of data, deadlines, production requests, etc., all are factored in when deciding what technology options are available to accomplish the goals.

Sometimes iterative searching and analysis of results may be all you need. Other times you may need more advanced technology that can help you not only identify potentially relevant data but provide functionality which can reduce the time spent reviewing, increase consistency and accuracy in your review calls, and get you to the production in time to hit your deadlines. So, technology matters. Making an informed decision matters as well and having a partner to help guide you through the options, deployment and management of these processes can make a huge difference.

This data is not that data. You have to start with a fresh perspective each time. 

Can you give an example of a mistake you have seen – Or a classic ‘what NOT to do?’ 

Two things – overly limited targeted data collections which later require re-collection of a broader set of data; and overly rigid search terms with a reluctance to consider new information revealed during the analysis phase.

There are times when clients, in an effort to control processing and hosting costs, will perform a very narrow data collection. This might be accomplished by searching in an application which has limited or just not very good searching capabilities or allowing custodians to pick and choose what data they think is relevant, etc. While this may keep costs down at the beginning, often a lack of information in the initial set requires a re-collection of data which can end up costing more than if a broader data set had been collected initially.

Another what-not-to-do is applying overly rigid search terms with no allowance for terms or concepts that present themselves as potentially relevant during the analysis phase. Granted, there are times when search terms are based on agreements between parties but when they are not, being closed to newly discovered information which may result in relevant data may lead to issues and cost increases down the line.

So it’s not a mistake in looking, it’s a mistake in what to look for?

Yes. Exactly.