We had a visit not long ago from Harvey Spencer, an analyst in the capture and data extraction fields, and got into an interesting discussion about big data and the ever-evolving methods of finding useful patterns in a stack of information. Once upon a time, Harvey used to work with us in the document capture business at Digital Check, but has since started his own consulting firm, Harvey Spencer Associates, and branched out into photo, voice and video, among others.

The latest thing he’s working on has to do with voice recognition: After the 2007-08 financial crisis, new regulations contained in the Dodd-Frank Act mandated that every phone call related to certain securities markets had to be recorded for compliance purposes, and provided to auditors on demand. The intention behind those rules, obviously, was to leave a data trail that investigators could later follow if necessary – which, by extension, ought to encourage honesty and transparency on the part of the involved parties. On the other hand, it also created a huge amount of data to sift through in the event that there actually was an investigation. One might literally have to listen through hundreds of hours of audio recordings to find a specific conversation.

Managing all this data suddenly takes on huge importance when you consider that the firm being audited has to foot the bill for going through the audio files – and the ones doing the listening aren’t college students earning $10 a hour; they’re trained professionals drawing significant salaries. So the cost of a single investigation can easily spiral into five or even six figures just for the time spent listening to voice recordings. Audio in particular has also presented some specific challenges that increase its labor-intensive nature: It can’t be gone through in quick succession like images; it can’t simply be sped up and skipped through like video; You can’t look at a series of sample points and determine what happened before and after.

This is where big data is supposed to come in and save the day, identifying the useful bits and keeping human involvement to a manageable level, which is worth big bucks to those who would use it. And before you feel bad about the lost jobs and the cold nature of business, remember that these weren’t jobs that had always existed and suddenly went away thanks to automation; they came about because of recent regulations that proved to be hugely expensive. In fact, one might go so far as to say that the real outcome is that a lot of highly trained professionals are having to waste their time. It is definitely a good thing if machines can someday take over the boring part of this job.

But until recently, a limiting factor was the availability of voice-recognition technology that could account for all the variables in a phone call. For one thing, the industry language used by traders is not the same as what two “regular” people would say to each other in the street. Throw in other muddling factors like call quality, different accents, the speed at which different people talk, even the amount of swear words used by certain individuals, and you’ve got quite a mess to clean up.

What does this have to do with our business, check imaging? Well, over the past decade, banks have developed automated sorting abilities that let them sift through mountains of documents to find the one that will help resolve an issue – largely with the help of advances in OCR and MICR. As Harvey puts it, voice recognition is where OCR was 10 years ago – still sorting out the how-tos of producing clean data and managing lots of it. But, just as importantly, also possessing the same obvious potential for huge cost and labor savings. Will voice get to the same place? We have little doubt that it will, just as image recognition evolved (and continues to evolve) to meet many of the same challenges. Here’s looking forward to a future where the machines can do the grunt work in that line of work too.