Modelling Processes - DNA Profiling

This is a post about taking a theory, exploring the data, proposing a more probable theory, and saving a $1 million.

I once lead a small team in the U.K. that was responsible for the service development of new DNA profiling initiatives. For the most part, our mission was to make the processes quicker, easier (less training), and better value. The DNA profiling service for reference samples had already been introduced by my team and within 6 mouths it was under a lot of pressure because it wasn't delivering the anticipated benefits, mostly this criticism was focused on throughput and instrument uptime.

The process of profiling DNA was complex and involved 7 district stages (Batch, Extract, Quantify, Amplify, Split, Analyse, and Interpret) all stages took variable amounts of time and utilized a varied volume of operators and instruments. When delivering the new "line" we had done basic arithmetic in Excel, we added up the time and volume at each stage and checked no bottlenecks were evident. What could go wrong?

The most obvious thing was that instruments not only had variable operating times but also uptime, meaning that availability was limited and the theory was we needed more. What should have been easy to understand was the barrier of training. When in R+D, we all could run all instruments, but as we rolled it out operators were training in just one or two instruments, again another theory was raised. All of these hypotheses were put forward, both would increase costs and I was lucky to have a manager who had experience statistics and modeling and introduced me to a number of software packages that could model all these variables into a model, ultimately we selected Witness from Lanner. Using this we could then test the addition of instruments, improved training, and even test scheduled checks from operators.

We never called ourselves data scientists, but what we did was science, my entire department had all graduated in Science, many were PhDs, and we worked at the Forensic Science Service. So, the scientific method was applied without any thought. Theories were put forward, the team collected data, this data was organized, analyzed and the output was either supporting evidence with probabilities, or arguments against the theory and often the enhancements or adjustments to theories.

What we discovered was that many of the theories from operators were not supported and that only a handful could make a significant difference. One failed theory was that the line needed an additional amplification instrument. This was a theory that was becoming increasingly prevalent and senior leaders were jumping on as the instrument was the only one at that stage and downtime was high. This seemed a fair conclusion. The only problem was that these instruments were very expensive, and if we purchased another the question was would the uptime improve significantly enough to solve the overall output problems.

As a result of the models I built with a colleague, we introduced a team of multi-skilled robotic engineers, an expensive overhead, but the model demonstrated that the benefits to throughput would justify the additional operation cost. We could never demonstrate the need for a second amplification instrument, as it also had downtime and only the addition of skills could address this.

On reflections, I think such changes might well have evolved in time, but by having this modeled and presented using the software, senior leaders were easily persuaded to invest in the skills. We'll never know how many months or years of evolution it would have taken to come to a similar conclusion in real life.

Summary

A theory was that capital investment was required to reduce bottlenecks.
The data showed many variances and there were patterns of failure with all instruments.
The model showed that adding 1 instrument would not release the issue and several would be required, adding up to several million.
The alternative theory was tested where dedicated engineers were deployed. These worked across all instruments and they're immediate availably improved the uptime significantly enough to meet the throughput requirements for the line.

alexcrossley.com

Modelling Processes - DNA Profiling - Data Science