Sometimes for the CTO data analytics alone is not enough

The CTO and data analytics are formidable allies. And their relationship has become increasingly important for many markets, including industrial manufacturing over recent years.

When combined with domain knowledge, analytics is often instrumental in finding the sources of margin leakage and uptime losses. However, results are typically sensitive to the context of the data, and data analysis can often produce faulty outcomes. The truth is that while data analysis techniques including machine learning are portable across industries, domain knowledge is not — and you need both to succeed.

Any analytical solution needs to be able to correctly separate causation from simple correlation and only provide alerts about real impending issues. But no data analysis, not even machine learning, can ever be a silver bullet. Only with ‘guiderails’ can analysis techniques ever find accurate answers. Otherwise, silly correlations emerge such as that affirming that increased consumption of margarine leads to divorce in the state of Maine. The guiderails come from domain knowledge, translated into contextual data limits that establish reasonable expectations of behaviour and exclude the meaningless correlations that machine learning can find when it works in isolation.

A look at data analytics trends for 2019

As our understanding of data analytics has developed, data analytics is being used in wave of innovative and exciting new ways. From IoT analytics and augmented analytics and DataOps we look at the top data analytics trends for 2019. Read here

Machine learning will find a wide range of data correlations but some are inevitably meaningless. Understanding causation typically depends on knowledge and experience. What time, skills and experience will you need to attempt a solution, how long will it take, and will it scale? In a sense, machine learning can only go so far.

Using “clustering” techniques in unsupervised learning algorithms, machine learning can detect and learn similar patterns. Indeed, in the oil and gas sector, clustering can learn to distinguish normal operational behaviour from the signals coming from sensors on and around machines. Any deviations from normal, called anomalies, are useful to highlight operational issues with a piece of equipment.

Another machine learning technique called supervised learning requires a human to declare an event as a time and date when something happened. Machine learning has no concept of what happened other than the date and time it occurred. It requires domain knowledge and understanding of data context to attach meaning to the event. But once an event is declared, the machine learning learns the signature of the precise patterns leading to the event. For example, in heavy industry an event could be a machine failure due to a precise cause such as a bearing failure. With its learned knowledge of the exact degradation and failure pattern, the AI then tests new incoming patterns to discover recurrences well before the failure occurs. Such early notification allows action to avoid the degradation completely or provides time to arrange a repair before major damage occurs. The results are lower maintenance costs and more uptime producing valuable products.

At a plant site, expert staff understand the relationships between machine behaviour and subsequent degradation mechanisms. The staff provide such insight into direct machine learning to find the proper causation patterns. In addition, we are discovering that our complex first principle and empirical models can forecast the likely ‘neighbourhood’ of specific results and consequently can also provide guiderails for machine learning to discover exact patterns of degradation. All in all, that data context is critically important in correctly labelling events, selecting variables and directing the data cleanup.

Effective solutions always require the marriage of what you know about processes emitting the data combined with expertise in the analytical techniques. Thus, the guiderails need to be tough and robust.

What is the role of the CTO at McKesson Technology?

Andrew Zitney explores his role as SVP, CTO of McKesson Technology with Information Age: how it’s changing and how to free up the IT department and developer teams. Read here

Putting a plan in place

So how does all this work in practice? Take a two-phased approach. First, do the engineering. Learn about the process producing the data, correctly label the important events and perhaps calculate some imperative events such as known physical limitations. Use this information as guiderails to cleanse data and subsequent event patterns with an understanding about operating modes. Then when the engineering effort is done, switch into data scientist mode.

Once there, you’ve supplied the data context: now the algorithms aren’t concerned about your particular problem domain. In the analytical depths the data, algorithms and patterns do not know from whence they came: it’s just data! Scales, engineering units and data sources can be diverse and do not matter. In this context, we do not strictly need the rigour of engineering models and the implied complex differential equations.

In summary, the data input guiderails do matter. You always need carefully “framed” data sets to secure precise outcomes. Understanding frames data with context. So, learn the pertinent process details for each solution and then transition from engineering to data science using the guiderails.

Written by Mike Brooks, senior director, APM Consulting, AspenTech

Editor's Choice

Editor's Choice consists of the best articles written by third parties and selected by our editors. You can contact us at timothy.adler at