## You are currently viewing: Articles

### Predictive analytics in incident prevention

Predictive analytics can support risk management by identifying where failures are likely to occur and what can be done to prevent them.

**WILLIAM R BROKAW**

Kestrel ManagementKestrel Management

**Viewed :**1164

**Article Summary**

Companies are generating ever increasing amounts of data associated with business operations, leading to renewed interest in predictive analytics, a field that analyses large data sets to identify patterns, predict outcomes, and guide decision-making. Companies are also facing a complex and ever expanding array of operational risks to proactively identify and mitigate. While many companies have begun using predictive analytics to identify marketing/sales opportunities, similar strategies are less common in risk management, including safety.

Classification algorithms, one general class of predictive analytics, could be particularly beneficial to the refining and petrochemical industries by predicting the time frame and location of safety incidents based on safety related inspection and maintenance data, essentially leading indicators. There are two main challenges associated with this method: (1) ensuring that leading indicators being measured are actually predictive of incidents, and (2) measuring the leading indicators frequently enough to have predictive value.

A case study to illustrate this process is discussed in this article. Using regularly updated inspection data, the author developed a model to predict where broken rails are likely to occur in the railroad industry. The model was created using a logistic regression modified by Firth’s penalised likelihood method, and predicts broken rail probabilities for each mile of track. Probabilities are updated as additional data are collected.

In addition to predicted broken rail probabilities, the model identifies the variables with the most predictive validity (those that significantly contribute to broken rails). Using the model results, the railroad was able to identify exactly where to focus maintenance, inspection, and capital improvement resources and what factors to address during these activities. Validation tests of the model revealed 70% of the actual broken rail incidents occurred on the 20% of segments at highest risk for broken rails.

The same methodology could be used in the refining and petrochemical industries to manage risks by predicting and preventing incidents, provided that organisations:

• Identify leading indicators with predictive validity

• Regularly measure leading indicators (inspection, maintenance, and equipment data)

• Create a predictive model based on measured indicators

• Update the model as data are gathered

• Use the outputs to prioritise maintenance, inspections, and capital improvement projects and review operational processes/practices.

Predictive analytics is a broad field encompassing aspects of various disciplines, including machine learning, artificial intelligence, statistics, and data mining. Predictive analytics uncovers patterns and trends in large data sets. One type of predictive analytics, classification algorithms, could be particularly beneficial to the refining and petrochemical industries.

Classification algorithms can be categorised as supervised machine learning. With supervised learning, the user has a set of data that includes predictive variable measurements which can be tied to known outcomes. In the model discussed in the case study section of this article, various track measurements (for instance, curvature, crossings) were taken during a two-year period for each mile of rail. The known outcome, in this case, is whether a broken rail occurred on each rail mile during that two-year period.

An appropriate modelling algorithm is then selected and used to analyse the data and to identify the relationships between the variable measurements and the outcomes to create predictive rules (a model). Once created, the model is given a new data set containing predictive variable measurements and unknown outcomes and will then calculate the outcome probability based on the model rules. This is in comparison to unsupervised learning types, in which algorithms detect patterns and trends in a data set with no specific direction from the user, other than the algorithm used.

Common classification algorithms include linear regression, logistic regression, decision tree, neural network, support vector machine/flexible discriminants, naïve Bayes classifier, and many more. Linear regressions provide a simple example of how a classification algorithm works. In a linear regression, a ‘best-fit’ line is calculated based on the existing data points, providing a y = mx + b line equation. Inputting the known variable (x) gives a â€¨prediction for the unknown variable (y).

Most real world relationships between variables are not linear, but complex and irregularly shaped. Therefore, linear regression is often not useful. Other classification algorithms are capable of modelling more complex relationships, such as curvilinear or logarithmic relationships. For example, a logistic regression algorithm can model complex relationships, can incorporate non-numerical variables (for instance, categories), and can often create realistic and statistically valid models. The typical output of a logistic regression model is predicted probabilities of the outcome/event occurring. Other classification algorithms provide a similar output as logistic regression, but the required inputs are different between algorithms.

The modelling of complex relationships is particularly useful in risk management, where risk is typically prioritised based on the likelihood and potential severity of a particular outcome. Modelling the risk factors that contribute to that outcome results in a precise and statistically valid estimate of outcome likelihood. In contrast, many risk assessments measure ‘likelihood’ on a categorical scale (once in ten years, once a year, multiple times per year), which is less precise, more subjective, and makes it impossible to distinguish between risks that are in the same broad category. Other techniques exist to quantifiably assess potential severity in a risk assessment, but that is beyond the purview of this article.

The author developed a predictive broken rail model for railroad application. Broken rails are a significant driver of derailment risk in railroad operations. Derailments caused by broken rails tend to have more severe consequences compared to other derailment causes since broken rail derailments typically occur at higher speed with little or no warning. The ability to predict where broken rails are likely to occur would allow for more effective management of broken rail derailment risk through targeted track inspections, maintenance, and capital improvement programmes.

The author developed and validated a predictive model of broken rail derailments on a mile-by-mile basis. The objectives were to:

• Identify the various factors that drive broken rail risk

• Quantify how each risk factor affects broken rail risk

• Develop a risk profile for each mile of track based on current and historical risk factors

• Translate the model results into easily understood language, thereby allowing field managers and engineers to prioritise corrective actions in real time based on current risk profiles.

Classification algorithms, one general class of predictive analytics, could be particularly beneficial to the refining and petrochemical industries by predicting the time frame and location of safety incidents based on safety related inspection and maintenance data, essentially leading indicators. There are two main challenges associated with this method: (1) ensuring that leading indicators being measured are actually predictive of incidents, and (2) measuring the leading indicators frequently enough to have predictive value.

A case study to illustrate this process is discussed in this article. Using regularly updated inspection data, the author developed a model to predict where broken rails are likely to occur in the railroad industry. The model was created using a logistic regression modified by Firth’s penalised likelihood method, and predicts broken rail probabilities for each mile of track. Probabilities are updated as additional data are collected.

In addition to predicted broken rail probabilities, the model identifies the variables with the most predictive validity (those that significantly contribute to broken rails). Using the model results, the railroad was able to identify exactly where to focus maintenance, inspection, and capital improvement resources and what factors to address during these activities. Validation tests of the model revealed 70% of the actual broken rail incidents occurred on the 20% of segments at highest risk for broken rails.

The same methodology could be used in the refining and petrochemical industries to manage risks by predicting and preventing incidents, provided that organisations:

• Identify leading indicators with predictive validity

• Regularly measure leading indicators (inspection, maintenance, and equipment data)

• Create a predictive model based on measured indicators

• Update the model as data are gathered

• Use the outputs to prioritise maintenance, inspections, and capital improvement projects and review operational processes/practices.

Predictive analytics is a broad field encompassing aspects of various disciplines, including machine learning, artificial intelligence, statistics, and data mining. Predictive analytics uncovers patterns and trends in large data sets. One type of predictive analytics, classification algorithms, could be particularly beneficial to the refining and petrochemical industries.

Classification algorithms can be categorised as supervised machine learning. With supervised learning, the user has a set of data that includes predictive variable measurements which can be tied to known outcomes. In the model discussed in the case study section of this article, various track measurements (for instance, curvature, crossings) were taken during a two-year period for each mile of rail. The known outcome, in this case, is whether a broken rail occurred on each rail mile during that two-year period.

An appropriate modelling algorithm is then selected and used to analyse the data and to identify the relationships between the variable measurements and the outcomes to create predictive rules (a model). Once created, the model is given a new data set containing predictive variable measurements and unknown outcomes and will then calculate the outcome probability based on the model rules. This is in comparison to unsupervised learning types, in which algorithms detect patterns and trends in a data set with no specific direction from the user, other than the algorithm used.

Common classification algorithms include linear regression, logistic regression, decision tree, neural network, support vector machine/flexible discriminants, naïve Bayes classifier, and many more. Linear regressions provide a simple example of how a classification algorithm works. In a linear regression, a ‘best-fit’ line is calculated based on the existing data points, providing a y = mx + b line equation. Inputting the known variable (x) gives a â€¨prediction for the unknown variable (y).

Most real world relationships between variables are not linear, but complex and irregularly shaped. Therefore, linear regression is often not useful. Other classification algorithms are capable of modelling more complex relationships, such as curvilinear or logarithmic relationships. For example, a logistic regression algorithm can model complex relationships, can incorporate non-numerical variables (for instance, categories), and can often create realistic and statistically valid models. The typical output of a logistic regression model is predicted probabilities of the outcome/event occurring. Other classification algorithms provide a similar output as logistic regression, but the required inputs are different between algorithms.

The modelling of complex relationships is particularly useful in risk management, where risk is typically prioritised based on the likelihood and potential severity of a particular outcome. Modelling the risk factors that contribute to that outcome results in a precise and statistically valid estimate of outcome likelihood. In contrast, many risk assessments measure ‘likelihood’ on a categorical scale (once in ten years, once a year, multiple times per year), which is less precise, more subjective, and makes it impossible to distinguish between risks that are in the same broad category. Other techniques exist to quantifiably assess potential severity in a risk assessment, but that is beyond the purview of this article.

**Case study**The author developed a predictive broken rail model for railroad application. Broken rails are a significant driver of derailment risk in railroad operations. Derailments caused by broken rails tend to have more severe consequences compared to other derailment causes since broken rail derailments typically occur at higher speed with little or no warning. The ability to predict where broken rails are likely to occur would allow for more effective management of broken rail derailment risk through targeted track inspections, maintenance, and capital improvement programmes.

The author developed and validated a predictive model of broken rail derailments on a mile-by-mile basis. The objectives were to:

• Identify the various factors that drive broken rail risk

• Quantify how each risk factor affects broken rail risk

• Develop a risk profile for each mile of track based on current and historical risk factors

• Translate the model results into easily understood language, thereby allowing field managers and engineers to prioritise corrective actions in real time based on current risk profiles.

**Current Rating :**2