Those working in manufacturing and supply chain domains will be well aware of Predictive Maintenance (PdM) and its effects on the cost base of such businesses. What might not be as well understood, is exactly how the PdM solutions are delivered. A key idea underlying many PdM solutions is ‘Remaining Useful Life’ of machine parts, and, put simply, this involves a prediction on the time remaining before a machine part is likely to require repair or replacement. After taking this into account, maintenance teams are able to optimise the maintenance schedule with a goal of reducing unplanned downtime, as well as needless preventative maintenance.
The inputs for a remaining useful life model will largely depend on the data available, and can all be thought of as indicators of the condition of the machine parts. As with most artificial intelligence application, the more data the better. The focus of this blog post is on a Lifetime Data approach to RUL using a statistical approach known as survival analysis.
What is Lifetime Data and Survival Analysis?
Survival analysis is the method to determine the lifetime of a part using a probability model. From historical data, the method calculates the probability of the item to 'survive' the number of days considering both observed and unobserved durations of part failure. Observed means that the entire life of the part was recorded from start to finish. Unobserved means that there was an interruption in the life of the part but we still consider the time that overlapped and was recorded. For example, the installation of the machine happened before the data was being recorded, meaning we know the part has survived longer than the start of records to the first failure.
Starting with a dataset with the columns of Machine, Failed_Part and Date, and focussing on a single part, e.g. compressors, we calculate each observed and unobserved event by considering each machine and calculating the number of times the part failed. We then use a merging of machine learning methods and probability distributions to estimate the likelihood of failure.
It can help to understand this approach through an example. Remi AI had a client that had a network of mobile construction machines that were points of sale (>80) across Australia, these machines required compressed air for their operational requirements. Using historical data, the team at Remi AI was able to build out a Remaining Useful Life Forecast at an increasing number of days since the last failure or replacement, the results of which are shown in figure 1 below.
It is interesting to note in figure 1 below that the probability of survival decreased by ~ 10% in the first 100 days of use, followed by a fairly linear decline out to 1200-1300 days where the probability of survival drops away quite quickly. This curve is somewhat sigmoidal, and the power of this approach grows with decreasing linearity.
This plot is then translated into an estimation of ‘Remaining Useful Life’. This allowed the Maintenance Teams to better focus their Preventative Maintenance Efforts on parts that had only 100 days before predicted failure. It also allows their Preventative Maintenance to be more accurate in its proximity to failure, significantly reducing costs of refurbishment of the entire compressor.
What do I do with my Remaining Useful Life Estimate?
Once you have a reliable estimate of remaining useful life, the maintenance schedule can be optimised to reduce downtime and maintenance cost. A decision policy can be built so that a part is replaced if it has not failed before a specific number of days, and this decision can be made by a human, or a more advanced option is to allow a reinforcement learning algorithm to make this decision.
1. Human Decision:
A team could select a threshold number of days for a machine part such that a replacement process will be initiated if the part has not been replaced by a certain number of days. The probability threshold will depend on the nature of the part in question i.e. a part that is critical to machine functionality might have a survival threshold value of 60% whereas a part that is less critical may have a survival threshold value of 20%. Dashboards and alarm systems can be built to allow maintenance staff to respond to pending part failures.
2. Reinforcement Learning Decision:
Finding the optimal survival probability threshold is an iterative process and is a problem domain where Reinforcement Learning (RL) can produce powerful results. Using Simulation of the Maintenance Programme, an RL model can take the RUL data as an input and optimise towards the best replacement policy such that machine uptime is maximised and unexpected maintenance time is minimised. This is then tested against historical breakdown data to confirm the Reinforcement Learning Model has found an applicable maintenance schedule.
If you’d like to learn more about how Preventative Maintenance can generate significant savings in loss of sales, breakdowns and general maintenance costs, please don't hesitate to reach out through our website.
Stay tuned on the Remi AI blog as we build out the complete supply chain offering!
Or, if you're ready to start seeing the benefits of A.I-powered inventory management, start the journey here.