In today’s industrial world, a small failure is a big loss to industrialists and the business. As technology advances and also industrial automation helps a lot, failures and faults are not tolerable and larger failures can even cause a huge downtime in the business.
Reliability Techniques for Analyzing Fault Tolerance
So, it is necessary to reduce or eliminate downtime through various methods, for increasing efficiency and reliability. If a particular system fails, then there must be some quick means to identify it and solve it before any irreparable loss happens. This gives rise to reliability techniques for analyzing faults and solving them quickly.
In this post, we will have a look at some of the most used techniques for fault management in industries.
Fault Tree Analysis (FTA)
It is also termed as FTA method in industries. It is a means for reaching to the root cause and solving the issue. In these terms, it majorly deals with diagnosing the problem first and then finding the root cause. So, it is also related to root cause analysis.
FTA comes with experience and expert future overview. As a simple example, suppose your building lift is not working. You will either call the electrician or a mechanical guy to look into it. If you find it is an electrical problem, then the electrician will look into various cases like broken wires, motor overload, short circuits, etc.
Out of that, the electrician will find the exact root cause. Here, we can see that we start from the top and go to the root cause by studying various probabilities and checklists. This is nothing but FTA.
Now, in industrial and larger applicable terms, for FTA to implement, a detailed study of the machine is required with various types of parts in it. Then, what comprises each part needs to be broken down. After that, what issue can occur in a particular sub-part needs to be broken down. This will take us to the root cause.
In FTA, as the name suggests, a diagram representing a tree is made to design the whole behavior. It comprises two main components – events and gates, as shown in the figure below.
As seen, the major event failure breaks down into various possibilities and each possibility can either be a combination of sub-events (AND gate) or a choice of sub-event (OR gate). Gates are terms used in digital electronics for logic building.
Failure Modes and Effect Analysis
Failure Modes and Effect Analysis is a workbook-type system which has rows and columns in it, like an Excel spreadsheet, for tracking the failure management process. It helps in identifying the failure causes, effects, and severity, and tries to reduce the downtime to a great extent in the future.
It follows the given order for management –
- Planning and preparation
- Structure analysis
- Function analysis
- Failure analysis
- Risk analysis
- Optimization
- Result documentation.
Basically, a sheet is prepared with entries like system name, what happens when it fails, what are the causes, what effects it has, how frequently it occurs, how it is detected, what is the final risk level, how severe the failure is and how to solve it.
Once all the entries for every type of system are made, it is studied and kept under records. When a failure happens, the issue is tracked with this chart which helps in troubleshooting and solving it quickly. Each risk number is categorized in various colors for quick identification.
Monte Carlo:
Monte Carlo is derived from gambling in casinos in Italy and as the name suggests, it is a technique of gambling with various types of possibilities and probabilities that can fail a system and help to solve it quickly. It is basically a simulation method where multiple scenarios are created and different types of faults are generated to determine a solution. All these simulation activities are recorded for future use.
It follows mathematical graphs like Beta Function, PERT function, multiple peaks, discrete distribution, and symmetric triangular function for determining when a failure can occur before a maintenance schedule and when a failure can occur after a maintenance schedule. This method is complex to learn due to the use of extensive formulae, predictions, and graphs; but it is the most widely used in industries for fault management.
Markov Chain Model
This is a complete prediction-based system, but by uses proper probability formulae. As can be seen in the image below, this system defines which state the function will go from its present state by using numbers shown in lines. The numbers depict the chances of where the state will go (either different state or remain in the same state). It is derived from formulae (like transition matrices) and comes near to a precise answer.
Due to this method, failure management becomes easy as the system knows what state it will go probably before or after failure, and how it can be eradicated. This chain drawing diagram uses lines and numbers for writing down probabilities. It is a very powerful method as it predicts system failure over time. This method hugely relies on current performance to determine the future outcomes.
In this way, we saw some general reliability methods for fault management.