Analysing the Character of Failures
Designing your strategy accordingly
Sporadic or chronic events?
Component defects acceptable or excessive ?
Preventive Maintenance effective or not?
Analysing Failures
In order to collect the right data to analyse failures, Grothus searches for chronic and sporadic failures. He uses the sampling method below.
The Chronic and Sporadic Incidents in your Process
Examine what your incidents really look like in the way described below. This will help you develop your strategy to make “Zero Failure Management” a reality.
Take a sample
For just some days (or weeks the most) record every single loss event. (Don’t rely on records already existing. Most probably they will not be complete enough). Do not necessarily use computers for this job, they will not save your time. Just manually sorting a couple of hundred bills will be fast and cheap enough.a) Date: b) Time: d) Department: |
d) Process/equipment, affected by the event: |
e) Keyword of event: |
f) Kind of loss (circle): breakdown/malfunction – reduction of output – accident – damage to the environment – damage to company’s image – quality defect/costumer’s complaint – psychic damage to personnel – damage to plant component (not triggering any of the previous kinds of loss) |
g) If “damage to plant component” or if the other kind of loss has been triggered by a defect (intolerable deterioration) at a plant equipment, then circle: yes |
h) If previous item g) is “yes” and the damaged component is a “short-life-component”, fill in “Standard Short-Life-Component Code: |
i) Has the same kind of loss at the same spot already previously occurred at least once (circle): yes no |
k) Length of time of loss event (circle): no interruption – 0,1 hr – 0,5 hr – 1 hr – 2 hrs – 4 hrs – 8 hrs – if more than 12 hrs, then fill in: hrs |
l) Total number of work hours spent on recovering the normal function circle): 0,5 hr – 1 hr – 2 hrs – 4 hrs – 8 workhours – if more than 12 hrs, then fill in: hrs |
m) Additional information from: n) Dept: |
Loss Report |
a) and b) refer to the moment when the loss event had been first recognized,
c) and d) to the place at which the event occurred.
e) Keyword just to later-on remember the event.
f) These are all kinds of loss events, that may occur. Circle one single item. “damage to plant component” is a separate kind of loss event only, if a damage had not triggered a kind already circle.
g) Here mark “yes”, if this loss event (irrespective which kind) had been triggered by a defect at a plant equipment.
h) If, according to the previous item, the loss event had been triggered by a defective plant component and this component is listed as “Short-Life-Component” SLC (=wear part) in our standard, then enter here the SLC code. If the component is not listed in this standard, then leave this item blank.
i) If anybody is remembering that exactly the same loss event had occurred before at exactly the same spot, then circle “yes”. k) If the process had been interrupted by this loss event, then this is the total period of time elapsing until the process has been put back into normal operation. Circle the single figure coming close to the number in question. Only if it exceeds 12 hours, then fill in the actual period of time.
l) This is the total number of work hours spent on recovering the normal function when taking into account all employees who had been involved. Circle the single figure coming close to the number in question. Only if it exceeds 12 hours, then fill in the actual period of time.
m) and n) refer to a person who could – when needed – give more information about details of this event.
Analysing the data
I advise to transform the following list into a spreadsheet (e.g. MS EXCEL) and you enter into the column “Source” the formulas given.
- Count from the entire sample the total number of bills, the period of time and the work hours and put these figures into lines [1] through [3].
- Sort the bills by the “kinds of loss events circled, count their numbers per kind and fill them into line [4] through [10].
- Count the bills marked “yes” under item g) and fill their number into line [11].
- Count the bills carrying an “Short-Life-Component Code” under item h) and fill their number into line [12].
- Count the number of bills, carrying under item h) a “Short-Life-Component Code”, which according to our SLC Standard is “controllable by periodic inspections” and fill their number into line [13].
- Enter the number of those bills into line [14], where under item h) a “Short-Life-Component” Code is given as well as under item i) “yes” has been circled.
- Enter into line [15] the number of bills, not carrying under item h) a “Short-Life-Component”, Code however marked under i) “no”.
- Enter into line [16] the number of bills, marked under i) “yes”.
- Figure with the remaining lines the values applying the rules given in column “Source”.
Item | Description | Dimen- sion |
Source |
Value Example |
[1] | Total number loss events | 1 | Loss Report | 852 |
[2] | Total period of time of loss events | h | Loss Report | 1.500 |
[3] | Total work hours spent on recovering | h | Loss Report | 2.000 |
[4] | Number of breakdowns/malfunctions | 1 | Loss Report | 704 |
[5] | Number of accidents | 1 | Loss Report | 3 |
[6] | Number of quality defects/costumer complaints | 1 | Loss Report | 46 |
[7] | Number of damages to environment | 1 | Loss Report | 2 |
[8] | Number of psychic damages | 1 | Loss Report | 0 |
[9] | Number of damages to company image | 1 | Loss Report | 20 |
[10] | Number of damages to plant triggering only this kind of loss | 1 | Loss Report | 29 |
[11] | Number of damages triggering any kind of loss | 1 | Loss Report | 187 |
[12] | Number of defects with Short Life Components | 1 | Loss Report | 43 |
[13] | Number of defects with Short Life Components, that could have been timely controlled, but hadn’t and therefore triggered a loss event | 1 | Loss Report | 11 |
[14] | Number of defects not at Short Life Components | 1 | [11]-[12] | 144 |
[15] | Number of defects not at Short Life Components however repeatedly at the same spot | 1 | Loss Report | 46 |
[16] | Number of defects not at Short Life Components however the first time at this spot | 1 | [11]-[12]-[15] | 98 |
[17] | Number of loss events having occurred repeatedly at the same spot | 1 | Loss Report | 252 |
[18] | Number of loss events not having occurred repeatedly at this spot | 1 | [1]-[16] | 600 |
[19] | Percentage of chronic loss events | % | 100*[16]/[1] | 30 |
[20] | Percentage of sporadic loss events | & | 100*[17]/[1] | 70 |
[21] | Percentage of loss events triggered by defects at plant equipment | % | 100*[11]/[1] | 22 |
[22] | Percentage of acceptable defects (at Short Life Components) | % | 100*[12]/[11] | 23 |
[23] | Percentage of acceptable defects that could have been timely controlled, however have not, figured from the total number of loss events at Short Life Components | % | 100*[13]/[12] | 26 |
[24] | Percentage of not acceptable defects, figured from the total number of loss events triggered by defects | % | 100*[14]/[11] | 77 |
[25] | Percentage of chronic and not acceptable defects figured from the total number of loss events triggered by defects | % | 100*[15]/[11] | 25 |
[26] | Percentage of sporadic and not acceptable defects figured from the total number of loss events triggered by defects | % | 100*[16]/[11] | 52 |
Define your new strategy
Priority of the Kinds of Loss Events
Lines [4] through [10] tell you about the importance of the various kinds of losses.
With the above example “breakdowns” dominate very much. Loss events of this kind can easily and reliably be recorded. This will be very helpful for controlling your “Zero Failure Management”. Don’t forget that your success in reducing losses of this kind will as well produce the same effect with all other kinds of losses.
Eliminating Chronic Loss Events
Chronic Loss Events, line [19], indicate weak spots.
They should definitely not exceed 30% of all Loss Events. Higher percentages occur, when
- your processes or plant equipment have only recently been installed and still suffer from many infant illnesses,
- or you in the past have not given sufficient attention to repetitive faults.
Then you should record the spots involved, find out which ones failed repeatedly and eliminate the causes.
With 1) – excessive infant illnesses – you should analyze with “Zero Failure Management”, which Basic Risk Factor during planning, designing, installing, and putting into operation had triggered these many sicknesses.
With 2) you should analyze by “Zero Failure Management”, which Basic Risk Factor caused the fact, that these many sporadic repetitive losses had not yet been detected and/or eliminated.
Eliminating Sporadic Loss Events
Up to now you probably hadn’t given sufficient attention to Sporadic Loss Events, line [20]. You haven’t found them by putting much time and money into recording loss events, filing them in expensive process/equipment histories and searching for repetitively failing spots (per definition with Sporadic Loss Events repetitive events just don’t exist anyway).
Rather you have to – with “Zero Failure Management” – find the dominating Basic Risk Factor mainly triggering these events.
The larger the percentage of Sporadic Loss Events the more important it will be for your organization to employ this new approach.
Eliminating defects triggering Loss Events
This percentage, line [21], will be the larger the more plant equipment you employ and the more it is subject to severe deterioration. That’s why you can not refer to a universal benchmark.
The larger this figure, the more important your Plant Maintenance is.
Eliminating Short-Life-Components triggering Loss Events
Line [22] gives the percentage of loss events triggered by defects at Short-Life-Components SLC. These defects are acceptable and not to complain about.
However, most defects developing with Short-Life-Components can be – by periodic inspections – detected and repaired in time before loss events occur. Refer to the last column in our standard “controllable by periodic inspection”. Only very few SLC (e.g. lamps) develop defects that cannot be predicted in time.
Line [23] tells you the percentage of loss events triggered by those SLC, that could have been controlled reliably but haven’t. Here you should improve your Preventive Maintenance inspections.
Fight defects that are not acceptable
You should tolerate only Short-Life-Components to fail before the entire piece of equipment has reached the end of its life time. Other components are not supposed to fail.
Line [24] tells you the percentage of defects you should not accept according to our defect standard.
Line [25] refers to those loss events that were triggered by defects and occurred repetitively; we call the failing components “Weak Components”. You should detect and eliminate these Weak Components. How about putting process layout schemes directly to the pieces of equipment and asking your workers to put a signal (e.g. a needle) into the spot that had failed. You immediately find the Weak Components from allocations of several needles. – Of course, you can also record the events with computer histories. However watch out: Putting up a hierarchical equipment structure sufficiently deep could produce enormous cost.
There remains in line [26] the percentage of Sporadic Loss Events triggered by not-acceptable non-repetitive defects. You can find and eliminate the Basic Risk Factor causing them by means of your “Zero Failure Management”.