The Importance of Correct Design and Management for Data Center Fire Safety Systems
By: Barry Elliott, Director, Capitoline
CEng MIET MCIBSE MBA BSc(Hons) DCE ATD
The ANSI/TIA-942-B-2017 Telecommunications Infrastructure Standard for Data Centers, recognizes the need for an adequate fire detection and suppression strategy in a functional data center. It calls for:
- Fire resistant walls and other structures, up to four hours in some cases
- Meet NFPA 75 or alternatively the data center fire protection standard applicable for the location
- Fire separation from computer room of the UPS/battery room and other areas of the data center
- Exit corridors
- A Computer Room Emergency Power Off (EPO) System if allowed or required by the Authority Having Jurisdiction (AHJ)
- An Early Warning Smoke Detection System for computer rooms and entrance rooms containing active Information and Communications (ICT) equipment and fire detection in all other areas
- A gaseous suppression system for computer rooms and entrance rooms containing active ICT equipment. When used, clean agents should be allowed by local code. Alternative systems (e.g., hypoxic, mist) are allowed
- A sprinkler system, or water mist, if used, must be of the ‘pre-action’ type
This is a comprehensive set of requirements and although not going into detail it relies on other standards such as NFPA 75:2020 Standard for the Fire Protection of Information Technology Equipment to provide the necessary detail.
TIA-942 allows other applicable codes which might include British Standard BS 6266 Fire protection for electronic equipment installations – Code of practice, and from the European Union ‘F-Gas’ Regulations which dictates which fluorinated gasses can be used in fire suppression systems, and the Construction Product Regulations, which calls for low flammability cabling to be used in buildings.
Another area of influence in fire safety is the insurance industry, e.g. FM Global (Factory Mutual), Property Loss Prevention Data Sheets 5.32, Data centers and Related Facilities. This standard acknowledges fire as the greatest threat to data centers and as such has a great deal to say about fire alarm, fire detection and fire suppression.
Fires, and mishaps relating to fire suppression systems, remain one of the leading causes of catastrophic outages amongst data centers. According to our statistics it comes third after Information Technology (IT) related problems and power outages.
Over a ten year period we recorded 53 major data center outages caused by fire incidents. It is safe to assume that there are many more incidents that never make it to the public domain.
Even small fire incidents can have catastrophic consequences as fire suppression systems are released and power is turned off to the computer room.
Our statistics show that the recovery time after a fire incident is appallingly long, averaging 25 hours for real fires and 17 hours for false alarm fire suppression incidents.
Data center fires can be dangerous to life as well as property. 2018 saw the first recorded fatalities due to a data center fire. A fire at a data center in Tokyo killed five and injured 50. The data center is reported to belong to Amazon, and the cause was given as welding in the building, i.e. ‘hot works’.
Not enough information has been collected on the causes of all fires but enough is known to draw up a list of prime suspects;
- IT equipment catching fire. These fires tend to be small but smoky – certainly enough to set off a fire suppression system
- UPS (Uninterruptible Power Supply) units and batteries catching fire
- Transformers catching fire
- Hot works in the building e.g. welding, brazing, paint stripping, bitumen roof laying etc.
- Malfunctioning humidifier units in air conditioning systems
- External smoke from brush fires and fireworks setting off data center fire suppression systems
The spread of a fire in a data center is exacerbated by high air flows from the air conditioning, large volumes of plastic insulated cables and build-up of flammable rubbish, especially paper and cardboard packaging.
Other fire related ‘incidents’ revolve around accidental release of fire suppression gas, usually during maintenance, accidental power shutdown caused by a fire detection system and even disk drives destroyed by the dust kicked up by a gas discharge event. There are even some instances where the immense sound and pressure of a gas discharge event has damaged disk drives.
Causes of data center outages, excluding IT. Source; Capitoline Ltd
We have one recorded incident where a humidifier heating element was not replaced at the manufacturer’s recommended interval and a build up of calcite on the element cause it to emit smoke: the smoke set off the fire suppression gas and because the data center had never been cleaned the resulting cloud of dust and grit kicked up was ingested by the disk drive arrays causing millions of Euros of damage.
To minimize fire risk data center managers should address the problem in five stages;
- Correct building design and construction
- An appropriate fast-reacting smoke detection system
- An appropriate automatic fire suppression system
- An integrated cause and effect plan that integrates the reaction of critical systems to a fire event
- A recovery plan. The lack of a recovery plan is the main reason why so many small fire incidents lead to such long outages
Let’s look at these five key areas in more detail:
Correct Building requirements
We can summarize these as
- Appropriate fire ratings for walls, ceilings and doors
- Appropriate physical separation and compartmentalization of computer rooms, electrical rooms, battery rooms, fuel storage etc.
- Correct fire exits, signage and emergency lighting in place
- Low flammability materials used, especially in cabling
An appropriate fast reacting smoke detection system
The best smoke detection system for the data center environment is known as aspirating smoke detection, or ASD. This method employs plastic tubes that are deployed around key areas of the data center, principally the computer room, and follow the main air flows. This would usually mean sensing tubes following the cold aisles under a raised floor and then again following the hot aisles at high level and then over the air intakes of air conditioning equipment.
The tubes have small holes cut in them at regular intervals and are terminated in a device that draws air in through the holes (hence the term ‘aspirating’) and then passes the sampled air through a laser detection chamber. If the laser beam is deflected by smoke particles then it will raise the alarm that smoke is detected.
The advantages of this method over traditional ‘point’ detectors based on ionization or optical detection of smoke are very fast reaction times, reaction to lower levels of smoke and multiple stages of alarm (not just ‘on’ or ‘off’).
For smoke detection systems connected to fire suppression systems it is important to reduce the risk of false alarms and so ‘coincident’ or ‘double-knock’ detection is required. This essentially means that two smoke detectors have to be activated before a signal can be sent to fire suppression equipment.
An appropriate automatic fire suppression system
‘Appropriate’ means something that is effective, allowed or recommended by data center standards and not disallowed by any regulations. This approach essentially leads to one of three solutions;
- Water mist
- Inert gas
- Halogenated gas
Water mist is similar to a sprinkler system but the water comes out under high pressure to form a mist rather than a ‘shower’. A water mist uses less water than a sprinkler and can be applied locally to a fire rather than dousing an entire computer room as a gas-suppression system would.
Although these system are now widespread in the industry there is little experience of what actually happens to IT equipment after being doused by a water-mist fire-suppression system.
An inert gas system uses mainly nitrogen to lower the oxygen content of a room to below the point where combustion can take place. The inert gas has no toxic effects on people or IT equipment. However a lot of it is required, which can make it expensive, and there are overpressure impact effects on the protected room, and some IT equipment, that have to be taken into account.
A halogenated gas is stored as a liquid which rapidly expands when it is released and evaporates. This lowers the temperature of the room and takes energy out of the fire situation. It also reduces the oxygen level by displacing some of the air in the room. The advantage of this method is that less fire suppressant material is needed, compared to inert gas. If used correctly there are no toxic effects on people or IT equipment but there are still overpressure impact effects on the protected room and some IT equipment such as disk drives.
Halogenated gasses now have to be chosen for their low ODP, or Ozone Depletion Potential, and GWP, Global Warming Potential effects. Certain gasses, such as Halon, are now banned in Europe and America due to these reasons.
It is important to pick a gas which is approved by the US NFPA 2001 Standard on Clean Agent Fire Extinguishing Systems and not disallowed by the European Union ‘F’ Gas Directive.
An integrated cause and effect plan
Many data center operators run into trouble because nobody has really considered what will happen when smoke is detected in a data center. Owners and operators presume that a fire system contractor will have done something ‘sensible’ and it isn’t the concern or the responsibility of the data center operator. This is not the case: fire system contractors will install and commission their part of the jigsaw but it is not their responsibility to integrate disparate system together unless their contract specifically requires it.
An integrated cause and effect algorithm must consider;
- Where all the smoke detectors are (there may be dozens) and which ones have to be activated to cause an alarm and then a fire suppression activation command
- What audible and visual alarms are activated?
- What signals are sent to the Building Management System (BMS)?
- What signals are sent to remote or general building fire alarms?
- What signals are sent to off-site third parties e.g. fire services?
- What happens to ventilation input and extract systems?
- What happens to the room air conditioning? (Ideally they are turned off to prevent the air movement from fanning the flames)
- What happens to the room power supply? (Ideally it is turned off as an electrical fault is often the cause of the fire)
This issue requires high level management planning and cannot be left to the fire system contractors and failure to address this topic leads on to the next major management issue; recovery!
A recovery plan
The lack of a recovery plan is the main reason why so many small fire incidents lead to such long outages and is often a follow-on from a lack of understanding of the fire cause and effect plan.
When the fire is out and the fire service have declared the site safe the main elements of a recovery plan should be;
- Purging the room of smoke and fire suppression gas. This requires manual control of the ventilation system so that it can be set to a ‘purge’ mode that will remove and replace the contaminated air within a few hours at most
- Manually reset after a fire incident
- Fire detection systems
- UPS/Power
- Air conditioning
- Ventilation
- IT equipment
- Identification and removal of damaged equipment
- Replacing fire suppression gas consumed during the fire incident
Case Study
As a case study we propose to look at the fire and subsequent destruction of the OVH data center in Strasbourg, France on 9th March 2021. The report on the incident has just been published by the French Government department Bureau of Investigation and Analysis on Industrial Risks (BEA-RI).(1)
1Rapport d’enquête Sur l’incendie au sein du centre de stockage de données OVH situé à Strasbourg (67) le 10 mars 2021. BEA-RI, Paris, 24/05/2022
OVH Strasbourg was a five-building campus with a 12.5 MW capability. On the night of 9/10th March 2021 one building was totally destroyed by fire and the adjacent building declared beyond use. The fire caused no casualties and no injuries. The fire started in the rooms that house the batteries and the UPS (Uninterruptible Power Supply). The outbreaks of fire occurred almost simultaneously on batteries and on an inverter. The precise causes of the fire are not known at the date of the report although the presence of a liquid is implied by the rise in humidity readings in the area of the UPS at the time of the fire.
The report has some positive things to say about the incident;
- The aspirating smoke detection system worked as expected and immediately raised an alarm
- The engineers and security staff on site reacted to the alarm, ensured the building was evacuated and summoned the emergency services
The report also has some other comments and recommendations about future designs, construction and data center management.
CCTV capture showing the start of the fire in the battery and UPS rooms(1)
Building Construction
The SBG2 building, the one where the fire started, was a modular building, on six levels, with walls of prefabricated concrete backed by a steel frame. The floors were made of raw wood and the exterior walls were of single skin cladding or in aluminum strip cladding. The objective of this construction was to promote heat exchange with the outside and reduce the consumption of the energy devoted to cooling computer or electrical equipment. In terms of fire protection the internal structure had benefited from a treatment ensuring fire stability for one hour and the floors of a one hour fire protection treatment by application of intumescent paint or flocking.
“Some smoke detectors located on the upper floors triggered minutes after the fire started on the ground floor. This observation demonstrates a great permeability of the building to the outside air and therefore to fire and smoke which spread quickly and without difficulty.
In addition to the choice to promote the circulation of air within its facilities by means of openings in the façade, the materials chosen to build SBG2 and the absence of overrun floors on the façade did not sufficiently slow down the progression of the fire with regard to the time required to massively commit the water resources needed to bring the fire under control and then extinguish it.1”
To ensure a higher survivability of the data center the report also discusses the benefits of greater compartmentalization of rooms with a high energy storage or fuel load by physical separation and/or greater fire ratings of the walls.
“That the intervention of the emergency services will be facilitated if the energy rooms are located in a separate building or on the ground floor of buildings that guarantee fire resistance sufficient to allow the intervention.(1)” The document also concludes that this is an area likely to be affected by the lack of local regulations requiring this.
The building totally engulfed by fire after about two hours(1)
Fire Suppression
The building was not equipped with any kind of automatic fire suppression system.
The public emergency services only had a local fire hydrant available for fire fighting and this delivered an insufficient flow (less than 60m3/h). The data center operator did not have a water supply available for fire fighting either or even a means of pumping water from the adjacent Rhine canal. It took the arrival of a fire-fighting pump boat at 3.00 a.m. to finally contain the fire
Cutting off the power
For the safety of the emergency services and to reduce the further likelihood of fire it is essential to cut the power to the affected area.
The absence of a general site cut-off device easily accessible in the event of a fire required the intervention of the manager of the electrical network to cut off the power supply from the source substation.
After the power was cut the emergency generators appear to have started because they were only reacting to the overall power cut to the site and not an Emergency Power Off (EPO) command and even the batteries of the UPS continued to energize the system for another twenty minutes.
Although the fire started at 12.35 a.m. the site was only electrically secure from 2:30 a.m., by which time SBG2 was fully ablaze and the fire spread to adjacent buildings.
We would quote here advice from FM Global (Factory Mutual), Property Loss Prevention Data Sheets 5.32, Data centers and Related Facilities.
2.7.2 Power Isolation Plan
2.7.2.1 Develop a detailed power-isolation plan that includes formal procedures for de-energizing data and/or HVAC (Heating, Ventilation, Air Conditioning) equipment to reduce damage, contamination from smoke, and prevent reignition.
General management
Other key points from the report are;
- The batteries were not equipped with a polling or supervision system. This suggests that malfunctioning batteries would have been detected quicker if a dedicated battery monitoring system was in place.
- The emergency services reported that fire doors had been propped open at the time of the building evacuation which had the effect of degrading the effectiveness of building fire barriers
- Sprinklers in battery rooms are recommended to address the fire and also battery overheating and thermal runaway
- Carry out an audit of a data center’s facilities to study the vulnerability of the site to the risk of fire
- Develop procedures to address fire situations and these plans must be regularly tested
Conclusion
Fire safety in the data center environment is a serious concern in respect of human safety, public inconvenience and commercial loss. TIA-942 points in the right directions to establish fire safety in the design at the earliest stages of a project and a large number of other publicly available standards provide all the necessary detail.
Data center operators must realize that successful fire safety management can only come about through an integrated approach that takes in the requirements of
- Building with low flammability materials
- Fire detection
- Fire suppression
- Integration of BMS, power and HVAC services
- Management plans that minimize operational risk
The ideas and views expressed in this guest blog article are those of the author and not necessarily those of TIA or its members companies