Inert gas data centre fire protection and hard disk drive damage
From their advent in the mid-late 1950’s, data centres have faced a universal threat: Fire. One data centre fire in particular sent a clear message to both the computer industry and the fire prevention community. On July 2, 1959, a fire started in the computer room operated by the United States Air Force. It was located inside the Pentagon building complex. The fire burned for more than five hours and caused an estimated $7 million in damages. Fire investigators attributed the fire’s ignition to an incandescent light bulb in a magnetic-tape storage area. The National Fire Protection Association (NFPA) responded by drafting the first edition of NFPA Standard 75: Fire Protection for Electronic Data Processing Equipment. The standard has continued to be revised to keep pace with the evolution of information technology electronics, the advancements in fire detection and protection, and the end-user requirement for system availability.
NFPA Standard 75
The NFPA Standard for the Protection of Information Technology Equipment is a comprehensive document, the purpose of which is to define the minimum requirements for protecting equipment, and the areas where that equipment is installed, from fire as well as associated threats such as heat, smoke and water.
The NFPA Standard was developed in the United States, but it is widely accepted and adopted in many countries. Independent, country-specific standards also exist and should be consulted for the regions in which they apply.
Water is a universally accepted method of extinguishing a fire. As such, NFPA 75 and other standards state that if the building structure that houses the data/technology room is fire protected with a water sprinkler system, so must be the data/technology room. If the data/technology room is not in a sprinkler-protected building, then the data/technology room owner can elect to use a water sprinkler system, a gaseous clean-agent system or a combination of both.
Standards also state that the user can always upgrade from the basic requirements with acceptable methods or technology. In other words, even if a building has water sprinklers installed, and the standard prescribes water sprinklers for the data/technology room, the user can elect to use a suitable gaseous fire suppression agent.
The ability to choose the method of fire suppression provides the end user with options to address risk factors that vary from one user site to another. Water and subsequent potential water damage has been viewed as a risk concern since the time of early computer rooms. In the 1960s, bromotrifluoromethane, also known as Halon, was introduced as an effective fire suppression gas agent. Halon 1301 became the total flooding gaseous agent of choice for data centres, since it was effective at fire suppression and was considered safe for human occupants at properly engineered concentration levels. Halon is a chlorofluorocarbon (CFC), or ozone-depleting chemical; consequently, new production ceased on January 1, 1994.
The United States Environmental Protection Agency created the Significant New Alternatives Policy (SNAP)3 to evaluate suitable replacements for Halon. Keep in mind that the cessation of new Halon production did not make Halon systems illegal. System owners have the choice of when and how to convert to a different fire suppression system. Owners must decide between installing a new fire suppression system, retrofitting the existing system with a new gas agent or waiting for a Halon discharge event before making a decision.
The list of alternatives provides a number of viable options. Each gaseous agent has some unique characteristics and differences that include concentration level, delivery system, discharge pressure and storage volumes. The gaseous agents are classified in two general categories: chemical agents and inert gases.
Inert Gas Fire Suppression Systems
The Inert Gas Fire Suppression System (IGFSS) is comprised of Argon (Ar) or Nitrogen (N) gas or a blend of those gases. Both are inert, unreactive gases, which present no danger to electronics, hardware or human occupants. The systems extinguish a fire by quickly flooding the area to be protected and effectively diluting the oxygen level to about 13–15%. Combustion requires at least 16% oxygen. The reduced oxygen level is still sufficient for personnel to function and safely evacuate the area. Since their debut in the mid 1990s, these systems have proven to be safe for information technology equipment application.
The systems store the inert gas blends in high-pressure cylinders that can be connected in parallel. Depending on the manufacturer, cylinder pressure ranges from 2,200–2,900 pounds per square inch (PSI), or 152–200 bar, and discharge at pressures of 870–1,000 PSI (60–69 bar). Because of the sudden addition of inert gas within the protected area, automatic venting is a component of the design to prevent pressure buildup.
Hard disk drives
Roughly around 2007, reports started to surface saying that hard disk drives (HDDs) were damaged during gas discharges from IGFSSs. Since then there has been significant speculation and misinformation as to what the cause of failure is, so we should identify what is not the cause to clear up the confusion.
• Damage is not due to a chemical reaction—the gases used are inert and naturally occurring in our atmosphere and pose no danger to HDDs.
• Damage is not due to air pressure in the data centre. IGFSS systems do release a large volume of gas, but as previously mentioned, the data centre is designed with a vent system that limits pressure buildup to less than one PSI. The drives have a “breather hole” that allows equalizes the inside of the HDD with the higher pressure outside, but the air first passes through a multistage filter in the HDD, which prevents contamination. Work done by Siemens and failure analysis by HDD suppliers on failed drives shows that the air pressure is not the issue.
• Damage is not due to temperature change. IGFSSs release about one-third of the data centre’s volume in gas when deployed. The gas is stored at high pressure, so when it is released and falls to normal atmospheric pressure, the temperature does drop some. If you enter the data centre after the release, you will notice it is slightly cooler than it was previously, but this is not a problematic change for the HDDs.
What is the real problem? Acoustic noise. When the gas is released from the pressurised cylinders it moves through the pipes at very high velocity. On exit through multiple nozzles in the data centre, it generates high-level acoustic noise. The noise reaches the HDDs where it causes vibration, which in turn causes the read/write element to go off the data track. Current-generation HDDs have up to about 250,000 data tracks per inch on their disks. To read and write, the element must be within ±15% of the data track spacing. This means the HDD can tolerate less than 1/1,000,000 of an inch offset from the centre of the data track—any more than that will halt reads and writes.
Early disk storage had much greater spacing between data tracks because they held less data, which is a likely reason why this issue was not apparent until recently. Figure 2 shows the performance of multiple HDDs in a storage device as percent of normal I/O (with no noise). After 60 seconds, the IGFSS was deployed, causing performance of all HDDs in the storage device to degrade. Since the valve to the gas is essentially opened and the pressure drops as the cylinders are emptied, you typically see the worst degradation at the beginning (at 60 seconds, when the system was deployed). Gradually the system recovers, except for two HDDs that were failed (the red and bright green traces in the graph). How much a specific model of HDD is affected by an IGFSS release is a complicated question. Each drive has unique set of noise frequencies or spectral sensitivities that adversely affect it, and these frequencies or sensitivities are governed by its design specifics. The performance impact is dependent on the noise levels generated at those frequencies by the fire protection system and how much noise actually reaches the HDD. It is similar to the singer hitting exactly the right note that causes the wine glass to break. The difficult part is each IGFSS system is different, each computer room is different, and each HDD model has a different spectral sensitivity, making it nearly impossible to predict the exact results of a discharge.
Perhaps the greatest confirmation that acoustic noise is the issue is that when a recording of an IGFSS release is played back over a high-performance audio system at appropriate volume, the I/O performance of the HDD mimics what was seen in the actual IGFSS event. Figure 3 shows an audio test stand being used to test the response of an HDD to the noise generated by an IGFSS release. The stand can also be used to develop a detailed understanding of the frequency sensitivity for each HDD.
How can you avoid HDD damage from an IGFSS event?
As mentioned earlier, the IGFSS, the computer room and the HDD all interact, so it is very difficult to accurately predict if a problem will occur. There are, however, three categories of actions that can significantly mitigate the impact on data centre HDDs.
1. Avoid the exposure
All IGFSS systems sound an alarm before the gas is released so personnel can react and exit the data centre. If storage systems can be shut down during this period so they are not running during the release, data loss and damage will be avoided entirely.
2. Reduce the noise generated by the nozzles
Testing has shown that two smaller nozzles that carry one half the gas flow, generate less noise and have less impact on HDDs than one larger nozzle. The nozzle design also determines if it produces pure tones (a whistle type of sound), which can be more damaging than broadband noise (similar sound to wind). A number of proactive IGFSS manufacturers have undertaken nozzle redesign to significantly reduce acoustic noise to protect HDDs. Finally, it is also important to note that some pneumatic sirens have been shown to degrade HDD performance. The electronic alarms are generally much easier on HDDs than the pneumatic type.
3. Reduce the noise to the HDD
This step relates to the noise characteristics of the data centre and rack in which the HDDs are installed. Noise mitigation actions include noise-reducing baffles in the data centre, as Figure 4 shows, or acoustic covers for the front of racks, such as in Figure 5. Equipment manufacturers design these baffles and covers to limit the noise generated by internal air-moving devices to comply with employee health and safety limits for noise exposure. The levels generated by the IGFSS discharge are far below these limits, but the acoustic cover design provides additional protection for the HDD. It is not recommended that anyone other than the equipment manufacturer add baffles or noise-absorption treatment to any equipment. Adding a cover or treatment yourself might restrict cooling airflow and result in overheating of electronics, thereby reduce reliability. Check with the equipment manufacturer to determine if acoustic cover options are available.