Название | Maintaining Mission Critical Systems in a 24/7 Environment |
---|---|
Автор произведения | Peter M. Curtis |
Жанр | Физика |
Серия | |
Издательство | Физика |
Год выпуска | 0 |
isbn | 9781119506140 |
There are about ten times as many UPS systems in use today than there were 10 years ago, and many more companies are still discovering their worth after losing data during a power line disturbance. Do you want electrical outages to be scheduled or unscheduled? Serious facilities engineers use comprehensive preventative maintenance procedures to avoid being caught off‐guard.
Many companies do not consider installing backup equipment until after an incident has already occurred. During the months following the U.S. Northeast Blackout of 2003, the industry experienced a boom in the installation of UPS systems and standby generators. Small and large businesses alike learned how susceptible they are to power disturbances and the associated costs of not being prepared. Some businesses that are not typically considered mission critical learned that they could not afford to be unprotected during a power outage. For example, the Blackout of 2003 destroyed $250 million of perishable food in New York City alone.1 Businesses everywhere, and of every type, are reassessing their level of risk tolerance and cost of downtime.
3.3 Identifying the Appropriate Redundancy in a Mission Critical Facility
Mission critical facilities cannot be susceptible at any time to an outage, including during maintenance of the subsystems. Therefore, careful consideration must be given in evaluating and implementing redundancy in systems design. Examples of redundancy are classified as (N+1) and (N+2) configurations and are normally applied to the systems below:
Utilities service
Power distribution
UPS
Emergency generator
Fuel system supplying emergency generator
Mechanical Systems
HVAC Distribution including piping, AHU’s and CRAH units
Fire Protection Systems
A standard system (N+1) is a combination of two basic schemes that meet the criteria of furnishing an essential component plus one additional component for backup. This design provides the best of both worlds at a steep price, with no economies of scale considered. A standard system protects critical equipment and provides long‐term protection to critical operations. In a true (N+1) design, each subsystem is configured in a parallel redundant arrangement such that full load may be served even if one system is off‐line due to scheduled maintenance or a system failure. The alternate (+1) component is also fed from a different power source.
The next level of reliability is a premium system. The premium system meets the criteria of an (N+2) design by providing the essential component plus two components for backup. It also utilizes diverse electric services from different power distribution boards and two different utility substations. Under this configuration, any one of the backup components can be taken off‐line for maintenance and still retain (N+1) reliability.
Another level resiliency can be gained by having building electric services installed underground as opposed to aerial. Aerial services are susceptible to lightning strikes, fallen trees, or damaged utility poles.
Facilities can also be classified into different Tiers based on their required level of reliability and maintainability. Tiers for mission critical facilities range from I to IV, with Tier IV being the most redundant, reliable, and maintainable.
Table 3.3 Uptime Tiers
(Reference: Uptime Institute)
Tier I – Basic Non‐Redundant | Tier II – Basic Redundant | Tier III – Concurrently Maintainable | Tier IV – Fault and Failure Tolerant |
---|---|---|---|
No redundancySusceptible to interruptions from planned and unplanned activitiesEquipment configurations minimum required for equipment to operateOperation errors or failures will cause an interruption in service | Limited backup and redundancy reportSusceptible to disruptions from planned and unplanned activitiesMay contain limited criticality functions that can be shut down properly without adverse effects on businessUPS and/or generator backup may be installed for parts of the buildingFailures may cause a disruption in facility service | Full single system backup and redundancy (N+1)Planned preventative and programmable maintenance activities, repairs, testing, etc. can be conducted without interruption of serviceErrors in operation or spontaneous failures of infrastructure may cause disruption of power to the load | Facility functions cannot tolerate any downtimeNo single points of failure, and multiple system backup with automated recovery (2N)Capable of withstanding one or more component failures, errors, or other events without disrupting power to the loadFull load can be supported on one path without disruption while maintenance/testing is performed on the other |
Load Classifications
Critical Load: Requires 100% uptime. Must have uninterrupted power input to safeguard against facility damage or losses, prevent danger and injury to personnel, or keep critical business functions on line.
Essential Load: Supports routine site operations. Able to tolerate power failures without data loss or affecting overall business continuity. Critical cooling can be classified as essential load when critical spaces have a “ride through time,” allowing chillers, cooling towers, and pumps to restart automatically in a limited period.
Discretionary Load: Load that indirectly supports the operation of the facility, such as administrative, cafeteria/pantries/fitness centers/retail spaces, and office functions. Can be “shed” without affecting overall business continuity in order to keep critical loads on line.
3.4 Improving Reliability, Maintainability, and Proactive Preventative Maintenance
The average human heart beats approximately 70 times a minute, or a bit more than once per second. Imagine if a heart missed three beats in a minute. This would be considered a major power line disturbance if we were to compare it to an electrical distribution system. Take the electrical distribution system that feeds your facility, or better yet, the output of the UPS system, and interrupt it for 3 seconds. This is an eternity for computer hardware. The critical load is disrupted, your computers crash, and your business looses two days’ worth of labor or worse, is fined $10 to $20 million by the federal government because they did not receive the quota of $500 billion dollars of transaction reports by the allocated time.
All this could have been prevented if electrical maintenance and testing were performed on a routine basis, and the failed electrical connections were detected and repaired. Repairs could have been quickly implemented during the annual infrared scanning program that takes place before building maintenance shutdowns.
What can the data processing or facility manager do to ensure that their electrical system is as reliable as possible?
The seven steps to improved reliability and maintainability