Inside the Briefcase

Augmented Reality Analytics: Transforming Data Visualization

Augmented Reality Analytics: Transforming Data Visualization

Tweet Augmented reality is transforming how data is visualized...

ITBriefcase.net Membership!

ITBriefcase.net Membership!

Tweet Register as an ITBriefcase.net member to unlock exclusive...

Women in Tech Boston

Women in Tech Boston

Hear from an industry analyst and a Fortinet customer...

IT Briefcase Interview: Simplicity, Security, and Scale – The Future for MSPs

IT Briefcase Interview: Simplicity, Security, and Scale – The Future for MSPs

In this interview, JumpCloud’s Antoine Jebara, co-founder and GM...

Tips And Tricks On Getting The Most Out of VPN Services

Tips And Tricks On Getting The Most Out of VPN Services

In the wake of restrictions in access to certain...

Surviving Electric Squirrels and UPS Failures

July 16, 2012 No Comments

SOURCE: DataCenterKnowledge.com

Folks who’ve worked in the data center industry for a while tend to have their squirrel stories. Mike Christian, who runs business continuity for Yahoo, shared his recently during a keynote at the O’Reilly Velocity conference in a presentation titled “Frying Squirrels and Unspun Gyros,” which examined the many ways that data centers can fail.

“A frying squirrel took out half of our Santa Clara data center two years back,” Christian said, noting squirrels’ propensity to interact with electrical equipment, with unfortunate results.
If you enter “squirrel outage” in either Google News or Google web search, you’ll find a lengthy record of both recent and historic incidents of squirrels causing local power outages.

Yahoo houses its servers in 29 different data centers, explaining Christian’s familiarity with the many ways they can fail. These include:

  • Inadvertant fire suppression: When electrical triggered smoke detectors at a Texas data center hosting Yahoo Launch (Broacast.com), staffers didn’t realize they could override the next phase of the system – power shutdown and a “dump” of FM200 fire suppressant.
  • HVAC Failure: A cooling system failure in an N+1 Yahoo facility in Reston, Virginia caused a temperature spike in part of the data center, which triggered the fire suppression system – which then shut down the remaining HVAC units, resulting in a “thermal runaway” that resulted in 130 degree F temperatures in the data center. Yahoo was able to shift the load, resulting in no downtime. That’s one reason Yahoo built its Lockport, N.Y. “chicken coop” data center to use fresh air instead of mechanical cooling. “That’s one less failure point,” said Christian.
  • UPS Meltdowns: Yahoo had a small UPS setup in its Sunnyvale data center fail three times in five years. Christian cites a recent survey indicating that up to 29 percent of unplanned data center outages are caused by UPS failures. “Our UPS causes as many problems as it solves,” said Christian. “Complexity is introduced by adding all these multiple systems. They actually introduce additional failure cases.”

How do you prepare for these kind of events? Focus on storing data in more than one location, and routing around facility failures. How does Yahoo know this will work? It conducts full-scale live failover testing with live loads, shifting millions of users between data centers with no visible impact.

Leave a Reply

(required)

(required)


ADVERTISEMENT

Gartner

WomeninTech