Inside the Briefcase

Ironclad SaaS Security for Cloud-Forward Enterprises

Ironclad SaaS Security for Cloud-Forward Enterprises

The 2015 Anthem data breach was the result of...

The Key Benefits of Using Social Media for Business

The Key Benefits of Using Social Media for Business

Worldwide, there are more than 2.6 billion social media...

Gartner IT Sourcing, Procurement, Vendor and Asset Management Summit 2018, September 5 – 7, in Orlando, FL

Gartner IT Sourcing, Procurement, Vendor and Asset Management Summit 2018, September 5 – 7, in Orlando, FL

Register with code GARTITB and save $350 off the...

Infographic: The Three Pillars of Digital Identity: Trust, Consent, Knowledge

Infographic: The Three Pillars of Digital Identity: Trust, Consent, Knowledge

8,434 adults were surveyed to gauge consumer awareness of...

FICO Scales with Oracle Cloud

FICO Scales with Oracle Cloud

Doug Clare, Vice President at FICO, describes how Oracle...

Surviving Electric Squirrels and UPS Failures

July 16, 2012 No Comments

SOURCE: DataCenterKnowledge.com

Folks who’ve worked in the data center industry for a while tend to have their squirrel stories. Mike Christian, who runs business continuity for Yahoo, shared his recently during a keynote at the O’Reilly Velocity conference in a presentation titled “Frying Squirrels and Unspun Gyros,” which examined the many ways that data centers can fail.

“A frying squirrel took out half of our Santa Clara data center two years back,” Christian said, noting squirrels’ propensity to interact with electrical equipment, with unfortunate results.
If you enter “squirrel outage” in either Google News or Google web search, you’ll find a lengthy record of both recent and historic incidents of squirrels causing local power outages.

Yahoo houses its servers in 29 different data centers, explaining Christian’s familiarity with the many ways they can fail. These include:

  • Inadvertant fire suppression: When electrical triggered smoke detectors at a Texas data center hosting Yahoo Launch (Broacast.com), staffers didn’t realize they could override the next phase of the system – power shutdown and a “dump” of FM200 fire suppressant.
  • HVAC Failure: A cooling system failure in an N+1 Yahoo facility in Reston, Virginia caused a temperature spike in part of the data center, which triggered the fire suppression system – which then shut down the remaining HVAC units, resulting in a “thermal runaway” that resulted in 130 degree F temperatures in the data center. Yahoo was able to shift the load, resulting in no downtime. That’s one reason Yahoo built its Lockport, N.Y. “chicken coop” data center to use fresh air instead of mechanical cooling. “That’s one less failure point,” said Christian.
  • UPS Meltdowns: Yahoo had a small UPS setup in its Sunnyvale data center fail three times in five years. Christian cites a recent survey indicating that up to 29 percent of unplanned data center outages are caused by UPS failures. “Our UPS causes as many problems as it solves,” said Christian. “Complexity is introduced by adding all these multiple systems. They actually introduce additional failure cases.”

How do you prepare for these kind of events? Focus on storing data in more than one location, and routing around facility failures. How does Yahoo know this will work? It conducts full-scale live failover testing with live loads, shifting millions of users between data centers with no visible impact.

DATA and ANALYTICS 

Leave a Reply

(required)

(required)


ADVERTISEMENT

Gartner IT Operations

SuperCharge Your Cloud

American CISO

IBC 2018

ITBriefcase Comparison Report