Tune into the CLoud

Gregor Petri

Subscribe to Gregor Petri: eMailAlertsEmail Alerts
Get Gregor Petri: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: Cloud Computing

Blog Feed Post

Tune into the Cloud – Losing My Religion | @CloudExpo #BigData #IoT #API

One of the tenets of the cloud religion is that it should be possible to build reliable systems on top of unreliable hardware

Tune into: a cloud mindset

One of the tenets of the cloud religion is that it should be possible – through the use of intelligent software – to build reliable systems on top of unreliable hardware. Just like you can build reliable and affordable storage systems using RAID (Redundant Arrays of Inexpensive Disks). One of the largest cloud providers even evangelizes to its application development customers that they should assume that “everything that can go wrong, will go wrong”.  In fact their SLA only kicks in after a minimum of two zones becomes unavailable. Quite a surprising but none the less a typical cloud approach.

Nowadays most of the large cloud providers buy very reliable hardware. When running several hundred thousands of servers a failure rate of 1 PPM versus 2 PPM (parts per million) makes quite a difference. And using too cheap memory chips can cause a lot of very difficult to pinpoint problems. These providers also increase the up-time by buying simpler (purpose-optimized) equipment and by thinking carefully about what exactly is important for reliability. For example: one of the big providers routinely removes the overload protection from its transformers. They prefer that occasionally a transformer costing a few thousand dollars breaks down, to regularly having whole isles loose power because a transformer manufacturer was worried about possible warranty claims. And not to worry, they do not remove the fire safety breakers.

With that we are not implying that the idea of assuring ​​reliability at higher stack levels than hardware is no longer necessary. Sometimes even the best quality hardware can (and will) fail. Not to mention human errors (Oops, wrong plug!) that still on a regular basis take complete data centers out of the air (or rather, out of the cloud).

The real question continues to be what happens to your application when something like this happens. Does it simply remain operational, does it gracefully decline to a slightly simpler, slightly slower but still usable version of itself, or does it just crash and burn? And for how long? For end users bringing their own applications to the cloud it is clear where the responsibility lies for addressing this (with themselves). But end users who outsource their applications to a so called “managed cloud provider” may (and should) expect that the provider who provides that management takes responsibility. Recently several customers of a reputable IT provider – who earned his stripes largely in the pre-cloud era and who now offers cloud services from a large number of regionally distributed DCs – lost access to their applications for several days because one operator in one data center did something fairly stupid with just one plug.

Luckily we do see the rate of such human mistakes decline as cloud providers gain more experience (and add more process). Experience counts, especially in the cloud. But an outage like this simply is not acceptable. If a provider boasts it has more local cloud data centers than others, but then is unable to move specific customer workloads to those other data centers within an acceptable timeframe, it is not really a “managed” cloud provider. Simply lifting and shifting customer applications to a cloud instance  without “pessimistically” looking at what could go wrong, is as stupid as putting all your data on a single inexpensive disk without RAID and without backup. And if reengineering the applications is too expensive to create a feasible cloud business case, then users should ask themselves whether cloud in that case is really the right solution.

In the words of R.E.M.: “I think I thought I saw you try” is really not enough assurance for success. The cloud is not about technology or hardware, it’s about mindset. And providers who do not change their mindset may see their customers loosing faint in the cloud (or at least in their cloud). Quit quickly.

Losing My Religion (1991), was the biggest commercial hit of alternative rock band R.E.M.. The song was written more or less accidentally as the bandleader was trying to teach himself to play a second hand Banjo he bought on sale.

More Stories By Gregor Petri

Gregor Petri is a regular expert or keynote speaker at industry events throughout Europe and wrote the cloud primer “Shedding Light on Cloud Computing”. He was also a columnist at ITSM Portal, contributing author to the Dutch “Over Cloud Computing” book, member of the Computable expert panel and his LeanITmanager blog is syndicated across many sites worldwide. Gregor was named by Cloud Computing Journal as one of The Top 100 Bloggers on Cloud Computing.

Follow him on Twitter @GregorPetri or read his blog at blog.gregorpetri.com