Tuesday, December 23, 2008

SLAs, availability, and expectations vs. guarantees

I recently wrote a Service Level Agreement-type of document to help set expectations as to what kind of service one can expect from the Eclipse Webmasters, and from the Eclipse servers. You can read it here.  It wasn't really a pleasant experience, and the document is not really groundbreaking.  It's actually quite boring.

I was then introduced to this blog posting, which, as I understand it, essentially states that SLAs are useless when they use a blank metric we call 'availability'.  The author makes an interesting point, and as I prepared a pretty lengthy reply, the blog software decided my comment was too long, so I figured I'd post it here.

Interesting article.  I'd appreciate seeing an example of an effective SLA that you have authored, and that you are being held accountable for.

I mean, let's face it -- I'd love to guarantee that every time you load a web page, it will come in under 20 seconds, and that all your email will be in your Inbox within 15 minutes.  But there are those things that, as you say, are not easy to measure.  When will I get hit by the next DoS attack? When will the most important server decide to crash?  When will an IT guy make a critical human error and take some key system down?  If your systems are connected to the Internet, then you're open to a whole world of unknowns.

So, how can I guarantee, with absolute certainty, that email will not go down for a full day on the last day of a quarter (which, I agree, could be extremely damaging to a business)?  Perhaps we spend millions of dollars on redundant high-end hardware, redundant points-of-presence, more process for staff, and more staff (to help maintain all this hardware). Easy enough.  But now I must increase my prices to afford this wonderful SLA, driving away customers to the cheaper competitors, equally damaging my business.

Then again, sometimes spending massive amounts of money in infrastructure isn't even enough.

So the alternative is to write an effective SLA that has service expectation metrics so forgiving that they don't make much more sense compared to using the availability metric; ones that certainly don't match the expectations of users any more accurately.

Of course, the odds of a catastrophic failure in a properly executed IT infrastructure are very low, making it easy to set reasonable expectations. You *should* expect your email within 15 minutes. But in between expectation and guarantee is this thing those out-of-alignment IT people call 'budget'.

Wednesday, December 10, 2008

Eclipse.org homepage: the saga continues

After years of getting 'the Eclipse.org homepage is awful' feedback, we decided to turn around 180 degrees and redesign our homepage to remove clutter and make it easier to identify interesting content based on who-you-are, instead of the conventional what-we-offer.

I guess the new page works for total newcomers who can now get to Eclipse and new technologies easier, but for old-timers, the new home page generates more of the same 'the Eclipse.org homepage is awful' feedback.  You can read all the feedback in this bug, but to save you time, here are some highlights:

  • The old homepage showed latest announcements, latest spotlights, latest community news and foremost latest plugins (new one and update one). This was very very informative.

  • There is no content on the new home page.

  • This is the first time I've visited your site and was curious to see if the old home page was less bad than the new one. I wasn't getting much info from the new one.

  • I suggest you seek design inspiration from google.com not pets.com.

  • I appreciate that Eclipse wants our opinion.

Arned with such great feedback, Nathan challenged himself to solve all our problems.  I can only give you this blurred preview, but take it as a sign of good things to come.