Friday, January 24, 2014

Results: EclipseCon 2014 is returning to California... What are your thoughts?

The results are in!!  Earlier this week I posted a whacky poll, asking you what your thoughts were on EclipseCon's return to the Bay area.  Here are the results:


36 Im-glad-its-back-in-SF-I-hope-the-weather-is-nice

32 Im-still-hopeful-for-an-EclipseCon-in-Maui-make-it-happen

30 I-prefer-the-east-coast-less-time-on-a-plane

No surprise there.  I enjoyed driving to the last two EclipseCons but I'm looking forward to sunny California!

16 doesnt-matter-where-ECon-is-I-will-be-there

11 Im-glad-its-back-in-SF-I-will-visit-with-family-and-friends



6 what-no-beer

I have to keep my material fresh, so there were no choices for beer.  Sorry.  But that didn't stop some poll veterans from making up some choices.

SF-is-the-best-so-are-Eclipse-webmasters
Im-glad-its-back-in-SF-and-twice-as-glad-to-join-the-EF-staff

Hi-Dennis-This-is-me-again-will-you-give-me-a-beer-somedays

Looks like I owe some kind folks a beer or two.  Less than 2 months to EclipseCon North America, see you there!

Wednesday, January 22, 2014

Poll: EclipseCon 2014 is returning to California... What are your thoughts?

There was a time when EclipseCon was as West Coast USA as the Golden Gate bridge.  However, unlike the bridge, EclipseCon has been to Washington, D.C. and to Boston, Massachusetts to broaden its horizons.  Next March, the favourite gathering point for the Eclipse community is returning to the bay area for its 10th anniversary.

So here's the poll: What do you think about going back to San Francisco?  Clickety-click the links below to cast your votes:


http://eclipsecon.org/webmaster/Im-glad-its-back-in-SF-I-hope-the-weather-is-nice


http://eclipsecon.org/webmaster/Im-glad-its-back-in-SF-I-will-visit-with-family-and-friends


http://eclipsecon.org/webmaster/I-prefer-the-east-coast-less-time-on-a-plane


http://eclipsecon.org/webmaster/doesnt-matter-where-ECon-is-I-will-be-there


http://eclipsecon.org/webmaster/Im-still-hopeful-for-an-EclipseCon-in-Maui-make-it-happen


Next week I'll post the results!  Remember, no ballot stuffing or reverse-engineering of the polling process!

Friday, November 29, 2013

Using Gerrit code review without code review

Committers who typically push to HEAD (the master branch) can hesitate to enable Gerrit simply because they don't want to add the extra step of reviewing code. Fortunately, with Gerrit, you don't have to.

In fact, EGit makes it easy to push directly to master on a regular basis yet still submit a change to Gerrit for code review.  Here's how:


1. Ask Webmaster to enable Gerrit on your repo, and explicitly state that your project committers should be able to bypass code review.  Moving forward, Gerrit is the only system that can write to your Git repo, even if you're pushing directly to master.


2. Using EGit, clone your repo using the Gerrit URLs, not the Git URLs.  These typically look like this:
ssh://uid@git.eclipse.org:29418/cbi/parent
https://uid@git.eclipse.org/r/cbi/parent

I like to use the "Import from Git repository" functionality in Eclipse, since it will clone the repository and create a project at the same time.


3. By default, EGit will be configured to push directly to the master branch.  I use the Commit and Push button to bypass code review, and the Commit button to (later) push the changes to Gerrit for review.

EGit "Commit" dialog.



If Webmaster didn't activate the Push to Master permission, you'll likely see this:

Gerrit prohibits direct push to master


4. You're done.  Use the Commit and Push button to push directly to the master branch.  If you do wish to create a code review for your local commits, choose the Push to Gerrit item:


Right-click > Team shows Push to Upstream vs. Push to Gerrit

Push to Gerrit allows you to specify code review branch... in this case, refs/for/master, which differs from the master branch, refs/heads/master


5. If Push to Gerrit succeeds, you've effectively created a change, and Gerrit will provide a URL:



6. In the Gerrit UI, you can review your change then Submit it. Gerrit then merges your review branch into head and your change status becomes Merged.  Or, you can request review from others, who can  then Publish review commentary or Publish and Submit 

Gerrit code review


7. Watch your new Gerrit project!  Contributors may be submitting changes to Gerrit right now, and they are awaiting your review.


Monday, November 04, 2013

HIPP: Making Hudson work at Eclipse.org

At Eclipse, we've been using the Hudson CI server since 2008 when it was originally set up by members of the Eclipse community -- a handful of committers who wanted to take advantage of a great CI system and put their release engineering online, available to all.

After a while, the service was deemed important and useful, and management responsibility was transferred to the Webmaster team.  Projects signed up, the number of servers grew, and the variety of plugins increased as build methodologies and complexity varied from project to project.  Meanwhile, performance and stability were not usually on our side, and we struggled to keep the service problem-free for any amount of time.

If you build it, they will come (and destroy it all)

To be fair, any piece of code will get a stress-test when exposed to a large, open community such as Eclipse, where bots, scripts, users and search engines will use and abuse every nook and cranny, 24/7.  Nifty features, such as viewing logs or downloading workspaces as a ZIP file work great on internal or small-scale CI systems, but for a master with eight slaves, hundreds of jobs and thousands of users, it can be a bit overwhelming.

The Hudson project team has always been very responsive to our requests.  They've examined stack traces, bugs and build jobs.  However, in the end, problems typically revolved around any of the following states:
  • Hudson isn't designed to be a web server, responding to tons of http requests.
  • Hudson isn't designed to be a file server, serving 100+ MB files over the network.
  • Hudson isn't designed to be a log viewer, where some build logs can get quite large.
  • The design behind the slave delegation isn't great.
  • None of this is easy to fix.
As my Java Developer days are but a faint memory, picking up the codebase and hacking out patches was likely only going to make things worse.  Instead, I thought -- why not create a Hudson setup where it actually shines: in a smaller, more focused build environment. Enter bug 403843: Hudson Instance Per Project, or HIPP as I named the idea of providing Eclipse projects with their own dedicated Hudson instance.

Old school solutions to new problems.

Having dozens of Hudson processes on a single server, all performing specific tasks for individual projects didn't seem as scary as having a single Hudson process performing tasks for dozens of projects.  After all, threading, concurrency and resource contention are all issues that are resolved at the Operating System level.  The Linux Kernel will always be better at load-balancing multiple tasks than Hudson (and Jenkins) will ever be.

The Webmaster team got to work setting up the first HIPP server, and by leveraging Puppet we ensured we could easily create dozens more just like it.  By late July 2013, the first HIPP instance was set up as a proof-of-concept for the Sapphire project, and a quick "ps aux" command shows me the same Hudson instance is still online and operational since August 9.  So far, so good on the stability front.

RAM -- for breakfast, lunch AND dinner.

Of course, the proposed solution had some potentially serious hardware implications.  An idle Hudson master can take about 600M of RAM, and 4 to 8GB (and more) while running a build.  Banking on the fact that not every project builds at the same time, we went the conservative route and decided to allocate about 12 HIPP instances to a single 64GB server (24 instances to a 128 GB box).  Casual observations lead me to believe we could easily double those numbers without oversubscribing the box, but we're going into this cautiously. Stability first! We'll readjust our targets as we gain experience with the new setup.

Hudson -- he's a member of your project now.

Get to know the Hudson Butler -- that gray-haired, mustache-adorning gentleman wearing a suit and red bow tie, because under HIPP, he can be a committer in your project group, with the same rights to tag and branch your Git repo, as well as drop signed files directly into your downloads area.  A large, monolithic Hudson serving multiple projects can't do such things, since I've always had reservations about storing committer credentials inside a web app, or allowing one app to delete files in anyone's project.

Since a HIPP is now working specifically on your project, we've also allowed a host of plugins on HIPP that are still forbidden on the "shared" Hudson, such as the Gerrit plugin, which allows Hudson to be an active participant in your code review process.

What about Hudson Team Support?

Before diving into HIPP, I had some very interesting discussions at EclipseCon with some of the Hudson project members, and that's where I was introduced to the soon-to-be-released multi-tenant aspect of Hudson 3.1.  The promises were, well, promising, but it was not released yet, and I needed to fix stability years ago. I couldn't wait. Besides, I wasn't entirely sure that the Hudson team support could give me the levels of user/project separation I was after, so I forged ahead with HIPP.

As I write this, Thanh is provisioning HIPP instance #37, I'm working on web-based HIPP self-serve start/stop/restart, we're getting positive feedback that it all to Just Works and that makes me a happy camper.

Keep on building.

Wednesday, March 20, 2013

Poll results: How do you prepare your EclipseCon talks

The results are in! In yesterday's poll I asked how you prepared for your talk(s) at EclipseCon. Here were your answers:



30 i-prepare-my-talks-on-the-plane-to-econ

27 oh-crap-ill-be-back-later-theres-something-i-must-go-do

22 i-prepared-my-talk-in-2009-maybe-i-should-update-it

19 i-prepare-my-talks-while-i-drive-to-econ

16 i-prepare-my-talk-when-i-get-there



124 I-assumed-Denis-was-preparing-my-talks-for-me

If you think I am in some way qualified to prepare your talks, the attendees are in for a painful experience.


56 like-Eric-I-prepare-mine-on-the-toilet


That is way too much information.  I'd be leery of printed handouts at those talks.



For some reason, there were more answers on the topic of beer than any other.   I am shocked!

9 this.is.my.first.eclipsecon.ever.its.going.to.be.great---they.say.you.buy.beers

Congratulations -- you are in for a great week. Catch me at the bar and I'll buy you a beer.  You'll need to be fast since I am not at the bar very often.


6 please.set.up.a.better.404.page.for.eclipsecon


In this age of Drupal-enabled websites, this is out of my control.

 

5 ive.been.to.many.eclipsecons---why-have-you-never-bought-me-a-beer

I am sorry.  I will try harder this year.




5 /webmaster.ive.been.to.all.the.eclipsecons---and.I.know.you.buy.beers


I know who you are.


5 /webmaster.ive.been.to.a.couple.of.eclipsecons---I-prefer-to-run-than-drink-beer


I will run with you -- to the beer store!


 


Thanks to everyone who participated!  I look forward to seeing everyone again at EclipseCon 2013!

Tuesday, March 19, 2013

EclipseCon Poll: How are you preparing your talk(s) for EclipseCon

It's time for another Webmaster Whacky poll, where failure is the only sign of success.  In less than a week we'll all be at our favourite hangout, EclipseCon 2013.  This time I ask a simple question:

How do you prepare for your talk(s) at EclipseCon?  Cast your votes by clicking the links below:


http://eclipsecon.org/webmaster-i-prepare-my-talks-on-the-plane-to-econ

http://eclipsecon.org/webmaster-i-prepare-my-talks-while-i-drive-to-econ

http://eclipsecon.org/webmaster-i-prepared-my-talk-in-2009-maybe-i-should-update-it

http://eclipsecon.org/webmaster-i-prepare-my-talk-when-i-get-there

http://eclipsecon.org/webmaster-oh-crap-ill-be-back-later-theres-something-i-must-go-do

http://eclipsecon.org/webmaster-what-no-beer?


I'll tally up the results tomorrow afternoon.  Remember -- no ballot stuffing!!

Wednesday, February 27, 2013

Cisco CSS: Load balancing from the inside too

Disclaimer: I'm not a Cisco expert.  Years ago, then-webmaster Karl Matthias convinced me that I was almost smart enough to barely understand this gear.  Turns out he was right.

The skinny

We use a Cisco CSS to load-balance client requests to multiple servers.  For years we couldn't load-balance requests from our inside network, only from the outside.


The setup

If you read the Cisco docs, the predominant use-case for a load balancer appears to be a single CSS with  a  single server group serving all the content. However, like at most shops, we have multiple server groups to serve different content.  For example, we have three servers to serve www.eclipse.org, two for Bugzilla, three for Git, three for wiki.eclipse.org, and so on.




Here, the Load Balancer acts as the gateway -- all inside servers use rfc 1918 private IPs and use 172.16.0.1 as their default gateway.  One /24 subnet is used: 172.16.0.X. The CSS has multiple "virtual" IPs: the real, Internet-routable IP addresses that represent the services.

For Internet clients, this setup works beautifully.  When you consider that a CSS is nothing more than a heavy Spoofing device, you can easily follow the flow of traffic from a client, say 108.10.50.81:

  • Client 108.10.50.81 sends SYN packet to www.eclipse.org, which is 198.41.30.199
  • The CSS immediately responds with a SYN-ACK, which is ACK'ed by the client, thus completing the three-way handshake
  • Meanwhile, the CSS spoofs the connection to one of the real servers in the group.  It crafts a new SYN packet --  Source: 108.10.50.81  Dest: 172.16.0.7 (it happened to pick that one)
  • The real server responds with a SYN-ACK: Source: 172.16.0.7 Dest: 108.10.50.81.  Since the Destination is remote, the packet is sent to the Default Gateway, which is the CSS (172.16.0.1)
  • The CSS simply discards the SYN-ACK since it has already established a socket with the real client. It ACKs the real server and completes the three-way handshake on the backend.
  • Everyone is happy, and traffic is free to flow from the client to the real server.


The problem

Problems arise when a server on the "inside" becomes a client to a load-balanced service, also on the inside. For some reason, it just doesn't work.  Years ago, the Cisco experts (not Karl) just told me it was how Cisco devices worked -- the load balancer is not meant to be accessed from the inside network.  The Cisco forums provided no particular guidance, other than to essentially "NAT" the inside servers as clients.  While that solution works, I didn't find it particularly pretty.

We originally resolved the issue with hosts file entries at first, then internal DNS.  Since all our servers share a backend network connection, server-to-server connections would flow over it.  It worked, but it was error prone and confusing.  If one load-balanced node died or was taken offline, we'd need to remember to update DNS.


Why it doesn't work

The years passed and I didn't spend much time thinking about it, but as our services grew in number, size and traffic volume, the problems became more frequent and annoying.

Understanding the root cause of the problem was key to developing a solution, which happened haphazardly while explaining it to a bunch of Linux students.  A light bulb lit up and I saw the light.

The following day I spent a bit of time with tcpdump and webalizer, I noticed that the internal "client" trying to reach an internal server from the CSS was eventually receiving two SYN-ACK packets.  The client, understandably confused, would RST the connection leading to failure.  Bingo.

Following the flow of traffic from the inside "client", the problem becomes apparent.  Say Bugzilla server 172.16.0.15 wants to talk to server www.eclipse.org, using the virtual IP:


  • Internal Bugzilla server (the client) 172.16.0.15 sends SYN packet to www.eclipse.org, which, through the magic of DNS, resolves to 198.41.30.199. Remember, that IP is the CSS.
  • The CSS immediately responds with a SYN-ACK, which is ACK'ed by the bugzilla server (the client), thus completing the three-way handshake.  So far, so good.
  • Meanwhile, the CSS spoofs the connection to one of the real servers in the group.  It crafts a new SYN packet --  Source: 172.16.0.15 (the bugzilla "client")  Dest: 172.16.0.6 (one of the www nodes)
  • The real server responds with a SYN-ACK: Source: 172.16.0.6 Dest: 172.16.0.15.  Are you seeing this?  Unlike earlier, this time the Destination IP is not remote, the packet is not sent to the Default Gateway.  The SYN-ACK is sent directly to the Destination.
  • Two things happen:  1) The "client" receives a second SYN-ACK -- one from CSS, which is the spoofed connection, and now one from the real server.   2) The CSS is not "seeing" the response.  The CSS must "see" all the traffic.
  • Bugzilla server (the client), confused by the two SYN-ACKs, issues a connection RST and the connection fails.


The solution

For internal load-balancing to work, the CSS must see all the traffic coming in-and-out.  The easiest solution here is to isolate the servers in content groups to their own subnet.  Consider this:

The changes may be hard to spot:

  1. No changes to the "www" servers.  They remain on the 172.16.0 subnet, with a 24-bit mask.
  2. Bugzilla server IP addresses change from 172.16.0.X to 172.16.1.X.  Also with a 24-bit subnet mask, they are now on a different IP subnet than the www servers.  Physically, no wiring or vlan changes are needed.  Default Gateway changes to 172.16.1.1
  3. On the CSS, a new IP address is assigned to the inside circuit: 172.16.1.1.  It will be the Default Gateway address for the Bugzilla group.
  4. service rules for Bugzilla servers are updates to reflect their new IP addresses.
Clients on the outside don't see a thing -- they are still happily talking to the CSS via virtual IP 198.41.30.X. However, on the inside, Bugzilla and "www" can now talk to each other using their load-balanced virtual IP 198.41.30.X since the CSS must be used to route all traffic between them.  If one node fails, the CSS continues to use the remaining nodes, and service remains functional for inside clients too.

Friday, February 15, 2013

Big Server Move Reason #4: Big Savings

This is the final part of a blog series on why Eclipse.org moved to a new datacenter.

See Also: Reason #1: Bigger Pipe
See Also: Reason #2: Big Power
See Also: Reason #3: Big Cooling

Reason #4: Big Savings

The new colo facility was eager to have our business.  Very eager.  They kept sweetening the pot until it was practically impossible to say 'no'.  The end result of this move: more bandwidth, more AC power, better cooling, more cabinet space, and a lower monthly bill for the Foundation.

What's there not to like?

Thursday, February 14, 2013

Big Server Move Reason #3: Big Cooling

This is part of a blog series on why Eclipse.org moved to a new datacenter.

See Also: Reason #1: Bigger Pipe
See Also: Reason #2: Big Power

Reason #3: Big Cooling

The by-product of consumed electricity is heat -- lots of it.  We felt we had outgrown our previous location since our cabinet temperatures were very high, even if the cabinets themselves still had vacancies.  In the last six months, we replaced no less than eight failed hard drives, all in relatively young servers.  Not an efficient use of our time.

The new facility has cabinets that are not only deeper, but also equipped with large chimneys which are ducted into the facility's air return (the ceiling).  Hot exhaust air is literally sucked out of our cabinets, drawing cool air from the perforated floor tiles in front.  Wayne's blog post has some neat pictures.

The result is a set of hot chimneys, cool servers and remarkably uniform temperatures inside the cabinets. 

Next up: Reason #4: Big Savings

Wednesday, February 13, 2013

Big Server Move Reason #2: Big Power

See Also: Reason #1: Bigger Pipe

This is part of a blog series on why Eclipse.org moved to a new datacenter.

Reason #2: Big Power

Today, small servers can deliver greater computational power than the bigger servers of only a few years ago.  But they are power-hungry: 700W to 1000W power supplies in small 1U servers means server cabinets now require plenty of power distribution units (PDUs).

Since efficiency drops with as AC current increases, it became clear that  North America's standard 120v AC power was inefficient usage of yesteryear's technology (the rest of the world has figured this out long ago).  As DC power is simply not there yet for colocation, 208v 3-phase AC was the way to go.  Not only do servers consume a bit less power at that higher voltage, our new PDU bars have built-in current monitors that display instant power usage on each phase, and can be graphed remotely thanks to SNMP.

Additionally, the new facility is providing us with A+B redundant power circuits that we must keep under 40% capacity.  This allows for the total loss of one circuit while still remaining within acceptable loads (below 80%) on the remaining circuit.

Next up: Reason #3: Big Cooling