Monday, November 04, 2013

HIPP: Making Hudson work at Eclipse.org

At Eclipse, we've been using the Hudson CI server since 2008 when it was originally set up by members of the Eclipse community -- a handful of committers who wanted to take advantage of a great CI system and put their release engineering online, available to all.

After a while, the service was deemed important and useful, and management responsibility was transferred to the Webmaster team.  Projects signed up, the number of servers grew, and the variety of plugins increased as build methodologies and complexity varied from project to project.  Meanwhile, performance and stability were not usually on our side, and we struggled to keep the service problem-free for any amount of time.

If you build it, they will come (and destroy it all)

To be fair, any piece of code will get a stress-test when exposed to a large, open community such as Eclipse, where bots, scripts, users and search engines will use and abuse every nook and cranny, 24/7.  Nifty features, such as viewing logs or downloading workspaces as a ZIP file work great on internal or small-scale CI systems, but for a master with eight slaves, hundreds of jobs and thousands of users, it can be a bit overwhelming.

The Hudson project team has always been very responsive to our requests.  They've examined stack traces, bugs and build jobs.  However, in the end, problems typically revolved around any of the following states:
  • Hudson isn't designed to be a web server, responding to tons of http requests.
  • Hudson isn't designed to be a file server, serving 100+ MB files over the network.
  • Hudson isn't designed to be a log viewer, where some build logs can get quite large.
  • The design behind the slave delegation isn't great.
  • None of this is easy to fix.
As my Java Developer days are but a faint memory, picking up the codebase and hacking out patches was likely only going to make things worse.  Instead, I thought -- why not create a Hudson setup where it actually shines: in a smaller, more focused build environment. Enter bug 403843: Hudson Instance Per Project, or HIPP as I named the idea of providing Eclipse projects with their own dedicated Hudson instance.

Old school solutions to new problems.

Having dozens of Hudson processes on a single server, all performing specific tasks for individual projects didn't seem as scary as having a single Hudson process performing tasks for dozens of projects.  After all, threading, concurrency and resource contention are all issues that are resolved at the Operating System level.  The Linux Kernel will always be better at load-balancing multiple tasks than Hudson (and Jenkins) will ever be.

The Webmaster team got to work setting up the first HIPP server, and by leveraging Puppet we ensured we could easily create dozens more just like it.  By late July 2013, the first HIPP instance was set up as a proof-of-concept for the Sapphire project, and a quick "ps aux" command shows me the same Hudson instance is still online and operational since August 9.  So far, so good on the stability front.

RAM -- for breakfast, lunch AND dinner.

Of course, the proposed solution had some potentially serious hardware implications.  An idle Hudson master can take about 600M of RAM, and 4 to 8GB (and more) while running a build.  Banking on the fact that not every project builds at the same time, we went the conservative route and decided to allocate about 12 HIPP instances to a single 64GB server (24 instances to a 128 GB box).  Casual observations lead me to believe we could easily double those numbers without oversubscribing the box, but we're going into this cautiously. Stability first! We'll readjust our targets as we gain experience with the new setup.

Hudson -- he's a member of your project now.

Get to know the Hudson Butler -- that gray-haired, mustache-adorning gentleman wearing a suit and red bow tie, because under HIPP, he can be a committer in your project group, with the same rights to tag and branch your Git repo, as well as drop signed files directly into your downloads area.  A large, monolithic Hudson serving multiple projects can't do such things, since I've always had reservations about storing committer credentials inside a web app, or allowing one app to delete files in anyone's project.

Since a HIPP is now working specifically on your project, we've also allowed a host of plugins on HIPP that are still forbidden on the "shared" Hudson, such as the Gerrit plugin, which allows Hudson to be an active participant in your code review process.

What about Hudson Team Support?

Before diving into HIPP, I had some very interesting discussions at EclipseCon with some of the Hudson project members, and that's where I was introduced to the soon-to-be-released multi-tenant aspect of Hudson 3.1.  The promises were, well, promising, but it was not released yet, and I needed to fix stability years ago. I couldn't wait. Besides, I wasn't entirely sure that the Hudson team support could give me the levels of user/project separation I was after, so I forged ahead with HIPP.

As I write this, Thanh is provisioning HIPP instance #37, I'm working on web-based HIPP self-serve start/stop/restart, we're getting positive feedback that it all to Just Works and that makes me a happy camper.

Keep on building.

2 Comments:

Blogger Konstantin Komissarchik said...

Moving Sapphire build to HIPP has been remarkably successful. Once the instance was properly configured, I literally cannot recall when intervention was necessary because something broke or got stuck. On the shared instance, it was almost a daily occurrence. Failed builds due to random infra issues, stuck slaves, having to reassign the job to a different slave, etc.

A very happy HIPP participant here.

12:56 PM  
Blogger Denis Roy said...

Thanks for the feedback, Konstantin.

3:46 PM  

Post a Comment

<< Home