Thursday, March 08, 2007

Unintended consequences: local build servers

When projects run a build, I had always thought it would be much more efficient if the build servers were physically on the same LAN as Eclipse CVS and downloads, instead of pulling/pushing code over the Internet. After all, bandwidth isn't free. Well, that turned out to be true, as several projects now use some of our server infrastructure to run their builds. We do save on bandwidth. Now for the unintended consequences:

a) local builds are much, much faster thanks to gigabit networks and quad-cpu monster servers.
b) faster builds means teams can "afford" to build more often. Nary a minute goes by where I don't see WTP build something.
c) more builds from powerful monster build servers put more strain on our storage server.

Back in January we had a problem where our storage server would simply give up on life, and we didn't know why. We concluded that when the monster build server sends a few gigabytes of data to an already busy storage server, it gets overwhelmed to the point where it sometimes simply stops serving files. Ouch.

A busy disk shouldn't kill a server, but we have no fix in sight, other than some hacked scripts to monitor and kick what needs to be kicked to avoid a crash. They work great, but the problem isn't fixed per se, so where does that leave us? I thought of two solutions:

a) rearchitect our storage backend to handle massive disk writes. This will require hardware and money, so it won't happen tomorrow.

b) throttle the network for builds. The Internet introduces a natural throttling process, so if the build server talks slower, maybe the storage will have time to write everything down without falling over.

For now I've throttled the build network -- although the server can read files at gigabit speed, it can only write at about 100Mbit/sec. Let's see how that works.

Oh the joys of running servers for a pretty busy site never end...


Post a Comment

<< Home