Wednesday, February 27, 2013

Cisco CSS: Load balancing from the inside too

Disclaimer: I'm not a Cisco expert.  Years ago, then-webmaster Karl Matthias convinced me that I was almost smart enough to barely understand this gear.  Turns out he was right.

The skinny

We use a Cisco CSS to load-balance client requests to multiple servers.  For years we couldn't load-balance requests from our inside network, only from the outside.


The setup

If you read the Cisco docs, the predominant use-case for a load balancer appears to be a single CSS with  a  single server group serving all the content. However, like at most shops, we have multiple server groups to serve different content.  For example, we have three servers to serve www.eclipse.org, two for Bugzilla, three for Git, three for wiki.eclipse.org, and so on.




Here, the Load Balancer acts as the gateway -- all inside servers use rfc 1918 private IPs and use 172.16.0.1 as their default gateway.  One /24 subnet is used: 172.16.0.X. The CSS has multiple "virtual" IPs: the real, Internet-routable IP addresses that represent the services.

For Internet clients, this setup works beautifully.  When you consider that a CSS is nothing more than a heavy Spoofing device, you can easily follow the flow of traffic from a client, say 108.10.50.81:

  • Client 108.10.50.81 sends SYN packet to www.eclipse.org, which is 198.41.30.199
  • The CSS immediately responds with a SYN-ACK, which is ACK'ed by the client, thus completing the three-way handshake
  • Meanwhile, the CSS spoofs the connection to one of the real servers in the group.  It crafts a new SYN packet --  Source: 108.10.50.81  Dest: 172.16.0.7 (it happened to pick that one)
  • The real server responds with a SYN-ACK: Source: 172.16.0.7 Dest: 108.10.50.81.  Since the Destination is remote, the packet is sent to the Default Gateway, which is the CSS (172.16.0.1)
  • The CSS simply discards the SYN-ACK since it has already established a socket with the real client. It ACKs the real server and completes the three-way handshake on the backend.
  • Everyone is happy, and traffic is free to flow from the client to the real server.


The problem

Problems arise when a server on the "inside" becomes a client to a load-balanced service, also on the inside. For some reason, it just doesn't work.  Years ago, the Cisco experts (not Karl) just told me it was how Cisco devices worked -- the load balancer is not meant to be accessed from the inside network.  The Cisco forums provided no particular guidance, other than to essentially "NAT" the inside servers as clients.  While that solution works, I didn't find it particularly pretty.

We originally resolved the issue with hosts file entries at first, then internal DNS.  Since all our servers share a backend network connection, server-to-server connections would flow over it.  It worked, but it was error prone and confusing.  If one load-balanced node died or was taken offline, we'd need to remember to update DNS.


Why it doesn't work

The years passed and I didn't spend much time thinking about it, but as our services grew in number, size and traffic volume, the problems became more frequent and annoying.

Understanding the root cause of the problem was key to developing a solution, which happened haphazardly while explaining it to a bunch of Linux students.  A light bulb lit up and I saw the light.

The following day I spent a bit of time with tcpdump and webalizer, I noticed that the internal "client" trying to reach an internal server from the CSS was eventually receiving two SYN-ACK packets.  The client, understandably confused, would RST the connection leading to failure.  Bingo.

Following the flow of traffic from the inside "client", the problem becomes apparent.  Say Bugzilla server 172.16.0.15 wants to talk to server www.eclipse.org, using the virtual IP:


  • Internal Bugzilla server (the client) 172.16.0.15 sends SYN packet to www.eclipse.org, which, through the magic of DNS, resolves to 198.41.30.199. Remember, that IP is the CSS.
  • The CSS immediately responds with a SYN-ACK, which is ACK'ed by the bugzilla server (the client), thus completing the three-way handshake.  So far, so good.
  • Meanwhile, the CSS spoofs the connection to one of the real servers in the group.  It crafts a new SYN packet --  Source: 172.16.0.15 (the bugzilla "client")  Dest: 172.16.0.6 (one of the www nodes)
  • The real server responds with a SYN-ACK: Source: 172.16.0.6 Dest: 172.16.0.15.  Are you seeing this?  Unlike earlier, this time the Destination IP is not remote, the packet is not sent to the Default Gateway.  The SYN-ACK is sent directly to the Destination.
  • Two things happen:  1) The "client" receives a second SYN-ACK -- one from CSS, which is the spoofed connection, and now one from the real server.   2) The CSS is not "seeing" the response.  The CSS must "see" all the traffic.
  • Bugzilla server (the client), confused by the two SYN-ACKs, issues a connection RST and the connection fails.


The solution

For internal load-balancing to work, the CSS must see all the traffic coming in-and-out.  The easiest solution here is to isolate the servers in content groups to their own subnet.  Consider this:

The changes may be hard to spot:

  1. No changes to the "www" servers.  They remain on the 172.16.0 subnet, with a 24-bit mask.
  2. Bugzilla server IP addresses change from 172.16.0.X to 172.16.1.X.  Also with a 24-bit subnet mask, they are now on a different IP subnet than the www servers.  Physically, no wiring or vlan changes are needed.  Default Gateway changes to 172.16.1.1
  3. On the CSS, a new IP address is assigned to the inside circuit: 172.16.1.1.  It will be the Default Gateway address for the Bugzilla group.
  4. service rules for Bugzilla servers are updates to reflect their new IP addresses.
Clients on the outside don't see a thing -- they are still happily talking to the CSS via virtual IP 198.41.30.X. However, on the inside, Bugzilla and "www" can now talk to each other using their load-balanced virtual IP 198.41.30.X since the CSS must be used to route all traffic between them.  If one node fails, the CSS continues to use the remaining nodes, and service remains functional for inside clients too.

2 Comments:

Blogger K Matthias said...

Well done! Tcpdump to the rescue! See, you da man. ;-) Glad you got those things kicked into submission.

3:50 AM  
Blogger Denis Roy said...

Yep, thanks for the kick in the butt years ago. It has proved very helpful.

10:05 AM  

Post a Comment

<< Home