Load balancing is a term that describes a method to distribute incoming socket connections to different servers. It’s not distributed computing, where jobs are broken up into a series of sub-jobs, so each server does a fraction of the overall work. It’s not that at all. Rather, incoming socket connections are spread out to different servers. Each incoming connection will communicate with the node it was delegated to, and the entire interaction will occur there. Each node is not aware of the other nodes existence.
Why do you need load balancing?
Simple answer: Scalability and Redundancy.
Scalability
If your application becomes busy, resource limits, such as bandwidth, cpu, memory, disk space, disk I/O, and more may reach its limits. In order to remedy such problem, you have two options: scale up, or scale out. Load balancing is a scale out technique. Rather than increasing server resources, you add cost effective, commodity servers, creating a “cluster” of servers that perform the same task. Scaling out is more cost effective, because commodity level hardware provides the most bang for the buck. High end super computers come at a premium, and can be avoided in many cases.
Redundancy
Servers crash, this is the rule, not the exception. Your architecture should be devised in a way to reduce or eliminate single points of failure (SPOF). Load balancing a cluster of servers that perform the same role provides room for a server to be taken out manually for maintenance tasks, without taking down the system. You can also withstand a server crashing. This is called High Availability, or HA for short. Load balancing is a tactic that assists with High Availability, but is not High Availability by itself. To achieve high availability, you need automated monitoring that checks the status of the applications in your cluster, and automates taking servers out of rotation, in response to failure detected. These tools are often bundled into Load Balancing software and appliances, but sometimes need to be programmed independently.
How to perform load balancing?
There are 3 well known ways:
- DNS based
- Hardware based
- Software based
DNS based
This is also known as round robin DNS. You can inject multiple A records for the same hostname. This creates a random distribution – requests for the hostname will receive the list in a random order. If you wish to weight it (say serverA can take 2x the number of requests that serverB can), you can simply add more A records for a particular IP.
Hardware based
There are many commercial vendors out there selling appliances to perform load balancing.
- Cisco Ace Application Control Engine Module
- Barracuda Load Balancer
- JetNexus Accelerating Load Balancer Extreme
- Kemp Loadmaster 2000
- Many, many, more
Hardware based load balancing is the best way to go, if you have budget for it. These appliances provide the latest features, with little fuss.
Software based
This is where it gets fun, if you’re a technology enthusiast. If your budget doesn’t allow a load balancing appliance, or if you just like doing things yourself, software based load balancing is for you. You can turn a Linux server into your own load balancing appliance. Presumably, you could also use a Windows server, maybe even a Mac, but this article doesn’t cover those. For RHEL based, the “piranha” package provides Linux Virtual Server (LVS) and piranha (an LVS management tool – web based gui). Just “yum install piranha” and you’ll have everything you need to get started. Other softwares include BalanceNG (commercial) and a basic freeware counterpart balance.
balance
This was super simple to use. Just download, run the program. There are a few basic input parameters, and you can be load balancing in no time. This is a no frills binary program. There are no configuration files, no startup/shutdown programs, no logging or reporting. But it does have a nifty console that you can get runtime statistics from. You could create your own tools around “balance” to monitor and gather statistics.LVS and piranha on RHEL (or better yet, CentOS)
piranha is a gui that makes configuring Linux Virtual Server (LVS) easy. Here are some of the virtual server scheduling features:
- Round robin
- Weighted least-connections
- Weighted round robin
- Least-connection
- Locality-Based Least-Connection Scheduling
- Locality-Based Least-Connection Scheduling (R)
- Destination Hash Scheduling
- Source Hash Scheduling
There are two routing methods: NAT and Direct Server Return.
Direct server return is the best, because responses from the real servers go directly back to the requesting server, and don’t have to route back through the LVS funnel.
Direct Server Return:
Follow this link for General description of Direct Server Return on the RHEL site.
There are some specific requirements to make Direct Server return work. Basically, the way it works is the LVS server relays the packets to the selected real server. The real servers have the VIP bound to them. Arp requests are ignored on the real servers, using using iptables or arptables_jf. So, the VIP is bound to the LVS server, and the real servers, but only the LVS virtual server responds to ARP requests, so any incoming packets destined for the VIP go to the LVS server. The LVS server routes these packets to the real servers, which responds as if it got the packets directly!
Proof of Concept
At the time of writing this article, I had access to 3 dedicated servers, running CentOS-5. I installed piranha on the third, and load balanced two real servers!
Here is the /etc/sysconfig/ha/lvs.cf file that piranha_gui helped me make, easily:
serial_no = 30
primary = 208.109.98.248
service = lvs
backup = 0.0.0.0
heartbeat = 1
heartbeat_port = 539
keepalive = 6
deadtime = 18
network = direct
debug_level = NONE
virtual v6LB {
active = 1
address = 208.109.98.243 eth0:1
port = 80
persistent = 0
send = "GET / HTTP/1.0\r\n\r\n"
expect = "HTTP"
use_regex = 0
load_monitor = none
scheduler = wlc
protocol = tcp
timeout = 6
reentry = 15
quiesce_server = 0
server v6test1 {
address = 208.109.98.246
active = 1
weight = 1
}
server v6test2 {
address = 208.109.98.247
active = 1
weight = 1
}
}
I installed arptables_jf adn configured v6test1 like this:
yum install arptables_jf
arptables -A IN -d 208.109.98.243 -j DROP
arptables -A OUT -d 208.109.98.243 -j mangle --mangle-ip-s 208.109.98.246
service arptables_jf save
chkconfig --level 2345 arptables_jf on
Then on v6test2 like this:
yum install arptables_jf
arptables -A IN -d 208.109.98.243 -j DROP
arptables -A OUT -d 208.109.98.243 -j mangle --mangle-ip-s 208.109.98.247
service arptables_jf save
chkconfig --level 2345 arptables_jf on
I pointed www.ipv6poc.com to 208.109.98.243 in DNS.
And, presto, load balancing with Direct Server return is working.
Even though I read that LVS is IPv6 compatible, I couldn’t get it to work. Either piranha doesn’t know how to write the config files with IPv6 IPs, or the LVS version that ships piranha for RHEL5/CENTOS5 doesn’t have IPv6 support. So, instead, used round robin DNS for the IPv6 IPs:
Here is the BIND zone file for ipv6poc.com
[root@v6test1 named]# cat ipv6poc.com.zone
$TTL 1800
$ORIGIN ipv6poc.com.
@ IN SOA ns1 admin (
42 ; serial (d. adams)
3H ; refresh
15M ; retry
1W ; expiry
1D ) ; minimum
IN NS ns1
IN NS ns2
IN MX 10 mail
IN A 208.109.98.243
IN AAAA 2607:f208:1:1000::101
IN AAAA 2607:f208:1:1000::102
ns1 IN A 208.109.98.246
ns1 IN AAAA 2607:f208:1:1000::101
ns2 IN A 208.109.98.247
ns2 IN AAAA 2607:f208:1:1000::102
www IN A 208.109.98.243
www IN AAAA 2607:f208:1:1000::101
www IN AAAA 2607:f208:1:1000::102
ipv6 IN AAAA 2607:f208:1:1000::101
v6test1 IN A 208.109.98.246
v6test1 IN AAAA 2607:f208:1:1000::101
v6test2 IN A 208.109.98.247
v6test2 IN AAAA 2607:f208:1:1000::102
v6test3 IN A 208.109.98.248
v6test3 IN AAAA 2607:f208:1:1000::103
And the result is visible here: www.ipv6poc.com. At least it was at time of writing, but these are not my servers solely. If it’s not working, comment, and I’ll see what I can do to get it back up.

Just testing out the comments, wondering if it’s broken for non-logged in folks. Hardly anybody ever leaves a comment on my articles. Stats show good amount of traffic, 30 to 60 unique visitors a day, but only one comment every other month, if that. Huh. Not sure how to change that. I want comments. How do I get them?
http://www.ipv6poc.com/ if you’re hitting it with IPv6, you’re going to be fairly sticky, meaning you’ll keep hitting the same server over and over. This is because of caching DNS. With round robin DNS, my DNS server spits out the answers in different order, but your caching nameserver only asks once every 30 minutes.
If you want to force direct server return over IPv4, use the IP:
http://208.109.98.243/
Hope the comments are working alright. On the subject of load balancing, why not get the highest availability while not getting caught in high prices? Kemp’s got some great load balancers that are low priced and high in quality:
http://www.kemptechnologies.com/?utm_source=blog&utm_medium=pv&utm_content=zs&utm_campaign=home
Do your computers go down easily? If so, this can cause you to lose a lot of money. Think about the costs associated with lost work. If you are experiencing this problem, you should look into getting a hardware load balancer. Definitely a cost effective way to help combat server issues. I use the loadmaster 2000 and it has worked out great…it was one of the cheapest ones i have found, but has really helped our company out in both saving money in the long run and with getting rid of the “downtime” that we were experiencing.
http://www.kemptechnologies.com/?utm_source=blog&utm_medium=pv&utm_content=zs&utm_campaign=home
Bonnie – your comments smell like advertising, but they are on topic, and I’m just so excited to have somebody post a comment on my blog! I added Kemp to the list of Hardware Load Balancers in the article. Good luck.
Hi,
I’d to ask for some help in this matter. I’m working on a POC myself and have the same setup as you, except dns is not in place yet, but will be soon. I’m currently using /etc/hosts file on the node servers to bypass this for now.
My problem is when I hit the VIP via browser it’s returning the real hostname of the nodes instead keeping the VIP hostname like your POC. Would dns cause this kind of behavior? Any help is appreciated.
Hi, I actually figured out my problem. I had a index.html page which was redirecting using the real server hostname. I replaced the hostname with the vip hostname and things started to work, not dns related.
Or you could try haproxy. http://haproxy.1wt.eu/
Used it for two years to balance 500MBS+ of http trafic .
Dave-
How can I check to see if ARP is enabled on the web servers. The config for lo:0 shows that arp is on, however I am being told that it is on but not arp’ing.
I can’t find anything that states how to test this.
Mike – I’d have to look it up, but I think there is a way to see it with tcpdump. The idea is your web servers should not respond to arp requests for the VIP. Did u turn on arptables_jf?
Load Balancing Techniques is good .This technique would solve the problem on “Scalability and Redundancy” as stated on the above article. This application is very useful for all the windows users.
@ jean Bullington: I agree with your post. In fact this system would really help the students a lot.