Website Tutorials

How To Handle A Large Burst Of Traffic To Your Website With Load Balancing

Published

7 months ago

18 October 2023

How To Handle A Large Burst Of Traffic To Your Website With Load Balancing

Website owners spend large sums of money making sure their sites are running on the biggest, best equipment available. But, what happens when the server hosting all of your website services cannot cope with an overabundance of inbound traffic?

Nothing is more frustrating than having a website break during peak usage times or worse, during a large burst of traffic from a sales event. It’s a problem every website owner perpetually dreads, especially before launching some piece of new content, sale, or product.

Whether the surge in traffic comes from significant growth, a planned event period, or due to social media buzz overloading the system, it’s all the same in the end. An unresponsive, unusable website and terrible end-user experience.

The damage to your reputation alone can cost millions, and that doesn’t count any of the lost revenue missed during the downtime. A portion of those missed opportunities will never return due to the bad taste left over by a poor website experience. Well then, what is a website owner to do when they have reached the limits of the highest tier hardware available to them? Luckily, we have some tips to scale WordPress sites for site owners, but most of the time it’s a lot more complicated.

Enterprise Hosting Is The Answer To Prevent Downtime Woes

The enterprise hosting space has the answers to these issues. After all, how do the likes of Amazon, Google, Netflix, PayPal, and thousands of other top-tier online businesses seem to keep their websites always-on without ever having such major failures from their web server? The answer is not a definitive one, as there are many different approaches to making your website downtime resilient through enterprise-grade High-Availability solutions.

However, one constant that almost all of these big box companies employ is some form of load balancing. It is generally the first progressive step into horizontal scaling solutions and for good reason, it gets results.

Load Balancing can do wonders for a lonely web server by giving it a coworker, or four, to help share the workload and keep everyone happy. After all, what would happen if our friends in the food service industry only hired a single employee to handle the lunch shift on a busy campus? Rhetorical exposition aside, the truth of the matter is you cannot achieve 100% uptime with only a single server. It does not matter how expensive the hardware was, every system inevitably fails. The measure of a website’s virility is not about how good the hardware is, it’s more about how well your infrastructure is at combatting both planned and unplanned downtime events.

What Are High-Availability Products?

The enterprise hosting world is built on High-Availability products. But, what are High-Availability products? These are solutions that combine clever applications of software to have multiple servers emulate an always online single-server website or service.

One approach High-Availability solutions utilize is the hot-spare principle. The hot-spare principle is the practice of having a secondary device, instance, or server, ready to go and handle production workloads, but left out of rotation similar to the spare tire built-in to most car models.

This hot spare can be manually or automatically introduced into production in an instant, reinforcing the public-facing hardware in the event the original live servers fail. This is standard practice for most enterprise-grade solutions on the market.

What is Load Balancing?

Load balancing is the practice of using a special network device, called a Load Balancer, to split incoming requests into bitesize pieces proxied to a pool of two or more back-end servers. Load balancing is a necessary component in horizontal scaling as it delivers the means for websites, applications, or services to remain always available to end-users, despite any traditional downtime scenarios. To understand how load balancing works, we must first understand the difference between vertical and horizontal scaling.

Monolithic Servers vs Server Clusters (Vertical vs Horizontal Scaling)

To understand how load balancing can help provide 100% website uptime we must first define the difference between vertical and horizontal scaling practices. Two very different approaches to resolving the same problem of resource starvation from an overloaded website or application.

Vertical Scaling – Monolithic Servers

Vertical Scaling is the practice of handling website growth through continuously adding additional resources into a single server or by migrating/resizing to a larger single server. Just like Mario with his magic mushroom, the server starts out small, sometimes as a shared-server hosting plan. It becomes bigger and bigger over each iteration of its upgrades.

Vertical scaling is the standard path most websites undertake as they grow in size. Afterall, it’s the logical approach. Site needs more resources? So let’s buy a big server with all the resources!

Unfortunately, there are diminishing returns with this practice as doubling resources doesn’t necessarily double the sustainable workload. This coupled with multiple single points of failures, i.e. hardware failure, web service failure, database service failure, operating system failure, network failure, etc., all of which translates to vertical scaling being far from downtime resilient and is a practice any serious website should shed as soon as fiscally viable.

Horizontal Scaling – Server Clusters

A convention for website growth that relies on server clusters in combination with load balancing techniques to split traffic across multiple backend servers or devices. Instead of continuously increasing the size and resources of an all-in-one monolithic server, horizonal scaling solves growth through adding additional similar-sized servers along with some form of load balancer to handle the splitting of inbound traffic evenly.

Note: Horizontal scaling typically requires that database traffic is split off the nodes in a web cluster and into a dedicated database server or database server cluster.

Horizontal scaling is much more downtime resilient. However, it comes at a much bigger price tag and requisite technical acumen to administer properly.

Two fundamentally different solutions to the same problem. However, when looking at the situation from a 100% uptime standpoint, vertical scaling is destined to eventually hit a well, where it simply cannot move forward. This is generally when site owners get introduced to the practice of horizontal scaling, which has no upper limit. You can just keep adding servers and additional load balancers as needed.

How Many Nodes Should I Have In My Cluster?

The +1 Spare Rule

Always have one spare node in your clusters beyond what is minimally viable for that cluster to run at full capacity during peak operational hours.

Horizontal clusters don’t mean a whole lot if they cannot prevent downtime. Running with only what is minimally viable merely ensures that when you encounter a node failure, that there will be downtime. The +1 Spare Rule is the antidote to node failures.

A common mistake I see often when planning out your horizontal infrastructure is neglecting the importance of The +1 Spare Rule. It’s not enough to have the bare minimum number of nodes needed for daily operations. To achieve a true 100% uptime, you will always need to have extra capacity in your clusters beyond what is necessary. The extra capacity can be one or more spare nodes configured in either a Hot-Spare or Live-Spare configuration to provide a failsafe when the inevitable hardware failure rears its ugly head.

Hot-Spare Cluster Members

An additional spare node that is fully configured, tested, and standing by to be deployed at any given moment. This can be configured a couple of ways, but it usually a failure node setup to handle traffic automatically when errors are encountered in the primary load balancing pool. Some prefer to leave the node idle to preserve its hardware lifespan and is only put into rotation when necessary. You can think of this like the spare tire in your car. It’s always there, on-hand, ready to take up traffic.

Live-Spare Cluster Members

A live spare is essentially a Hot-Spare, but instead of laying dormant, waiting to be put into use when needed, it is instead added into production along with the other similar live nodes. Continuing our spare-tire analogy, a Live-Spare is akin to the extra tires you see on every big-rig semi-truck. They have split-rims with 2 sets of tires all running in service. If one has a blow-out, the Live-Spare is already in place keeping the truck moving until it can be properly serviced.

In either case, whether you opt for Hot-Spare or Live-Spare, the general principle is the same. That is, being able to sustain full capacity even while your down a single node in the cluster. The more spare nodes you have, the more downtime resilient your infrastructure becomes.

What Other Considerations Are There For Horizontal-Scaling Servers Clusters?

There are several other items to consider when load balancing your websites. Overlooking these items can and will cause additional pain when trying to move into a load-balancing configuration. These are the additional challenges that are often overlooked when site owners first transition from their single monolithic server into the realm of clusters and load balancers.

File Replication

For multiple web nodes to function in a load-balancing configuration, there must be some level of file system replication between the nodes. Otherwise, changes made on one node will only appear on that same node.

Example: In a 3-node setup, when a visitor uploads a picture, that upload will only exist on the specific node that the visitor was currently routed to at the time of the upload. Any subsequent attempts to view the uploaded image will result in a 404 on each of the other two nodes. Making the picture appear and disappear to visitors depending on the node they happen to land on when requesting the URL of that image file.

Method 1 – Simple One-Way File Replication & Traffic Pinning

One-Way file replication runs on the primary node in a cluster, usually, node01. Any changes made to the filesystem on the master node are detected and subsequently synced across to the other cluster members in near real-time. Traffic pinning on the load balancer is required for this type of replication to work correctly. Since only changes to the master node are synchronized, it becomes necessary to configure traffic scripts on the load balancer device that can detect URLs that perform file uploads or changes and routes those requests to the master node, ensuring the changes will be migrated to the whole cluster.

Uneven Load Distribution – An unfortunate side-effect of traffic pinning is the incongruency of traffic handled by the nodes in the cluster. Since traffic that does write operations will always be pinned to the master node, this means the master node will always be servicing more requests than the rest of the cluster members. This can be offset by planning for beefier hardware for your master node or using more advanced weighted load-balancing algorithms to assign less read traffic to the master node, offsetting the incongruency.

Single Point of Failure – Traffic Pinning inherently introduces a single point of failure. This is due to the reliance on a primary node for one-way replication. If the master node goes offline for any reason, write traffic to the site will start to fail resulting in errors on the end-user’s browser.

Method 2 – Network Storage Devices & File System Locking

Another common solution to file system replication in a horizontal scaling scenario is using a network storage device like a SAN in conjunction with a network-aware file system like NFS or OCFS2 to provide a block device that each web node can directly mount and manipulate. This kind of solution provides no delay in replication to secondary nodes but introduces a new problem to contend with in the form of file system locking trouble.

File Locking – As with all things, network-aware file systems have their limits too. One common problem you may encounter using a shared network storage device is PHP processes stacking up due to the slower bitrate of a network device versus a true hardware device. It’s best to make sure your session files, cache files, and socket files are all on local disks otherwise they will significantly hinder site performance .

Single Point of Failure – The network device becomes the single point of failure in this scenario, since it’s one device that all nodes access directly, when it goes offline, it will impact all the cluster members simultaneously.

Caching / CDN

The move into horizontal scaling can be a tricky one as tried and true industry staples like disk-based caching do not work well with file replication systems. So another solution for caching is usually necessary. Switching from disk-based caching to memory-based solutions like Memcached, Redis, Varnish, or Nginx reverse-proxy are the solutions designed to work both in-memory and over a load-balanced cluster of nodes.

Employing a CDN is an excellent strategy for reducing the overall server load in your clusters. Since static content will be delivered by the CDN, your servers won’t have to deal with those requests. This frees up capacity on the load balancer as well as the nodes themselves as images, css, and other static files are served by the content delivery network.

Sessions

Another vital component of website operations is session storage. You will need to reconfigure your applications’ session save path so that it saves the session files into a central location that all members of the cluster can access. If not, visitors will be unable to navigate properly on the website. They will not be able to stay logged in which will impact many other important site features like checkout, carts, and user dashboards. Session save path can be moved into a database or onto a network storage location available to all nodes.

Firewalls & DDoS Mitigation

Load Balancers are not firewalls and do not protect against DDoS attacks. While load-balancing does increase the baseline capacity of your website due to the increased inventory running multiple nodes in a cluster provides. However, it is no substitution for a proper firewall and DDoS mitigation solution. DDoS attacks can easily cripple load balancer devices just like any other network device depending on its capacity and the size of the DDoS attack. Relying on solutions that stand in front of your load balancer for DDoS mitigation is a basic requirement in the cyber world of today. Securi, CloudFlare, or Arbor are designed to combat malicious traffic and that is true for either vertical or horizontal scaling setups.

Database Split

It becomes necessary to separate your database from your web nodes when moving into horizontal scaling. Having multiple web nodes which host their copy of your databases is in effect running multiple independent sites instead of a proper load-balanced solution. So each web node will need to be configured to connect to a central database server or cluster of servers. Just like web nodes, database servers can be split off into a replication cluster dispersing traffic across several servers instead of a single one.

DNS Round-robin

An inexpensive way to split traffic between multiple servers without the cost of a Load Balancer device is through DNS Round-robin. Essentially, if you add more than one A or AAAA record to your DNS zone entries, the DNS system will alternate between each record on each subsequent lookup. While this technique can definitely be put to use early on in the transition from vertical to horizontal scaling, it’s often not sophisticated enough to handle sites that require advanced 7th-layer load balancing features like: traffic pinning, alternate load balancing algorithms, take nodes out of service, or custom maintenance pages. The biggest failing in adopting DNS Round-robin for load balancing is that it has no way to know if a node is offline and will send requests to down nodes regardless due to node state ignorance. Where as load balancer devices are specifically designed to address this problem.

Summary

Load Balancing is the gateway to horizontal scaling. Without it, the cyber world of today would be a very different landscape. Sites could not conceivably handle such robust traffic surges and every visitor to your website would have to wait in line for their turn on the website. However, with load balancing, and a few other key enterprise-grade solutions we can safely operate sites reaching millions or even tens of millions of page views daily.