We have put together a few points to think about when making sure your infrastructure will be up to the challenge ahead of a Black Friday or Cyber Monday stampede.
Load balancing does exactly what it says on the tin. Typically you will have two or more application servers sitting behind a load balancer. When traffic hits the load balancer, it will determine which server is the most underutilised and send traffic to that server. This gives you two main advantages; it spreads the load equally between all the servers sitting behind it, it also gives you some redundancy. If one of the servers were to fail for any reason (hardware issue for example), then the site would stay up, as the load balancer would simply redirect users to the other (working) server. This buys you time to boot up another instance to replace the failed server.
Generally used in conjunction with load balancing, autoscaling allows you to set predefined limits at which your servers will scale up or down. This is especially good when you have periods of unexpectedly high traffic, as the servers will detect the increased workload, and will automatically scale up (essentially boot up a new instance of your web server), which will then take some of the load off the original servers. Once the load has died down, and after a certain period has elapsed, the servers can scale down again to their original size. This is cost efficient because it will only run at the capacity required and it doesn’t require any user interaction (once the initial configuration is done).
You may be expecting additional load over the Black Friday to Cyber Monday period with genuine shoppers hoping to grab a bargain, but you should also be aware that attackers may use this time to try and take your site offline. Because the demand on your site will already be increased, some may see this as an opportunity to piggyback that extra traffic and use it to their advantage to cause havoc. There are several ways you can mitigate attacks like these, from paid-for services available from your hosting provider to manual steps you can put in place on your server. These will depend on your server setup and configuration, but will essentially do the same thing; to work out which requests are bogus, take action as necessary, and to ensure that genuine requests are still processed so that visitors are unaffected.
A website is a collection of many different parts, from the code that drives it, to the database that houses all the information, to the assets (images and styles) that bring it to life. If you have spent time load balancing and autoscaling your application servers, but your assets or database are running on a separate single instance server, then if that goes down - your site goes down too. It’s therefore important to understand your infrastructure and to take steps to ensure that there are no weak links in the chain. Ensure the same level of care is afforded to all aspects of your website, so that when (not if) the worst happens, you are able to handle it.
Even if you follow all of the above, and you put all best practices into place, there are no guarantees that something entirely out of your control might happen to cause your site to go down. When this happens it’s important you know immediately, so you can investigate and hopefully resolve the issue before too much damage is done. In this instance, it is essential that you have some form of uptime monitoring in place that will alert you when the worst happens, Uptime Robot is a good example.
Essentially it comes down to a balancing act; having enough power to deal with the increase in visitors, versus the increased infrastructure costs involved in providing this power. To try and get this balance right, it’s important to understand as much as possible in terms of expected visitor numbers, and the subsequent server load. You can use historical data from analytics, or server logs, or even order volume. However you do it, always err on the side of caution - it’s better to have too much power than not enough. You may be paying out more for infrastructure, but the damage it could do to new and existing customers if your site were to crash does not bear thinking about.
What do you think about the above, is there anything on this list you have thought about implementing, or something you think that should be on the list that isn’t - leave a comment below.