High availability refers to systems that are durable and likely to operate continuously without failure for a long time. The term implies that parts of a system have been fully tested and, in many cases, that there are accommodations for failure in the form of redundant components.
High availability (HA) and disaster recovery (DR) are often thought of as synonymous with each other. A highly available infrastructure component or IT system is described as “fault tolerant” or having the ability to “fail over”. An example of high availability at the component level is adding redundant power supplies. At the datacenter level adding dual UPS (A/B power) adds high(er) availability to power systems. To some, this implies the system is resilient enough to survive a disaster. Implementing high availability on its own, however, does not achieve disaster recovery. So what is the difference between High Availability and Disaster Recovery?
The hallmark of a good data protection plan that protects against system failure is a sound backup and recovery strategy. Valuable data should never be stored without proper backups, replication or the ability to recreate the data. Every data center should plan for data loss or corruption in advance. Data errors may create customer authentication issues, damage financial accounts and subsequently business community credibility. The recommended strategy for maintaining data integrity is creating a full backup of the primary database then incrementally testing the source server for data corruptions.
Even with the highest quality of software engineering, all application services are bound to fail at some point. High availability is all about delivering application services regardless of failures. Clustering can provide instant failover application services in the event of a fault. An application service that is ‘cluster aware’ is capable of calling resources from multiple servers; it falls back to a secondary server if the main server goes offline.
Geo-redundancy is the only line of defense when it comes to preventing service failure in the face of catastrophic events such as natural disasters that cause system outages. Like in the case of geo-replication, multiple servers are deployed at geographical distinct sites. The locations should be globally distributed and not localized in a specific area. It is crucial to run independent application stacks in each of the locations, so that in case there is a failure in one location, the other can continue running. Ideally, these locations should be completely independent of each other.
Load balancing is an effective way of increasing the availability of critical web-based applications. When server failure instances are detected, they are seamlessly replaced when the traffic is automatically redistributed to servers that are still running. Not only does load balancing lead to high availability it also facilitates incremental scalability. Network load balancing can be accomplished via either a ‘pull’ or a ‘push’ model. It facilitates higher levels of fault tolerance within service applications.