Your office burned down over the weekend. We’re sorry to be the bearers of bad news, but it happened. What will your business do now? How fast can your IT infrastructure be back up and running? What data will it need to recover with the highest priority?
These are the questions that will inevitably arise when hammering out a Disaster Recovery plan. In the industry, we call these questions the Recovery Time Objective (RTO) and Recovery Point Objective (RPO). But how do you determine the ideal RTO/RPO for your business?
In this article, we’re going to explain the concepts of RTO/RPO and how you can apply them to disaster-proofing your business’s IT infrastructure. That way, when you get the news that your systems and data are smoldering ashes (or less melodramatically, that your server has crashed), your first question won’t be: “what do we do now?”
Recovery Time Objective
Your RTO refers to how long your business can afford to have its vital systems and data offline before it really starts affecting your bottom line. In even simpler terms, it means: how quickly do you need to recover?
When determining an acceptable RTO, you should be focusing on individual applications rather than entire servers. The stopwatch starts on your RTO the instant an important application is no longer usable, and won’t stop until it can be used again as normal by all users who require it.
You’re on your way to understanding RTO/RPO.
Recovery Point Objective
The second part of the RTO/RPO considerations is, perhaps unsurprisingly, RPO. Rather than the recovery time, RPO refers to the systems and applications themselves. Determining your RPO means working out how much of the data contained within these systems and applications your company could afford to lose.
This means that if you could afford to lose 2hours worth of data from your Exchange Server without it having a serious impact on your business, then your RPO would be 2 hours.
While RTO will determine the speed necessary for your recovery, RPO determines how frequently you should be making backups and what kind of backups will be necessary.
RPO is important for backups. This is because it addresses the question of when and how often to back up. If you backup at 9pm, and the next day at 10am you lose a hard drive and all of the data on it, then there could be up to 12 hours’ worth of data lost – known as a “backup gap”.
In practice, the lost data may only be emails and word docs that staff worked on from 8am to 10am when people were at in the office. But then again, it could be a full 12 hours’ worth of data from a database that was being updated all night by an important application… The key is to determine what data you would lose on each type of system if it went down, and how much of this you could afford (if any).
Okay, now you understand what RTO/RPO means, let’s put this knowledge into practice.
Determining your downtime
When establishing an acceptable level of downtime (for the record, we’re talking about the RTO aspect of your RTO/RPO planning here), it can be tempting to simply ask your users. After all, they know how long they can do without their applications, right? Well, probably not.
Generally, when asking end-users you’re going to get a response that’s either unrealistically short (“If I can’t access my emails for 30 seconds, I’ll miss that sale and the world will end in fire.”), or naively long (“I don’t even know what that application does, I’m pretty sure I could go without it for a few months”). That’s why, as with most things in life, you need to look at the data.
The first step is to make a comprehensive list of every system and application your business uses as a part of its operations. Then determine what role each actually plays – i.e. all functions it performs for your business and what departments/users would be affected by its loss.
Next you need to determine potential losses that could occur if the system or application were to go down – i.e. lost revenue or sales, salaries paid to idle workers, additional expenses due to lack of access, damage to company reputation etc. Do this for every application individually, and don’t forget to take into account that certain times of year will account for heavier consequences than others.
Now, once you’ve worked out these details is where you get to the fun part – determining exactly how long before these losses become unacceptable. Exactly how long this time will be depends on the specifics of your business, but here are some questions you should ask yourself that will factor into your ideal RTO:
- Do you hold customers’ data on their behalf? If so, what service agreements and obligations do you have with them? This will impact how quickly you will need to recover that data.
- Do you have customers who need real-time access to your data? An example may be point-of-sale systems.
- What systems have dependencies? For example, if you lost a database then what applications would be affected and what are their corresponding uptime requirements?
- What systems would result in direct financial loss if they went down? E.g. a website selling goods.
- What systems would cause a production outage? E.g. factory-floor or quality control systems
Once you’ve answered these questions and worked out the required recovery time for each application and system, your overall RTO is determined in one of two ways. Either: if there’s one application that will cause significantly greater loss to your business than the others – use the time taken to recover this as your RTO. Or, if all applications are equally as valuable, simply average the times for all of them and use this instead.
The final step is to perform a test recovery for each and every system and application. However long it takes is your Recovery Time Actual (RTA). Your goal is then to make your RTO and RTA one and the same.
Deciding what to revive
The final phase of your RTO/RPO planning deals with determining an acceptable level of data loss. RPO refers to the frequency with which your data is recoverable – e.g. if you’re performing daily backups then your Recovery Point will be 24 hours.
While it’s possible to achieve an RPO of zero data loss, these kinds of solutions are almost always costly. Most businesses are going to need to determine a realistic RPO that will cause minimal impact.
Remember those questions you asked yourself when figuring out a RTO? Well these same considerations are going to affect your RPO as well – with the focus being on the data instead of recovery time.
Once you know how much data you can afford to lose, you can plan a backup strategy to meet that requirement for each system, data store and application.
A daily backup will allow you to have a copy of your data as it was and the end of the previous business day. If you have determined that a loss of data at any point in the day can be replaced by the previous day’s backup – then you can protect your data with a daily backup.
If you have determined that you cannot afford to lose more than a few hours’ data, then a daily backup will not be enough. You will need an ongoing backup of your data through the day. For a data hard drive this could mean you need a mirrored drive. For an SQL database, it means you may need a transaction level backup that runs every 10 or 15 minutes.
If your requirement is somewhere in the middle, you can afford to lose a half a day’s data, then you could consider backing up the data twice a day.
Anything you’re not sure about when it comes to RTO/RPO?
Leave your question in the comments below, tweet it @BackupAssist or post to our Facebook wall.
Share this article and become a business continuity crusader.