The effect of CPU on cloud backup and recovery speeds in BackupAssist ER

Using the D2D2C capability of BackupAssist ER is a great way to get your full system backups into the cloud. If you’re seeking the ultimate performance for backup or recovery, it’s worth noting that the speed of the CPU on the client machine can make a difference in high performance network environments. Here’s some pointers on how to get the most out of BackupAssist ER.

TL;DR summary

  • There are several factors that affect the speed of uploading or downloading your full cloud backup. Any one of these can be a bottleneck. These factors are:
  1. Speed and latency of network connection to your cloud storage provider
  2. Speed of the cloud storage provider
  3. The size and number of files you back up
  4. Number of threads of your CPU
  5. Speed of your backup device
  • In most cases, the speed of the network connection will be the limiting factor. However, if your speeds to cloud storage exceed around 200Mbps, then the power of your CPU can start to influence the time to upload or download a backup.
  • When doing a backup, the power of your CPU is unlikely to be something you can change. Further, the duration of the backup is also unlikely to make a tangible difference to business outcomes. So don’t worry too much about this. ?
  • When downloading your backup to do a full recovery, the time taken to do the download can make a difference to the time taken to do the recovery, and therefore the amount of downtime you experience. We make these recommendations when downloading your full cloud backup:
  1. If your network speed is 100Mbps or less, a more powerful CPU barely makes a difference.
  2. If network speed is between 100Mbps and 500Mbps, we recommend a 4 core / 8 thread CPU
  3. If network speed is 500Mbps to 1Gbps, we recommend a 6 core / 12 thread CPU.
  4. At gigabit speeds, an 8 core / 16 thread CPU is recommended.

Background – why CPU makes a difference

When uploading your backup data to the cloud, BackupAssist will perform real-time data processing to convert your files into deduplicated, compressed and encrypted chunks of data.

This process takes a low to moderate amount of CPU time. However, you may be surprised to learn that the faster data crunching is not the primary reason why having a more powerful CPU enhances performance.

The primary reason is actually the complex interactions with the cloud storage destination that happen before and after the data crunching is complete.

When uploading and downloading data to your cloud storage, BackupAssist can have 20 or more concurrent connections to your storage provider. Given that the cloud storage can be unpredictable in latency and throughput, there can be “random” patterns of activity. For example, in the case of downloading a backup:

  • “Bursty” activity: in the worst case, BackupAssist issues 20 concurrent GET requests to get 20 chunks of data. The cloud storage takes a long time to process them, and completes all of them at exactly the same time. Thus, the local computer’s CPU is idle during that long waiting time, and then a burst of activity happens as the 20 chunks have to be reassembled into data files.
  • “Consistent” activity: in the best case, BackupAssist issues 20 concurrent GET requests, and these requests return results at equal intervals in time. Thus, the local computer’s CPU is working at a consistent rate.

In reality, a typical “download my entire cloud backup” operation will have periods of time between these two extremes.

The benefit of having a CPU with many cores and threads is that the bottlenecking in the worst case scenario is not as severe or even barely noticeable. Let’s over-simplify it: if there are 20 chunks to process at once, but our CPU can only handle 2 concurrent threads, you can see that it will take 10 “work cycles” to process all 20. But let’s say we have an 8 core / 16 thread CPU – it would take 1.25 “work cycles” to process all 20.

If there are other bottlenecks in the system, namely network connection speed, the swings between the two extremes are somewhat “masked” by the other bottlenecks, so the number of cores or threads will have less impact on performance. But in high performance network environments, the CPU may become the bottleneck.

We should stress that there are many factors that affect performance, but you can consider our explanation above to be a good first-order approximation.

Experiments and results

In order to measure the effect of CPU on backup and recovery times, we set up an environment where CPU was going to be the limiting factor.

  1. Local client machine – three different CPUs in identical motherboards, with comparable RAM and SSD storage. Three different CPUs were used:
    a. AMD Ryzen 5 3400G, 4 core / 8 thread CPU
    b. AMD Ryzen 5 2600, 6 core / 12 thread CPU
    c. AMD Ryzen 7 3800X, 8 core / 16 thread CPU

  2. Network connection – local area networks were used:
    a. 100 Mbps ethernet network via a 100Mbps switch, with no other computers connected
    b. Gigabit ethernet network with direct cable attaching the client and server computers

  3. Cloud storage server: a FreeNAS install running on physical hardware, with an Intel 6 core CPU and NVMe SSD storage.

We ran backups of the exact same data set in order to obtain the results.

For reference, the data set consisted of:

  • Windows 10 Operating System
  • 889,836 files, and 521 GB
  • The data compressed down to 394 GB, and deduplicated further to 334 GB.

Table of results

Number of CPU threads81216
Backup Time – Gigabit Ethernet3:22:422:40:412:19:11
Recovery Time – Gigabit Ethernet3:33:503:02:482:35:00
Backup Time – 100Mbps Ethernet9:56:239:41:569:28:43
Recovery Time – 100Mbps Ethernet11:08:5610:59:1210:49:16

Analysis

In the case of a slower network, 100Mbps, upgrading the CPU from 8 to 16 threads resulted in a reduction in both backup times and recovery times of around 28 minutes and 19 minutes respectively. When expressed as a percentage, the improvement was around 5% and 3%.

In our fast Gigabit network, upgrading the CPU from 8 to 16 threads resulted in a reduction in both backup times and recovery times of around 1h 3m and 59m respectively. That’s a huge 31% and 28% improvement.

Conclusions

Based on the experimental results and knowledge of how our software works, we can offer these recommendations:

Scenario 1: backing up to or recovering from a public cloud via the Internet

It’s likely that the speed of your Internet connection, and the latency between your machine and the cloud service will be the limiting factors. You’ll see results similar to our 100 Mbps simulation.

We recommend a 4 core / 8 thread CPU machine for doing the backup and recovery. A more powerful CPU will be faster, but not significantly.

Scenario 2: backing up to or recovering from a private cloud via a LAN

Let’s assume you have a Gigabit or better network connection.

If your private cloud machine uses a hard drive, the latency of the hard drive will be the limiting factor. (We’ll write more about this scenario in another blog post.) Here, a 6 core / 12 thread CPU will give you good backup and recovery performance, and a more powerful CPU is unlikely to improve that.

If your private cloud machine uses SSDs, you’ll see a scenario close to our Gigabit ethernet simulation. We recommend an 8 core / 16 thread CPU to make sure that the local CPU is not the limiting factor.

Leave a Comment

Share on email
Share on print
Share on facebook
Share on google
Share on twitter
Share on linkedin

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email. Join 1,874 other subscribers