Notes on Rsync Server scalability for hosting backup data

Hi everyone,

Many of you who run our own I.T. consulting practice have been interested in hosting your clients’ data. In fact, over the last month, we’ve discovered that many of you have already tested out our Rsync engine for the purposes of synchronising files, SQL and Exchange databases offsite.

The question of scalability has recently been asked – that is, how many clients, how many simultaneous backups, how much data, can I host on a single machine?

That’s a very open-ended question – like asking how many websites you can host on a single webserver. So while the answer is “it depends…”, I’ll putting out some basic pointers here to get everyone started, and look at publishing some case studies in a few months time once we have solid data from “out in the field”.

When it comes to Rsync hosting, scalability is a combination of disk speed and CPU speed – but typically a good combination of both will outperform a super CPU with average disk. Also, I believe that having multiple “good” servers will definitely outperform one “great” server – a bit like how Google uses lots of average bits of hardware instead of one supercomputer.

There are also two stages of the backup – the in-file delta calculation stage that is disk/CPU bound, and the transmission stage which is largely network bound.

[ In-file delta calculation stage ]
– Source machine calculates quick checksum – generally disk bound
– Data host calculates quick checksum – generally disk bound
– Source machine calculates detailed checksums – CPU and disk bound
– Data host calculates detailed checksums – CPU and disk bound
[ Transmission stage ]
– In-file deltas are transmitted – network and disk bound, with some load on CPU

In-file delta calculations are a function of file size – calculating checksums on small files (say <10 MB) is extremely fast, but on huge files (50 gig) it’s slow. So if you know that you’ve got clients with huge files, be prepared for poor scalability. But if your files are generally small, then scalability will be very good. We’ve found that on a reasonable spec desktop hardware with a Quad core processor, a checksum takes 20% CPU (ie. one core) at the speed of about 1 GB per minute.

From a “common sense” point of view, if your server has 4 cores (eg. 2 CPUs and 2 cores per CPU) then it should be able to handle 4 simultaneous in-file delta calculations easily, with the disk speed being the limiting factor. However, if you try 10 simultaneous in-file delta calculations, then performance will slow down.

So we recommend staggering your backup jobs. If one client’s backup starts at 9pm, start the next one at 9:10pm, and the third at 9:20pm.

The thing to note is that many of you will be hosting data for your clients in the same timezone, so the backup activity will most likely occur overnight – when your clients’ internet connections are available for use.

This means that your scalability will be reduced somewhat – for example, if you have an 8 hour backup window, you’ll be able to host only 1/3 the amount of clients/data as if you had a 24 hour backup window (such as clients in different timezones).

Now, over time we’ll be working with our partners to obtain some empirical results. We’ll also be looking to see what we can do with checksum caching, which will further improve scalability. But until then, here is a summary of our recommendations:

• Prefer multiple good machines over a super machine. That means, if you have a budget of $10,000, you’ll most likely get better performance from two $5,000 servers instead of one $10,000 server. [Also note that the performance difference between a $5,000 and $10,000 server might be 50%, not 100% as the price would suggest.]
• Stagger your backups. Don’t set them off at the same time – instead, stagger them by a reasonable interval, like 5 or 10 minutes.
• If you’ve got clients with large database files, expect lower scalability than having many smaller files

Hope this helps!

Oh, and for a limited time, we’ll be working with our partners to analyse load on their backup servers and to help architect scalable backup solutions. If you’re interested in participating, please contact me in our Melbourne office.

Regards,

Linus

Leave a Comment

Share on email
Share on print
Share on facebook
Share on google
Share on twitter
Share on linkedin

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email. Join 1,874 other subscribers