How does Hard Links work with Single Instance Store (SIS)?
This is a question we get quite often because it is so different than traditional backups.
When doing either a File Protection or an rsync backup job, using a schedule other than Mirror 1:1, you will be using the SIS.
The SIS uses Hard Links to allow a never ending incremental backup.
This means for any file that has not been changed, it will be marked so that it is not backed up.
The confusion comes in because in the past, an incremental backup would be smaller than the full backup, and you would need to have every backup made, since the last full backup, to do the restore. Because of Hard Links each folder is really linked back to the unchanged data, so every folder appears to be the size of a full backup. This allows you to restore all your data from the newest backup, even if older backups had been deleted.
Hard Links have the drawback that when you select all of your folders, your unchanged data will be reported for each folder, even though it was only backed up one time. A 1MB file will show as 3MB if you are looking at 3 folders. You will need to check the properties of the hard drive to see how much data has actually been written. This also means if you copy the backup directories to another location, the hard link be broken, but your data will be copied for each folder you have.
Let’s take a look at what I am talking about. Here is a screenshot of a drive I formatted for this test. Other than the system data for this drive it is empty. The used space is only 93.2MB.
I then did a backup of a folder using File Protection, a total of 12.6MB of data was backed up to the hard drive. There were 3 files created , a backup folder called 2009-12-30 because I ran a job for that day, a .catalogues folder to keep information for the restore, and a file called 2009-12-30.ba that keeps information for the media usage report.
We are just going to look at the backup folder because that is where all the data is.
So this is what you would expect it shows that the folder 2009-12-30 has 84 files, 6 folders, and is 12.6MB. If we look at the drive we will also see that the used space increased from 93.2MB to 106MB.
This was our first backup so we expected it to be a full backup. When you are rotating drives, when you get back to a drive that has a backup on it, then BackupAssist will compare the existing data to the source and copy over what has changed since the last backup to that particular drive.
I have done a second backup but changed no data to show you what happens. This time the folder that was created was 2010-01-01. No files were backed up because nothing had changed as shown by this screen shot from the BackupAssist report.
Now, let’s look at the properties for the 2010-01-01 folder it shows us that is the same size as the first backup.
Also if we take a look at the drive properties, if it did do a full backup instead of a incremental backup, there would be an increase in used space but looking at the screen shot below that did not happen, it is still showing the same size. The backup folder 2010-01-01 is just Hard Linked back to data that was already on the drive.
Let’s now take a look at what happens when you have data that is added.
I added two files to the source folder that were a total of 27.9MB. Here is a screenshot of the report for that job. There are now 86 files 84 did not need to be copied. The total size is now 40.66MB, 12.70MB did not need to be copied again, but 27.96MB did.
We now show that the properties of this folder 2010-13-01 shows that not only does it have the 2 files that were added but the original 84 files from the first backup as well.
However the used space on the drive only went up 27MB, not 40MB, because the other data was already written there, and is just Hard Linked.
Now that we know how it stores the files using SIS, we need to understand how the files are handled when these backup folders start to be deleted.
First we are going to look at how unchanged data is handled. As long as there is a link to the data, then it can never be deleted. This is how you can restore data from the newest file even if your original backup folder was deleted. In the example below it shows how unchanged data is linked for 3 backups.
If we delete Backup 1 let’s say it was the first backup done so it was your “FULL” backup. The diagram below shows how the link is removed from that folder but Backups 2 and 3 are still linked, so the data is still able to be restored.
Then why delete the folders if you do not delete the data?
The answer, is that you only keep unchanged data, files that have changed, and are no longer linked to newer folders, are deleted.
For example let’s say you have a report that is updated for a meeting every Friday. It is unchanged on four of your daily backups, but that Friday it was modified.
You can see in this diagram that you have two versions of the same report. So as you delete the folders each day you will see the Hard Link being deleted as well, but for three days you will still be able to restore either one.
After all the folders that the old report was linked to, are deleted, then that file is deleted and the space it was using on your drive is freed up.
For more information on File Protection you can review our user guide: