I can’t speak to Synology’s implementation but perhaps a glimpse down the rabbit hole of the Linux file system will give some insight into backups, deduplication, and incremental synchronization techniques.
DISCLAIMER ALERT
What follows is not intended to be complete and accurate. It’s a cartoon sketch of decades of technological advancement.
A file is a collection of data and a label that allows you to identify it. A directory is another file. What makes it special is it is a list of other files. Some of the files in the list can be directories.
At its core, a file is a collection of blocks of data, along with some metadata with bits of data like its name. This thing is called an inode. So every file is an inode and every directory is a file, so every directory is an inode too. Our file is a label we use to identify an inode.
We can give the same inode multiple labels. An inode can simultaneously be /home/user/myfile.txt or /var/log/something_else.lst. That’s called a hard link. The inode keeps track of the number of links using a usage counter. Every time a hard link is created, the usage counter increments. Every time a link is destroyed, the usage counter is decremented. When the usage counter reaches zero, the inode and all the associated data blocks become available for some other use.
So lets talk snapshot backups.
A snapshot is a copy of all the inodes on a file system at a specific point in time. The operating system can make this copy very quickly because it’s just the metadata and not a copy of the actual files. The process is making a directory structure with hard links to the existing inodes. The magic happens the next time any file is changed. Instead of altering the existing inode, the changed file is stored as a new inode. The old inode is still around, and will remain around as long as the usage count is above zero, which will be true as long as the snapshot directory copy exists. The good news is this is really efficient. The bad news is deleting a file may not actually recover any disk space. There might be some other link, like the one to the snapshot image, that keeps the usage counter from decrementing to zero.
Now we can finally get into deduplication and replication at the file level. This actually happens at the block level for greater efficiency, but the concept holds.
Every time a file is saved, it is compared to every other file on the system. The usual technique is to calculate a checksum for the file based on its contents and use that value as one of the hard link filenames. If the hard link exists, then the new file is linked to the existing file. Otherwise, a new inode is created.
The last backup was done from a snapshot. Every file in that snapshot has a usage count of at least 2; 1 for the original file and 1 for the snapshot.
This backup starts with a new snapshot. Every file in the old snapshot now has a usage count of 3. All the newly altered files have a usage count of 2. An incremental backup searches for all files with a usage count of 2 and copies their inodes and data blocks to the remote location.
Some files may have had their usage count reduced to 1 if they are in the last snapshot but no longer in the latest snapshot. When the old snapshot is deleted, they will also be deleted.