Backup Format Overview
The backup format is based on the idea that backup data on the storage is always kept as a data container regardless of the backup type. This approach allows keeping backup plans completely independent from each other. Every backup plan is always a separate configuration that delivers backup data to a specific location on backup storage. In other words, each separate backup plan data is kept in its own directory on backup storage. This data structure expels any backup data interference issues.
Backup data is divided into blocks and a data block is a main operating entity instead of files and folders. As data is uploaded to backup storage, blocks are combined into data parts, which size can vary. A data part size depends on two factors: uploading speed (a new data part is formed every 5 minutes) or size limit (1 GB). Uploading backup data by parts allows to continue upload in case of backup interruption: only an unfinished data part is uploaded again. All previous parts that were uploaded prior to connection breakdown are already on backup storage: no need to upload them again.
Supported platforms:
- Windows (as of Backup Agent for Windows 7.1 and later)
- Linux (as of Backup Agent for Linux 4.1 and later)
- macOS (as of Backup Agent for macOS 4.1 and later)
Currently, the following backup types are supported only in the legacy backup format:
The new backup format key features are: New backup format key features for Windows backup are:
- Synthetic Full Backup
- GFS (Grandfather-Father-Son) Retention policy
- Forever Forward Incremental
- Object Lock (Immutability)
- Client-Side Deduplication
- Consistency Checks
- Restore on Restore Points
- Optimized operations with storage, resulting in fewer requests, faster synchronization, and faster purge
- Continued data upload in case of network issues
- Object size up to 5 PB to any storage destination
- Optimized performance and storage usage for a large number of small files
New backup format features for macOS and Linux backup (BETA) are:
- Client-Side Deduplication
- Consistency Checks
- Restore on Restore Points
- Optimized operations with storage, resulting in fewer requests, faster synchronization, and faster purge
- Continued data upload in case of network issues
- Object size up to 256TB to any storage destination
- Optimized performance and storage usage for a large number of small files
- Improved incremental backup performance.
In the beta version, the following features of the new backup format are missing for macOS and Linux:
- Synthetic Full Backup
- GFS (Grandfather-Father-Son) Retention policy
- Forever Forward Incremental
- Object Lock (Immutability)
Currently, the new backup format is supported for the following backup types:
- File backup (now supported for macOS and Linux)
- Image-based backup
- VMware backup
- Hyper-V backup
Terms and Definitions
The section contains several new terms and entities that need to be explained to operate them in the future.
Bunch
Bunch is a notion of a backup plan. A bunch is always unique within the cloud directory and the plan type. Using bunches (data structure by plans) enables comfortable data deletion from cloud storage with many other advantages since all backup content is stored in one directory.
Generation
Generation is a complete self-contained dataset that is sufficient for restoration. In other words, generation is a sequence of a full backup and incremental backups for a specific backup plan. Every full backup starts a new generation.
Restore Point
Restore Point is a partial data set for restore. A full-fledged restore point contains at least one file or directory. If a restore point does not contain any file or directory, it is considered empty, but successful can contain blocks for further subsequent runs. A valid Restore Point guarantees a correct restore of backed-up data. On the opposite, an invalid Restore Point does not contain a complete data set for restore, but at the same time can contain blocks that are used for restore from other Restore Points.
It is recommended to schedule full backup at least once every 3 months
Client-Side Deduplication
Deduplication is an approach that involves multiple usage of the same data parts in various processes.
This functionality is not supported for legacy backup format
The new backup format uses client-side deduplication. This approach brings the following benefits:
- Client-side deduplication is much faster compared to a server deduplication
- The absence of internet connection issues
- An internet traffic decrease
- A server deduplication database constantly grows, and this can cause a significant expense increase. Client-side deduplication uses local capacities only.
Regardless of a backup type, the first backup is always a full backup. Bringing a routine to a backup, a backup implies data updates, thus next backup jobs are usually incremental and depend on full backup and previous incremental backups as well.
The backup format reckons for a full backup plan independence, so each separate backup plan has its own deduplication database. Moreover, backup plan generations also have their own deduplication databases.
Once a backup plan is run, the application reads backup data in batches aliquot to block size. Once a block is read, it is compared with deduplication database records. If a block is not found, it is delivered to storage and is assigned with a block ID, which becomes a new deduplication database record. The block scanning continues, and if a block matches any of the deduplication database records, a block with such ID is excluded from a backup plan.
This approach significantly decreases a backup size, especially in virtual environments with a large number of identical blocks.
If a deduplication database is deleted or corrupted, a full backup is always executed
For image-based backup type, the approach is slightly different. Instead of cluster reading, a Master File Table (MFT) is read then the mechanism checks which files have been modified. This decreases source data reading exponentially.
Consistency Checks
While backing up data, a user is sure that it is possible to restore data, but this is not always the case if backup data is corrupted. This issue can have many reasons, ranging from technical problems with the cloud provider’s service to industrial sabotage.
The consistency check is a technique that provides avoiding data losses. By finding any discrepancies, a user is notified if some backup objects are missing in backup storage or there is a mismatch between object sizes or modification dates.
Once a consistency check is run, the request goes to the backup storage. A file list is requested along with metadata.
In all cases, the user is notified about backup damage. If something happened, MSP360 Backup runs full backup automatically. Possible damage to previous generations is also monitored.
Once a consistency check is executed, the user is aware of any possible mismatches and is able to plan further actions to solve possible issues.
This functionality is not supported for legacy backup format
Mandatory Consistency Check
In the new backup format, the mandatory consistency check is the current generation check. The current generation consistency check is mandatory for all plans and is executed before starting any backup plan.
Full Consistency Check
Full consistency check implements backup plan generation checks with the exception of the current generation check, which is the subject of a mandatory consistency check. After a successful full consistency check, the user can be sure that backed up data is ready to be restored.
Changed Block Tracking for Image-Based Backups
Changed Block Tracking is an algorithm that features a decrease of data reading in backup source during incremental image-based backups.
The changed block tracking algorithm is supported by NTFS file systems only
This functionality is not supported for legacy backup format
Once a first full backup is made, each MFT (Master File Table) block is marked. On subsequent incremental backup runs, the MFT table is read again and blocks are compared. If a block was modified, the changed block tracking algorithm determines which files were modified and locates disk clusters that contain these files' data.
Once all blocks are compared, only modified blocks are sent for reading.
As a result, the changed block tracking algorithm reduces the processed data amount when reading a disk that significantly reduces the backup time.
Restore on Restore Points
The Restore Point approach enables the guaranteed restoration of backup data. This means the following: if a restore point is valid, the backup dataset is valid to be restored.