Backup Format Overview

The backup format is based on the idea that backup data on the storage is always kept as a data container regardless of the backup type. This approach allows keeping backup plans completely independent from each other. Every backup plan is always a separate configuration that delivers backup data to a specific location on backup storage. In other words, each separate backup plan data is kept in its own directory on backup storage. This data structure expels any backup data interference issues.

Backup data is divided into blocks and a data block is a main operating entity instead of files and folders. As data is uploaded to backup storage, blocks are combined into data parts, which size can vary. A data part size depends on two factors: uploading speed (a new data part is formed every 5 minutes) or size limit (1 GB). Uploading backup data by parts allows to continue upload in case of backup interruption: only an unfinished data part is uploaded again. All previous parts that were uploaded prior to connection breakdown are already on backup storage: no need to upload them again.

The backup format key features are:

Currently, the following backup types are supported only in the legacy backup format:

Terms and Definitions

The section contains several new terms and entities that need to be explained to operate them in the future.

Bunch

Bunch is a notion of a backup plan in the main database. Bunch is linked to a directory in the database which in turn is linked to a destination. A destination can be modified. Bunch is always unique within the cloud folder and the plan type. This approach enables comfortable data deletion on cloud storage since all backup content is stored in one directory.

Generation

Generation is a complete self-contained dataset that is sufficient for restoration. In other words, generation is a sequence of a full backup and incremental backups for a specific backup plan.

Restore Point

Restore Point is a partial data set for restore. A full-fledged restore point contains at least one file or directory. If a restore point does not contain any file or directory, it is considered empty, but successful can contain blocks for further subsequent runs. A valid Restore Point guarantees a correct restore of backed-up data. On the opposite, an invalid Restore Point does not contain a complete data set for restore, but at the same time can contain blocks that are used for restore from other Restore Points.

Client-Side Deduplication

Deduplication is an approach that involves multiple usage of the same data parts in various processes.

This functionality is not supported for legacy backup format

The new backup format uses client-side deduplication. This approach brings the following benefits:

  • Client-side deduplication is much faster compared to a server deduplication
  • The absence of internet connection issues
  • An internet traffic decrease
  • A server deduplication database constantly grows, and this can cause a significant expense increase. Client-side deduplication uses local capacities only.

Regardless of a backup type, the first backup is always a full backup. Bringing a routine to a backup, a backup implies data updates, thus next backup jobs are usually incremental and depend on full backup and previous incremental backups as well.

The backup format reckons for a full backup plan independence, so each separate backup plan has its own deduplication database. Moreover, backup plan generations also have their own deduplication databases.

Once a backup plan is run, the application reads backup data in batches aliquot to block size. Once a block is read, it is compared with deduplication database records. If a block is not found, it is delivered to storage and is assigned with a block ID, which becomes a new deduplication database record. The block scanning continues, and if a block matches any of the deduplication database records, a block with such ID is excluded from a backup plan.

This approach significantly decreases a backup size, especially in virtual environments with a large number of identical blocks.

If a deduplication database is deleted or corrupted, a full backup is always executed

For image-based backup type, the approach is slightly different. Instead of cluster reading, a Master File Table (MFT) is read then the mechanism checks which files have been modified. This decreases source data reading exponentially.

Consistency Checks

While backing up data, a user is sure that it is possible to restore data, but this is not always the case if backup data is corrupted. This issue can have many reasons, ranging from technical problems with the cloud provider’s service to industrial sabotage.

The consistency check is a technique that provides avoiding data losses. By finding any discrepancies, a user is notified if some backup objects are missing in backup storage or there is a mismatch between object sizes or modification dates.

Once a consistency check is run, the request goes to the backup storage. A file list is requested along with metadata.

In all cases, the user is notified about backup damage. If something happened, MSP360 Backup runs full backup automatically. Possible damage to previous generations is also monitored.

Once a consistency check is executed, the user is aware of any possible mismatches and is able to plan further actions to solve possible issues.

This functionality is not supported for legacy backup format

Mandatory Consistency Check

In the new backup format, the mandatory consistency check is the current generation check. The current generation consistency check is mandatory for all plans and is executed before starting any backup plan.

Full Consistency Check

Full consistency check implements backup plan generation checks with the exception of the current generation check, which is the subject of a mandatory consistency check. After a successful full consistency check, the user can be sure that backed up data is ready to be restored.

Changed Block Tracking for Image-Based Backups

Changed Block Tracking is an algorithm that features a decrease of data reading in backup source during incremental image-based backups.

The changed block tracking algorithm is supported by NTFS file systems only

This functionality is not supported for legacy backup format

Once a first full backup is made, each MFT (Master File Table) block is marked. On subsequent incremental backup runs, the MFT table is read again and blocks are compared. If a block was modified, the changed block tracking algorithm determines which files were modified and locates disk clusters that contain these files' data.

Once all blocks are compared, only modified blocks are sent for reading.

As a result, the changed block tracking algorithm reduces the processed data amount when reading a disk that significantly reduces the backup time.

Restore on Restore Points

The Restore Point approach enables the guaranteed restoration of backup data. This means the following: if a restore point is valid, the backup dataset is valid to be restored.