Backup Format Overview
The new backup format is based on the idea that backup data on the storage is always kept as a data container regardless of the backup type. This approach allows keeping backup plans completely independent from each other. Every backup plan is always a separate configuration that delivers backup data to a specific location on backup storage. In other words, each separate backup plan data is kept in its own directory on backup storage. This data structure allows avoiding any possible backup data interference issues.
Backup data is divided into blocks and a data block is a main operating entity instead of files and folders. As data is uploaded to backup storage, blocks are combined into data parts, which size can vary. A data part size depends on two factors: uploading speed (a new data part is formed every 5 minutes) or size limit (1 GB). Uploading backup data by parts allows to continue upload in case of backup interruption: only unfinished data part is uploaded again, all previous parts that were successfully uploaded prior to network or any other connection issue are already on backup storage and are valid for data restore.
New backup format key features are:
- Grandfather-Father-Son (GFS) retention policy
- Immutability
- Client-Side Deduplication
- Consistency Checks
- Synthetic Backup for file-level, image-based, and VMware backups
- Changed Block Tracking for Image-Based Backup
- Restore on Restore Points
- The number of requests to storage is reduced significantly
- Uploading by data parts enables continued upload in case of network issues
- Support for any characters (emoji, 0xFFFF, etc) and extra-long filenames
- Filename encryption in the box (one password for generation)
- Real full backup for file-level backups
- Fast synchronization (reduced number of objects in backup storage)
- Plan configuration is always included in a backup
- Backup logs are backed up along with backup data
- Object size is now limited to 256TB regardless of the storage provider limitations
- Fast purge (reduced number objects on backup storage, deletion of whole generation database)
- Encryption password hint
- Faster backup and restore for a large number of small files
- Lower costs for a large number of small files (not applied for S3 standard-IA with 128KB limit)
Currently, the new backup format is supported for the following backup typs:
- File backup
- Image-based backup
- VMware backup
- Hyper-V backup
Terms and Definitions
The section contains several new terms and entities that need to be explained to operate them in the future.
Backup Plan
The backup Plan determines the backup data configuration sent to a backup destination. The configuration contains a number of parameters:
- Backup Data
- Encryption
- Compression
- Retention Policy
- Backup Plan Run Schedule
Bunch
Bunch is a notion of a backup plan in the main database. Bunch is linked to a directory in the database which in turn is linked to a destination. A destination can be modified. Bunch is always unique within the cloud folder and the plan type. This approach enables comfortable data deletion on cloud storage since all backup content is stored in one directory.
Generation
Generation is a complete self-contained data set sufficient for data restoration. In other words, generation is a set of a full backup and chain of incremental backups for a specific backup plan.
Restore Point
Restore Point is a partial data set for restore. A full-fledged restore point contains at least one file or directory. If a restore point does not contain any file or directory, it is considered empty, but successful can contain blocks for further subsequent runs. A valid Restore Point guarantees a correct restore of backed-up data. As the opposite, invalid Restore point does not contain a complete data set for restore, but at the same time can contain blocks that are used for restore from other Restore Points.