Article ID: m0042Last Modified: 27-Sep-2024

Client-Side Deduplication

Deduplication is a method that optimizes data storage by eliminating duplicate copies of repeating data blocks, allowing for more efficient storage and backup processes.

This functionality is not supported for the legacy backup format

The new backup format leverages client-side deduplication, which offers several advantages:

  • Faster performance compared to server-side deduplication.
  • Reduced dependency on an internet connection, minimizing connection issues.
  • Decreased internet traffic, as duplicate data is not transmitted repeatedly.
  • Lower costs, since server-side deduplication databases grow over time and can lead to significant storage expenses, while client-side deduplication uses local resources.

How It Works

  • Regardless of the backup type, the first backup is always a full backup.
  • Subsequent backups are incremental, storing only the changes since the last full or incremental backup. These backups rely on the previous full and incremental backups.

The new backup format treats each backup plan independently, meaning that every backup plan has its own deduplication database. Furthermore, each generation of a backup plan also has its own deduplication database.

Here’s how client-side deduplication works:

  1. When a backup plan runs, the software reads the backup data in blocks, which are sized according to a predefined block size.
  2. Each block is compared to the records in the deduplication database.
    • If a block is not found in the database, it is sent to the storage and assigned a block ID, which is then added to the deduplication database.
    • If a block matches an existing record in the deduplication database, it is excluded from the backup, preventing redundant data from being stored. This process can significantly reduce backup sizes, especially in virtual environments where many identical data blocks exist.

If the deduplication database is deleted or corrupted, a full backup will be required and forced automatically

For image-based backups, the process is slightly different:

  • Instead of scanning data block by block, the software reads the Master File Table (MFT) to identify which files have been modified.
  • This approach minimizes the time needed to read source data, further increasing efficiency.

This system ensures faster and more cost-effective backups by utilizing local resources and reducing the amount of data that needs to be transmitted and stored.

https://git.cloudberrylab.com/egor.m/doc-help-mbs.git