Greg and the Network: Data Domain File System and Data Management

The Data Domain file system starts with the ddvar directory. In NFS it can be shared as /ddvar and in CIFS it can be shared as \ddvar. You can't rename or delete it because it holds the operating environment binaries and configuration.

The data directory is mounted at /data/col1/. "Col1" is short for "collection #1" and there is only one collection at this point. Inside /col1 you will find folders, or "MTrees." The default is "backup," and up until recently all backup data was stored there. With later releases, however, you can create and administer more MTrees under /col1.

Each MTree can be individually managed with its own policies and permissions. The max number of MTrees is 100, however after 14 there is a performance degradation in environments where there is a lot of read/write streams to each MTree. Best practice dictates that you keep MTrees to 14 and to aggregate operations to one MTree where possible. You can mount MTrees like any other CIFS/NFS share, however it is not recommended to mix protocol type. VTL and DDBoost clients create their own MTree, as well.

MTrees are configurable with quotas. Hard quotas stop any data from being written, where soft quotas send warnings and alerts. These quotas can be used in charge-back scenarios, and administrators can be alerted when thresholds are reached to ease administration.

Data Domain Snapshots create a full copy of the data at a point in time (PIT). A DD snapshot copies the pointers from the production data, and then marks new data with a new pointer leaving the old in place for the snapshot. Snapshots are kept in /backup/.snapshot. The max number of snapshots is 750, with warnings at 90% (from 675-749) and alerts at 750.

If the DD is a collection source, snapshots are replicated, but if the DD is a directory replication source, snapshots are not replicated. One other benefit of using snapshots is that there is no additional license required.

Fastcopy is a feature often used to revert from a snapshot. It copies the pointers in full, but where a snap is read-only, a Fastcopy copy is read/write.

Data Sanitization is essentially electronic shredding of data. It is a filesys clean operation, including overwriting free space, metadata and reference data. The sanitization deletes unique pieces of the file only, but if blocks are being used by other files in the deduplicated data, those pieces are left in tact to maintain the integrity of the other files. This feature is often required by DoD/NIST compliance organizations. Sanitization can only be accessed through the command-line, and it is licensed along with the Retention Lock license.

Filesystem Cleaning is a process that runs by default every week on Tuesday at 6:00 AM. When data is deleted, it remains on the disk until the retention lock period expires ur until the cleaning process runs. Space on the DD is not recovered until the filesystem is cleaned. Cleaning self-throttles to 50% CPU and is configurable by an administrator.

To clean the filesystem, data is copied forward to a free container, after which unclaimed data is deleted and finally the original container is deleted.

Monitoring Filesystem Usage depends largely on the rate of change within your data set. If your data changes often and your retention period is long your data on the DD will grow rapidly as well. The size and number of data sets will affect the growth of data on the DD, as well does "compressibility." Essentially, data that is already compressed or deduplicated will not compress well, so your DD data set will actually grow. Modifying the retention period is one way to alleviate this problem. Data can be graphed from 7-120 days to monitor and maintain utilization.

Greg and the Network

Friday, November 1, 2013

Data Domain File System and Data Management

No comments:

Post a Comment

Total Pageviews