Data Domain is another component in EMC's backup recovery portfolio. The Data Domain appliance is mainly disk storage with deduplication built in.
One differentiating factor is that DD deduplicates in a CPU-centric, in-line method so that data is written to disk only once. They call this SISL, or Stream Informed Segment Layout. This is compared with a traditional dedupe product that uses a post-process deduplication. This method uses more disk since it works on data already written to disk. This may seem more efficient, however spindle speed is a limiting factor. More disk is required for post-process dedupe as well because data is being written and re-written after being processed.
Ingest speed is the rate at which data can be brought into a backup system.
Data Domain works with and is certified with most backup application vendors because it works over standard protocols such as Ethernet or Fiber Channel. It simply presents a CIFS or NFS share to the backup server and is used like any other backup-to-disk, except that DD deduplicates data inline. It can also simulate a tape library (or can be seen as a VTL - Virtual Tape Library) to be used with existing backup software and not requiring reconfiguration of existing backup jobs. They simply run faster.
Another component of Data Domain is the DDBoost software, which is a client-loaded piece of software that performs the deduplication on the client. This is advantageous when backing up remote data centers or branch offices where bandwidth is at a premium.
DD also utilizes something they call "Data Invulnerability Architecture." While utilizing RAID6 to protect against disk failure, DD also performs many checksums against the data to ensure it is sound. When deduplicating data, a single corrupt block can have far-reaching effect since that block may be used in several files. DD self-heals and repairs data corruption.
No comments:
Post a Comment