SISL is Stream Informed Segment Layout. This is the target-based inline deduplication process that focuses on speed instead of 100% accuracy, although SISL does provide 99% accuracy in limiting duplicated data segments. There is minimal disk access because the process takes place in RAM, and a faster CPU provides increases in SISL efficiency. You dn't spend any time waiting on spindle speed.
The process is such:
- Streams data into RAM
- Stream is segmented - into 4-12 KB chunks
- Fingerprints are created from the segments
- Verify if segments are unique using two functions
- Summary vector - a list of segments from disk that are predictivly selected
- Segment locality - because data follows predictable patterns, DD selects segments for the summary vector that are most likely to be used based on the fingerprint list in RAM
- Store unique segments
- End-to-end Verification - after writing new segments to dis, DD OS performs a read of the data to ensure it can reassemble the file, computes and verifies the checksum
- Fault Avoidance and Containment
- New data never overwrites existing data. Regenerating data is fast because data is never missing due to overwrite. Deleted data is removed during disk cleanup.
- DD uses NVRAM to buffer all data not written to disk
- Fault Detection and Healing - DD OS uses the logging file system and RAID6 disks to verify the integrity of the data on every read. It computes and verifies the checksum each time data is accessed
- File System Recovery - because data is never overwritten, there are no block maps or reference counts required. DD OS merely needs to find the head of the log file and it can rebuild the filesystem. Once it knows where the log file is, it scans the log and rebuilds data using RAID6 where necessary
No comments:
Post a Comment