Greg and the Network: Avamar Processes and Backup Processes

Avamar utility nodes run 3 main processes. The mcs process is the Management Console Server and is the main component in the Avamar system. It manages, maintains and schedules backup jobs. The ems, or Enterprise Management Server presents the management console and cron is the standard linux scheduler. Replication is scheduled using cron instead of mcs.

Storage nodes have one main process, gsan, which takes in data, writes data to disk and presents it for restore.

An Avamar server can have 27 simultaneous client connections per node, always reserving one for restore.

On the client being backed up, the avagent process listens for backup requests and communicates with the utility node and avtar. Avtar is the backup process that takes the command from the agent, runs the backup process and communicates directly with the storage node. There is also a small avscc process that runs on a client, which produces a taskbar icon and gives a small set of function to Windows clients.

There are 2 client plug-ins for Avamar - file system plug-ins that scan file systems and backs them up, and database plug-ins which directly handle databases.

Avamar utility nodes listen on several tcp ports. Port 8443 is the web interface, port 7778 is where the admin console, both GUI and CLI connect to the utility node. Avagent communicates on port 28001/28002. The backup process, avtar, connects to the gsan service on the storage node over 27000 for normal backup and on 29000 for encrypted backup. Utility nodes will communicate outbound to the admin console on 7778 for alerting. You can also run SQL queries against the database (MCDB) on port 5555.

Deduplication on an Avamar system is a little different than on a Data Domain. It not only stores unique data only once, it sends that data only once across the network. When the backup request is accepted form teh utility node, the avtar process scans the file cache for new or updated files. It will skip anything not modified, saving time. Once it identifies new or changed data, it runs it through a process of "sticky-byte factoring."

Sticky-byte factoring will break the file into objects (or chunks) of data between 1-64 KB and will always produce the same result on unchanged data. If data has changed, it quickly locates the changes and resynchronizes it with the unchanged data. The average object size is 24KB.

After factoring takes place, the data is compressed using standard compression, reducing the data size by 30-50% so that chunks are typically 12-16 B in size. (NOTE: Office 2007 files are actually compressed groups of files. Avamar 4.0 and later will uncompress Office files for deduplication, but this is not a backward-compatible feature to pre-4.0 clients. These will not restore correctly) The compressed chunks are then run through a hashing process to produce "atomic hashes" which are compared to the hash cache to see if they have been sent to the Avamar server yet. If they have not been sent to the Avamar server, the hash is sent to Avamar, where it scans a hash cache to see if it has stored the chunk already. If it has stored the chunk, it saves the hash only. If it is new, unique data, the data and the hash are sent and stored on the Avamar server.

SHA-1 hashes are created for all files, which are 20B in size. The hash is created from the compressed data.

Data is stored with the hash, and part of the hash value is used as an address to locate the data.

Data storage is performed in a hierarchical method of hashes and composite hashes. The data is run through variable-block deduplication to create "atomics," which are segments of data that are reduced in size using compression. The compressed objects are then hashed, and the hashes are grouped into composites. The composites are then hashed and grouped into composite-composites, which are hashed to create the root hash. The root hash identifies the backup job.

Data objects are stored in atomic stripes along with parity information. Composites and composite-composites are stored in a different stripe, and root hashes are stored in an "accounts stripe." RAIN systems store RAIN parity information in another, separate stripe as well.

Greg and the Network

Sunday, November 3, 2013

Avamar Processes and Backup Processes

No comments:

Post a Comment

Total Pageviews