Wednesday, December 4, 2013

Data Domain Solutions Design Overview

To review some key features and benefits of Data Domain:

Data Invulnerability Architecture (DIA) protects data from hardware and software failure.  DD calls deduplication "compression" which removes redundant data.  It also will replicate data fro mone DD to another DD.

DD uses Variable Length Segment deduplication giving maximum efficiency to the deduplication process.  It performs dedupe inline compared to post-process, which saves disk utilization and number of writes.  It uses less WAN bandwidth by only sending deduped data, and using its Virtual Tape Library feature it provides easy migration from legacy tape systems.

SISL, or Stream-informed Segment Layout is the technology used to deduplicate data.  See this post for more info on SISL.

Data Domain has 3 types of data streams.  They are write, read and replication.  Max number of streams is dictated by model.

Data Domain will replicate data at the collection level, which is a full system replication (like system mirroring), at the directory level, which could be used to replicate to multiple targets or for specific directories within the collection.  It can replicate at the MTree level, which uses snapshot technology and is faster than directory replication, or it can replicate at the Pool level for VTL files and tapes.

A replication context is a pair - one source and one target.  Replication takes place over port 2051, using Data Domain's proprietary replication protocol.

Data Domain has an Extended Retention feature.  It is available only on the DD860 and DD990 models and is claimed to be the industry's first online, long-term archive solution.  Archiving gets the most efficiency if at least 2 years will be archived, and systems should be sized for 2-3 years of archiving.  When designing an extended retention system, make sure that the daily change rate is low (less than 10%) and that the long-term change rate is also low (less than 8% for best results).  A lower growth rate is best for archiving, also, and we also need to take the archive frequency (daily, weekly, monthly, yearly, etc) into account.

Extended Retention Licensing must be applied to unused shelves only, as partial shelf extended retention is not available.  Also, once defined as active or archive tier those shelves cannot be changed.  Each shelf in the system requires a capacity license, and there are different licenses for active and archive tiers.  There is also an Expanded Storage license required if more than the controller is required.  For the DD860 that is 64TB of raw storage, and the DD990 has 360TB of raw storage.

The Extended Retention replication type depends on the data to be protected.  When used as a source, the DD Extended Retention supports collection, MTree and DD Boost replication, and when used as a destination it will replicate those as well as directories.

DPA Design Considerations and Best Practices

There is no direct upgrade from DPA 5.x to DPA 6.  In order to upgrade to version 6.0, the DPA server must be running DPA 5.5.1 before migrating to 6.  There is a standalone tool that will migrate from 5->6, but this needs to be run on a separate server.  Do not install version 6 on a server running version 5.x due to port conflicts.  During migration, both the existing and new servers need to be online, and the 5.x server remains fully functional during migration.  Components that will not migrate are any analysis rules and old maintenance plans.  These must be recreated.

DPA should only be installed on a dedicated server due to memory utilization adn page swapping.

Plan to include maintenance schedules to age off data and to backup regularly.  This backup can be exported or taken off to a flat file.

Be careful when implementing a proof-of-concept, as these systems are often over-extended and it will likely end up in production.

There are command-line tools for exporting and importing the data.  Agents can be installed silently using SMS and an answer file.  Hosts can also be imported using a CSV file.

DPA Architecture

Data Protection Advisor is made of 3 components.  They are:
  1. Application Server, which is a JBoss Application that replaces the Controller, Reporter, Listener, Publisher and Analysis Engine
  2. The Datastore, which consolidates the Config, Datamine and Illuminator
  3. Agent - client software.  Version 6 supports agents from both version 5.5 or newer and version 6.  It sends data as XML
The REST API is the common way-in for all data.  The REST API talks to the agents, datastore and executes service commands which in turn work on the services.

The message bus manages its own persistence.  It uses a series of journal files, each of which fits in a single disk cylinder to maximize performance .

The datastore is a Postgres 9.1 database, communicates on port 9003 and will support up to 100 simultaneous connections.  Root user access is prohibited, and all access to the database are done using the "apollosuperuser" account.  FIles are located in /opt/emc/dpa/services/datastore, and logs rotate automatically at 250 MB and/or daily.

The Application server and datastore can be split into two separate servers for performance.  The datastore cannot reside on a CIFS or NFS share due to latency.  An agent will be installed on both servers.  They can run on Windows 2003-2008R 64-bit, Solaris 10-11 SPARC (64-bit), Red Hat 5 or 6 64-bit, and SuSE 9-11 on both x86 and 64-bit platforms.  They each require 8 GB RAM, and in a combined solution for small environments 4 CPUs are required.  In a split environment, 2 CPU per server are required.

Performance tuning is largely done automatically.  It's recommended to put everything on fast drives, suche as Ultra-320 SCSI or SATA drives in a RAID1+0.  RAID5 can be used if there are more than 6 disks, and increasing RAM removes dependency on disk cache and improves performance.

Data can be collected remotely without an agent in some situations.  Using WMI, an agent can collect data on multiple Windows hosts that do not have an agent installed.  System performance data can not be collected on *NIX boxes without an agent installed.

DPA uses port 3741 for communication with agents, port 9002 for the HTTPS front-end and port 9003 for the Postgres database.

Data Protection Advisor Solutions Design - Licensing

Data Protection Advisor is a monitoring and reporting tool made by EMC.  The basic questions for design include:
  1. What are the licensing considerations and requirements?
  2. How big should the DPA server be?
  3. How many agents are required to deploy?
  4. Are there any firewall considerations?
  5. What type of server architecture should be utilized?  Split or single server?
When designing a DPA solution, it's beneficial to know the licensing requirements and options.  They are:
  • DPA Enterprise - required for all deployments, allows login
  • DPA for Backup - most common.  Allows monitoring of backup applications, databases (both on clients and the DPA), IP switches, FC switches, tape libraries, VTL, etc.  It is licensed by the number of clients except for Avamar, which is licensed by capacity for consistency.  Client licenses come in packs of 20.  Networker can also be licensed by capacity, but client and capacity licenses can not be mixed.
  • DPA for Virtualization - for VMWare environments, monitors configuration, status, SLA and success of ESX/ESXi servers.  It will identify any VMs not being backed up, and is licensed by number of hos machines
  • DPA for Symmetrix - licensed by total local and replicated capacity
  • DPA for VNX Block - licensed by number and model of VNX arrays in use
  • DPA for RecoverPoint - based on the amount of data or by the model
DPA is sized by using the tool available on support.emc.com.  Always be sure to use the latest version.

Data Deduplication Sizing

While there are several products that can be used to size an EMC deduplication system for Avamar or Data Domain, only the EMC Backup System Sizer (EBSS) is an official EMC product.  It runs on Adobe Air and therefor runs on Windows or Mac, and EMC recommends you always download the latest version.  It will come in a zip file which includes supplements and a client questionnaire.

In a hybrid Avamar/Data Domain system, we are storing meta data on the Avamar and managing the backup through Avamar while storing the data on the DD.  (Exchange is an exception to this rule, where the database is stored on the DD and the logs are stored on the Avamar system)  In this setup, we'll size each product separately starting with the DD.

Data Domain sizing in the EBSS is  done by selecting Avamar as the backup provider, and then size the same as using Networker because Avamar only does full Exchange backup.  Avamar back end licensing is still required, and we'll use the physical capacity from the tool and round down to the nearest TB.

Avamar sizing is then executed by selecting "size for both Exchange and non-Exchange" regardless of whether or not you have Exchange in your environment.  When choosing backup type, "Av-DD Meta data" is an option and should be selected.  When sizing the base system, use the Total Backup Environment Size from the DD sizing exercise above.

In non-Exchange environments, take the total Backup Environment Size and divide by 10,000 while leaving the dedupe values at 0 and set retention to the same length determined in the DD sizing.  If Exchange will be backed up, take the Total Environment Size and multiply by 1.33%, and enter that value into the full size field.  Again, leave the dedupe at 0 and retention the same as on the DD sizing.

There are several factors that will affect the deduplication of data.  The main components are data type and retention period.  File systems, databases and email systems dedupe well while video streams, database logs and scientific data do not.  It is also best to have retention longer than 2 weeks or commonality is less likely to be found within the blocks.  Also, encrypted, compressed or multiplexed data does not deduplicate well, either.  These actions are typically taken on the user side of data.

When selecting clients for deduplication, the type and amount of data will affect deduplication rates again.  In this case, the more data you select for deduplication and as the data set grows the better your dedupe ratio will be.  Data with a high change rate is typically bad for dedupe, as is rich media due to little commonality.  Clients with regular, frequent backup will have high commonality rates and are therefor good for dedupe.  Again, encrypted and compressed data work against deduplication.  Also to note, when performing source-based deduplication, the client needs to have sufficient hardware resources to perform the deduplication action.

Avamar and Data Domain show dedupe ratios differently.  Avamar reports dedupe in a percentage (50% reduction), while DD reports in "times" (2x reduction, etc).  To compare:

Avamar - Data Domain
50% = 2x
80% = 5x
90% = 10x
95% = 20x
96% = 25x
98% = 50x
99% = 100x
99.7% = 333x

Note that the Data Domain rate increases rapidly as the percentage increase slows near 100%.

While we never quote marketing data, a general rule of thumb when asked what sort of deduplication a customer can expect is that the first full backup to a DD system can expect to see a 2-4x reduction.  Subsequent full backups will typically experience a 15-30x reduction for structured data such as databases, and 25-50x reduction for non-structured data.

Incremental and Differential backup can expect to see 3-7x reduction for structured data, and 5-10x reduction for non-structured.  The deduplication is going to depend largely on the frequency of the backup, with more regular backup resulting in the greatest amount of deduplication.  Near line, or TSO Incremental Forever backups can expect to see 3-7x reduction as a general rule.

Typical Avamar deduplication rates are similar, but you can expect to see a 70% reduction for non-structured data and 35% reduction in structured data across all backed-up clients in environments with 0.3% daily file change rates and 3-5% structured change rate.

When sizing a system for deduplication, be aware of data with high change rates, high growth rates and any challenging data forms.  Also be aware that there is a difference between raw data capacity and usable data capacity and to size the system for *usable* capacity, not raw.  As a rule of thumb, only size systems out to 80-90% utilization, never to 100%.  In environments where there are high change and growth rates, your best results will come about if you use the DDA:A and DDA:B tools.

When sizing systems, remember that low retention is not good for deduplication.  The best retention schedule will age data off in 3-6 months, and longer-term retention becomes unpredictable at best because there is no way to account for all possible events and growth.  It is acceptable to model a 2-3 year growth rate using the EBSS, but remember that this is an estimate based on current system deployment and doesn't take into account new systems or expanded function.

It is nerd-nature to want to run the biggest, most powerful system available whether it is required or not.  This will certainly work against closing the deal from a cost perspective.  EMC recommends focusing on the desired and reasonable backup window and meeting that need rather than over-sizing the system.  Also, make sure the customer's infrastructure will handle the increased load.  This in most commonly the limiting factor and a major issue with deployment.  Also remember that while a throughput level may be achieved, that level may not be able to be sustained for long periods of time to accommodate a backup window.  Shoot to the low side of network utilization when sizing systems and suggest upgrading where necessary, and don't forget that multiplexing is typically bad for deduplication.  Deduplication is not free.  There is never a 24/7 workload, and the backup system will need to take time for system maintenance and garbage collection.  This comes at a cost to the system performance.

When sizing a system for replication, remember that simply having bandwidth between sites does not guarantee that bandwidth is available.  We need to determine not only how much bandwidth a customer has, but also how that bandwidth is being used currently.

Tuesday, December 3, 2013

EMC Assessment Tools

EMC provides a number of assessment tools for analysis of any environment.  The objectives are similar to any backup assessment, including review of the environment and existing practices, checking the health of the backup environment, and diagnosing any problems that may present themselves.  Backup sizing is also an objective, and possibly the most important.

Assessment consist of the collection and analysis of data, after which recommendations are made based on the data collected.

SSA, or the Sharepoint System Assessment,  pulls configuration and utilization info on all levels of a Sharepoint system, including data on the farm, the site collections, the sites and lists. Because it is a Sharepoint Console application, it must be run from a Sharepoint server under the user context of an appropriately-privileged user.

The ESA (Email System Assessment) pulls data from the email system and gives statistics on attachemtns, growth and PST file sizes.  The FSA (File System Assessment) tool also shows growth rates.  These can be run from the Workbench Utility and are best suited to environments that are medium to large, with Exchange data stores in excess of 300GB and more than 2TB of file data.  They are also good to use in systems where archiving is an objective.

The Deduplication Assessment comes in two varieties:  The DDA:A and the DDA:B.  The DDA:A is a modified Avamar client that gets installed on the customer system and creates a report without backing up any data.  This is good to show deduplication expectancy.  The DDA:B is a java application that performs a similar function but does not require client installation.  Be sure to use a representative data set from the customer site when performing a dedupe assessment.

Quick Scripts are the most fundamental assessment tool and the quickest to deploy.  This is essentially a batch file that runs on the customer's existing backup environment and gathers data from existing backup software, giving detailed reports on the environment. 

The Assessment Workbench can be downloaded from EMC and is a platform that incorporates all these tools into a single tool.  There is training on its use on the EMC website, and data is uploaded to http://emc.mitrend.com for analysis.

Workbench Assessment workflow would typically follow a pattern of:
  1. Talk to the customer and make sure they are a qualified deal - this will take some work and time to complete, so you want to make sure you're not wasting your time
  2. Set the customer expectation with how long this assessment will take and what sort of impact there will be on the system (which is "none")
  3. Download the Workbench
  4. Give the software to the customer to deploy or deploy it yourself
  5. Install the Workbench at the customer site
  6. Have customer complete the "Client Qualifications" portion of the document
  7. Run the assessment
  8. For DDA, confirm the successful completion with an EMC SE
  9. Create the data package, which is a zip file
  10. Deliver the zip file to EMC by FTP or Email
  11. GO to the Report Generation Portal and view reports
  12. Review the reports internally before going to customer to make sure they make sense
  13. Review the report with the customer
  14. Propose and close on a backup system
 DDA:A|B Workflow
  1. Map all drives to be included in the assessment
  2. Use scan option A and choose all drives to be scanned
  3. Schedule and set a password - the job will not run without a password
  4. Specify when the assessment should run, by default it runs at midnight for 7 days, and 3 scans are required
  5. Configure FSA to run afterward if required
  6. Done - review data
The EMC Backup Assessment Service is a detailed review of the health and utilization of a backup environment.

Performing the assessment can be done using the DPA (either an existing installation or a VM of DPA) and/or quick collection of data.  To perform an assessment, you will assess the configuration, performance and history, and reports are generated on the mitrend site.

iperf is a 3rd-party tool used mainly to produce information on the TCP settings of a network.  It will measure bandwidth and TCP Window size, which is the most important metric for network tuning.  It will also measure UDP packet loss and delay jitter.

TCP Window Size is the most critical setting when tuning a network, and it can be tuned according to data gathered using iperf.  If the window is too small, the sender will sit idle and the network will be underutilized.  Bandwidth Delay Product (BDP) is used to determine the appropriate window size and is calculated using the bandwidth x latency, or Return Trip Time (RTT).  BDP is a starting point for window size tuning, and one must remember to check the OS for an upper limit.

EMCPing is a tool that is good to test for replication tuning.  It provides latency, latency jitter, packet loss and traceroute functionality.  While not a replacement for a WAN analysis, it is helpful in presales, baseline and troubleshooting.

EMC also provides tools to measure Total Cost of Ownership (TCO) and Return on Investment (ROI).  TCO provides two measurements, the CAPEX, or capital expense report that shows the initial investment, and the OPEX report, showing ongoing maintenance and operational costs.  ROI measures how much is returned on those investments and is measured using the cost savings divided by expenses to implement.

Some tools for measuring TCO and generating reports regarding it:
  • Unified/VMAX Consolidation Tool shows the savings of replacing multiple arrays with VNX or VMAX systems
  • Tech Refresh for VNX/VMAX - savings if upgrading
  • Backup & Recovery, using EMC Alinean
  • EMC Data Protection - cost savings if using Recoverpoint, Replication Manager and DPA
  • Competitive Power Calculator - self explanatory
  • Tech Refresh for Centera

EMC E20-329 Begins

Today I begin study for the EMC E20-329 exam to finish off the EMCTA certification.  The first module is "Assessing the Environment" and had mostly review information regarding why we need to assess an environment and how to do so from a 10,000-foot view.  However, some notable points:

  • Success Criteria
    • Application capacity, or do we have enough storage space?
    • Criticality of applications, or are we backing up what needs to be backed up?
    • RPO/RTO - can we recover to an acceptable level?
Phases of Assessment:
  1. Interview customer
  2. Environmental Analysis, often using EMC tools and manual discovery
  3. Cross-check assumptions vs. reality
  4. Design the solution

Some EMC analysis tools are the Exchange and Sharepoint System Assessment tools, (ESA, SSA), the File System Assessment tool (FSA), Deduplication Assessment tool (DDA:B) and EMC Quickscripts which are run against the existing backup tools and software.

Another interesting quote was to "never quote marketing data" when designing a system, because marketing data is developed in a lab under ideal conditions and a customer environment is *never* ideal.  Rather, "focus on solution design and problem solving."

Hmmm, does that mean "our marketing data is accurate unless it's in the real world?"