Avoid full dataset replications - EMC ISILON STORAGE BEST PRACTICES FOR ELECTRONIC DESIGN AUTOM

Certain configuration changes cause a replication job to run a full baseline replication; that is, the job copies all the data in the source path regardless of whether the data has changed since the last replication. A full baseline replication typically takes much longer than incremental synchronizations; thus, to optimize performance, avoid triggering full synchronizations unnecessarily by changing the file selection criteria on the source dataset. Changes to the following parameters trigger a full

synchronization:

• Source path: root path and the include and exclude paths

• Source file selection criteria: type, time, and regular expressions

Select the right source replication dataset

By default, OneFS synchronizes to the target cluster all the files and directories under the root directory that you select. With SynclQ policies, you can control dataset

replication by selecting the directories to include or exclude or by creating file-filtering regular expressions. If you explicitly include directories in the policy, OneFS

synchronizes only the files in the included directory.

Specifying file criteria, however, slows down a copy or synchronize job. Using includes or excludes for directory paths does not affect performance. As a result, a best

practice is to use include or exclude directory paths instead of file criteria.

Performance tuning guidelines

SynclQ uses a job engine to take advantage of aggregate CPU and networking

resources. The engine divides a job into work items and assigns them to processes, or workers, that run on all the nodes in a cluster. Each process scans a part of the

dataset for changes and transfers the changes to the target cluster. You can adjust the number of workers per node.

Although OneFS manages the cluster’s resources to maximize the performance of replication jobs, you can apply SyncIQ rules to control the performance of file

operations and the network. The rules can help protect the performance of other workflows. For more information, see the “OneFS Administration Guide.”

Although no overarching formula dictates changes that can enhance performance, the following guidelines establish a methodology to tune SyncIQ jobs:

• Establish reference data for network performance by copying data from one cluster to another with common tools such as Secure Copy (scp) or NFS copy. The data establishes a baseline for a single-threaded data transfer over your network. • After creating a policy but before running it for the first time, use the OneFS policy assessment option to see how long it takes to scan the source cluster’s dataset with the default settings.

• Increase the workers per node in cases where network utilization is low, such as over a WAN. Increasing the number of workers can help overcome network latency by having more workers generate I/O on the wire. If adding more workers per node does not improve network utilization, avoid adding more workers because of diminishing returns and worker scheduling overhead.

• Increase the workers per node in datasets with many small files to process more files in parallel. Adding more workers, however, consumes more CPUs because of other cluster operations.

• Use file rate throttling to roughly control how much CPU and disk I/O SynclQ consumes while jobs are running throughout the day.

• Use SmartConnect IP address pools to control which nodes participate in a replication job and to avoid contention with other workflows.

• Use network throttling to control how much network bandwidth SynclQ can consume throughout the day.

• Use target-aware synchronization prudently. A target-aware synchronization consumes many more CPUs than a regular baseline replication, but a target-aware synchronization potentially yields much less network traffic when the source and target datasets contain similar data.

Limitations and restrictions

By default, a SynclQ source cluster can run up to five jobs at a time. OneFS queues additional jobs until a job execution slot becomes available. You can cancel jobs that are in the queue. Keep in mind the following limitations and restrictions:

• The maximum number of workers per node per policy is eight and the default number of workers per node is three.

• The number of workers per job is a product of the number of workers per node multiplied by the number of nodes in the smallest cluster participating in a job (which defaults to all nodes unless a SmartConnect IP address pool restricts the number of nodes). For example, if the source cluster has six nodes, the target has four nodes, and the number of workers per node is three, the total worker count equals 12. • The maximum number of workers per job is 40. At any given time, 200 workers could potentially be running on the cluster (5 jobs with 40 workers each).

• If a user sets a limit of 1 file per second, each worker gets a ration rounded up to the minimum allowed (1 file per second). If no limit is set, all workers are unlimited, and the limit is zero (stop), then all workers get zero.

• On the target cluster, there is a limit of workers per node, called sworkers, to avoid overwhelming the target cluster if multiple source clusters are replicating to the same target cluster. By default, the limit is 100 workers; you can adjust this limit with the

max-sworkers-per-node parameter. To adjust the load on the target cluster that

incoming SynclQ jobs generate, contact Isilon Technical Support.

• The target cluster must be running the same or a later version of OneFS as the source cluster so that you can replicate from a source cluster running earlier versions of OneFS. To turn on SyncIQ automated failover and failback, both clusters must be running OneFS 7.0 or later.

In document EMC ISILON STORAGE BEST PRACTICES FOR ELECTRONIC DESIGN AUTOMATION (Page 39-41)