Environment Optimization Guide
July 8, 2016 - Version 9.2
Table of Contents
1 Overview 7
1.1 Release updates 7
1.2 Training and support 7
1.3 Environment optimization checklist 7
2 Configuring Windows server 11
2.1 Microsoft Windows server service packs 11
2.2 Windows power plans and CPU performance 12
2.3 Windows visual effects 12
2.4 Windows processor scheduling 13
2.5 Windows NTFS fragmentation 14
2.6 Configuring virtual memory 14
2.7 Disabling TCP chimney offloading 15
2.8 Antivirus directory exclusion list 16
2.9 Management server 17
3 Configuring SQL Server 17
3.1 Optimize for ad hoc workloads option 17
3.2 Max degree of parallelism option 18
3.3 Cost threshold for parallelism 19
3.4 Optimize the tempdb database 20
3.5 SQL max server memory 21
3.6 Instant file initialization 22
3.6.1 Comparison of zero and instant initialization 23
3.7 File allocation unit size 23
3.8 Microsoft SQL server service packs 24
3.9 Autogrowth settings 25
3.10 SQL Server 2012 best practices analyzer 25
3.11 RAID levels and SQL server 26
3.11.1 Level 0 26
3.11.2 Level 1 27
3.11.3 Level 5 27
3.11.4 Level 10 (1+0) 27
4 Configuring the Analytics server 29
4.1 Memory requirements 29
4.1.1 Analytics indexing 29
4.1.2 Structured analytics 30
4.1.3 Enabled analytics indexes 30
4.2 Java heap size (JVM) 31
4.3 Page file size 31
4.4 Index directory requirements 32
4.5 CPU requirements 32 4.5.1 Analytics indexing 32 4.5.2 Structured analytics 32 4.6 Scaling 34 4.6.1 Tier 1 Example 34 4.6.2 Tier 2 Example 35 4.6.3 Tier 3 Examples 35
5 Setting up your workspace 35
5.1 Fixed-length vs. long text field types 35
5.2 Fixed-length text field considerations 36
5.3 Unicode support 36
5.5 Analytics 37
5.6 Views and searching 38
5.7 Tally/Sum/Average mass operation 38
5.8 Group by for Pivot 38
5.9 User notifications 39
5.9.1 Message of the Day (MotD) 39
5.9.2 User status 39
5.9.3 Default workspace tabs 39
6 Monitoring environment performance 39
6.1 Windows and SQL server log analysis 39
6.2 Resource utilization 40
6.2.1 Monitoring disk usage 40
6.3 Analytics performance considerations 42
6.4 Server requirements for index builds 42
6.5 Memory requirements for querying 42
6.6 Server requirements for structured analytics 43
6.6.1 Relativity Processing 43
6.7 Performance Dashboard 44
6.8 SQL server table index fragmentation 44
6.8.1 SQL table indexes overview 44
6.8.2 Index fragmentation 45
6.8.3 Effects of fragmentation 45
6.8.4 Identifying fragmentation 45
6.8.5 Removing fragmentation 45
7 Managing your Relativity environment 46
7.1 Workspace management and maintenance 46
7.1.2 Full-text index management 47
7.1.3 Audit record table 48
7.2 SQL backups 49
7.2.1 Full backups 49
7.2.2 Differential backups 50
7.2.3 Transaction log backups 50
7.3 SQL recovery models 51
7.4 Relativity data backups 51
7.5 Check database integrity task 51
7.6 SQL table index management 52
7.6.1 Updating statistics 52
7.7 Database log management 52
7.7.1 Size management 53
7.7.2 Virtual log file (VLF) management 54
7.8 Shrink database task 54
7.8.1 Best practices 54
7.9 Job email notification alerts 54
7.10 Relativity applications 55
8 Expanding your Relativity environment 55
8.1 Agents 55
8.2 Analytics servers 55
8.3 Dedicated web servers 56
8.3.1 Desktop Client import/exports 56
8.3.2 dtSearch performance 57
8.3.3 Physical memory on the SQL Server(s) 57
8.4 SQL Server failover clustering 58
8.6 Web load balancing 59
8.6.1 Microsoft network load balancing 59
8.6.2 Relativity user load balancing 60
8.7 Terminal Services 60
8.8 Bandwidth tester 60
8.9 User notifications 61
8.9.1 Message of the day (MotD) 61
8.9.2 User status 61
1 Overview
This guide outlines the best practices for maintaining and optimizing a Relativity environment. Follow these recommendations to ensure stability and optimal performance of all workspaces.
1.1 Release updates
Note the changes and additions to this guide for each service release:
n There is a one-month pre-release of each Relativity version, allowing you to perform internal testing of your integrations, processes, workflows, and extensions.
n Product updates are released once a month. You can view Relativity release notes on the 9.0 Docu-mentation site.
n Verify that you're using the latest version of this guide. kCura updates this guide as new information is obtained from field deployments.
1.2 Training and support
This guide is often provided to system admins who may not have any exposure to the Relativity interface. For more information, you can access the following resources:
n User tutorials - Our Relativity tutorials are designed to help users become familiar with Relativity quickly. Using interactive tutorials that cover specific Relativity features, you can experience hands-on learning at your own pace. To view these tutorials, visit theTutorial pageon kcura.com.
n Documentation - We constantly strive to ensure that all users of Relativity are educated on the full func-tionality of the platform. To help you and your team stay well informed, we post all documentation on this website. To access documentation from previous versions of Relativity, see thedocumentation
archives page.
n Customer portal - You can access theCustomer Portalfrom the kCura.com Support page. If you need a portal account, please [email protected].
1.3 Environment optimization checklist
Use the following checklist to help manage and plan for your Relativity environment.
Reference Task Done
Release updates above Review product updates once a month. Training and support above Review user tutorials.
Training and support above Review Relativity documentation. Training and support above Explore the Customer Portal. Configuring Windows server
on page 11
Configure the Windows server for optimum performance. Microsoft Windows server
service packs on page 11
Verify that the latest Microsoft server service pack is installed.
Reference Task Done
Windows power plans and CPU performance on page 12
Configure Windows power plans for high performance. Windows visual effects on
page 12
Configure Windows visual effects for high performance. Windows processor
schedul-ing on page 13
Configure Windows processor scheduling. Windows NTFS
frag-mentation on page 14
Install an automatic disk defragmentation tool. Configuring virtual memory
on page 14
Manually set the size of the paging file to 4095 MB or higher.
Antivirus directory exclusion list on page 16
Exclude the SQL server, agent server, web server, Analytics indices, dtSearch indices, and the file repository from the antivirus software.
Management server on page 17
Create a management server with SSMS and the RDC. Configuring SQL Server on
page 17
Configure SQL server for optimum performance. Optimize for ad hoc
work-loads option on page 17
Optimize SQL server for ad hoc workloads. Max degree of parallelism
option on page 18
Set max degrees of parallelism. Optimize the tempdb
data-base on page 20
Optimize the tempdb databases. SQL max server memory on
page 21
Set SQL server max memory. Instant file initialization on
page 22
Enable instant file initialization. File allocation unit size on
page 23
Set the file allocation unit size. Microsoft SQL server service
packs on page 24
Verify that the latest SQL server service pack is installed. Autogrowth settings on
page 25
Set auto growth settings for SQL server files. SQL Server 2012 best
prac-tices analyzer on page 25
Install and run SQL server 2008 R2 Best Practices Analyzer. RAID levels and SQL server
on page 26 Review RAID Levels and SQL Server storage options:
n Verify that the tempdb data files reside on the
fast-est disks.
n Verify that each SQL file resides on its own separate
Reference Task Done
Java heap size (JVM) on page 31
Set the Java heap size to one-third of total RAM. Configuring the Analytics
server on page 29
Configure the Analytics temporary directory. Index directory
require-ments on page 32
If possible, store the Analytics index locally. Setting up your workspace
on page 35
Set up you workspace following best practices. Fixed-length vs. long text
field types on page 35
Use fixed-length fields when possible. Fixed-length text field
con-siderations on page 36
Don't allow the size of a fixed-length field to exceed 8,060 characters, if possible.
Unicode support on page 36 Define the Unicode Enabled field property prior to import. Data imports on page 37 Don't run the Relativity Desktop Client on a Relativity
pro-duction server.
Analytics on page 37 Disable the integrated dtSearch index when creating a new Analytics index, if possible.
Views and searching on page 38
Follow best practices for views and searching. Tally/Sum/Average mass
operation on page 38
Create indexes on groups that you Tally on. Group by for Pivot on
page 38
Create a non-clustered index on fields that are being grouped on for Pivot.
User notifications on page 39 Learn how to use various user notification methods. Analytics performance
con-siderations on page 42
Review Analytics performance considerations. Monitoring environment
per-formance on page 39
Ensure you have the right amount of memory when cre-ating indexes.
Monitoring environment per-formance on page 39
Review native imaging performance considerations. Managing your Relativity
environment on page 46
Review maintenance plan recommendations. SQL backups on page 49 Create backups for:
n Relativity system databases n Relativity system database logs n System databases and logs
Full backups on page 49 Ensure Verify backup integrity is selected for full backups. SQL recovery models on
page 51
Reference Task Done
Relativity data backups on
page 51 Create backups for:
n dtSearch and Analytics index shares n Relativity web server install directories n Native and image file shares
Check database integrity task on page 51
Schedule a weekly check database integrity test. SQL table index
man-agement on page 52
Set up and schedule the IndexOptimize smart script. Updating statistics on
page 52
Update statistics to improve query performance. Database log management
on page 52
Review database log management recommendations. Shrink database task on
page 54
Ensure AUTO_SHRINK is not enabled. Job email notification alerts
on page 54
Set SQL Database Mail on all Relativity SQL Servers and add these notification alerts to all scheduled Relativity maintenance tasks.
Monitoring environment per-formance on page 39
Review environment monitoring guidelines. Windows and SQL server log
analysis on page 39 Set up these alerts for SQL and Windows server logs:
n Low disk space on the servers n Server becomes unresponsive n Website becomes unavailable
Resource utilization on page 40
Gather benchmarks for Relativity servers' resources. Monitoring disk usage on
page 40
Measure disk latency. Performance Dashboard on
page 44
Install Performance Dashboard. Workspace management
and maintenance on page 46
Create needed indexes on workspaces with 500,000+ records.
Analysis of long running queries on page 47
Analyze and try to optimize long-running queries. Full-text index management
on page 47
Manage full-text index. Audit record table on
page 48
If needed, manage the Audit Record Table. SQL server table index
frag-mentation on page 44
Understand, identify, and remove fragmentation. Expanding your Relativity
environment on page 55
Reference Task Done
Agents on page 55 Scale agents as needed. Desktop Client
import/-exports on page 56
Create a dedicated web server for the RDC and dtSearch dtSearch performance on
page 57
Create a dedicated web server for the dtSearchWebAPI Physical memory on the
SQL Server(s) on page 57
Increase RAM on the SQL server if needed. SQL Server failover clustering
on page 58
Create a SQL server failover cluster if needed. Distributed Relativity
SQL Servers on page 59
Create a distributed SQL server if needed. Web load balancing on
page 59
Use a web load balancer if needed. Terminal Services on page 60 Create terminal servers if needed.
Bandwidth tester on page 60 Test the capacity of a network connection with the Relativ-ity bandwidth tester.
2 Configuring Windows server
Use the following guidelines to configure your Windows Server for optimum performance with Relativity. Some of these configuration options are one-time settings, while others require intermittent updating as your hardware or database sizes change.
Note: These guidelines are applicable to all SQL Servers in the environment, including the Worker Manager SQL Server and the workers.
The recommendations are applicable to all Relativity roles.
2.1 Microsoft Windows server service packs
Ensure that you have the latest Microsoft Windows Server and .NET Service Pack installed on all Relativity servers.
Install any smaller security patches, Windows updates, etc. at your own discretion. kCura only tests major service packs, and not every Microsoft update that is released. Deploy any patches to your test instance of Relativity first and ensure that a rollback plan is in place if any issues are discovered during deployment.
Ensure the option to "Install updates automatically" on all Relativity servers has been disabled. Apply any required updates during a planned maintenance window.
2.2 Windows power plans and CPU performance
In Windows Server 2012 R2 and 2008 R2, you can access the Power Options screen by typing
powercfg.cpl in a Run prompt. Three default power plans are available: Balanced, Power saver, and
High performance. Balanced is set by default, which is what Microsoft recommends for general use. Unfortunately, the default Balanced power plan may not be the best choice for production server performance.
We recommend ensuring all Relativity SQL servers are configured to use the High performance windows power plan. This may also require changes to BIOS settings. You will need to decide what is best to accommodate all Relativity server roles.
Note: For more information, go tohttp://www.mssqltips.com/tip.asp?tip=2225.
2.3 Windows visual effects
Windows includes standard visual effects to make the user experience more enjoyable. These effects are not required and consume CPU resources. kCura recommends disabling these effects on all Relativity servers.
1. Access the Advanced tab in the System Properties dialog, and then select the Performance Options dia-log by clicking Settings in the Performance panel.
3. Select the Adjust for best performance option and click OK.
2.4 Windows processor scheduling
Application performance is related to processor scheduling caching options that you set for Windows Server. Processor scheduling determines the responsiveness of applications you are running interactively (as opposed to background applications that may be running on the system as services). kCura recommends that you select background services on all Relativity servers.
1. Access the Advanced tab in the System Properties dialog, and then display the Performance Options dialog by selecting Settings in the Performance panel.
2. In the Performance Options dialog, select the Advanced tab.
3. In the Processor Scheduling panel, you have the following options:Programs: Use this option to give the active application the best response time and the greatest share of available resources. Generally, you will want to use this option only on development servers. You should also select this option when you are using Windows Server 2008 R2 as your desktop operating system.
Background Services: Use this option to give background applications a better response time than the active application. Generally, you will want to use this option for production servers.
2.5 Windows NTFS fragmentation
Install an automatic disk defragmentation tool on all Relativity Web, SQL, Search, and File servers. There are a number of tools available to defragment your hard drives and optimize performance. Some of the latest products are pro-active and prevent much of the fragmentation from ever
occurring. Research the available tools to decide which is best for you. Unlike the available Windows task to defragment your physical disks, these tools work automatically in the background, using only idle resources to ensure nothing is negatively impacted.
Most current SAN and NAS devices include technologies to avoid or limit fragmentation on the block level. Windows sees the data logically from the software level, outside of the storage realm. If Windows detects a file in hundreds of pieces, SAN performance may be affected.
Please work with your storage vendor to see if they recommend installing a defragmenting tool. If the Relativity roles are not virtualized, there will likely be a mirrored array housing the OS and required Relativity components (unless you’re also booting from the storage device). If these disks are heavily fragmented, roles including the web servers (IIS) may experience poor performance that could impact Relativity review.
2.6 Configuring virtual memory
RAM is a limited resource, whereas virtual memory is, for most practical purposes, unlimited. There can be many processes, each one having its own 2 GB of private virtual address space. When the memory that is in use by all the existing processes exceeds the amount of available RAM, the operating system moves pages (4 KB pieces) of one or more virtual address spaces to the hard disk, freeing that RAM frame for other uses. In Windows systems, these "paged out" pages are stored in one or more files that are named pagefile.sys in the root of a partition. There can be one such file in each disk partition. The location and size of the page file is configured in Control Panel. To set these values, click System, select Advanced system settings, and then click Settings under Performance. By default, Windows Server puts the paging file on the boot partition where the operating system is installed and creates a default size of the paging file that is 1.5 times the physical RAM, up to a maximum of 4095 MB.
n For all Relativity servers, manually set the size of the paging file to 4095 MB. This is recommended as the OS volume will typically only have enough room for what is required and will not be able to support a page file size of 1.5 times the amount of physical RAM.
n For servers with a large amount of RAM installed (16GB+) and more than just the OS volume, create a second page file and place it onto a drive other than the one housing the OS. Set this second page file to a size of 1.5 times the amount of physical RAM - no greater than 50 GB. Microsoft has no specific recommendations about performance gains for page files larger than 50 GB. An example of this server type might include the Analytics server.
n SQL server shouldn’t use the page file for memory on correctly configured servers. There should be no need to create a second page file on the SQL server like other Relativity server roles that have a large
amount of memory and additional volumes that don’t house the OS. SQL server memory con-figurations are detailed in the SQL server setup section of this guide.
Manually setting the size of the paging file typically provides better performance than allowing the server to size it automatically or having no paging file at all. Best-practice tuning is to set the initial minimum and maximum size settings for the paging file to the same value. This ensures that no processing resources are lost to the dynamic resizing of the paging file, which can be intensive. This is especially true given that this resizing activity typically occurs when the memory resources on the system are already constrained. Setting the same minimum and maximum page file size values also ensure the paging area on a disk is one single, contiguous area, which improves disk seek time. Microsoft recommends isolating the paging file onto one or more dedicated physical drives that are configured as either RAID-0 (striping) or RAID-1 (mirroring) arrays, or on single disks without RAID. Note that redundancy is not normally required for the paging file. Do not configure the paging file on a RAID 5 array. Configuration of the paging file on a RAID 5 array is not recommended because paging file activity is write intensive and RAID 5 arrays are better suited for read performance than write performance.
2.7 Disabling TCP chimney offloading
For optimal processing usage, we strongly recommend that you disable TCP chimney offloading on all servers and associated workers. To do this, perform the following steps:
1. Check whether TCP chimney offloading is enabled by running the following command prompt on all servers and associated workers. Make sure to run this command prompt as an administrator:
netsh int tcp show global
2. If TCP chimney offloading is enabled, disable it by running the following command prompts on all serv-ers and associated workserv-ers. Make sure to run these command prompts as an administrator:
netsh int tcp set global chimney=disabled
netsh int tcp set global RSS= disabled
netsh int tcp set global NETDMA= disabled
You also need to disable TCP chimney offloading on the network interface (NIC) card for both the client and the server. To do this, perform the following steps. Make sure to do this on both servers and the associated workers.
1. Click Start, select Run, enter ncpa.cpl. in the text box, and click OK. 2. Right-click on the Network Adapter object and select Properties. 3. Click Configure... and then select the Advanced tab.
5. Repeat step 4 to disable all of the following items from the Property list: n TCP/IP Offload
n IPV4 Checksum Offload n Large Send Offload V2 (IPv4) n TCP Checksum Offload (IPv4)
6. Restart the server to allow these changes to take effect.
2.8 Antivirus directory exclusion list
All antivirus software installed on any of the Relativity servers should be configured to exclude several areas:
n SQL servers - All directory values are located in the EDDS database | Configuration table (excluding the Relativity Program Files and the BCPPath). Relativity is installed in the following location: C:\Program Files\kCura Corporation\Relativity. Please adjust the drive location if necessary.
o Directory where the Database files are located o Directory where the Log files are located o The location of the BCPPath on the SQL server o Directory where the Full Text Indexes are located n Agent servers
o Default: C:\Program Files\kCura Corporation\Relativity o C:\Windows\Temp
o C:\Users\Rel_SVC\AppData\Local\Temp o Account running the agent service = Rel_SVC Note: Keep in mind your environment may differ slightly.
n Web servers - Default: C:\Program Files\kCura Corporation\Relativity. Please keep in mind your envir-onment may differ slightly.
n Worker servers
o C:\Program Files\kCura Corporation
o C:\Windows\System32 (\Invariant.hook.dll) o C:\Windows\SysWOW64 (\Invariant.hook.dll) o Optional:
l C:\Users\Rel_SVC\AppData\Local\Temp l C:\Windows\Temp
Note: We recommend that you scan any raw data for malware before introducing it into your Relativity environment. If you perform a scan, you may be comfortable excluding the Temp directories on your worker servers from your antivirus scans as well, which will result in better performance and fewer interruptions due to live scans on the temp files that Relativity Processing (Invariant) creates in these directories. Any raw data that has not been scanned has potential to introduce harmful files into your environment.
n Analytics indexes - Locate a folder named Content Analyst or CAAT; this is the installation directory. The index directory, if different, should also be excluded from Anti-Virus.
n dtSearch indexes - The default location is stored in the EDDS.eddsdbo.Configuration table. n File repositories - Any file directory that Relativity uses as a file share.
2.9 Management server
Whenever possible, avoid logging into a production server using remote desktop. Instead, use a management or utility server. This server or virtual machine should have SQL Server Management Studio (SSMS) and the Relativity Desktop Client installed. Use this server to connect to the SQL instances to adjust maintenance plans and query tables, etc. If you have an external hard drive containing data that needs to be imported into Relativity, connect it to this server and launch the Desktop Client on it to perform the data imports or export.
When administrators log directly into a production server, navigate and open applications, memory is consumed and processes are wasting CPU cycles. There is also a Windows Server 2008 and R2 issue related to file caches having no cap. System admins should avoid dragging and dropping files via RDP from the console as this action is cached and may result in no free memory being available to the operating system. There are additional reasons to avoid RDPing directly into production servers on a regular basis.
3 Configuring SQL Server
Use the following guidelines to configure the SQL Server(s) for optimum performance with Relativity. Some of these configuration options are one-time settings, while others require intermittent updating as your hardware or database sizes change.
Note: These guidelines are applicable to all SQL Servers in the environment, including the Worker Manager SQL Server.
3.1 Optimize for ad hoc workloads option
On the SQL Server, the optimize for ad hoc workloads option is used to improve the efficiency of the plan cache for workloads that contain many single use ad hoc batches. When this option is set to one, the Database Engine stores a small compiled plan stub in the plan cache when a batch is compiled for the first time. (This step is performed instead of the full compiled plan.) It helps to relieve memory pressure by not allowing the plan cache to be filled with compiled plans that are not reused.
The compiled plan stub allows the Database Engine to recognize that this ad hoc batch has been previously compiled, but has only stored a compiled plan stub. When this batch is invoked (compiled or executed) again, the Database Engine compiles the batch, removes the compiled plan stub from the plan cache, and adds the full compiled plan to the plan cache.
Note: This information was obtained from the following Microsoft article:
http://msdn.microsoft.com/en-us/library/cc645587.aspx.
We recommend enabling the optimize for ad hoc workloads option on each SQL instance. Execute the following statement to enable this option:
SP_CONFIGURE 'show advanced options',1 RECONFIGURE
GO
SP_CONFIGURE 'optimize for ad hoc workloads',1 RECONFIGURE
GO
3.2 Max degree of parallelism option
SQL Server detects the best degree of parallelism, which is the number of processor cores employed to run a single statement for each parallel plan execution. You can use the max degree of
parallelism option to limit the number of processor cores to use in parallel plan execution. To enable the server to determine the maximum degree of parallelism, set this option to 0 (this is the default value). Setting maximum degree of parallelism to 0 allows SQL Server to use all the
available processors (up to 64 processors).
Note: This information was obtained from the following Microsoft article:
http://msdn.microsoft.com/en-us/library/ms181007.aspx.
We recommend the following MaxDOP settings:
n If the total number of logical processor cores is greater than or equal to 16, set the MaxDOP to a value of 8.
n If the total number of logical processor cores is less than 16, set the MaxDOP to a value equal to half the number of available cores.
Another way to determine the optimal setting is to understand the amount of simultaneous long running queries that will be executed at any given time. For instance, in an 16-core system that has 100 simultaneous users logged in executing queries, it might make sense to set MaxDop to a value of 4 instead of 8 to more efficiently support many simultaneous long running queries. Analysis of the SQL Server Dynamic Management Views can assist in making this determination.
This setting takes effect immediately, without restarting the MSSQLSERVER service. Execute the following statement to set this value.
sp_configure 'show advanced options', 1; GO
RECONFIGURE WITH OVERRIDE; GO
sp_configure 'max degree of parallelism', 8; GO
RECONFIGURE WITH OVERRIDE; GO
The Optimize for Ad hoc Workloads and Max Degree of Parallelism options can also be viewed and adjusted in the properties of the SQL Server instance within SQL Server Management Studio. To view and adjust these options:
1. Launch SQL MGMT Studio.
2. Right-click on the Relativity SQL instance. 3. Select Properties.
4. Select Advanced.
3.3 Cost threshold for parallelism
Use the cost of threshold for parallelism option to specify the threshold at which Microsoft
SQL Server creates and runs parallel plans for queries. SQL Server creates and runs a parallel plan for a query only when the estimated cost to run a serial plan for the same query is higher than the value set in cost threshold for parallelism. The cost refers to an estimated elapsed time required to run the serial plan on a specific hardware configuration. The default value is 5.
Note: This information was obtained from the following Microsoft article:
http://msdn.microsoft.com/en-us/library/ms188603%28v=sql.105%29.aspx.
We recommend increasing this option to a value of 50 on each SQL instance. Execute the following statement to set this option:
sp_configure 'show advanced options', 1; GO
reconfigure; GO
sp_configure 'cost threshold for parallelism', 50; GO
reconfigure; GO
3.4 Optimize the tempdb database
The tempdb is a database used for temporary user and internal objects created by SQL Server. For optimal system performance and stability, ensure the tempdb database has been configured correctly.
Use the following guidelines to configure the tempdb database:
n Create one temporary database file per each processor core. Multiple files will provide better I/O per-formance and less contention on the global allocation structures. For higher end systems with more than 16 cores, Microsoft suggests beginning with 8 database files, and adding more if there is still con-tention. There are scripts available to measure contention; most Relativity environments work well with 8 data files.
n Refer to the following article on Tempdb contention:
http://sqlblog.com/blogs/adam_mach-
anic/archive/2011/04/21/analyzing-tempdb-contention-a-month-of-activity-monitoring-part-21-of-30.aspx.
n Set the initial size of each file to 10 GB or higher (to limit any autogrowth) and either set autogrowth to 10 percent or disable it. To set file sizes, right-click on the tempdb system database, select Files, and make additions. Set all files to the same initial size. It is important that all data files are set to the same initial size and autogrowth settings. As the environment grows, in an optimized setup, the Tempdb sys-tem database should be larger than your biggest, active Relativity workspace eddsdbo.Document table.
n Store these files on a fast RAID array and not the OS partition. RAID 10 is ideal. Some partners have recently started to store these data files on PCIe-flash or SSDs. Please read the vendor specific instruc-tions for these newer technologies, as they might require free memory for drivers or free disk space for performance.
n Store the log file (.ldf) for this database on the same array as the other database log files. Set the initial size of the log file to 25% of the total of all tempdb data files. For example, if there are 8 tempdb data files set to an initial size of 10 GB each, set the initial size of the tempdb log file to 20 GB. Do not create additional log files – only one is necessary.
Additions can be made while the system is live but you will need to restart the SQL instance for changes to take effect. This change should be made on all Relativity SQL servers.
Note: For more information about the tempdb, see
http://msdn.microsoft.com/en-us/library/ms190768.aspx.
Do not install the default tempdb data and log files on the C (OS) partition. Moving the default tempdev and templog files to another partition is important, even though it may not be easy to change their location using SQL Server Management Studio (SSMS). In order to move these files to a different location, you will have to execute some SQL statements in SSMS.
For more information, seehttp://msdn.microsoft.com/en-us/library/ms345408.aspx.
3.5 SQL max server memory
Use max server memory to prevent the SQL Server buffer pool from using more than the specified amount of memory. The remaining memory is available to the operating system and other installed applications.
The suggested configuration involves leaving at least 10% of the total amount of RAM available to the OS and other applications. For example, set the SQL Server Max Memory to a value of 58GB if there is 64GB RAM available. This assumes that the server is dedicated to SQL and there are no other applications installed that will require a significant amount of memory.
To set the max server memory option, adjust the max server memory value below and execute in SSMS.
sp_configure 'show advanced options', 1; GO
RECONFIGURE; GO
sp_configure 'max server memory', 1024; GO
RECONFIGURE; GO
The Maximum server memory option can also be viewed and adjusted in the properties of the SQL Server instance within SQL Server Management Studio.
To view and adjust the Maximum server memory option: 1. Launch SQL MGMT Studio.
2. Right-click on the Relativity SQL instance. 3. Select Properties.
In the event of a failover in an active/active SQL cluster, reduce the Max Server Memory value for each instance to half so they are not competing for resources (which can severely slow down or crash the server). For example, with 32 GB installed on each node, set the max server memory on each instance to 14 GB (so together they total 28 GB). This will leave 4 GB of memory available for the OS and other installed applications on the active node.
Note: For more information on max server memory, see
http://msdn.microsoft.com/en-us/library/ms178067.aspx.
3.6 Instant file initialization
Data and log files are initialized to overwrite any existing data left on the disk from previously deleted files. Data and log files are first initialized by populating the files with zeros when you perform one of the following operations:
n Create a database
n Add log or data file to an existing database
n Increase the size of an existing file (including autogrow operations) n Restore a database or filegroup
File initialization causes these operations to take longer. However, when data is written to the files for the first time, the operating system will not have to populate the files with zeros.
In the SQL Server, data files can be initialized instantaneously. This process allows for fast execution of file operations listed above. Instant file initialization reclaims used disk space without populating that space with zeros. Instead, disk content is overwritten as new data is written to the files. Log files cannot be initialized instantaneously.
Note: This information was obtained from the following Microsoft article:
http://technet.microsoft.com/en-us/library/ms175935.aspx.
1. On the Start menu, click Run. In the Open box, enter secpol.msc. 2. Expand Local Policies.
3. Select the User Rights Assignment folder. The policies will be displayed in the details pane. 4. In the pane, double-click Perform Volume Maintenance Tasks.
5. In the Local Security Policy Setting dialog, click Add.
6. In the Select Users or Groups dialog, add your SQL Server service account (or a domain group). 7. Restart the SQL service or server.
3.6.1 Comparison of zero and instant initialization
The following sample performance tests compare zero initialization and instant initialization (source: http://www.sqlskills.com/BLOGS/KIMBERLY/post/Instant-Initialization-What-Why-and-How.aspx#ixzz0sMB4W7PI).
Performance test
Zero initialization Instant initialization
Hardware Dell Precision 670 Dual Proc (x64) with Dual Core, 4 GB Memory, RAID 1+0 array w/4-142 GB, 15000rpm disks.
Dell Precision 670 Dual Proc (x64) with Dual Core, 4 GB Memory, RAID 1+0 array w/4-142 GB, 15000rpm disks. CREATE DATABASE with 20 GB Data file 14:02 minutes 1.3 seconds ALTER DATABASE BY 10 GB 7:01 minutes 4 seconds RESTORE 30 GB DATABASE (EMPTY Backup) 21:07 minutes 5 seconds RESTORE 30 GB DATABASE (11GB Backup) 38:28 minutes 19:42 minutes
3.7 File allocation unit size
File Allocation unit size is also called the Cluster size. Windows NTFS formatted drives support sizes of 512, 1024, 2048, 4096, 8192, 16K, 32K and 64K. The allocation unit is the smallest amount of space that a file can consume. Windows will use a default allocation unit size of 4096 bytes if this is not set during the format. Unfortunately, this is usually not ideal for SQL server performance. In most scenarios, the Data and Tempdb drives should be formatted with a 64K Allocation Unit Size. Configure this early in the SQL server setup process as the allocation unit size is set during a drive format operation.
Run the following command for each drive to see the file allocation unit size reported in bytes per cluster. The Bytes Per Cluster value, which contains the file allocation unit size, is highlighted here. For example, to run the command for E: drive:
fsutil fsinfo ntfsinfo e:
To run the command for F: drive:
fsutil fsinfo ntfsinfo f:
An appropriate value for most installations is 65,536 bytes (that is, 64 KB) for partitions on which SQL Server data files reside. If necessary, consult your storage vendor for their recommendation or assistance with this configuration. This significantly impacts SQL Server performance.
Always use the storage vendor's recommended setting. If your storage vendor doesn't have a recommended setting, use 64 KB.
Note: This information was obtained from the following article:
http://msdn.microsoft.com/en-us/library/dd758814.aspx.
3.8 Microsoft SQL server service packs
Ensure that you have the latest Microsoft SQL Server Service Pack installed on all Relativity SQL Servers. Microsoft uses service packs to distribute hotfixes and software updates for the SQL Server. In addition, these service packs may include system administration tools, drivers, and additional components. These updates and components are conveniently bundled for easy downloading. Since service packs are cumulative, each new one contains any recent fixes as well as any previous updates already released. You do not have to install a previous service pack before you install the latest one.
3.8.1 Determining the version of SQL server
To determine which version of Microsoft SQL Server is running in your environment: 1. Connect to the SQL Server with SQL Server Management Studio.
2. Run either of the following Transact-SQL statements:
SELECT SERVERPROPERTY('productversion'), SERVERPROPERTY ('productlevel'), SERVERPROPERTY ('edition')
SELECT @@VERSION
You'll see the following information displayed:
RTM SP1 SP2 SP3
SQL Server 2012 11.00.2100.60 11.00.3000
SQL Server 2008 R2 10.50.1600.1 10.50.2500 10.50.4000
3.9 Autogrowth settings
SQL Server will automatically increase the size of database (.mdf) and transaction log (.ldf) files throughout the life of a workspace. It is important to set these auto growth variables appropriately. If a transaction log is set to auto grow by the default 10% on a very large file, it will take time for the grow operation to complete. Other transactions that require a write to the transaction log will have to wait until the grow operation finishes. Conversely, if these files are set to expand by too small increments, higher disk fragmentation will occur in addition to the server being continuously taxed by having to expand these files.
Relativity data files are by default set to autogrow by 10 percent and their logs by 512 MB as
specified in the Instant File Initialization setting. Do not set the transaction logs (.ldf) to auto grow by more than 1024 MB, because log files cannot be initialized instantaneously. Note that databases enabled for Transparent Data Encryption (TDE) cannot take advantage of Instant File Initialization. Therefore, you should set databases enabled for TDE to autogrow by 4096MB instead of the default 10%. This is because if a database is large and needs to autogrow by 10% without being able to take advantage of Instant File Initialization, it can result in application timeouts during each autogrow. If possible, try to anticipate how large a database will become and set the "Initial Size" of the MDF file within the properties of that database to avoid or limit automatic growth.
Note: When a new workspace is created in Relativity, the template workspace database is backed up and restored as the new workspace database. The autogrowth and initial size settings from the backed up database will carry over to the new workspace, which you may need to adjust if you are setting the initial size.
3.10 SQL Server 2012 best practices analyzer
The Microsoft SQL Server 2012 BPA is a diagnostic tool that performs the following functions: n Gathers information about a Server and a Microsoft SQL Server instance installed on that Server. n Determines if the configurations are set according to the recommended best practices.
n Reports on all configurations, indicating settings that differ from recommendations. n Indicates potential problems in the installed instance of SQL Server.
n Recommends solutions to potential problems.
To download the Microsoft Diagnostic tool, go to http://www.microsoft.com/en-us/download/details.aspx?id=29302.
We recommend installing this on each Relativity SQL Server. Perform a diagnostic check and review the report prior to introducing a SQL server to the environment. Perform another analysis
3.11 RAID levels and SQL server
There are many iSCSI and Fibre channel SAN, DAS and NAS solutions available. These devices often utilize RAID levels not discussed in this guide and include other unique technologies to provide the fastest I/O possible. Work with your storage unit provider to ensure the device is configured for optimal performance. Work with your storage unit provider or follow Microsoft best practices to determine where different SQL data should live.
The tempdb data files should reside on the fastest disks available, separate from that of all other user databases. The tempdb database must be configured correctly for SQL server performance. Many clients have recently started to store these data files on PCIe-flash or SSDs as the data is temporary and doesn’t require shared storage in SQL Server Failover Cluster setups. Regardless of the type of SAN or DAS and any caching mechanisms or auto-tiering, storing these on local PCIe-flash or SSD RAID will result in reduced latency and take some of the load off of the primary storage device(s).
Database log files should usually reside on their own disks. Isolating the database log files on their own spindles helps to ensure that nothing will interfere with the sequential writes for each database log file.
Solid state storage is an exception to the conventional wisdom. Hard drives perform very differently depending on whether you are doing random or sequential operations. Random operations will be slower because of the moving parts included in typical hard drives. This is why you may want to avoid placing the log and data files on the same disks. Solid state disks, however, typically perform the same whether random or sequential operations are being performed.
As mentioned above, many storage solutions include unique technologies to provide the fastest I/O possible that may go against the conventional wisdom. Many newer SAN devices are intelligent enough to automatically shift data across different types of available storage depending on activity levels of the stored data. These devices can help you maximize performance with minimal manual intervention, reducing IT related costs and getting as much possible out of the available storage. Regardless, there are many different storage options available and kCura encourages you to research and decide which solution is best for your organization.
Below are some high level suggestions for our smaller 25-50 user licensed instances that may not include a SAN and are looking for some guidance in configuring local SQL storage.
RAID (Redundant Array of Independent Disks) levels 0, 1, 5 and 1+0 are typically implemented with SQL Server. Each RAID level below outlines the SQL data pieces they typically support.
3.11.1 Level 0
This level is also known as disk striping because it uses a disk file system called a stripe set. Data is divided into blocks and spread in a fixed order among all disks in an array. RAID 0 improves read and write performance by spreading operations across multiple disks. Operations can then be performed independently and at the same time.
kCura would never suggest RAID 0 for anything as it can introduce system or application specific outages.
3.11.2 Level 1
Use for: OS, Logs (.ldf)This level is also known as disk mirroring because it uses a disk file system called a mirror set. Disk mirroring provides a redundant, identical copy of a selected disk. All data written to the primary disk is written to the mirror disk. RAID 1 provides fault tolerance and generally improves read performance but may degrade write performance.
3.11.3 Level 5
Use for: Databases (.mdf), Full Text Indexes (.ndf)
Also known as striping with parity, this level is the most popular strategy for new designs. RAID 5 is similar to RAID 4, because this level stripes the data in large blocks across the disks in an array. However, RAID 5 differs because it writes the parity across all the disks. Data redundancy is provided by the parity information.
The data and parity information are arranged on the disk array so that the two types of information are always on different disks. Striping with parity offers better performance than disk mirroring (RAID 1). However, when a stripe member is missing, read performance is decreased, for example, when a disk fails.
3.11.4 Level 10 (1+0)
Use for: Databases (.mdf), Full Text Indexes (.ndf), TempDB, Logs (.ldf)
This level is also known as mirroring with striping. RAID 10 uses a striped array of disks that are then mirrored to another identical set of striped disks. For example, a striped array can be created by using five disks. The striped array of disks is then mirrored using another set of five striped disks. RAID 10 provides the performance benefits of disk striping with the disk redundancy of mirroring. RAID 10 provides the highest read-and-write performance of any one of the other RAID levels but at the expense of using two times as many disks.
RAID levels higher than 10 (1 + 0) may offer additional fault tolerance or performance
enhancements. These levels generally are proprietary systems. For more information about these types of RAID systems, contact the hardware vendor.
The following table breaks down the various RAID levels (source: http://technet.microsoft.com/en-us/library/cc966414.aspx#_RAID).
RAID Levels RAID 0 RAID 1 RAID 5 RAID 10
Reliability Lowest - Lack of fault tolerance res-ults in data loss.
Very good - Even better with duplexing. Good -Can tol-erate single machine fault. Excellent
Storage Efficiency 100 percent 50 percent >50 per-cent, <100 per-cent (#drives-1/#drives) 50 percent
Random Read Excellent Fair - Worst of the RAID levels but better than a single drive.
Excellent Excellent
Random Write Excellent Fair - Worse than a single drive but better than some RAID levels. Fair - Gen-erally bet-ter with larger stripe sizes. Very good
Sequential Read Excellent Fair - Comparable to a single drive. Very good. Generally, better with smal-ler stripe sizes. Very good
Sequential Write Excellent Good - Better than other RAID levels.
Fair Very good
Cost Lowest Moderate -
Rel-atively high cost due to redundant drives; however, no expensive con-troller required.
Moderate High
Recommended use Good for non-crit-ical data or stag-nantly updated data that is backed up
reg-Good for data that requires high fault tolerance at relatively low hardware cost Very good for Read only data. Data requir-ing high per-formance for both read and write
RAID Levels RAID 0 RAID 1 RAID 5 RAID 10
ularly or any data requiring fast write per-formance at very low cost. Great for testing.
(redundancy using parity requires more expensive hard-ware). Best for log files. and excel-lent reli-ability while trading off storage effi-ciency and cost.
The following table provides a summary of an example twelve disk RAID configuration.
Disk Qty RAID Example RAID-12 disk setup
0 2 1 Windows
1 2 1 System Databases (tempdb)
2 2 1 Transaction Logs (ldf)
3 4 5 Relativity Databases (mdf), Full Text Catalogs (ndf)
4 2 1 Backups
4 Configuring the Analytics server
Use the following guidelines to configure your analytics server(s) for optimum performance with Relativity.
4.1 Memory requirements
4.1.1 Analytics indexing
Server memory is the most important component in building an analytics index. The more memory your server has, the larger the datasets that can be indexed without significant memory paging. Insufficient memory will slow down index build performance.
The following factors affect RAM consumption during indexing: n Number of documents in the training set
n Number of documents in the searchable set
n Number of unique words across all the documents in the data set being indexed n Total mean document size (as measured in unique words)
Use the following equation to estimate how much free RAM is needed to complete an index build: (Number of Training Documents) * 6000 = Amount of RAM needed in bytes
An easy way to remember this equation is that every 1 Million training documents in the index require 6 GB of free RAM. The equation is based upon the average document set in Relativity. If your dataset has more unique terms than an average dataset, more RAM will be required to build. We recommend accounting for slightly more RAM than the equation estimates.
4.1.2 Structured analytics
To run structured analytics, the analytics server can require substantial server resources. The structured analytics features are run by the Java process, as well as PostgreSQL. One of the most important components of ensuring a successful structured analytics operation is ensuring that Java has access to a sufficient amount of RAM. The following equation may be used to estimate how much RAM will be required for a given structured analytics set:
(Number of Documents) * 6000 = Amount of JVM needed in bytes
An easy way to remember this equation is that every 1 Million documents in the set require about 6 GB of RAM for the Java process. If your dataset is comprised of very long documents, it may require more JVM. If it is comprised of very small documents, then you may not need as much JVM. If Java does not have sufficient memory to complete a structured analytics operation, it is not often that you will receive an OOM error. More often Java will heap dump and garbage collect endlessly without ever successfully completing the operation. This equation is a good starting point so that these types of problems do not occur. See the sectionJava heap size (JVM) on the next pagefor information on how to configure JVM.
4.1.3 Enabled analytics indexes
An analytics index is stored on disk and is loaded into memory when the index has “Queries Enabled.” An index with Queries Enabled may be used for all analytics functionality such as clustering, categorization, etc. as well as querying. When you select “Enable Queries” on an analytics index, Relativity loads the vectors associated with all searchable documents and words in the conceptual space into RAM in an lsiapp.exe process. For indexes with millions of documents and words, this RAM requirement may be thousands of MB. The number of words per document can range widely, from about 0.80 to 10, depending on the type of data in the index. These ranges indicate the amount RAM needed for an index to be enabled:
#SearchableDocuments * 5,000 = High end of #bytes required # SearchableDocuments * 400 = Low end of #bytes required
Select “Disable Queries” on any analytics indexes that aren’t in use to free up RAM on the analytics server. The Configuration value MaxAnalyticsIndexIdleDays helps with this issue. This value is the number of days that an analytics index can remain inactive before the Case Manager agent disables queries on it. Inactivity on the analytics index is defined as not having any categorization, clustering, or any type of searches using the index. This feature ensures that indexes that are not being used are not using up RAM on the analytics server. If the index needs to be used again, simply navigate to the index in Relativity and click "Enable Queries." It will be available for searching again within seconds.
4.2 Java heap size (JVM)
Depending on the amount of RAM on your analytics server, as well as its role, you will need to modify the Java Heap Size setting. This setting controls how much RAM the Java process may consume. Java is used for index populations, as well as structured analytics operations, clustering, and categorization.
Here are some general guidelines:
n If the analytics server is used for both indexing and structured analytics, set this value to about 50% of the server's total RAM. You need to leave RAM available for the LSIApp.exe process, which is used for building conceptual indexes.
n If the analytics server is used solely for structured analytics, set this value to about 75% of the server's total RAM. Be sure to leave about a quarter of the RAM available for the underlying database pro-cesses.
n If the analytics server is used solely for indexing, set this value to about one-third of the server's total RAM. You need to leave RAM available for the LSIApp.exe process, which is used for building con-ceptual indexes.
Due to a limitation in the Java application, do not configure JVM with a value between 32 GB and 47 GB (inclusive). When JVM is set between 32 GB and 47 GB, the application only has access to 20-22 GB heap space. For example, if the server has 64 GB RAM, set JVM to either 31 GB or 48 GB. This allows the Java application to access all RAM allocated.
To modify the Java Heap Size setting, perform the following steps: 1. Navigate to <CAAT install drive>\CAAT\bin.
2. Edit the env.cmd file.
3. Locate the line similar to the following: set HEAP_OPTS=-Xms4096m -Xmx16383m.
4. The configuration starting with –Xmx refers to the maximum amount of RAM available to Java, in mega-bytes. Modify this value as needed. Both megabyte and gigabyte values are supported for these set-tings. The change won't take effect until you stop and start the Content Analyst CAAT Windows Service. Note: Never set the Java maximum (-Xmx) to be less than the Java minimum (-Xms). Don't modify the Java minimum setting unless instructed by kCura.
4.3 Page file size
We recommend the following settings regarding the page file size for the Analytics server: n Set the size of the paging file to 4095 MB or higher. This is because the OS array generally only has
enough room for what’s required and is not able to support a page file size of 1.5 times the amount of physical RAM.
n Set the initial minimum and maximum size of settings for the page to the same value to ensure no pro-cessing resources are lost to the dynamic resizing of the paging file.
n For servers with a large amount of RAM installed, set the page file to a size no greater than 50 GB. Microsoft has no specific recommendations about performance gains for page files larger than 50 GB.
4.4 Index directory requirements
The index directory stores both the analytics indexes and the structured analytics sets. Using default settings, the average amount of disk space for the analytics index or structured analytics set is equal to about 20% of the size of the MDF file of the workspace database. This metric indicates an
average amount of disk space usage, but actual indexes may vary considerably in size. The amount of space required depends on the size of the extracted text being indexed, as well as the number of documents, unique words, and settings used to build the index. An analytics server may not have multiple index locations; it may only reference one disk location for the server’s analytics indexes and structured analytics sets.
Due to the size requirements, it is recommended that the index directory not be kept on the C: drive. Upon installation or upgrade, the Relativity Analytics Server installer will prompt for the index directory location. If you would like to move the index directory location after upgrade, see Moving Analytics indexes and structured analytics sets in the Admin Guide.
CAAT® uses the database software PostgreSQL which requires guaranteed synchronous writes to the index directory. The connection from the analytics server to the index directory should be one that guarantees synchronous writes, such as Fibre Channel or iSCSI, rather than NFS or SMB/CIFS. We recommend storing the indexes and structured analytics sets on locally mounted storage rather than a remotely mounted file system.
4.5 CPU requirements
4.5.1 Analytics indexing
The analytics index creation process also depends on CPU and I/O resources at various stages of the build. Ensuring that your server has multiple processors and fast I/O also increases efficiency during the population build process. Adding more CPU cores to an analytics server ensures that index populations are as fast as possible—especially for large indexes. When you add additional CPU cores, you may also increase the number of Maximum Connectors used per index on the server. To modify this value, see Editing an Analytics server in the Admin Guide.
4.5.2 Structured analytics
The structured analytics features are run by the Java process as well as PostgreSQL. In order for these processes to operate most efficiently, allocate sufficient CPU cores to the analytics server. For optimal Textual Near Duplicate Identification performance, the analytics server needs at least 8 CPU cores. Textual Near Duplicate Identification performance improves as additional cores are added. The following charts illustrate the “Running Analytics Operations” step of a Textual Near
Duplicate Identification structured analytics set with the default Minimum Similarity Percentage of 90 in CAAT® 3.17:
Dataset 1: Enron data – 1M documents, 6.2 GB total text
Dataset 2: Wikipedia data - 1M records, 2.4 GB total text
The performance of a given dataset varies based on certain factors outside of the number of documents or the total text size. The following types of datasets require more time to analyze:
n Datasets with a very high number of similarly-sized documents n Datasets with a very low number of textually similar documents
Additionally, if you lower the Minimum Similarity Percentage from the default of 90, more time is required to analyze the dataset.
4.6 Scaling
It is often beneficial to add multiple analytics servers to the Relativity environment. This allows jobs to run concurrently without adversely affecting each other. It also allows servers to be dedicated to a feature set (structured or indexing) which makes RAM management easier. The following table shows some example environment configurations as well as the typical upper limitation that will be encountered. The upper limitation assumes no other concurrent activity on the server. The upper limit is intended to server as an estimate and is not a guarantee. Datasets vary widely, and some may require more server resources than usual.
4.6.1 Tier 1 Example
For an entry level environment with 100 or more users, usually one analytics server is enough. Here is an example of the server configuration you may have:
Server Name
Role RAM JVM CPU
Cores Upper limit -Structured Upper limit -Indexing
ANA-01 Both Structured and Indexing 32 GB 16 GB 8 3 MM documents 3 MM documents
4.6.2 Tier 2 Example
For a mid-level environment with over 300 users, you may need to add an additional server to allow for concurrent jobs. Splitting the server roles also allows the servers to work on more data due to the allocation of Java Heap. Here is an example environment configuration:
Server Name Role RAM JVM CPU Cores Upper limit - Structured Upper limit - Indexing
ANA-01 Structured 32 GB 24 GB 8 4 MM documents N/A
ANA-02 Indexing 32 GB 10 GB 4 N/A 4 MM documents
4.6.3 Tier 3 Examples
For a large scale environment, you will likely need to scale up the server and add additional servers. Splitting the server roles allows the servers to work on more data due to the allocation of Java Heap. Adding even more analytics servers allows more jobs to run concurrently. However, adding a large amount of RAM to one server will allow a very large job to complete successfully. The balance needs to be determined based on the client needs. The following are example Tier 3 environment configurations:
Example 1
Server Name Role RAM JVM CPU Cores Upper limit - Structured Upper limit - Indexing
ANA-01 Structured 64 GB 48 GB 16 8 MM documents N/A
ANA-02 Indexing 64 GB 22 GB 12 N/A 7 MM documents
Example 2
Server Name Role RAM JVM CPU Cores Upper limit - Structured Upper limit - Indexing
ANA-01 Structured 128 GB 96 GB 32 16 MM documents N/A
ANA-02 Indexing 128 GB 48 GB 24 N/A 14 MM documents
5 Setting up your workspace
Use the following best practices for workspace administrators to optimize your Relativity environment.
5.1 Fixed-length vs. long text field types
Use the appropriate field type and length for text fields within Relativity.
Fixed-length text - A text field with a limited character length.
n Maximum technical limitation of 4,999 characters n Examples: Author, Email From, Email Subject
Long Text - A text field larger than 4,999 characters.
n The technical limitation on a long text field is 2 GB.
n Relativity is typically configured to stream files smaller than 50 MB. The maximum file size can be set in the configuration table.
n Only the first 20,000 characters are displayed in Layouts and Views.
You can create indexes on particular columns to improve performance for queries against large databases. See the Admin Guide for more information.
An index cannot be created on a column of type Long Text, a Unicode-enabled fixed-length text field of more than 450 characters or a non-Unicode-enabled fixed-length text field of more than 900 characters. Consider these SQL limitations when deciding between the two field types.
5.2 Fixed-length text field considerations
If there are more than 8,060 characters in a single database row, SQL may experience performance issues. As a result, a best practice is to limit the length of fixed-length text fields.
You can get the total field length usage for any workspace by viewing the document objects properties in the Object Type tab for that workspace. This also applies to any Dynamic Objects.
When you create a fixed-length text field, configure the length of the field to the appropriate size. The length can be increased as necessary throughout the life of the workspace. If you have a field that will never exceed a certain length, set the length of the field to that value.
For example, a Document Extension field should have a small field length value. If there is a field that could eventually grow to a great length — a Unicode-enabled fixed-length text field of more than 450 characters or a non-Unicode-enabled fixed-length text field of more than 900 characters — then set it as a Long Text field initially rather than a Fixed Length Text field.
Note: For more information on limiting the total length of text fields in SQL, see
http://msdn.microsoft.com/en-us/library/ms186981.aspx.
After any major cleanup or length reduction of fixed-length text fields in a workspace, run the DBCC CleanTable SQL Script afterhours on the database to reclaim free space. For the steps on running this script, seehttp://msdn.microsoft.com/en-us/library/ms174418.aspx.
5.3 Unicode support
Define the Unicode Enabled property of a field prior to importing data. A database level operation to convert a field’s data from Unicode to non-Unicode (or vice versa) can take a long time, lock
tables, and potentially timeout if performed on a large data set in Relativity. Only the following field types are Unicode-compatible:
n Fixed Length Text n Long Text
n Single Choice List n Multi Choice List
Define the Extracted Text field as Unicode if there is a possibility that text in languages other than English will be loaded into the workspace.
5.4 Data imports
Use the Relativity Desktop Client to import data into Relativity. There are two methods of importing native/TIFF files:
1. Copying files from the original location in the selected load file field to the document repository. This copies the files from the original location in the selected load file field to the selected document repos-itory.
2. Importing files that already reside in a valid, Relativity-accessible location.
Note: Use this option when native files already have been copied to their final location, accessible by Relativity. This location should be separate from that of the Relativity document repository. The default option can be set on an environment-wide level. For more information, see the Relativity Configuration Table Guide.
For either method, the selected native file path field should point to the current location of the native files.
Many clients use the second option to reduce the total time it takes to load files into the Relativity repositories. You can manually copy files to where they should reside and then providing the
Relativity Desktop Client with pointers to their locations. This can save significant time when loading large amounts of natives and/or TIFFs.
Never run the Relativity Desktop Client on a Relativity production server (Web, Agents, SQL, or Search). Launch the tool on a different server or workstation to prevent resource contention.
5.5 Analytics
Use the following guidelines when setting up an Analytics index in your workspace.
n When populating a Relativity Analytics index, exclude non-conceptual files such as XLS, EXE, or RAR from the index. Make sure that the Training Set is limited to documents with less than 2 MB of ted Text, and make sure that the Searchable Set is limited to documents with less than 30 MB of Extrac-ted Text.
n Disable queries on indexes that are no longer being actively used. Queries are automatically disabled for indexes that haven't been accessed within a pre-set number of days.
Note: This value is configured using the MaxAnalyticsIndexIdleDays value in the EDDS.EDDSDBO.Configuration table.
n When queries are enabled on an Analytics index, the index is loaded into RAM. Unused indexes shouldn't consume resources on the CAAT server.
n Use the following guidelines when setting up a structured analytics set in your workspace.
n When creating a structured analytics set, exclude files with more than 30 MB of text, as these may slow down the performance of the set.
5.6 Views and searching
Follow these best practices for views and searching:
n Avoid Is Like and Is Not Like statements on un-indexed queries. Full table scans on large databases using Is Like statements will severely load the server. A full table scan occurs when a query reads every row in a table in order to locate the required information.
If possible, use a Contains condition instead. In order to use "Contains," you must add the field to the Full Text Index.
Note: Is Set and Is Not Set criteria will also trigger a full table scan. A full table scan operation can be very IO-intensive and take a while to complete on slower storage units.
n Prevent users from filtering and/or sorting on Long Text fields within the document list in large work-spaces. To do this, make sure the field's the Allow Sort/Tally property is set to No.
5.7 Tally/Sum/Average mass operation
In the Tally/Sum/Average mass operation, the Tally option is audited. For the set of requested documents, it lists each unique value found in that field along with the frequency of each occurrence. The process can run on Fixed Length Text, Choice, User, and Number fields.
Tallying on un-indexed columns in large workspaces can take a while to complete and slow down database/system performance. Be sure to place indexes on columns in SQL that will be regularly tallied in large workspaces.
You can disable the ability to Tally in a workspace by setting each field's Allow Sort/Tally property to
No.
5.8 Group by for Pivot
When performing Pivot operations in large workspaces, you can improve Pivot performance by placing a non-clustered index on the field being grouped on. Use SQL Server Management Studio to place the index.