only in accordance with the terms of the applicable agreement. No part of this guide may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording for any purpose other than the purchaser’s personal use without the written permission of Quest Software, Inc.
The information in this document is provided in connection with Quest products. No license, express or implied, by estoppel or otherwise, to any intellectual property right is granted by this document or in connection with the sale of Quest products. EXCEPT AS SET FORTH IN QUEST'S TERMS AND
CONDITIONS AS SPECIFIED IN THE LICENSE AGREEMENT FOR THIS PRODUCT, QUEST ASSUMES NO LIABILITY WHATSOEVER AND DISCLAIMS ANY EXPRESS, IMPLIED OR STATUTORY WARRANTY RELATING TO ITS PRODUCTS INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. IN NO EVENT SHALL QUEST BE LIABLE FOR ANY DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE, SPECIAL OR INCIDENTAL DAMAGES (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF PROFITS, BUSINESS INTERRUPTION OR LOSS OF INFORMATION) ARISING OUT OF THE USE OR INABILITY TO USE THIS DOCUMENT, EVEN IF QUEST HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Quest makes no representations or warranties with respect to the accuracy or completeness of the contents of this document and reserves the right to make changes to specifications and product descriptions at any time without notice. Quest does not make any commitment to update the information contained in this document.
If you have any questions regarding your potential use of this material, contact: Quest Software World Headquarters
LEGAL Dept 5 Polaris Way
Aliso Viejo, CA 92656 email: [email protected]
Refer to our Web site (www.quest.com) for regional and international office information. Trademarks
Quest, Quest Software, the Quest Software logo, and Simplicity at Work are trademarks of Quest Software and its subsidiaries. See http://www.quest.com/legal/trademarks.aspx for a complete list of Quest Software’s trademarks. Other trademarks are property of their respective owners.
Quest One Identity Manager Data Governance Edition - Classification Module - User Guide Updated - May 2013
Software Version - 6.1
Third Party Contributions
Quest One Identity Manager contains some third party components (listed below). Copies of their li-censes may be found at http://www.quest.com/legal/third-party-licenses.aspx.
COMPONENT LICENSE OR ACKNOWLEDGEMENT
.Less 1.3.1 Apache License Version 2.0, January 2004 http://www.apache.org/ licenses. Apache 2.0 License.
Apache Tomcat 7.0 Apache License Version 2.0, January 2004 http://www.apache.org/ licenses. Apache 2.0 License.
asm 3 Copyright (c) 2000-2011 INRIA, France Telecom All rights reserved. Project License - INRIA, France Telecom.
Boost 1.34.1 Boost Software License - Version 1.0 - August 17th, 2003. Boost 1.0 License.
cherrypy 3.1.1 Copyright (c) 2002-2008, CherryPy Team ([email protected]) All rights reserved. BSD 4.4 License
commons-httpclient 4 Apache License Version 2.0, January 2004 http://www.apache.org/ licenses. Apache 2.0 License.
CyberNeko 1.9 Apache License Version 2.0, January 2004 http://www.apache.org/ licenses. Apache 2.0 License.
Dojo Toolkit 1.8.3 Copyright. All Rights Reserved. BSD Simple License.
dom4j 1.6.1 Copyright 2001-2005 (c) MetaStuff, Ltd. All Rights Reserved. This product includes software developed by dom4j(http:// www.dom4j.org/). Dom4J 1.6.1 License.
Erlang 16 ERLANG PUBLIC LICENSE Version 1.1. Erlang Public License 1.1 Google Open Sans 1.0 Copyright (c) January 2004 (http://www.apache.org/licenses). Apache
2.0 License.
Java SE 6 javase-6 Nov 30, 2011
Java Mozilla HTML Parser 0 Mozilla Public License (MPL) 1.1
JCIFS 1.3.14 Copyright (C) 1991, 1999 Free Software Foundation, Inc. 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
GNU Lesser General Public License 2.1
jcrop 0.9.9 Copyright (c). MIT License.
jTDS SQL Server Driver 1.2 Copyright (c) 2007 Free Software Foundation, Inc.
Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
GNU LGPL Version 3, 29 June 2007
JQuery 1.7.1 Copyright (c). MIT License.
JQuery UI 1.8.20 Copyright (c). MIT License.
log4j 1.2 Copyright (c) 2000 The Apache Software Foundation. All rights reser-ved. The Apache Software License, Version 1.1
Novell.Directory.LDAP 2.1.9.0 Copyright (c). MIT License.
Ontopia 5 Apache License Version 2.0, January 2004 http://www.apache.org/ licenses/. Apache 2.0 License.
pyodbc 2.1.3 Copyright (c). MIT License
Python 2.5.2 Python 2.5 license 2.5
Copyright 2001-2006 Python Software Foundation All rights reserved.
Copyright 1995-2001 Corporation for National Research Initiatives. All rights reserved.
Copyright 1991-1995 Stichting Mathematisch Centrum Amsterdam, The Netherlands. All rights reserved.
PJL Compressing Filter 1 Apache License Version 2.0, January 2004 http://www.apache.org/ licenses/. Apache 2.0 License.
RabbitMQ 2 Mozilla Public License (MPL) 1.1 RabbitMQ Client C# 2 Mozilla Public License (MPL) 1.1 RabbitMQ Client Java 2 Mozilla Public License (MPL) 1.1 SharpZipLib 0.85.4.369 SharpZipLib License
SQLAlchemy 0.5.0 Copyright (c). MIT License spin.js 1.2.2 Copyright (c). MIT License.
tika-app 1 Apache License Version 2.0, January 2004 http://www.apache.org/ licenses/. Apache 2.0 License.
UUID 3 Copyright (c). MIT License.
Windows Installer XML toolset
(aka WIX) 3.6.3303.0 Microsoft Reciprocal License (MS-RL) License Xalan Java 2.7.1 The Apache Software License, Version 1.1
Copyright (c) 2000 The Apache Software Foundation. All rights reser-ved. Apache 1.1 License.
ZLib.NET 1.0.3 Copyright (c) 2006, ComponentAce (http://www.componentace.com). All rights reserved.
INTRODUCTION . . . .3
ABOUTTHIS GUIDE. . . 4
SYSTEM REQUIREMENTS . . . 4
REQUIRED PORTS. . . 6
PERFORMANCE CALCULATIONS . . . 6
ADJUSTING CPU THROTTLING LEVELS. . . 7
DEPLOYING CLASSIFICATIONIN IDENTITY MANAGER . . . .9 CLASSIFICATION OVERVIEW. . . 10 REQUIRED COMPONENTS. . . 11 COMPONENT WORKFLOW. . . 12 WORKFLOW DETAILS. . . 13 ACTIVATING CLASSIFICATION . . . 14
INSTALL THE CLASSIFICATION COMPONENTS. . . 14
IDENTIFYTHE CLASSIFICATION SERVICE ACCOUNT . . . 14
DEPLOY THE CLASSIFICATION SERVER. . . 16
DEPLOY CLASSIFICATION WORKERS . . . 17
EXAMINING THE CLASSIFICATION DEPLOYMENT . . . 17
ENABLEAND DISABLE AUTOMATIC CLASSIFICATIONON SPECIFIC MANAGED HOSTS. . . 19
ASSIGN CLASSIFICATION APPLICATION ROLES. . . 22
CONFIGURING CLASSIFICATION: TAXONOMIES, CATEGORIES, AND RULES . . . .25
AN OVERVIEWOF CLASSIFICATION CONFIGURATION . . . 26
STEPS REQUIREDTO IMPLEMENT CLASSIFICATION . . . 26
CREATING TAXONOMIES . . . 27
WORKINGWITH TAXONOMIES. . . 28
WORKINGWITH CATEGORIES. . . 30
SETTING UP MANUAL CATEGORIZATION. . . 38
IMPLEMENTING RULES FOR AUTOMATED CATEGORIZATION. . . 39
AN OVERVIEWOF RULES . . . 39
HOW RULES AFFECT CATEGORIZATION . . . 39
WRITING XML RULES . . . 44
MANAGING RULES INTHE CLASSIFICATION SYSTEM . . . 48
ASSOCIATING RULES TO CATEGORIES. . . 49
TESTING AND REVIEWING AUTOMATED CLASSIFICATION. . . 50
MAKING A CATEGORY AVAILABLETOTHE AUTOMATED SYSTEM. . . 55
CLASSIFYING RESOURCES . . . 56
WORKINGWITH CLASSIFICATION TAXONOMIES . . . 56
WORKING WITH A TAXONOMY XML FILE . . . 60
MANAGINGTHE LIFE CYCLE OF TAXONOMIES AND CATEGORIES. . . 60
TAXONOMY DEPLOYMENT CONSIDERATIONS . . . 61
DEPLOYINGA TAXONOMY . . . 61
MODIFYING A PRODUCTION TAXONOMY. . . 62
ADVANCED RULE APPLICATIONS . . . 64
THRESHOLD CONSIDERATIONS . . . 64
WORKING WITH EXTRACTORS . . . 66
WORKING WITH CATEGORIZED RESOURCES . . . .73
WORKINGWITH THE CATEGORIZATIONOF YOUR RESOURCES. . . 74
WHAT CATEGORIES CAN YOU APPLY? . . . 74
WORKING WITH MANUALLY CATEGORIZED RESOURCES . . . 74
WORKING WITH AUTOMATICALLY CATEGORIZED RESOURCES. . . 76
CATEGORIZATION STATISTICS AND VIEWS . . . 77
APPENDIX A: POWERSHELL CMDLETS . . . .79
ADDINGTHE POWERSHELL SNAP-INS . . . 80
DEPLOYINGTHE CLASSIFICATION SERVERAND THE CLASSIFICATION WORKER. . . 80
TROUBLESHOOTING DEPLOYMENT. . . 81 MANAGING TAXONOMIES. . . 81 TAXONOMY MANAGEMENT . . . 81 CATEGORY MANAGEMENT . . . 82 RULES MANAGEMENT. . . 82 CLASSIFICATION ANALYSIS . . . 83
APPENDIX B: ORACLE CONFIGURATION. . . .85
USINGAN ORACLE DATABASE FOR CLASSIFICATION . . . 86
APPENDIX C: CLASSIFYING DATAWITH DATA GOVERNANCE TEMPLATES. . . .89
AVAILABLE TEMPLATES . . . 90
WORKINGWITH THE SAMPLE TAXONOMY TEMPLATES. . . 90
SAMPLE TEXT EXTRACTORS DETAILS. . . 94
ABOUT QUEST SOFTWARE . . . .123
ABOUT QUEST SOFTWARE. . . .124
CONTACTING QUESTSOFTWARE, INC.. . . .124
CONTACTING QUEST SUPPORT . . . .124
1
Introduction
• About this Guide
• System Requirements
• Required Ports
About this Guide
This document has been prepared to assist you in becoming familiar with Quest One Identity Manager Data Governance Edition — Classification Module.
This document is for network administrators, consultants, analysts, IT professionals responsible for de-ploying Data Governance in their organization, and Web Portal users. It provides typical use cases and step-by-step instructions to help you understand how to use Data Governance to secure the unstruc-tured data in your organization.
This guide is supplemented with the Data Governance Edition Deployment Guide, User Guide, and Quick Start Guide, which provides more detailed information about the Data Governance features, and includes instructions to help administrators perform day-to-day administrative activities.
System Requirements
Review the following section to ensure that your system meets the following minimum requirements for the Classification Module.
Database Server Requirements
• Microsoft SQL Server Standard Edition 2008 Service Pack 3 • Microsoft SQL Server Standard Edition 2008 R2 Service Pack 1
• Microsoft SQL Server 2012 Standard Edition, Service Pack 1 (Compatibility level for databases: SQL Server 2008 (100))
• Oracle database 11g r2 Enterprise Edition version 11.2 (patch level will vary with operating sys-tem platform)
• Microsoft Windows Operating Systems: Windows Server 2003 R2, Windows Server 2008, Win-dows Server 2008 (R2) (32 bit or non-Itanium 64 bit), or WinWin-dows Server 2012
• 32 GB RAM minimum
• In addition to Q1IM Database Server requirements, an additional 30GB per million resources
Classification Server Requirements
• 64-bit Windows Server OS (Windows Server 2003 (R2), Windows Server 2008, Windows Serv-er 2008 (R2), Windows SServ-ervServ-er 2012
• 500 MB of space required for installation, 200 MB space for logs, plus an additional 2 GB per 1 million resources for data processing
• 8 GB RAM • Quad core CPU
Worker Server Requirements
• 64-bit Windows Server OS (Windows Server 2003 (R2), Windows Server 2008, Windows Serv-er 2008 (R2), Windows SServ-ervServ-er 2012
• 300 MB of space required for installation, plus an additional 300 MB for logs • 8 GB RAM
• Quad Core CPU
• .NET 3.5.1
Classification Agent Host Requirements
• 4GB RAM (if hosting multiple agents, 16GB RAM)
• 100 MB free disk space for every million resources scanned
• 2 GHz or faster x86 or x64 bit processor (if hosting multiple agents, quad core CPU) • .NET 3.5.1
• Classification enabled local agents are not supported on Windows Server 2003 or Windows Server 2003 R2 operating systems.
• Agents hosts installed on Windows Server 2003 or Windows Server 2003 R2 operating systems are not supported if they are scanning a classification-enabled managed host.
Classification Account Requirements
• The Classification service account requires administrative privileges on the Classification Serv-er, Worker Servers, and any agent hosts.
Required Ports
Performance Calculations
The following performance counters can help you to understand how Classification is affecting your sys-tem’s performance.
Data Governance Server and Agent
PORT DIRECTION DESCRIPTION
8721 Incoming DGE Server - HTTP
8722 Incoming DGE Server - net.tcp
8723 Incoming DGE Server - REST
18530 -
18630 Incoming DGE Agent (Each agent on the same server uses a dynamic port within the range.) 18529 Incoming Classification Agent - HTTP
Classification Server
PORT DIRECTION DESCRIPTION
8725 Incoming Apache
8726 Incoming Message Queue
8727 Incoming Message Queue Management
8728 Incoming Tomcat
Classification Worker
PORT DIRECTION DESCRIPTION
8730 Incoming Worker WCF Host (HTTPS) 8729 Incoming RuleEngine WCF Host (HTTPS)
Content Provider
COUNTER DESCRIPTION
# assets / sec Number of resource callback requests to the Data Governance agent for content per second.
KB of binary
content in / sec Rate of binary content flowing into the agent for processing into plain text. KB of binary
content out / sec
Adjusting CPU Throttling Levels
Extracting text for the purpose of categorization and classification may cause strain on the agent com-puter’s CPU. To ensure the classification process does not disrupt any other services running on your computer, you can enable CPU throttling. The optimal value depends upon the other services are run-ning on the agent computer and how much cpu capacity you want dedicated to the classification pro-cess. If the value is set to for example 75%, the agent will never cause the load on the computer to ex-ceed that value. Setting this value too low will limit the classification process as the act of extracting content can trigger the throttling and cause a start/stop of the process.
The value to throttle at is set through creating the following registry key: "cpuUsageThreshold" DWORD key in [HKEY_LOCAL_MACHINE\SOFTWARE\Quest Software\Broadway\Agent\Services\contentRe-quester].
To disable the throttling, set the value to 0.
File Handler
COUNTER DESCRIPTION
# assets / sec Rate at which files are queued in the classification system to be processed by the rules engine.
Rule Engine
COUNTER DESCRIPTION
# assets / sec Number of resources being processed by the rule engine.
# matches Total number of rules that matched. Note: This does not mean that a resource was classified.
entity extractor
bytes / second Rate at which the plain text extracted from the resource is being processed. KB plain text
process / sec
Rate at which plain text content is examined for rule matches. rules processed /
sec Rate at which rules are run against plain text content.
Rule Engine Extractors
COUNTER DESCRIPTION
Average process-ing time
Average time it takes for one text extractor to process one resource.
Rule Engine Rules
COUNTER DESCRIPTION
Average
Classification Overview
Classification helps you and the security professionals in your organization understand the contents of your unstructured data, thereby ensuring that sensitive NTFS and SharePoint assets are properly se-cured.
More specifically, Quest One Identity Manager Data Governance Edition provides:
• The ability to categorize and classify data from Windows computers, Windows clusters, Net App Attached Storage Devices, and SharePoint. Numerous file types can be scanned to provide in-formation on the data in your organization, its content, and the categorization and classification that should be applied based on the automated system.
• Automatic and manual classification: Automatic classification evaluates your documents against a set of rules to automatically apply categories and ultimately classify your data. Man-ual categorization enables the appropriate business owner to control how the data is catego-rized and ultimately classified.
• Data security intelligence and control: Control data access through the automatic governance of data and policies based on classification. Classification also provides details and trends through statistics that identify the cost of data exposures. For example, you can see files lo-cated in a public folder that have been classified or categorized as Secret.
• Business data accountability: Assign data ownership based on classification policies and enable attestations and manual categorization by the business owner to ensure the classifications are valid.
• Classification enforcement: Specify ‘unbreakable’ rules that must be enforced and cannot be overridden.
• The ability to import Titus classification policies into the system. • Classification auditing.
By understanding the contents of a document using categorization, organizations can better secure their NTFS and SharePoint assets. Through both the Manager and the Web Portal, Identity Manager en-ables this by:
• Using an automated categorization engine to process documents and tag them according to defined rules
• Allowing the extension and customization of the automated categorization system
• Having the owner of the asset attest to its proper categorization, providing accountability • Allowing users to override the system to improve the accuracy of the categorization • Creating policies that define access to resource with a particular category
• Identifying violations to these policies, and providing a workflow to resolve them
Identity Manager includes templates to help you to test and understand the classification process. The templates include sample taxonomies, categories, extractors, and rules that can be used for automatic classification.
• Data Governance Sample taxonomy
• Data Governance Payment Card Industry (PCI) taxonomy • Titus Commercial taxonomy
Proper deployment of your classification system requires the coordination of the administrator respon-sible for managing the data that is scanned, the classification analyst responrespon-sible for managing the tax-onomies in the system, the business owners responsible for verifying and managing the categorization of resources, and the security or compliance officer responsible for oversight.
For details on managing your taxonomies and working with classified data, see Configuring Classifica-tion: Taxonomies, Categories, and Rules on page 25 and Working with Categorized Resources on page 73.
Required Components
Categorizing and classifying data through Identity Manager Data Governance Edition requires the installation and configuration of the following components:
• Classification Server includes the services that manage the classification engine repository, the Gateway service, and the content service. When a Data Governance agent scans a managed host and recognizes a new resource to be classified, it pushes the data to the Classification server, which queues request to process data by the Worker Service.
• Classification Worker includes the rules engine and the file and SharePoint handlers. By default one of each is installed, but this can be configured and installed on any number of computers to manage scalability.
The rules engine processes data and looks for matches to the predefined rules. Based on the matches, the Worker service determines whether categories are applied to the resource or not. • Secure Communication
For classification to be applied, Data Governance agents must be able to communicate securely to the Classification Server and Classification Worker. This is accomplished through installing the Classification Server and Classification Worker with an account with the required creden-tials. For details, see Identify the Classification Service Account on page 14.
• Synchronization with the Identity Manager database
When data is classified or assigned a category that has been deemed to cause governance, then the resource is updated and stored in the Identity Manager database.
Component Workflow
Agents discover resources during normal security scanning and notify the Classification Server. The Classification Server adds references to these resources to a queue where at some point a Classification Worker retrieves it for processing. The Classification Worker then retrieves the resource content from the agent and processes it to find any appropriate categorizations.
Workflow Details
The following diagram details the process:
1. During a security scan an agent identifies a file to be classified and notifies the Classification Service.
2. The Classification Service on the agent host computer forwards the request for classification to the Classification Server.
3. The Classification Server posts the resource to be classified onto a queue for processing. 4. One of the Classification Workers retrieves the resource to be classified from the queue and
begins the classification process.
5. A request for the resource content is dispatched to the Classification Service on the agent host for the agent responsible for this resource.
6. The Classification Service proxies this request to the proper agent scanning the target host. 7. The agent retrieves the content and streams it back to the Classification Service.
8. The Classification Service returns the content to the Classification Worker for processing. 9. All standard Classification/Categorization processing occurs and the results are written to the
Classification Database and the Data Governance Server is notified.
Activating Classification
For a fully functional Classification deployment, you need to perform the following tasks:
• Install the Classification.msi included with the Classification download on the Data Governance server to make it ready for a Classification deployment.
• Enable the Classification component in the Designer and recompile the database. The classifi-cation component is located in the Designer under TargetSystem\ADS\QAM.
• Once you have completed this process, a Classification node will be available in the Navigation view in the Manager from which you can manage your Classification deployment.
• Identify the Service Account that will be used for securing the classification services. • Deploy a Classification server
• Deploy Classification worker
• Enable Classification on the required managed hosts
• Upgrade agents on any existing managed hosts where classification has been enabled
• Ensure that you have applied the correct application roles for classification analysts, business owners, compliance officers, and Data Governance administrators.
Install the Classification Components
The Classification package obtained through the download contains all the files required to add the Classification functionality to your Quest One Identity Manager Data Governance Edition deployment.
To install Classification extension
• Run the DataGovernance_ServerComponentsInstaller_x64.msi to install the files on the Data Governance server to make it ready for a Classification deployment.
Once the installation is complete, you need to activate Classification in Quest One Identity Man-ager, deploy a Classification server and Classification Workers in your environment, and enable classification on individual managed hosts.
Identify the Classification Service Account
Network communication between the Data Governance Edition agents and server and the Classification components is all performed using REST services over HTTPS channels. By default the HTTPS channels are secured using a self-signed certificate, but customers can provide their own certificate.
Communication is further secured using a trusted subsystem security model. Before any Classification components can be deployed, one of the Data Governance Edition service accounts must be identified
The Classification Configuration Parameter is a node located under the Data Governance op-tion.
as the “Classification Identity”. When the Classification components are deployed they are configured to run as this identity. All communication related to classification will be performed using this identity.
To identify a service account as the Classification Identity
1. In the Data Governance Navigation view, select Service accounts. 2. In the Results list, double-click the required service account.
From the service account overview, you can view the domains associated with the selected ser-vice account.
3. From the Tasks view, select Change master data.
4. Select the Classification Identity check box, and click Save.
If the administrator changes the classification service account for any reason, all of the deployed ser-vices will need to be changed manually to use the new classification service account. To do this, you must go to every instance of a Classification server, Worker server, or Classification agent and ensure that they are logged on using the new service account credentials.
To update the service account as the Classification Identity
1. Log on to the computer where the Classification Server is installed.
2. Open Services, locate the Quest QCS Apache and Quest QCS Tomcat x64 services, right-click and select Properties.
3. Select the Log On tab, select the Account and enter the password and click OK. 4. Log on to the computer where the Worker server is installed.
5. Open Services, locate the Quest QCS Worker and Quest QCS Rule Engine services, right-click and select Properties.
6. Select the Log On tab, select the Account and enter the password and click OK. 7. Log on to all managed hosts with classification enabled.
8. Open Services, locate the Quest One Identity Manager Data Governance Classification Agent Service, right-click and select Properties.
9. Select the Log On tab, select the Account and enter the password and click OK.
The classification identity must be a member of the local administrators group on all agent computers where classification is to be deployed.
Deploy the Classification Server
The Classification Server can be installed on the same computer where the Data Governance Server is installed. However, for load balancing it is recommended to install the Data Governance and Classifica-tion Server on different computers and deploy ClassificaClassifica-tion Workers as required.
To deploy the Classification Server
1. In the Navigation view, expand the Classification node, and select Configuration.
The Classification Server address and port information will be displayed.
2. Click Deploy to add the Classification server.
A check will be made to ensure that a service account has been identified as the classification service account.
3. Browse to the target server and click Next.
The database requirements depend upon whether you are using a SQL or Oracle database.
4. If you are using a SQL database, select the required authentication method and associated cre-dentials, and click Next.
5. If you are using an Oracle database, enter the Service name, the Username and Password for the Content and Topic databases, and click Next.
The Classification server will now be deployed with the specified configuration.
To remove a Classification Server
1. In the Navigation view, expand the Classification node, and select the Configuration node.
From here you will see all the currently deployed services.
2. Click Undeploy.
3. Click Yes to confirm the removal.
The Classification server requires 500 MB of space for installation, 200 MB space for logs, plus an additional 2 GB per 1 million resources for data processing.
If you are using an Oracle database, you need to create the required tablespaces before in-stalling the Classification server. You must also ensure that the Classification server has the ADO client for Oracle (32bit version of ODP.Net) installed. Supported versions include ODAC 11.2 Release 3 or higher. For details, see Using an Oracle Database for Classification on page 86.
To locate the Service name, run the following cmd on the Oracle DB server: lsnrctl status.
Deploy Classification Workers
To add a Classification Worker
1. In the Navigation view, expand the Classification node, and select the Configuration node.
From here you will see all the currently deployed servers.
2. Select Add Classification Worker from the Tasks view.
You can deploy one server per computer and the computer must be in a managed domain.
3. Select the computer where you want to add the server, and click Deploy. 4. Click Close.
To remove a Classification Worker
1. In the Navigation view, expand the Classification node, and select the Configuration node.
From here you will see all the currently deployed workers.
2. Select the required Classification Worker, and select Remove Classification Worker from the Tasks view.
3. Click Yes to remove the Classification Worker from the deployment.
When you remove the worker, the rules engine for classifying data will no longer be processed on this computer.
To update the Classification Worker to a newer version
1. In the Navigation view, expand the Classification node, and select the Configuration node.
From here you will see all the currently deployed workers. If there is a disconnect between the currently installed versions and the expected versions, you will need to perform an upgrade.
2. Select the required Classification Worker, and select Upgrade Classification Worker from the Tasks view.
3. Click Yes to confirm the upgrade.
4. Click Close once the upgrade is complete.
Examining the Classification Deployment
There are a number of cmdlets available to help you examine your environment and troubleshoot any issues with it.
Using the Classification Logs
The classification logs can help you pinpoint any issues with your deployment. You can retrieve logs for the following services: Classification, Content, FileHandler, Gateway, TextService, Readability, RulesEn-gine, SharePointHandler, Topic or ViewProvider. You can set a date range for the logs, or choose the number of logged events. Use the Get-QClassificationLogs cmdlet for this. If you do not specify any pa-rameters, the last 200 events for each service are returned.
To view the classification logs
Provide the name of the computer hosting the Data Governance server, and the port. En-ter in the form compuEn-tername:port number. The default port is 8723.
2. If desired, you can use the following optional parameters: a) ServiceType
Returns only the logs for the selected service.
b) Severity
Choose from Info, Warn, Error, Fatal or Debug. If no type is specified then all logs are re-turned.
c) Limit
Returns the specified number of logs starting from the earliest event.
d) Tail
Returns the specified number of events starting from the last event and working back-wards.
e) Start time
The date of the first event to include in the log.
f) End time
The date of the last event to include in the log.
Review Deployed Classification Workers
You can view a list of computers hosting your worker servers to help you troubleshoot connection or other issues. You can also confirm that the proper version of the classification worker is installed. Use the Get-QWorkerServer cmdlet.
To view a list of classification workers
1. Run the Get-QWorkerServers cmdlet with the following mandatory parameter: a) ServerAddress
Provide the name of the computer hosting the Data Governance server, and the port. Enter in the form computername:port number. The default port is 8723.
Review the Classification Service
If your classification service is not functioning properly, you can use the Get-QServiceInfo cmdlet to troubleshoot the issue. You can use this cmdlet to:
• Ensure that the version of the classification server matches the version the Data Governance server expects. If the versions do not match, the system will not work properly.
• Get information about the account used by the classification service.
• Get information about the identity used to communicate with the DGE server. Use this to en-sure the expected user is accessing the classification service.
• Determine of the services required for classification are running (FileHandler and SharePoint handler)
To view information about the classification service
1. Run the Get-QServiceInfo cmdlet with the following mandatory parameter: a) ServerAddress
Enable and Disable Automatic Classification on
Specific Managed Hosts
Data Governance allows for both automatic and manual classification of resources.
Automatic classification refers to the process by which a resource is categorized according to defined rules. This is enabled on a per managed host basis to ensure that target computers containing poten-tially sensitive data are processed while maintaining a reasonable amount of network traffic.
Manual classification refers to assigning a resource to a given taxonomy and category. This is per-formed within the Web Portal by the business owner. Manual classification overrides automatic classifi-cation.
There may be a time lapse between when the business owner is able to manually classify data in the Web Portal and when the resources marked for automatic classification in the Manager are processed. If the business owner categorizes a resource prior to the processing, they will be able to eventually see the automatic processing results and make adjustments where required within the Web Portal.
For details on managing your classified resources, see Classifying Resources on page 56.
Automatic classification can be enabled when you add a managed host to the Data Governance deploy-ment or at a late date.
To enable automatic classification on currently deployed managed hosts
1. In the Navigation view, select Managed hosts.
2. Select the required managed host in the Managed host tab, and select Change master data in the Tasks view.
3. Select the Classification tab and check Enable automatic classification.
Enabling Categorization on Folders (Security Index Roots)
Before data can be processed and classified, the folders that contain the data must be identified. This is accomplished through the security index root configuration.
A security index root is the root of an NTFS directory tree to be scanned by an agent, or a point in your SharePoint farm hierarchy below which everything is scanned.
To enable classification on folders and their contents
1. In the Navigation view, select Managed hosts.
2. Select a managed host in the Managed host tab, and select Change master data in the Tasks view.
3. Select the Security Index Roots tab.
4. Select Configure security index roots from the Tasks view.
For a local host, you need to add and save your changes before you can add a security index root. For a remote host, you need to add and save your changes, then add an agent and save your changes before you can add a security index root.
Managing the File Types to be Classified
To reduce the amount of network traffic and expedite the process of classifying only the data that is of interest to you, you can easily configure the scans to focus on specific types of data.
The following data types are enabled for classification by default.
File extensions included by default
FORMAT APPLICATION (EXTENSION)
Archive 7-Zip (7Z)
GZIP (GZ) PKZIP (Zip) RAR archive (RAR) WINZIP (ZIP)
CAD Microsoft Visio (VSD,VSS,VTS)
Display Adobe PDF (pdf)
Mail Microsoft Outlook (MSG,OFT)
Microsoft Outlook Offline Storage File (OST) Microsoft Outlook Personal Folder(PST)
Presentation Microsoft PowerPoint (PPT,PPS,POT)
Microsoft PowerPoint Windows (PPT,PPS,POT) Microsoft PowerPoint Windows
XML(PPTX,PPTM,POTX,POTM,PPSX,PPSM)
OASIS Open Document Format (SXD,SXI,ODG,ODP) OpenOffice Impress (SXI,SXP,ODP)
StarOffice Impress (SXI,SXP,ODP)
Spreadsheet Comma Separated Values (CSV)
Microsoft Excel Charts (XLS) Microsoft Excel Macintosh (XLS)
Microsoft Excel Windows (XLS,XLW,XLT,XLA)
Microsoft Excel Windows XML (XLSX,XLTX,XLSM,XLTM,XLAM) OASIS Open Document Format (ODS,SXC,STC)
OpenOffice Calc (SXC,ODS,OTS) StarOffice Calc (SXC,ODS)
Text and Markup ANSI(TXT)
ASCII (TXT) HTML (HTM)
Microsoft Excel Windows XML (XML) Microsoft Word Windows XML (XML) Rich Text Format (RTF)
To manage the files to be considered for classification
1. In the Navigation view, expand the Classification node and select File Types.
You will see a list of file types currently being assessed during agent scans.
2. Use the arrows to add and remove the file types that you want scanned and click Save.
Upgrade Agents for Classification
You must restart an agent for the classification process to take effect during a scan of the associated managed hosts.
To restart an Agent
1. In the Navigation view, select the Agents View.
2. Select the required agents in the Agents view tab, and select Restart agent in the Tasks view. 3. Click Yes to confirm.
Perform a Data Re-classification with PowerShell
Using PowerShell you can cause an immediate re-classification of all NTFS and SharePoint data for all of the managed hosts within your environment or on only selected data.
If you have made changes to any existing rules within a taxonomy that will result in changes to how data has been previously classified. You can run this cmdlet to ensure the classification reflects the change.
Word Processing Microsoft Word Macintosh (DOC)
Microsoft Word Macintosh (DOT) Microsoft Word PC (DOC) Microsoft Word Windows (DOC) Microsoft Word Windows (DOT) Microsoft Word Windows XML (DOCM) Microsoft Word Windows XML (DOCX) Microsoft Word Windows XML (DOTX) Microsoft Word Windows XML (DOTM) WordPad (RTF)
Ensure that the enabled file type extensions have not been explicitly excluded from agent scans. For details, see Managing Exclusions in the Data Governance Edition User Guide.
When a Data Governance agent is restarted, it re-creates all information within its local index. The server index is updated when the full scan completes. Local managed hosts/agents will al-ways scan on restart and rebuild the index - all other types of hosts require the Immediately scan on agent restart option enabled in the Master data form | Scanning Schedule tab. To de-termine whether data in the client is the most current from the agent, ensure that the data state of the managed host being examined is marked as “Data Available.”
File extensions included by default
The ability to perform an immediate re-classification is important for manual classification of contain-ers. In cases where classification should be inherited by child resources, the children will need to be re-processed to have this inherited classification applied.
Syntax: Request-QClassification <ServerAddress> [ManagedHostId] [Folder]
To force a re-classification of your data using PowerShell
1. Run the Request-QClassification cmdlet with the following parameters: a) ServerAddress (Required parameter)
Provide the name of the computer hosting the Data Governance server, and the port. En-ter in the form compuEn-tername:port number. The default port is 8723.
b) ManagedHostid (Optional parameter)
Provide the ID of the required managed host. If not specified, all managed hosts enter-prise-wide will be re-classified.
c) Folder (Optional parameter)
Specify the required folder to scan.
Examples:
To re-classify all recognized data on the managed hosts in your environment, enter only the server ad-dress: Request-QClassification “server.address.com:8723”.
To re-classify all recognized data on a specific managed host, specify the host ID but not a specific folder: Request-QClassification “server.address.com:8723”
“92c17163-a883-4037-a4f6-3735cfeae732”.
To re-classify the contents of a specific folder, on a specific managed host, enter all the parameters: Re-quest-QClassification “server.address.com:8723” “92c17163-a883-4037-a4f6-3735cfeae732” “C:\Im-portantDocuments”.
Assign Classification Application Roles
The following application roles are specifically for Classification functionality. They are to be used in conjunction with Quest One Identity Manager and Data Governance specific application roles. For de-tails on applying application roles, see the Quest One Identity Manager Getting Started Guide and Data Governance Edition User Guide.
Administrators
Employees assigned this role are responsible for the care and maintenance of the Data Governance Edi-tion deployment including the ClassificaEdi-tion services. This Employee uses the administraEdi-tion tools (Manager/Identity Manager) to ensure the Business Owners, Classification Analyst, and Compliance Of-ficers have access to all required information through the web portal.
They are primarily responsible for the deployment of the managed hosts, managed domains, service accounts, Classification Servers, and Classification Workers.
Members of this role can:
• Manage the Classification infrastructure and services using the Manager. • Configure the file extensions that will be classified by the automated system. • View taxonomy structures and category properties.
• Manage the classifications of any resource, regardless of ownership.
• Manage the automated classification and categorizations, including rules and category associ-ations.
• Run what-if commands and categorization analysis features using PowerShell commands.
Classification Analyst
Employees assigned this role are responsible for implementing classification, taxonomies, and rules and to manage the automated system as designed by the business. This employee uses the web portal to modify rules, troubleshoot categorizations, view classified resources across the entire deployment, and manage taxonomies.
Members of this role can:
• Configure file extensions that will be classified by the automated system using the web portal. • View all taxonomy structures and category properties and settings being used by the system. • Modify taxonomy structures and category properties.
• View all classifications of all resources in the system, regardless of ownership. • Manage the classifications of any resource regardless of ownership.
• Manage the automated classification and categorization, including rules and category associa-tions.
• Run what-if commands and categorization analysis features.
Compliance and Security Officer
Employees assigned this role are responsible for over seeing the Classification deployment and ensur-ing security requirements are met as defined by the organization. They are responsible for reviewensur-ing classified resources across the system regardless of ownership.
Members of this role can:
• View all taxonomy structures and category properties and settings through the web portal. • View all classifications of all resources in the system, regardless of ownership.
Business Owner
Employees assigned this role are responsible, through the web portal, for managing and attesting to the classification of resources that they own.
Members of this role can:
• Manage the classifications of their owned resources. • Read all classifications on their owned resources.
To assign application roles
1. In the Quest One Identity Manager Navigation view, select Employees. 2. In the Results list, select the required employee.
3
Configuring Classification:
Taxonomies, Categories, and
Rules
• An Overview of Classification Configuration
• Steps Required to Implement Classification
• Creating Taxonomies
• Setting Up Manual Categorization
• Implementing Rules for Automated Categorization
• Classifying Resources
• When Do Categorization and Classification Occur?
• Importing and Exporting Taxonomies
• Working with a Taxonomy XML File
An Overview of Classification Configuration
Categorization is intended to provide information about your data that can help you better understand the state of your environment, and secure information based on an understanding of a resource’s con-tent. The end result of classification is a relationship between a resource and a particular category. In order for categorization to have value in your organization, the category must tell you something spe-cific about the resource, and you must have confidence that system is applying these categories accu-rately.
By working with the components of the classification system, and using a combination of automatically and manually applied categories, you can refine the system. The following outlines the components of the system and other necessary concepts:
Steps Required to Implement Classification
Proper deployment of your classification system requires the coordination of the administrator respon-sible for managing the data that is scanned or monitored, the classification analyst responrespon-sible for managing the taxonomies in the system, the business owners responsible for verifying and managing the categorization of resources, and the security or compliance officer responsible for oversight. You
Components of the Classification System
COMPONENT DESCRIPTION
Resource The NTFS or SharePoint object that is being categorized.
Taxonomy A hierarchical group of categories. For more information see
Working with Taxonomies on page 28.
Category A well defined division in the classification system. By associ-ating rules with the category, it can be determined if a given resource belongs to that category. For more information, see How Rules Affect Categorization on page 39.
Rule A rule sets the criteria for categorization according to that
rule. More than one rule can be assigned to a category. For more information see Implementing Rules for Automated Categorization on page 39.
Rule Engine Processes a resources extracted text and identifies all rele-vant entities (such as names, addresses and so on), running all rules to determine rule matches, and where appropriate, assigning a category to the resource.
Categorization A relationship between a resource and a category. This rela-tionship can be created manually, or as a result of passing the rules associated with the category.
should also consider how you plan to make changes over time. See Managing the Life Cycle of Taxono-mies and Categories on page 60.
Creating Taxonomies
Careful planning and coordination is required to get the most out of classification in Quest One Identity Manager Data Governance Edition. Ideally, one or more well-organized taxonomies will be de-ployed in your organization, and used to categorize resources of interest in your environment.
ACTION ROLE FOR MORE INFORMATION
Activate classification in your
deploy-ment. Data Governance Administra-tor Activating Classification on page 14 Set up scanning and change watching
for classification on your servers
Data Governance Administra-tor
Enable and Disable Automa-tic Classification on Specific Managed Hosts on page 19 Create taxonomies and add categories Classification Analyst Creating Taxonomies on
page 27 Make categories available for manual
categorization if desired Classification Analyst Setting Up Manual Categorization on page 38 Add rules and associate them to
cate-gories, and adjust the category thresh-old if needed.
Classification Analyst Implementing Rules for Automated
Categorization on page 39 Manage your classification taxonomy Classification Analyst Working with Classification
Taxonomies on page 56 Test your rules and categories to
ensure desired results Classification Analyst Testing and Reviewing Auto-mated Classification on page 50
Make categories available for
auto-mated categorization Classification Analyst Making a Category Available to the Automated System on page 55
Build polices, attestations and reports
to help secure resources Compliance OfficerSecurity Officer See the Identity Manage-ment User Guide Refine the categorization of resources Business Owner
Classification Analyst
Working with the Categori-zation of Your Resources on page 74
Manage the life cycle of your categories Classification Analyst Managing the Life Cycle of Taxonomies and
Categories on page 60
A taxonomy is a set of related categories, organized as a tree structure. The top node represents the taxonomy as a whole, and each branch is a category. Although taxonomies tend to be tall rather than deep, you can have subcategories nested as you need.
All categories in a taxonomy should be related in some way. Create a separate taxonomy for each re-lated set of categories. This makes it easier for users to understand their resources’ categorization.
To view the taxonomies in your environment using the Web Portal
• Select Governed Data | Taxonomy Manager | Categorizations.
To return a list of taxonomies in your environment using PowerShell
• Run the Get-QTaxonomies cmdlet with the following mandatory parameter: a) ServerAddress
Provide the name of the computer hosting the Data Governance server, and the port. Enter in the form computername:port number. The default port is 8723.
Working with Taxonomies
Using the Web Portal or Quest.Classification PowerShell snap in, you can create and edit taxonomies. See Deploying a Taxonomy on page 61 before publishing any taxonomies in your production
Data Governance environment.
You can work with taxonomies using the following methods: • Web Portal, under the Governed Data Node
• Powershell snap-in (see Adding the PowerShell Snap-ins on page 80)
• Editing the template XML directly (see Working with a Taxonomy XML File on page 60)
Creating a Taxonomy
In Identity Manager, you can create your own taxonomies using either the Taxonomy Manager in the Web Portal, or the PowerShell cmdlets found in the Quest.Classification snap-in.
When you create a taxonomy, you are providing the base for the category tree, as well as creating a category that could be applied to resources. For example, if you are creating a PHI taxonomy, you will then add categories to it to create the desired taxonomy. However, if you want, you can assign rules to the top level node, PHI, for it to be used in automated categorization, or you can make it available for manual categorization. There are a number of parameters associated with a category. See Working with Categories on page 30 for more information. These parameters only affect the use of the top node of the taxonomy tree applied as a category, and do not apply to the taxonomy as a whole. For example, when you select Publish this category, it does not make the entire taxonomy available, only the top node.
To create a taxonomy using the Web Portal
1. Select Governed Data | Taxonomy Manager | Categorizations. 2. Click Create new taxonomy.
3. Provide a name for the taxonomy.
The name will appear anywhere the taxonomy is shown.
4. Enter an optional description.
The description appears in the list of taxonomies on the Manage Taxonomies page.
6. Click Save.
The Edit Taxonomy dialog box appears. You can either add categories now, or click OK to com-plete the creation of the taxonomy. For more details, see Creating a Category on page 32,
Editing a Category on page 35 and Deleting a Category on page 38.
To create a taxonomy using PowerShell
1. Run the Add-QTaxonomy cmdlet with the following mandatory parameters: a) ServerAddress
Provide the name of the computer hosting the Data Governance server, and the port. En-ter in the form compuEn-tername:port number. The default port is 8723.
b) Name
The name will appear anywhere the taxonomy is shown.
2. If desired, you can set any of the following optional parameters: a) Description
The description appears in the list of taxonomies on the Manage Taxonomies page.
b) Category parameters: Risk, CausesGovernance, IsPublished, IsAutomaticClassificationEn-abled, IsMutuallyExclusive, IsStrictlyOrdered.
By default, the risk is set to 0, and all other parameters are set to $false. The threshold is set to 1. For more information on setting the parameters on a category, see Working with Categories on page 30.
Editing a Taxonomy
You can change the name and description of a taxonomy. If you plan to apply the top node of a taxon-omy as a category, you may want to change the category parameters. For more information, see Editing a Category on page 35.
To edit the name and description of a taxonomy using the Web Portal
1. Select Governed Data | Taxonomy Manager |Categorizations. 2. Locate the row containing the taxonomy, and click Edit.
3. Select the top node of the tree, and click Edit. 4. Modify the name and description.
5. Click Apply Changes.
You can change any of the category parameters as well. For details, see Working with Categories on page 30.
To edit the name and description of a taxonomy using PowerShell
1. If you do not know the required taxonomy ID, run the Get-QTaxonomies cmdlet, using the fol-lowing mandatory parameter:
a) ServerAddress
Provide the name of the computer hosting the Data Governance server, and the port. En-ter in the form compuEn-tername:port number. The default port is 8723.
b) Locate your desired taxonomy, and note or copy the taxonomy ID. 2. Run the Set-QTaxonomy cmdlet, with the following parameters:
a) ServerAddress (mandatory)
Provide the name of the computer hosting the Data Governance server, and the port. En-ter in the form compuEn-tername:port number. The default port is 8723.
b) TaxonomyID (mandatory)
The ID of the taxonomy you want to change.
The new name of the taxonomy.
d) Description (optional)
The updated description of the taxonomy.
You can change any of the category parameters as well. For details, see Working with Categories on page 30.
Deleting a Taxonomy
If a taxonomy has been in use, you should use extreme care deleting it. When you delete a taxonomy: • All categories in the taxonomy will be deleted.
• If resources were categorized using any category in the taxonomy, the association will be re-moved.
• Any policy that included a category from the taxonomy may no longer have the expected re-sults.
• Attestations involving any category from the taxonomy may no longer work. • Reports will no longer include data about any category in this taxonomy.
If you choose to delete a taxonomy, you should ensure that the proper administrators are notified so that policies, attestations and reports can be modified as needed. A safer approach may be to delete categories individually. For more information, see Deleting a Category on page 38.
To delete a taxonomy using the Web Portal
1. Select Governed Data | Taxonomy Manager | Categorizations. 2. Locate the row containing the taxonomy, and click Delete.
3. In the confirmation dialog box, select the I still want to delete this taxonomy check box. 4. Click Delete Taxonomy.
To delete a taxonomy using PowerShell
1. Make sure you know the ID of the taxonomy. For more information, see Finding a Taxonomy or Category ID using PowerShell on page 35.
2. Run the Remove-QTaxonomy cmdlet with the following mandatory parameters: a) ServerAddress (mandatory)
Provide the name of the computer hosting the Data Governance server, and the port. En-ter in the form compuEn-tername:port number. The default port is 8723.
b) TaxonomyID (mandatory)
The ID of the taxonomy you want to delete.
3. Press enter to confirm the deletion.
Working with Categories
The proper configuration of a category is integral to a properly working system. Categories should be created and refined in test mode, and published when they are ready to be used in your production en-vironment. Deployments of categories should be properly managed. See Managing the Life Cycle of Ta-xonomies and Categories on page 60 for more information.
You can work with categories using the following methods: • Web Portal, under the Governed Data node
• Editing the template XML directly (see Working with a Taxonomy XML File on page 60)
Each category has a number of settings, which have an impact on the category’s behavior. In the table below, the parameter in brackets is the PowerShell and XML equivalent of the setting in the Web Portal.
Category Parameters (PowerShell/XML equivalent in brackets) SETTING
Category Risk
(Risk) Indicates the relative risk of the category. This is then used to determine how a resource is classified. For more informa-tion, see Classifying Resources on page 56.
Publish this category
(IsPublished) Makes a category available for manual categorization. You must also enable this for automation to work. Publish a cate-gory only when you are ready for business owners to have access to it.
A subcategory must have a published parent category. If you publish a subcategory, and the parent is unpublished, the action is ignored.
Allow this category to be used by the automated system
(IsAutomaticClassificationEnabled)
You can make a category available to the automated system. Automated categorization is based on the rules associated with the category, so you should associate rules and test the category before automating it.
Automation will not take place until the category is published as well.
Govern using this category (CausesGovernance)
When a category that causes governance is applied to a resource, that resource is placed under governance and can be managed using the Web Portal. Resources under gover-nance can be subject to polices and attestations.
Mutually Exclusive
(IsMutuallyExclusive) The mutually exclusive setting applies to the children of a category. If a category has been defined as mutually exclu-sive, only a single subcategory can be applied to a resource. For example, consider a category in your taxonomy called PHI, which has three subcategories: Level 1, Level 2, Level 3. If PHI is set to mutually exclusive, you can only apply one of the subcategories. To create an entire taxonomy that is mutually exclusive, so that only one category can be assigned from the taxonomy, all parent categories must be set to mutually exclusive. When more than one category could be applied to resource based on the associated rules and category threshold, the category with the highest com-bined rule score is applied.
Strictly Ordered
(IsStrictlyOrdered) Strictly ordered is a special kind of mutual exclusivity, in which the order of the subcategories has meaning. When more than one category could be applied to a resource based on the associated rules and category threshold, the category closest to the parent category will be applied. For example, if your categories are Level 1 and Level 2, in that order, and either category could be applied, in this case Level 1 will be applied.
Creating a Category
The first step is to create the category, giving it a name and description. A category requires a parent, which can either be the top level taxonomy node or any category in the taxonomy. By default, new cat-egories:
• are created in test mode, and are not available for manual or automated categorization. • do not cause governance.
• have a threshold value of one. • are not mutually exclusive.
Each category can only belong to a single taxonomy. If you have created a category in one taxonomy and want to move it to another, see Moving a Category on page 37.
To create a category using the Web Portal
1. Select Governed Data | Taxonomy Manager | Categorizations. 2. Locate the row containing the taxonomy, and click Edit.
-OR-Create a taxonomy as outlined in Creating a Taxonomy on page 28, and click Save. 3. Select the parent category of the new category.
To create a first level category, select the top level taxonomy node, otherwise select the cate-gory under which you want the new catecate-gory to appear.
4. Click Add.
5. Provide a name and optional description for the category. 6. Set the risk value.
7. If you are ready to allow business owners to use this category to manually categorize their re-sources, select the Publish this category check box.
You should not make a new category available for use by the automated system, as this re-quires the association of rules to the category. For more information, see Associating Rules to Categories on page 49.
8. If you want all resources categorized with this category to be governed, select the Govern us-ing this category check box.
9. If you plan to allow only one subcategory of this category to be applied to a resource, select
Threshold The threshold value determines whether a category is
applied. Combined with the weights given to a rule when you associate it, and the match strength of the rule, the thresh-old gives you control over what causes a resource to have a category applied. For a full explanation, see How Rules Affect Categorization on page 39. The default threshold is one. Note: The threshold can only be modified through PowerShell commands.
You should not change these values without fully understanding the implications for your clas-sification system. For more information, see Working with Categories on page 30.
the Mutually Exclusive check box.
10. If you want the order of the categories to influence categorization, select the Strictly Ordered check box.
11. Click Save.
Your new category appears nested under its parent category.
To create a category using PowerShell
1. Make sure you know the ID of the parent category. For more information, see Finding a Taxo-nomy or Category ID using PowerShell on page 35.
To create a first level category, provide the ID of the taxonomy root.
2. Run the Add-QCategory cmdlet with the following mandatory parameters: a) ServerAddress
Provide the name of the computer hosting the Data Governance server, and the port. En-ter in the form compuEn-tername:port number. The default port is 8723.
b) ParentCategoryID
To create a first level category, use the ID of the parent taxonomy, otherwise use the ID of the category under which you want the new category to appear.
c) Name
3. Additionally, you can use the following optional parameters: a) Description
b) Risk
c) CausesGovernance
By default, this is set to $false.
d) IsPublished
You should only set this to $true if you are ready for business owners to use this category for manual categorization.
You should not set IsAutomatedClassificationEnabled to $true, as automated categoriza-tion requires the associacategoriza-tion of rules to the category. For more informacategoriza-tion, see
Associating Rules to Categories on page 49.
e) IsMutuallyExclusive
Set this to $true if you plan to allow only one subcategory of this category to be applied to a resource.
f) IsStrictlyOrdered
Set this to $true if you plan to allow only one subcategory of this category to be applied to a resource, and you want this based on the order of the categories.
g) Threshold
The default value is one. You should not change this unless you have already determined the rules you plan to associate and the weights for these rules. For more information, see
How Rules Affect Categorization on page 39.
Note: The threshold can only be modified through PowerShell commands.
How Categories Work Together: Mutual Exclusivity, Strict Ordering
and Inheritance
• If there are no categories set to mutually exclusive, all potential categories will be applied • If all potential categorizations share the same parent, and that parent is mutually exclusive,
only one potential category will be applied. The category that will “win” is the one with the high-est combined rule score. This means that if you are planning a mutually exclusive taxonomy or branch of a taxonomy, it is important that you ensure the relative weighting of all subcategories makes sense for your needs. For more information, see How Rules Affect Categorization on page 39 and Advanced Rule Applications on page 64
• If some of the potential categorizations share the same parent and others do not, more than one category from the taxonomy may be applied, but the mutual exclusive settings of each branch will be respected. For example, the system may have to assign only one of the potential categories within a mutually exclusive branch, but be able to assign other categories within the taxonomy where mutual exclusivity has not been set.
• If strictly ordered has been applied to a parent category, only one potential subcategory can be applied. In this case, however, instead of using combined rule scores to determine what cat-egory, the category closest to the parent on the tree will “win”.
The following diagram shows the difference between mutually exclusive and strictly ordered:
Inheritance does not directly affect categorization, however it is important to understand how inherited categories are applied when evaluating why particular categorizations were made. Inheritance refers to categorizations that are inherited from a parent container. For example, a folder can be manually cate-gorized, and all documents in the folder inherit that categorization. The following can occur with inher-ited categories:
• If no other categorization exists on a resource, the inherited category remains, and is identified as such in all views
• If a resource is subsequently manually or automatically categorized, and only one potential cat-egory can be applied due to mutual exclusivity settings, the new catcat-egory overrides the inher-ited one, and the inherinher-ited category is no longer associated with the resource
• If a resource is subsequently manually or automatically categorized, and both the new and in-herited categories can exist without violating any mutual exclusivity settings, both are associ-ated with the resource
• Inheritance only works on resources that are regularly scanned for classification. If you man-ually categorize a container that you own that is not scanned, child resources do not inherit the categorization unless they are scanned at some point.
• If categorizations are removed from a resource for any reason, and there is a potential
ited category, the inherited category is reapplied to the resource.
Finding a Taxonomy or Category ID using PowerShell
Many of the PowerShell cmdlets you can use to manipulate your taxonomies require that you know the ID of a category. You must know the parent taxonomy of the category.
To determine a taxonomy ID using PowerShell
• To determine the taxonomy ID, run the Get-QTaxonomies cmdlet and provide the ServerAddress.
Provide the name of the computer hosting the Data Governance server, and the port. Enter in the form computername:port number. The default port is 8723.
You can also use the Get-TaxonomyByName cmdlet to retrieve the taxonomy ID if you know the taxomomy name.
To determine a category ID using PowerShell
1. Determine the ID of the parent taxonomy.
The taxonomy ID has the same properties as a category. You can use a category ID and a tax-onomy ID interchangeably in the cmdlets. For example, you can manage the properties of the taxonomy root when it is used as a category, or use the taxonomy root as a parent category when adding new categories to a taxonomy.
2. Run the Get-QTaxonomyTree cmdlet with the following parameters: a) ServerAddress
Provide the name of the computer hosting the Data Governance server, and the port. En-ter in the form compuEn-tername:port number. The default port is 8723.
b) TaxonomyID.
3. Locate the desired category and note or copy the ID.
Editing a Category
When you edit a category, some changes can have significant impact on the way resources are catego-rized. The following table outlines the impact of your changes on categorization:
Category Parameters (PowerShell/XML equivalent in brackets)
SETTING IMPLICATIONS
Risk (Risk)