• No results found

ScanR TM. Evaluation and reference guide. Page 1 ScanR evaluation and reference guide version

N/A
N/A
Protected

Academic year: 2021

Share "ScanR TM. Evaluation and reference guide. Page 1 ScanR evaluation and reference guide version"

Copied!
42
0
0

Loading.... (view fulltext now)

Full text

(1)

Page | 1 ScanR evaluation and reference guide version 1.0.0.3

ScanR

TM

(2)

Page | 2 Contents INTRODUCTION ... 4 INSTALLATION ... 5 HARDWARE REQUIREMENTS ... 5 SOFTWARE PREREQUISITES ... 5 ACCOUNT REQUIREMENTS ... 6

WINDOWS SERVER INSTALLATION ... 6

WINDOWS CLIENT INSTALLATION ... 6

ADDING OR UPDATING A LICENSE... 8

RUNNING A SCAN OF A FILE SHARE ... 9

CREATE A NEW CONFIGURATION ... 9

CHOOSE THE RULES FOR THE CONFIGURATION... 10

UNDERSTANDING THE TYPES OF RULES AND HOW THEY WORK ... 11

MATCH A TERM RULE... 11

MATCH A PATTERN RULE ... 13

MATCH AND TERM AND PATTERN WITHIN A PROXIMITY ... 14

NATURAL LANGUAGE PROCESSING (ARTIFICIAL INTELLIGENCE) RULES... 15

RUNNING A SCAN ... 16

EXAMINING THE RESULTS ... 16

EXPORTING THE RESULTS ... 17

RUNNING AN INCREMENTAL SCAN ... 17

RESETTING SCAN... 19

ADDING A NEW RULE ... 19

UNDERSTANDING RULE WEIGHTS ... 20

ADDING RULE EXCLUSIONS ... 22

EXCLUDING A DOMAIN FROM THE EMAIL RULE ... 22

CHANGING THE PROXIMITY OF A RULE ... 23

SCANR OPTIONS ... 27

FILE SIZE THRESHOLD ... 27

FILE CHARACTERS THRESHOLD ... 27

OCR LANGUAGE ... 27

OCR IMAGE FILES ... 28

WORD BREAKERS ... 28

SCANNING SHAREPOINT CONTENT ... 29

PERFORMANCE OPTIMIZATION CONSIDERATIONS ... 31

ENVIRONMENT... 31

CONFIGURATIONS ... 31

RULES ... 31

OPTIONS ... 31

REPORTING ... 32

SUBJECT ACCESS REQUEST (SAR) REPORTING ... 32

TO LOG AN INCOMING SAR... 33

TO CREATE A SAR SEARCH ... 34

RUNNING A SAR REPORT ... 35

(3)

Page | 3

APPENDICES ... 37

APPENDIX A – SUPPORTED FILE TYPES... 37

APPENDIX B – SUPPORTED DOCUMENT SYSTEMS ... 37

APPENDIX C – RULES ... 38

(4)

Page | 4

Introduction

Welcome to the evaluation and reference guide for ScanR. ScanR is an application that identifies sensitive

information inside electronic documents. It connects to document repositories, extracts the text from each file and passes the text through a set of rules. Once a scan is completed the results can be examined within the application, exported to a file, or an analysis tool can connect directly to the database containing the results.

This document will show you the features and usage of ScanR using a sample set of documents and provides reference information for future use.

For help or technical questions please contact [email protected]

(5)

Page | 5

Installation

ScanR is installed on a local machine in your environment, it does not require a connection to the internet and does not send any information outside of your organisation. ScanR stores configuration and reporting data in a local instance of SQL Compact Edition. The software can be installed on any modern operating system (including

Windows 10) for evaluation purposes. For a production installation the following minimum hardware requirements are recommended running Windows Server 2008 R2 or later.

Hardware requirements

Deployment type and scale RAM Processor Hard disk space

Up to one million documents stored on File Shares or SharePoint.

8 GB 64-bit, 4 cores

80 GB for system drive

Over one million documents stored on File Shares or SharePoint. 16 GB 64-bit, 4 cores 200 GB for system drive

Software prerequisites

The following software packages are required prior to the installation of ScanR.

Package Download location Size

.Net 4.5.2 or later https://www.microsoft.com/en-gb/download/details.aspx?id=42642

66.8 MB

SQL CE 4.0 x64 https://www.microsoft.com/en-gb/download/details.aspx?id=17876

(6)

Page | 6

Account requirements

For audit and security purposes it is recommended that the ScanR software uses a dedicated account.

Scenario Requirement

Installation Local administrator privileges are required to install the software.

File Share Scanning Read access to all content that is required to be scanned.

SharePoint Scanning Read access to all content that is required to be scanned.

The software connects to SharePoint using the client-side object model (CSOM) and does not require any software to be installed on the SharePoint farm or tenant. SharePoint online (Office 365) and Server 2010, 2013 and 2016 are supported.

Windows Server Installation

The software is provided as a single MSI file. Installation steps: • Ensure the minimum hardware requirements are met • Ensure the prerequisite software packages are installed

• Using an account with local administration rights unzip and run the provided ScanR.msi file.

A shortcut to the application will added to the desktop

Windows Client installation

The application is designed to run on a server but can be installed on a Windows client machine for evaluation purposes. There are two issues to be aware of when testing:

You need to manually run the application as the administrator • Ensure the prerequisite software packages are installed • Unzip and run the ScanR.msi file

• Open file explorer and browse to ..\Program Files (x86)\TermSet\ScanR • Right click on ScanR.exe and select Run as Administrator

(7)

Page | 7

You may not be able to see mapped drives without changing a setting

There is a known issue with some Windows clients where programs are not able to access mapped drives by default (https://support.microsoft.com/kn-in/help/3035277/mapped-drives-are-not-available-from-an-elevated-prompt-when-uac-is-co)

You can ensure mapped drives are visible in ScanR by taking the following steps: • Run Registry Editor (regedit.exe), locate the following key:

• HKEY_LOCAL_MACHINE/SOFTWARE/Microsoft/Windows/CurrentVersion/Policies/System • Create a new DWORD entry with the name EnableLinkedConnections and value 1

(8)

Page | 8

Adding or updating a license

If you are evaluating the software, then a trial license will already be installed. To run the software in production you will need to add the license key issued to you. There are two types of ScanR license:

Type Duration Volume of content * Restrictions

Trial 7 days Unlimited 1 in 3 files will be randomly

skipped from the scanning process

Standard 1, 2 or 3 years 1TB – 10000TB depending on

licence

None

* The volume of content is the total size of all the content that has been Scanned by the application. You can run unlimited scans of the same content source and only new or modified files since the last scan would be added to the total.

You can verify your license type and usage by selecting File -> License from the top menu bar of the application. To add a production license, select File -> License from the top menu bar of the application and click Upgrade or Register then:

(1) Enter your Serial number (2) Click Activate by Entering a code (3) Click copy to clipboard

(4) E-mail the details to [email protected] (5) You will receive your activation code

The activation code will bind your installation to the machine you installed it on. If you need to move the licence to a different machine e-mail [email protected]

When running a production license, you will be notified when over 80% of the duration of the license or the volume of content is used.

(9)

Page | 9

Running a scan of a file share

In this section of the guide we shall detail all aspects of scanning files stored on a file share.

Important! Please don’t skip the following exercises - we know how tempting it is to jump in and scan your own documents. These steps will give you a full understanding of the using ScanR.

Download the set of sample documents that we shall use for the rest of this guide. To download the documents: (1) Download from http://bit.ly/2BNseZB

(2) unzip the documents to a folder called SampleDocuments.

Create a new configuration

A configuration tells ScanR the location of the files you wish to Scan. Here are the steps: (1) On the application home page click the Scan icon

(2) Click on the + Add Config button.

(10)

Page | 10

(3) Define the configuration

• Give your configuration a name • Leave the platform as Fileshare

• Click Browse and find the SampleDocuments folder where you unzipped the sample files • Click Save

• Click Cancel

Note: If the files you wish to scan are located on a network drive you will need to map a local drive before ScanR can browse to the location or you can enter a UNC path

Choose the rules for the configuration

Creating the configuration defines where the files to be scanned are located, we now need to add some rules to define what information we should look for within the text of the files.

Here are the steps to add a rule:

(1) Click on the edit icon (the pencil icon on the configurations screen) of the configuration you defined (2) Click on Edit Rules

(11)

Page | 11

On the left-hand panel are all the available rules and on the right-hand side are the rules selected for the configuration. For a full description of each rule please see appendix C.

(3) Add the following rules to the configuration by clicking the rule on the left panel and then clicking the >> Button: • BLOODGROUPS • BANKACCOUNTS • SORTCODES • CREDITCARDNUMBERS • CREDITCARDCCV • CREDITCARDEXPIRE • EMAIL • PERSON • ORGANIZATION (4) Click Save (5) Click Update

Understanding the types of rules and how they work

ScanR has four types of rules to help you identify sensitive information within a document: • Match a term

• Match a pattern

• Match and term and pattern within a proximity

• Natural language processing (artificial intelligence) rules

Match a term rule

Rules that look for a specific word or phrase are the simplest type of rule. The rule has a list of words or phrases to look for and if they are found within the document then the rule is considered true. When a rule is considered true, each of the matches will be added to the report.

The rule BLOODTYPE is an example of a rule that matches a list of terms. To examine the rule: (1) Edit the configuration

(2) Click Edit Rules

(12)

Page | 12

Figure 3. A term matching rule

We can see the list of terms to be matched and it is straightforward to add or remove terms to the list.

Note in the Patterns section the value -ANY- is specified, this indicates that the rule is only looking to match from the specified list of terms.

Note: Terms are not case sensitive. You can add and remove terns from the list as required, including multiple language terms and synonyms.

(13)

Page | 13

Match a pattern rule

Rules that match a pattern within the text of a document are a highly flexible method for discovering sensitive information that is in a structured format (for example passport or driving license numbers). By describing the format of the matching content, the rule is true for any information that matches the pattern. When a rule is considered true, each of the matches will be added to the report.

Note: ScanR uses regular expressions to define patterns. Although they appear initially quite cryptic there or lots of useful tools and tutorials to help. We recommend using http://www.ultrapico.com/expresso.htm which is a free tool to build and test your own regular expressions

The rule EMAIL rule is an example of a rule that matches a list of terms. To examine the rule: (1) Edit the configuration

(2) Click Edit Rules

(3) In the right-hand panel click on the EMAIL rule

Figure 4. A pattern matching rule

We can see this time there is a value defined in the pattern section (this is a regular expression that matches all valid e-mail addresses). Note in the Terms section the value -ANY- is specified, this indicates that the rule is only looking to match from the specified pattern.

(14)

Page | 14

Match and term and pattern within a proximity

You may have guessed already that this type of rule is a combination of the previous two rules, that is a rule that requires both a term and a pattern to be matched for the rule to be true. We also introduce another concept, proximity, into this type of rule. Proximity defines how near the term and the pattern have be to each other. We will explore this in more detail a little later.

The rule BANKACCOUNT is an example of a rule that requires both a term and pattern to match for the rule to be true. To examine the rule:

(1) Edit the configuration (2) Click Edit Rules

(3) In the right-hand panel click on the BANKACCOUNT rule

Figure 5. A term and pattern matching rule with proximity

We can see this time there is a value defined in the pattern section (this is a regular expression that matches all valid bank account numbers) and a list of terms that also need to be within the text of the documents.

In the example of the BANKACCOUNT rule, the pattern will match any nine-digit number (there is no further validation the pattern can do as bank account numbers are simply nine-digit numbers. To reduce the number of false-positive matches we add terms to provide further clues that is a bank account number.

(15)

Page | 15

By adding proximity (in this case 200 characters) we further reduce the amount of false-positive results by specifying that the term and the pattern need to be within 200 characters of each other.

Natural language processing (artificial intelligence) rules

ScanR includes three special rules that do not use patterns or terms to discover information. Instead, these rules use a form of artificial intelligence known as natural language processing.

Rue name Description Example matches

PERSON Matches the names of people Bill Gates Dr J. Smith ORGANIZATION Matches the names of companies and

organizations

Microsoft Limited

World health Organisation LOCATION Matches location type data such as towns,

cities and countries

Seattle

United Kingdom

Natural language processing uses a machine learning model that has been trained to recognize entities, in our case people, organizations and locations. Whilst the results will produce some false positives these three rules

effectively discover many potentially sensitive types of information that would be very difficult to discover using other methods

(16)

Page | 16

Running a scan

We shall now run the scan of the SampleDocuments folder using the configuration and rules that you selected. (1) Simply click the run icon on the main configuration screen.

Examining the results

Once the scan has completed click the Exit button and then from the main configurations screen click on Report. You will see the results as follows:

Figure 6. The results grid

(1) Find a document where the score is not zero and double click on the row (2) You will see all the matching rules for the rule

Note: If you are using a trial version 1 in 3 files are randomly skipped. You can reset the scan if you wish to see a different set of results

(17)

Page | 17

A description of the columns in the results are as follows:

Column Description

Status icon A green tick indicates the file has been read successfully. Other icons indicate that a file has been skipped as it is unchanged since the last scan or that the file encountered an error Folder path The full path of the folder

File name The file name

Message The details of the scanning process for each file

Score The total score of the file. The score is the sum of each rule that was true for each document Duration The total time it took to process the file

You can access the results at any time by clicking on the Report icon from the main configurations screen.

Clicking on the column headers allows you to order the results by any value you wish.

Clicking on the View Diagnostics link will show details of any files that ScanR was unable to process.

Exporting the results

Clicking the export button from the results grid will export the file to a .CSV format. This can be loaded directly into Excel or a text editor for analysis. You can also export the results from the main configuration screen by clicking the Export icon.

Note: If the scan was against a large volume of data with a large number of rules then the result sets can be very large, and exporting may take a significant amount of time to complete

Running an incremental scan

When re-running a scan of files, ScanR will only re-scan new files or files that have been modified since the last Scan. To test this functionality, follow these steps:

(1) Open one of the documents in the SampleDocuments folder and add some text then save and close the document

(18)

Page | 18

You will notice the scanning runs very quickly as only the modified file will be re-read and when completed only one file has been reported as read.

(3) Exit and click on the Report button

(19)

Page | 19

Resetting scan

Resetting scan means that the previous scan is removed, and all files are read.

• Click the Reset on the main configurations screen • Once completed click Report icon to view the results You will notice that all the files were re-read.

Important: Anytime you update the rules in a configuration you will need to run a full scan for the files to be re-read taking the new changes into account. As the previous scan is deleted then the reset scan will count against your licence usage.

Adding a new rule

ScanR allows you to add your own rules to your configurations. In this exercise we will define a rule to identify an employee number. In our example an employee number starts with either “A” or “B” followed by six digits. An example employee number would be A123456.

(1) Create a new document called EmployeeNumber in the Sampledocuments folder (2) Add the following text to the document (you can paste from the below)

This is a sample document for our EMPLOYEENUMBER rule. Employee number: A123456 should match

Employee number: C123456 should not match

(3) Save and close the document (4) Edit the configuration (5) Edit the rules

(6) Click New

(7) Set the rule name EMPLOYEENUMBER (8) Set weight to 10

(9) For the pattern enter [AB]\d{6} (10) Click the Add Term button

(11) Click Add Term and add the term Employee (12) Leave all other settings and click Save

(13) Add the rule by highlighting EMPLYOENUMBER and clicking the >> button (14) Click Save

(20)

Page | 20

Note: Although this guide does not cover creating your own regular expressions let’s explain the pattern you entered in step 9. Putting characters between two square brackets means Match any of these characters, so [AB] means either a capital A or B will match. \d means match a digit, and the {6} means match it six times. So, the whole pattern with match any text that starts with an A or B followed by six digits.

To check your new rule has worked: (1) Run the scan

(2) Examine the results (notice only the new document was scanned)

(3) Click on the EMLOYEENUMBER row and you should see your matched value

Figure 7. Matching employee number from our custom rule

Understanding rule weights

Each rule has an associated weight which is an integer that allows you to create an overall score of the sensitive data contained within a document. When a rule is true against a document the rule weight is added to the overall score of the document. The weight is only added once regardless of how many times the rule matches within a document.

Tip: Although it is optional, if you’re the total of the weights of all rules within a configuration equals 100 then it will be easy to report on the overall scores as they will range from zero to one hundred.

(21)
(22)

Page | 22

Adding rule exclusions

An exclusion to a rule is where the conditions of a rule match but you wish to exclude a value from the results. In other words, they are the exceptions to a rule. There are two common uses for exclusions:

• You wish to exclude data that relates to your own organization (for example your postcode)

• The artificial intelligence has incorrectly identified a PERSON, LOCATION or ORGANIZATION and you wish to exclude it from the results

To test using a rule exclusion, do the following: (1) Edit the configuration

(2) Edit the rules (3) Edit the EMAIL rule (4) Click Add Exclusion (5) Enter [email protected] (6) Click OK

(7) Click Update (8) Reset the scan

If you now open the results and click on the results row for the EMAIL document, you will see that the [email protected] is no longer returned as a match despite its meeting the criteria of the rule.

Excluding a domain from the EMAIL rule

A common scenario is that an organisation will like to match all e-mail addresses in a document but excluding their own domain. You can achieve this by editing the EMAIL rule.

(1) Edit the configuration (2) Edit the rules

(3) Edit the EMAIL rule

The pattern below is the regular expression that captures all email addressed \w+@(?!excluded.com)(\w+\.\w+(\.\w+)?)

(4) Replace excluded\.com with the domain that you wish to be excluded from the scanning results for example contoso\.com or yourdomain.\co\.uk

(23)

Page | 23

Changing the proximity of a rule

In the sample document set there are two documents for testing proximity using the CREDITCARDCCV rule. If we edit the configuration rules and select the CREDCARDCVV rule, we can see the following settings

Figure 9. CREDITCARDCVV rule has a default proximity of 50

Notice that currently, the proximity value is set to 50. This means that for the rule to be true then pattern and term must be within 50 characters of each other. In this case it means that one of terms “CCV”,”CVC” or “CCV” needs to be found within 50 characters of a three digit number.

(24)

Page | 24

In our SampleDocuments folder we have two files that test the CREDITCARDCVV rule. The first file, _CCVPROXTESTPASS contains the following text.

Figure 10. The contents of the _CCVPROXPASS file

In this case, as the terms CCV, CVC and CVV are all within the 50 characters of the pattern matches 123,321 and 987, they should all match.

The second file, _CCVPROXTESFAIL contains the following text.

Figure 11. The contents of the _CCVPROXFAIL file

With a default proximity setting of 50 characters the rule will be false as term CVC is 400 characters away from the matching pattern 123.

Testing the default proximity rules

Step 1. Reset the scan and re-run the configuration

Step 2. Click the Report Icon and find the results for the two _CCVPROXTEST rules

(25)

Page | 25

You will notice the first document CCVPROXTESTFAIL has a score of 0 meaning that no matches found whereas CCVPROXIMITYTESTPASS has a score of 10 meaning the rule was true.

Step3. Click on the second result to see the matching entries for the rule (note that each valid CCV number is separated by a # character).

Figure 12. The matching CVV values Changing the proximity of a rule

In this exercise we will change the proximity of the CREDITCARDCCV rule so that the file will now score the rule as true.

Step 1. Edit the configuration rule for the CREDITCARDCVV Step 2. Change the proximity value to 500 using the slider Step 3. Reset the scan and re-run the configuration

Step 4. Click the Report Icon and find the results for the two _CCVPROXTEST rules Step 5. Click on each filename and notice that both results now match the rule.

(26)

Page | 26

(27)

Page | 27

ScanR options

You can open the ScanR options screen from File -> Options on the top menu.

Figure 14. The ScanR options screen

File Size Threshold

Specifies the maximum size of files to be scanned. Any files over the specified limit will not be read and will be logged as “Slipped due to exceeding maximum file size”. The default maximum file size is 2MB. Please refer to Performance optimization considerations for more details on this option.

File Characters Threshold

Specifies the maximum amount of characters to extract from the file. The default value is 500,000. Please refer to Performance optimization considerations for more details on this option.

OCR Language

(28)

Page | 28

OCR Image files

Selecting this switch to ON will result in image files (BMP, TIF, TIFF, JPG) being passed through an optical character recognition method to attempt to extract text from the image. This feature is useful for extracting text from areas where scanned documents are stored in image formats. By default, this feature is not selected as there is a performance overhead in processing image files.

Word breakers

Word breakers are characters that define the start or end of words in a block of text. Although most words are separated by spaces there are occasions where other characters can denote the start or end of a word. Both terms and patterns use the word breakers maximize the accuracy of matches.

For example, consider the following text:

Credit card number:1234-1234-1234-1234.

Both the colon and full stop are defined as valid work breaking characters and so we would match the credit card pattern in the above example.

(29)

Page | 29

Scanning SharePoint content

Scanning documents stored in SharePoint online or a SharePoint server is very similar to file share scanning. The only differences are in the definition of the configuration.

(1) Click on New Configuration from the main screen (2) In the platform select 365 or 2010, 2013 or 2016.

(30)

Page | 30

The following fields are available:

Column Description

Config Name The name for the configuration

URL Any valid URL for a site collection, site or subsite

Include subsites (checkbox) The Scan will iterate through all subsites and libraries starting for the top URL suppled.

When this option is selected you cannot specify a library name. Username For SharePoint online, a username in the format

[email protected]

For SharePoint servers the username can be in the format

[email protected] or domain\username

Password A valid user password

(31)

Page | 31

Performance optimization considerations

When scanning hundreds of thousands or several million files there are several considerations to improve

performance and balance the speed and accuracy of the results. The overall speed of the scanning will depend on many elements including:

• The hardware and network

• The type and size of the documents • The type and number of rules you run

Environment

• More memory and processors will speed up the overall scanning time.

• Faster disk I/O and LAN or Internet speeds will speed up the overall scanning time

Configurations

Breaking large document repositories into many configurations is best practice. For example, if you have a file share with one million documents, consider creating several configurations, perhaps by department or folder structure. This affords several significant advantages:

• Each configuration will complete quicker and are less likely to fail due to external influences (network outages, power outages, maintenance etc.)

• You can run incremental scans quicker and at different times and frequencies • Each configuration can have different rules

• The results are faster and easier to work

• The results can be securely shared with the correct teams

• BI \ Analysis tools will be more effective with subsets of data that can be aggregated

Rules

The more rules added to a configuration, the longer it will take to process each document. Some guidelines are: • Only add rules for information you need to discover (adding all of the rules to see what we can find will

result in slow processing times and a lot of noise in the results)

• Test the rules on a small set of documents until you are confident they are working as desired

• When creating your own rules make them as specific as possible and wherever possible include both terms and patterns

Options

Using the Maximum file size option to exclude very large files will reduce the chance of 1% of your files taking up 90% of the time it takes to complete the scan.

(32)

Page | 32

Reporting

The reporting section of the application allows you to produce reports with a full audit trail. To access the reporting from the scanning screens:

(1) Click the HOME PAGE button (on the bottom of the screen) (2) Click the REPORTS Icon

Subject Access request (SAR) reporting

This section of the guide describes how ScanR can assist in responding to Subject Access requests (SARs). The ICO define a subject access request as follows:

This right, commonly referred to as subject access, is created by section 7 of the Data Protection Act. It is most often used by individuals who want to see a copy of the information an organisation holds about them. However, the right of access goes further than this, and an individual who makes a written request and pays a fee is entitled to be:

• told whether any personal data is being processed;

• given a description of the personal data, the reasons it is being processed, and whether it will be given to any other organisations or people;

• given a copy of the information comprising the data; and given details of the source of the data (where this is available).

Source: https://ico.org.uk/for-organisations/guide-to-data-protection/principle-6-rights/subject-access-request/

(33)

Page | 33

To log an incoming SAR

(1) Click on the Add SAR button

Figure 16. Add a new Subject Access request

(2) Give the SAR a unique name

(3) Record the date the SAR was received

(4) If you know the data usage and third party sharing details you can enter them now, or alternatively you can run the SAR search and review the data before completing these details

(5) Click Update

(34)

Page | 34

To create a SAR search

From the main SAR Dashboard:

(1) Click Edit

(2) Select which configurations (scanning sources) that you would like to search from – clicking the very top selection box will select all configurations

(3) Click Add Search Criteria

Figure 17. Create search criteria for a SAR

The Add Search Criteria screen allows you to define the parameters of the SAR. Typically, it would begin with the name (and perhaps alternate ways the name may be stored) being used for the search. Although the default is set to PERSON, any rule could be chosen for searching.

The search criteria can be combined with AND / OR statements to build more complex queries. In the example shown in figure 17, we are looking for documents that contain a PERSON named ‘GARETH MOON’ or a PERSON named ‘G MOON’ that also has a POSTCODE of ‘WR14 1NA’

(35)

Page | 35

Running a SAR report

From the main SAR dashboard:

(1) Click the Run Icon

Figure 18. Create search criteria for a SAR

The SAR report will load a list of all documents that match the search criteria. Clicking on the file name allows you to view the source file. Clicking on the details shows you the data discovered during the scan.

(36)

Page | 36

Exporting the SAR report

Once you are ready to export the SAR report, you have two options:

• Export to Excel – this report is designed for internal use (for example to review the documents identified) and contains a list of the files and links to each

• Export to Word – this report is designed to be sent to the person who requested the information (1) From the Run SAR screen, select the export format and click EXPORT.

If you wish to customise the report template you can edit the following document ...\Program files (x86)\TermSet\ScanR\SAR Template.docx

(37)

Page | 37

Appendices

Appendix A – supported file types

Extension Description

BMP Image file (OCR)

CSV Comma separated format

DOC \ DOCX Word Files

HTML \ HTM Web pages

JPG Image file (OCR)

MSG Outlook message file (including imbedded attachments)

PDF PDF files (all versions)

PPT \ PPTX PowerPoint files

TXT Text files

TIFF \ TIF Image file (OCR)

XLS \ XLSX Excel files

XML XML files

Appendix B – supported document systems

System

File Shares

SharePoint server 2010 SharePoint server 2013 SharePoint server 2016 SharePoint Online (Office 365)

(38)

Page | 38

Appendix C – Rules

Rule Description

AUSTRAILIANMEDICAL Australia Medical Account Number AUSTRAILIAPASSPORT Australia Passport Number

AUSTRIALTAX Australia Tax File Number

AUSTRIAVAT Austria VAT number

BANKACCOUNT Banking account numbers nine digits

BELGIUMVAT Belgium VAT number

BELGUIMNI Belgium National Identify Number

BLOOD GROUPS Blood groups

BRAZILIDCARDNEW Brazil National ID Card (RG) New Format BRAZILIDCARDOLD Brazil National ID Card (RG) Old Format BRAZILLEGALENTITY Brazil Legal Entity Number (CNPJ)

BUGARIAVAT Bulgaria VAT Number

CANADAPASSPORT Canada Passport Number

CANADASOCIALINS Canada Social Insurance Number

CORATIAPERSONALID Croatia Personal Identification (OIB) Number

CREDITCARD Credit Card numbers

CREDITCARDCVV CVV number for credit cards CREDITEXPIREDATE Expire date for credit cards

CROATIAID Croatia Identity Card Number

CROATIOVAT Croatian VAT Number

CYPRUSVAT Cyprus VAT Number

CZECHID Czech National Identity Card Number

DATEOFBIRTH Date of Birth

DEANUMBER Drug Enforcement Agency (DEA) Number

DENMARKVAT Denmark VAT Number

EMAIL Email Addresses

ETHNICITY Ethnic groups

FINLANDNATID Finland National ID

FINLANDPASSPORT Finland Passport Number

FINLANDVAT Finland VAT Number

FRANCEINSEE France Social Security Number (INSEE)

FRANCEVAT France VAT Number

FRENCHPASSPORT French Passport

GERMANDRIVINGLICIENCE German Driving licence number

GERMANID German Identity Card Number since November 2010

GERMANPASSPORT German Passport number

GERMANVAT German VAT Number

GREECENATIONALID Greece National ID Card (Old) GREECENATIONALIDNEW Greece National ID Card (New)

GREECEVAT Greece VAT Number

HONGKONGID Hong Kong Identity Card (HKID) Number

IBAN International Banking Account Number (IBAN)

(39)

Page | 39

INDIAUNIQUE India Unique Identification (Aadhaar) Number

INDOID Indonesia Identity Card (KTP) Number

IPADDRESS IP Addresses

IRELANDPNEW Ireland Personal Public Service (PPS) Number (New) IRELANDPSOLD Ireland Personal Public Service (PPS) Number (Old)

IRELANDVAT Ireland VAT number

ISRAELNATID Israel National ID

ISREALBANKACCOUNT Israel Bank Account Number ITALYDRIVINGLICENCE Italy Driver's License Number

ITALYVAT Italy VAT number

JANPANDRIVING Japan Driving Licence Number

JAPANSIN Japan Social Insurance Number (SIN)

LOCATION Address and location information

LUXVAT Luxembourg VAT number

NETHBSN Netherlands Citizen's Service (BSN) Number

NETHVAT Netherlands VAT Number

NORWAYIDNUMBER Norwegian citizen ID number

ORGANIZATION Company and organisation names

PERSON Names of people

PHILIPID Philippines Unified Multi-Purpose ID Number

POLANDID Poland Identity Card

POLANDPASSPORT Poland Passport

PORTUGALCITZ Portugal Citizen Card Number

PORTUGALVAT Portugal

SAFRICAID South Africa Identification Number

SAUDIID Saudi Arabia National ID

SEXUALORIENTATION Sexual Orientation descriptions

SINGANRIC Singapore National Registration Identity Card (NRIC) Number

SKORREARN South Korea Resident Registration Number

SORTCODE Banking sort codes in the format NN-NN-NN"

SPAINSSN Spain Social Security Number (SSN)

SPAINVAT Spanish VAT Number

SWEDENID Sweden National ID

SWEDENPASS Sweden Passport Number

SWEDENVAT Sweden VAT number

TAWAINARC Taiwan Resident Certificate (ARC/TARC) Number

TAWAINNATID Taiwan National ID

UKCARREGISTRAIONPOST2001 UK Vehicle registrations from 2001 onwards

UKCELLPHONE UK Cell phone number

UKCHILDBENEFITREFERENCE UK Child Benefit number UKDRIVINGLICENCE UK Driving licence number

UKELECROLL UK Electoral Roll Number

UKGOVSEC UK government security classifications

UKNATIONALINUSRANCENUMBER UK National Insurance number

UKNHS UK National Health Service Number

(40)

Page | 40

UKVAT UK VAT Number

UKVIN UK Vehicle Identification Number

USSSN US Social Security Number (SSN)

USTIN US Individual Taxpayer Identification Number (ITIN)

USVIN US Vehicle identification

UKSTREETADDRESS UK street addresses

USTELEPHONE US Telephone numbers

USSTATEANDZIP US Abbreviated State name and ZIP code

(41)

Page | 41

Appendix D – Regular expression definitions

Symbol Meaning

c Match the literal character c once, unless it is one of the special characters. ^ Match the beginning of a line.

. Match any character that isn't a new line. $ Match the end of a line.

| Logical OR between expressions. () Group sub-expressions.

[] Define a character class.

* Match the preceding expression zero or more times. + Match the preceding expression one or more times. ? Match the preceding expression zero or one time. {n} Match the preceding expression n times.

{n,} Match the preceding expression at least n times.

{n, m} Match the preceding expression at least n times and at most m times. \d Match a digit.

\D Match a character that is not a digit.

\w Match an alpha character, including the underscore. \W Match a character that is not an alpha character. \s Match a whitespace character (any of \t, \n, \r, or \f). \S Match a non-whitespace character.

(42)

Page | 42

\n New line. \r Carriage return. \f Form feed.

References

Related documents

Columbia Threadneedle offers several ways to purchase mutual fund shares to add to your account or to redeem your shares including online, by phone, in writing or through a

The presentation should be based on the revised project description, include a project outline and highlight their expected artistic outcome, what they – at this stage – would see

We have achieved near diffraction-limited retina images using the dual deformable mirrors to correct large aberrations up to ±3D of defocus and ±3D of cylindrical aberrations with

In the company information, under page settings, the events payment page, there will be a dropdown box, select the page you setup as Payment.. Next go to the

structured cabling system, the channel contains all of the elements of the permanent link, plus the equipment cords at the horizontal cross-connect (HC) and the patch cords in

Recent findings have demonstrated that increased acid excretion [ 18 ], dysregulation of calcium homeostasis [ 19 ] and a decrease in urinary citrate [ 20 – 22 ] are recognizable

Frontline health workers • In-depth interviews covering perceptions about the model and capabilities of Community Service Providers and Community Resource Persons, acceptance of

Hideo Yabuki , Master Mariner, Tokyo University of Marine Science and Technology, Japan Prof. Homayoun Yousefi , MNI, Chabahar Maritime