• No results found

Infobright Community Edition-user Guide

N/A
N/A
Protected

Academic year: 2021

Share "Infobright Community Edition-user Guide"

Copied!
90
0
0

Loading.... (view fulltext now)

Full text

(1)

Infobright Community

Edition 4.0.6 GA

USER GUIDE

(2)

COPYRIGHT NOTICE

The materials provided herein are Copyright © 2005-2012 Infobright Inc. All rights reserved.

CONFIDENTIAL: The information contained in this document is the property of Infobright Inc. Except as specifically authorized in writing by Infobright, the holder of this document shall keep the information contained herein confidential and shall protect same in whole or in part from disclosure or dissemination to third parties.

If these materials were purchased as a digital download, Infobright hereby grants the purchaser permission to reproduce a single copy (print or download) of the materials without prior written permission.

If these materials were purchased in printed form, no part of these materials shall be reproduced or retransmitted by any means, electronic, mechanical, photocopying, recording, or otherwise without written permission from Infobright.

(3)

Contents

1. About Infobright ... 1

Infobright Overview ... 1

Infobright and MySQL ... 1

2. Setting up Infobright ... 3

Technical Requirements ... 3

Linux for Infobright ... 4

Installation ... 4

Installing Infobright... 4

Windows Installation ... 4

Linux Installation ... 5

Upgrade ... 8

Windows Upgrade Instructions ... 8

Updating Table Structures (Versions Prior to ICE 3.3.2 Only) ... 9

Linux RPM or DPKG Upgrade ... 9

Updating Table Structures (Versions Prior to ICE 3.3.2 Only) ... 10

Linux TAR Upgrade ... 11

Updating Table Structures (Versions Prior to ICE 3.3.2 Only) ... 12

Configuration ... 12

Configuring Infobright ... 12

Configuration Tips and Examples ... 14

3. Using Infobright ... 16

Starting and Stopping the Infobright Server ... 16

Windows ... 16

Linux ... 16

Working with the Infobright Server ... 16

Windows ... 17

Linux ... 17

Checking the Infobright Version ... 17

Infobright is the Default Storage Engine ... 18

About Log Files ... 19

About Errors ... 20

About SQL Command Syntax ... 20

(4)

CONTENTS I I

4. Managing Infobright Tables ... 22

About the Infobright Database Files ... 22

About Supported Data Types ... 22

Creating and Dropping Tables ... 24

About Column Options ... 25

NULL and NOT NULL ... 25

Lookup Columns ... 25

Unsupported Column Options ... 25

Unsupported Indices Options ... 26

Converting Oracle DDL to Infobright ... 26

Converting SQL Server to Infobright ... 26

Converting MySQL (MyISAM) to Infobright ... 26

Viewing Table Information ... 27

Viewing Compression Ratio Statistics ... 28

Viewing Table Level Compression Ratio Statistics ... 29

Viewing Column Compression Ratio Statistics ... 30

Comparison of Calculated Compression Ratio to Physical Size ... 30

5. Data Manipulation Statements ... 31

Unsupported Data Manipulation Commands (INSERT, UPDATE, DELETE) ... 31

6. Character Set Support ... 32

Supported Character Sets ... 32

Collations and Comparisons ... 32

Padding ... 33

7. Importing and Exporting Data in Infobright... 34

Multi-character Delimiter ... 34

About Transactions ... 34

Using AUTOCOMMIT, COMMIT and ROLLBACK Commands ... 34

About Transaction Behavior ... 35

Failure Handling ... 35

About Export Differences in Infobright ... 36

CHAR(n) Data Type Values ... 36

Escape Characters ... 36

(5)

Infobright Loader Reject File ... 36

Enabling the Reject File Functionality ... 37

Disabling the Reject File Functionality ... 38

Infobright Loader Line Terminators ... 38

Escape Character ... 38

End of Line (EOL) Sequence ... 38

Importing Data ... 39

Importing Data Using Remote Load ... 39

Exporting Data ... 40

Optional FIELDS Clause ... 40

Importing Files with Invalid Values ... 41

Importing Data Using Linux Pipes ... 41

About Import Errors ... 42

About Export Errors ... 42

Sample Script (Create Table, Import Data, Export Data) ... 43

Exporting and Importing Query Results ... 44

8. Running Queries in Infobright ... 45

About the Knowledge Grid ... 45

About Knowledge Nodes ... 45

Running Queries ... 46

Running Queries ... 46

Enabling Queries to be Redirected to the MySQL Engine ... 46

Enabling Queries to be Redirected to the MySQL Engine ... 46

Viewing Queries Redirected to the MySQL Engine ... 46

Terminating a Query ... 47

Creating VIEWs in Infobright ... 47

Create VIEW Syntax ... 47

Select Syntax Supported in Infobright ... 47

Select Syntax ... 47 Join Syntax ... 48 Union Syntax ... 48 Subqueries ... 48 Query Performance ... 49 Rough Queries... 50

(6)

CONTENTS I V

Subquery Support ... 56

Complex Expression ... 61

9. Infobright Backup and Recovery ... 65

Backup Procedure ... 65

Restore Procedure ... 65

A. Infobright Optimizer - Supported Functions and Operators ... 66

Comparison Functions and Operators ... 66

Logical Operators ... 67

Control Flow Functions ... 67

String Functions ... 67

String Comparison Functions ... 69

Numeric Functions ... 69

Date and Time Functions ... 71

Text Search and Other Functions ... 73

Group By Aggregate Functions ... 74

Group By Modifiers ... 74

B. Infobright Data Tools ... 75

Infobright Configuration Manager ... 75

Running the Infobright Configuration Manager ... 75

Charset Migration Tool ... 75

Running the Charset Migration Tool ... 75

Log Structure ... 76

Collations-conversion-file Structure ... 76

Infobright DomainExpert ... 77

About the Infobright DomainExpert ... 77

Decomposition Rules ... 77

Decomposition Rules Language ... 77

Predefined IPv4 Rule ... 79

Other Predefined Rules ... 79

Assigning Rules to Columns ... 79

Applying Rules to Data... 80

Modifying a Rule for an Existing Column ... 81

(7)

System Settings for Red Hat Enterprise Linux and CentOS ... 82

Disable SElinux ... 82

Swappiness ... 82

Disable Unused Processes ... 82

File System Settings ... 82

Ensure CacheFolder is on a Fast Local Disk ... 82

Larger Readahead ... 82

Use XFS File System for Data Directories ... 82

noatime ... 83

Deadline Elevator ... 83

Increase ulimit to Support Large Data Volume or Users ... 83

(8)

1.ABOUT INFOBRIGHT 1

1. About Infobright

Infobright Overview

Thank you for choosing to install Infobright Community Edition (ICE) 4.0.6 GA. Infobright is a column-oriented, high performance analytic engine designed for analytic applications and data marts that need fast query response across large data volumes. Infobright was designed specifically for large volume data analytics applications with up to 50TB of data.

Infobright uses a unique and patent-pending approach to compressing, storing, and processing data that allows it to be installed and run on commodity hardware with little or no DBA intervention. Infobright requires little tuning to support ad hoc or complex business analytic queries.

Infobright is a database engine utilizing the MySQL database environment. As such, Infobright is fully compatible with all MySQL-compliant Business Intelligence tools and utilizes the MySQL administrative interface to reduce the learning curve for system administrators.

Infobright Community Edition provides a versatile, highly-compressed database system optimized for analytic-type queries. The ratio of possible compression and the speed of data import and retrieval are optimized at the expense of some transactional features of the engine performance, like the frequent data updating.

Infobright executes complex or ad hoc queries across vast amounts of data with a low cost of ownership.

Infobright and MySQL

ICE 4.0.6 GA combines the Infobright storage engine with MySQL server implementation. Infobright consists of several layers. The upper layers are provided by the MySQL server implementation, and the lower layers are provided by Infobright.

Infobright includes its own computing engine along with the storage engine. The MySQL query engine can be used with Infobright; however, since the MySQL storage engine interface is row oriented, it can not take full advantage of the column orientation or the Knowledge Grid and hence query execution via this path is reduced. Queries will be directed to the Infobright optimizer whenever possible.

Infobright ships with the full MySQL binaries required, including the MyISAM storage engine. MyISAM is used to store catalog information (as with other storage engines) and you can use the MyISAM instance for other purposes but joining MyISAM and Infobright tables may result in reduced performance as the MySQL query engine will be used.

(9)

MySQL provides: Infobright provides:  Mature connectors, tools

and resources  Interconnectivity and

certification with BI Tools  Management services and

utilities

 Load function that compresses data

 Column-oriented storage engine

 Knowledge Grid metadata layer that contains information about the compressed data  Optimizer/executor that uses

the Knowledge Grid

Infobright and MySQL are integrated as shown below:

Since other storage engines, like InnoDB and Falcon, are not included in the Infobright distribution, they must be run as separate instances (executables). If you wish to combine other storage engines with Infobright, you will need to look at a database federation application (some BI tools provide this).

(10)

2.SETTING UP INFOBRIGHT 3

2. Setting up Infobright

Technical Requirements

Before installing Infobright, review the following technical requirements.

I

NFOBRIGHT

T

ECHNICAL

R

EQUIREMENTS Requirement Description

Platforms Windows XP (32-bit only)

Windows Server 2003 (64-bit only) Red Hat Enterprise Linux 5 Debian “Lenny”

CentOS 5.2

Ubuntu 8.04 (32-bit only) Fedora 9 (32-bit only) Processor Architecture Intel 64-bit

Intel 32-bit AMD 64-bit AMD 32-bit

For Personal Evaluation and/or Application Development

CPU Speed 32-bit: 1.6GHz minimum, 2.0GHz or faster dual or quad core recommended 64-bit: 1.8GHz minimum, 2.0GHz or faster dual or quad core recommended Memory 32-bit: 1GB minimum. 2GB or more recommended

64-bit: 2GB minimum, 4GB or more recommended For Multi-User Evaluation or Production Deployment

CPU Speed 64-bit: 2.0GHz minimum, 2.0GHz or faster dual or quad core recommended Memory 64-bit: 4GB minimum, 16GB or more recommended (and at least 2GB per core)

Important

32-bit platforms are for solution testing purposes only and are not recommended for performance or multi-user testing, or production deployments.

(11)

Linux for Infobright

Infobright has been optimized for various ‘flavours’ of Linux. While Infobright can be run ‘out of the box’ on any supported Linux platform, there are a number of tuning opportunities to improve performance.

See "Linux Tuning Settings" on page 82 for a list of tuning suggestions.

Installation

Installing Infobright

The Infobright installation packages are provided as an RPM, DEB, PKG, .exe, or tarball. For non-Windows platforms, the user installing Infobright must be the root user or a user with the necessary permissions to install files, create the user mysql and create the group mysql.

Windows Installation

Windows Installation Instructions

1. Download the install package (for example, infobright-version-win32.exe) to the Windows machine on which you are installing Infobright, and double click on the .exe file to launch the Install Wizard. Click Next to continue.

2. Click I Agree to accept the GPL license agreement.

3. By default ICE is installed in C:\Program Files\Infobright. To change the default location, either enter the folder name in the field or click Browse… to select the desired install location on your computer. Click Install to accept the install location.

4. Please wait while the Install Wizard completes the installation.

5. Choose if you want Infobright to start on completion of the installation. Click Finish to complete the installation.

6. The Install Wizard automatically creates ICE as a Windows Service, which allows the Infobright server to be started and stopped automatically when you boot or shutdown Windows. If you do not want ICE to start on boot, open the Services window from the Control Panel and change the Startup Type for Infobright from “Automatic” to “Manual”.

7. The Install Wizard automatically determines the optimum memory settings based on the physical memory of the system. You may change these settings by editing the file brighthouse.ini within the data directory.

Important

The memory settings assume that there are no other services on the machine consuming significant memory. If this is not the case, please lower the memory settings for Infobright.

(12)

2.SETTING UP INFOBRIGHT 5

See “Recommended Memory Configurations” in "Configuration Tips and Examples" on page 14.

Uninstalling on Windows

 To uninstall ICE, select “Infobright Uninstall” under the Infobright program group in the Windows Start Menu:

Start/All Programs/Infobright/Infobright Uninstall

Linux Installation

Linux RPM and DPKG Installation Instructions

To install Infobright on Linux using the rpm or dpkg package:

1. Download the installation package from www.infobright.org/Download/ICE/. 2. Obtain root user access and run:

rpm -i infobright_version_name.rpm [--prefix=path] or

dpkg -i infobright_version_name.deb

Important

Do not install in the root or home directories due to possible MySQL permission checking issues during install, start up, and/or load. If you use the rpm --prefix option, you should manually create a softlink to the Infobright install directory from /usr/local/infobright.

3. To change the default install options, after installation run: /usr/local/infobright/postconfig.sh

You can run this script at any time after installation to change the datadir, CacheFolder, socket, or port. The script must be run as root, and Infobright must not be running.

(13)

I

NFOBRIGHT

I

NSTALL

O

PTIONS Parameter Description

Datadir Path to the directory where tables will be created and stored. Use a high-performance storage such as a RAID.

Cachedir Path to the directory where temporary files will be created and stored. Should be located on a fast drive, possibly not the same as the data. Allow at least 100 GB of free space (depending on database size).

Note: The Cachedir option is disabled when the Datadir option is chosen. To change Cachedir, rerun the postconfig utility and do not choose Datadir.

Port Listening port for the Infobright server instance.

Socket Socket connection point for client connections. (The socket connection point will be created during the Infobright installation.)

4. The installation determines the optimum memory settings based on the physical memory of the system. You may change these settings by editing the file brighthouse.ini within the data directory. See “Recommended Memory Configurations” in "Configuration Tips and Examples" on page 14.

Important

The memory settings assume that there are no other services on the machine consuming significant memory. If this is not the case, please lower the memory settings for Infobright.

Uninstalling on Linux

 To uninstall Infobright, run: rpm -e infobright

or

dpkg -r infobright

Linux TAR Install

To install Infobright on Linux using the tarball package: 1. Obtain root user access.

2. Change to the parent location in which you want to install (e.g. /usr/local) : cd /usr/local

Important

(14)

2.SETTING UP INFOBRIGHT 7

3. Unpack the tarball, which will create the product directory (e.g. infobright-version-x86_64_ice and create a symbolic link ‘infobright’ to the product folder:

gunzip < /path/to/infobright-version-x86_64_ice.tar.gz | tar xvf - ln -s /usr/local/infobright-version-x86_64_ice nfobright

cd /usr/local/infobright

4. Run the install script with the “--help” flag to check for system configuration and provide examples of directory parameters:

./install-infobright.sh –help Parameters required:

--datadir=infobright data folder

[--datadir=/usr/local/infobright/data] --cachedir=infobright cache folder

[--cachedir=/usr/local/infobright/cache] --config=mysql conf file to be created [--config=/etc/my-ib.cnf]

--port=infobright server port [--port=5029] --socket=socket file to be used by this server [--socket=/tmp/mysql-ib.sock]

--user=user to be created if not exist [--user=mysql]

--group=user group to be created if not exist [--group=mysql]

I

NFOBRIGHT

C

OMMAND

-

LINE

P

ARAMETERS Parameter Description

Datadir Path to the directory where tables will be created and stored. Use a high-performance storage such as a RAID.

Cachedir Path to the directory where temporary files will be created and stored. Should be located on a fast drive, possibly not the same as the data. Allow at least 100 GB of free space (depending on database size).

Note

The Cachedir option is disabled when the Datadir option is chosen. To change Cachedir, rerun the postconfig utility and do not choose Datadir.

Port Listening port for the Infobright server instance.

Config MySQL configuration file. (The configuration file will be created with defaults during the Infobright installation.)

Socket Socket connection point for client connections. (The socket connection point will be created during the Infobright installation.)

(15)

I

NFOBRIGHT

C

OMMAND

-

LINE

P

ARAMETERS Parameter Description

Group System group for the above user. Group will be created if it does not exist. The default group is mysql.

Run the install script again, this time with directory parameters. If parameters are used that already exist, an error will occur (for example running the same script with parameters twice).

Example command:

./install-infobright.sh datadir=/usr/local/infobright/data

cachedir=/usr/local/infobright/cache port=5029 config=/etc/my-ib.cnf --socket=/tmp/mysql-ib.sock --user=mysql --group=mysql

5. Change the default memory configuration by editing the file brighthouse.ini within the data directory. See “Recommended Memory Configurations” in "Configuration Tips and Examples" on page 14.

Important

It is critical that you increase the memory settings for systems running more than 2GB of physical memory or performance will be severely impacted.

Upgrade

Windows Upgrade Instructions

Before upgrading, be sure to read the latest release notes for any special upgrade instructions. 1. Please follow the standard ICE Windows installation instructions. The Install Wizard

automatically detects a previous version of ICE and upgrades your ICE installation while preserving your data and configuration settings. The install procedure automatically runs the Configuration Manager.

2. Start the Infobright server from the Start Menu items.

3. Create or ensure that the directory c:\tmp exists (necessary for step 4). 4. Run the MySQL Upgrade utility from the Windows command line:

cd "C:\Program Files\Infobright\bin"

mysql_upgrade.exe defaultsfile="c:\Program Files\Infobright\myib.ini" -uroot --tmpdir=c:\tmp

Important

The MySQL Upgrade utility may display several errors regarding the use of locks with log tables and errors requiring table upgrades. The errors are all handled automatically by Infobright and/or the upgrade utility and can be ignored.

(16)

2.SETTING UP INFOBRIGHT 9

5. Stop and start the Infobright server from the Start Menu items.

6. If you are upgrading from Infobright 3.5 or earlier, run the Infobright upgrade tool from the Windows command line. This creates stored procedures used by the DomainExpert. cd "C:\Program Files\Infobright"

\Infobright-upgrade.bat -u root

7. If you are upgrading from a version prior to ICE 3.3.2, you must update your table structures. See the next section for details.

Updating Table Structures (Versions Prior to ICE 3.3.2 Only)

If you are upgrading from a version prior to ICE 3.3.2, you must update your table structures after upgrading ICEIEE. Do NOT follow these instructions if you are upgrading from ICE

3.3.2 or higher or you may experience data corruption. If you are unsure what version of ICE

you are using, please contact Professional Services. 1. Stop the Infobright server from the Start Menu items.

2. Run the Charset Migration Tool from the Windows command line: cd "C:\Program Files\Infobright\bin"

chmt.exe –datadir=\absolute\path\to\data\directory 3. Start the Infobright server from the Start Menu items.

Linux RPM or DPKG Upgrade

Before upgrading, be sure to read the latest release notes for any special upgrade instructions. To upgrade using the rpm or deb package, simply run the installation command. The

package will automatically identify that Infobright is already installed and switch to upgrade mode. Your configuration settings and data will not be changed during the upgrade.

Important

If the previous installation was done using the tarball package, you must upgrade using the tarball package (see "Linux TAR Upgrade" on page 11) or contact Infobright Support to move from a tar install to a package install.

(17)

To upgrade Infobright on Linux using the rpm or deb package: 1. As user root, run either:

rpm -U infobright-version-platform.rpm dpkg -i infobright-version-platform.deb 2. Start the Infobright server:

/etc/init.d/mysqld-ib start

3. Run the mysql upgrade tool to upgrade the data folder: cd /usr/local/infobright

./bin/mysql_upgrade --defaults-file=/etc/my-ib.cnf --user=root --tmpdir=/tmp

Important

The MySQL Upgrade utility may display several errors regarding the use of locks with log tables and errors requiring table upgrades. The errors are all handled automatically by Infobright and/or the upgrade utility and can be ignored.

4. Restart the Infobright server: /etc/init.d/mysqld-ib restart

5. Confirm the build version as IB_4.0.6_r16086_16275: /usr/local/infobright/bin/mysqld --version

6. If you are upgrading from Infobright 3.5 or earlier, run the Infobright upgrade tool. This creates stored procedures used by the DomainExpert.

cd /usr/local/infobright

./infobright_upgrade.sh -u <user> -p <password> i.e. ./infobright_upgrade.sh -u root

Usage: ./infobright_upgrade.sh [-u <user>] [-p <password>]

7. If you are upgrading from a version prior to ICE 3.3.2, you must update your table structures. See the next section for details.

Updating Table Structures (Versions Prior to ICE 3.3.2 Only)

If you are upgrading from a version prior to ICE 3.3.2, you must update your table structures after upgrading ICE. Do NOT follow these instructions if you are upgrading from ICE 3.3.2

or higher or you may experience data corruption. If you are unsure what version of ICE you

are using, please contact Professional Services.

1. Stop the Infobright server from the Start Menu items. 2. Run the Charset Migration Tool (as user mysql):

cd /usr/local/infobright

./bin/chmt –datadir=/absolute/path/to/data/directory 3. Start the Infobright server from the Start Menu items.

(18)

2.SETTING UP INFOBRIGHT 11

Linux TAR Upgrade

Before upgrading, be sure to read the latest release notes for any special upgrade instructions.

To upgrade Infobright on Linux using the tarball package:

1. Unpack the tarball into a temporary folder. Use the gunzip utility for unpacking: cd /path/to/temp/

gunzip < /path/to/infobright-version-x86_64.tar.gz | tar xvf - 2. Stop the Infobright server:

/etc/init.d/mysqld-ib stop

3. Run the install script with the “--upgrade” and “--config” flags and pass in the configuration files of the previously installed version:

./install-infobright.sh --upgrade --config=/etc/my-ib.cnf 4. Start the Infobright server and run the mysql_upgrade utility:

/etc/init.d/mysqld-ib start cd /usr/local/infobright

./bin/mysql_upgrade --defaults-file=/etc/my-ib.cnf --user=root --tmpdir=/tmp

Important

The MySQL Upgrade utility may display several errors regarding the use of locks with log tables and errors requiring table upgrades. The errors are all handled automatically by Infobright and/or the upgrade utility and can be ignored.

5. Re-start the Infobright server: /etc/init.d/mysqld-ib restart

6. Confirm the build version as IB_4.0.6_r16086_16275: /usr/local/infobright/bin/mysqld –-version

7. If you are upgrading from Infobright 3.5 or earlier, run the Infobright upgrade tool. This creates stored procedures used by the DomainExpert.

cd /usr/local/infobright

./infobright_upgrade.sh -u <user> -p <password> i.e. ./infobright_upgrade.sh -u root

Usage: ./infobright_upgrade.sh [-u <user>] [-p <password>]

8. If you are upgrading from a version prior to ICE 3.3.2, you must update your table structures. See the next section for details.

(19)

Updating Table Structures (Versions Prior to ICE 3.3.2 Only)

If you are upgrading from a version prior to ICE 3.3.2, you must update your table structures after upgrading ICE. Do NOT follow these instructions if you are upgrading from ICE 3.3.2

or higher or you may experience data corruption. If you are unsure what version of ICE you

are using, please contact Professional Services.

1. Stop the Infobright server from the Start Menu items. 2. Run the Charset Migration Tool (as user mysql):

cd /usr/local/infobright

./bin/chmt –datadir=/absolute/path/to/data/directory 3. Start the Infobright server from the Start Menu items.

Configuration

Configuring Infobright

The Infobright configuration file is called brighthouse.ini and is located in the data

subdirectory within your Infobright installation directory. The configuration file is a text file containing the Infobright configuration parameters. See the Infobright installation package for a sample brighthouse.ini file.

Important

It is critical that you specify increased memory settings for systems running more than 2GB of physical memory to ensure optimal performance.

Each parameter is shown on a separate line and uses the following form: ParameterName=ParameterValue

If a parameter is not present in the configuration file or if the configuration file does not exist, the default values are used. Blank lines and comments (lines starting with #) are ignored. Be sure to customize the following parameters to optimize performance. These parameters

(20)

2.SETTING UP INFOBRIGHT 13

I

NFOBRIGHT

T

UNING

P

ARAMETERS

Parameter Syntax Value Description ServerMainHeapSize=size Not less

than 320 Default: 600

Size of the main memory heap in the server process, in MB. The larger the heap size, the more effectively the server works. However, the sum of the heap sizes in the server and the loader should not exceed physical memory installed in the machine, otherwise performance decreases radically.

LoaderMainHeapSize=size Not less than 320 Default: 320

Size of the memory heap in the loader process, in MB. The sum of the heap sizes in the server and the loader should not exceed physical memory installed in the machine, otherwise performance decreases radically. CacheFolder=directory Directory

name Default: none

This is a mandatory parameter. Path to the directory where temporary files will be created and stored. This is set as one of the install script parameters.

AllowMySQLQueryPath=number 0, 1 Default: 0

Set to 1 to allow queries that are not supported in the Infobright Optimizer to be handled by the MySQL query engine. Queries that take the MySQL path will have reduced performance.

Note

The values are commented out (preceded by #) in the brighthouse.ini file which causes them to default to the application minimum allowed values of 600 and 320 for

ServerMainHeapSize and LoaderMainHeapSize respectively.

I

NFOBRIGHT

A

DDITIONAL

P

ARAMETERS

Parameter Syntax Value Description KNFolder=directory Directory name

Default:

BH_RSI_Repository

Directory where the Knowledge Grid is stored. If not specified, these files are located in a subdirectory of the data directory. Allow free space of at least 1% of database size (compressed).

(21)

I

NFOBRIGHT

A

DDITIONAL

P

ARAMETERS

Parameter Syntax Value Description ControlMessages=number 0, 1, 2, 3

Default: 0

Set to 2 to turn the control messages on with timestamps. This is usually needed by Infobright to support performance investigation.

1 removes timestamp and session number

information, and generally not used in favour of 2. 3 is new in Infobright 3.4 and adds information for resource management (total and free memory and CPU cores).

Configuration Tips and Examples

Important

You should configure memory settings to ensure optimal performance.

The following table shows sample memory configurations for different systems.

R

ECOMMENDED

M

EMORY

C

ONFIGURATIONS

System Memory Server Main Heap Size Loader Main Heap Size

64GB 48000 800 48GB 32000 800 32GB 24000 800 16GB 10000 800 8GB 4000 800 4GB 1300 400 2GB 600 320

In most cases, the loader does not benefit from larger memory settings. However, increasing the LoaderMainHeapSize can help when:

(22)

2.SETTING UP INFOBRIGHT 15

 the table has many columns (e.g., 1000 columns).

You can use more memory at import if you are planning to execute several concurrent load tasks to different data tables. However, disk access may become a bottleneck.

ServerMainHeapSize should be as large as possible but safely smaller than the amount of physical memory in the machine. If performance decreases because of memory swapping by the operating system, try to set lower heap sizes. We also recommend decreasing the heap size if many users are running queries in parallel.

Important

Infobright may use additional memory for heavy loads or queries. Also, other applications on your server will use memory for their processes. It is important that the total of

ServerMainHeapSize and LoaderMainHeapSize is less than the total available physical memory. If the system needs to swap memory, performance will be severely impacted.

(23)

3. Using Infobright

Starting and Stopping the Infobright Server

Windows

The Windows Install Wizard automatically creates Infobright as a Windows Service, which allows the Infobright server to be started and stopped automatically when you boot or shutdown Windows.

 To manually start the Infobright server, from the Windows Start Menu run: Start/All Programs/Infobright/Infobright Start

 To manually stop the Infobright server, from the Windows Start Menu run: Start/All Programs/Infobright/Infobright Stop

Linux

You can start and stop the Infobright server the same way you would start and stop the original MySQL server (mysqld). Before using the Infobright server, see Starting and Stopping MySQL Automatically in the MySQL 5.1 Reference Manual.

Important

It is recommended that you run Infobright using MySQL user credentials rather than root for security reasons.

 To start the Infobright server on Linux, run: /etc/init.d/mysqld-ib start

 To start/stop the Infobright server during system boot/shutdown use the mysqld-ib script in /etc/init.d/ for start and stop services. Use run level 2 3 4 5 to start the service, and run level 0 1 6 to stop.

 The following are sample commands to create services:

(Ubuntu) update-rc.d mysql-ib.server start 99 2 3 4 5 . stop 01 0 1 6 . (CentOS) chkconfig --add mysqld-ib

chkconfig --level 2345 mysqld-ib on chkconfig --level 016 mysqld-ib off

Working with the Infobright Server

You can use the tools provided with MySQL, such as the mysql client program, with the Infobright server. For more information, see Tutorial in the MySQL 5.1 Reference Manual. You can also use GUI tools, such as the MySQL Workbench provided by MySQL AB, to query Infobright databases in a more graphical manner.

You can use the mysql client program to perform the following actions. For more information, see Tutorial in the MySQL 5.1 Reference Manual.

(24)

3.USING INFOBRIGHT 17

Windows

 To connect to the Infobright command line interface, run:

Start/All Programs/Infobright/Infobright Command Line Client

 To enable remote connections to Infobright you need to grant connection permissions in Infobright. From within the mysql shell run the following grant privileges commands: mysql> grant all privileges on *.* to 'root'@'localhost' with grant option; Query OK, 0 rows affected (0.00 sec)

mysql> grant all privileges on *.* to 'root'@'%' with grant option; Query OK, 0 rows affected (0.00 sec)

Linux

 If you used the standard install locations, enter the following command to connect to Infobright:

/usr/bin/mysql-ib

If you used a different install location, modify the above command to point to your socket file.

 When the Infobright server is first installed, an administrator account with no password is created. To connect to the administrator account, use the following command:

mysql-ib

 To run a script when connecting to the administrator account, use the following command:

mysql-ib < input_script_name.txt For example:

mysql-ib < /tmp/testing/input.txt

 To run a script when connecting to the administrator account and direct all output to a text file, use the following command:

mysql-ib < input_script_name.txt > output_results.txt For example:

mysql-ib < /tmp/testing/input.txt > /tmp/testing/output.txt

 During the Infobright server shutdown process, the server will not shut down until all running commands are completed. To force the shutdown of the server:

Kill the mysqld process and all running bhloader processes.

 Infobright can be used with most Business Intelligence tools and any MySQL GUI client tool like Toad or Navicat. Simply point to the IP address and socket number for the Infobright server, and logon using any user credentials that have been set up.

Checking the Infobright Version

You can use the following methods to check the version of the Infobright system.  Enter the following command at the command prompt:

/usr/local/infobright/bin/mysqld –-version

(25)

 After connecting to the Infobright administrator account, enter the following command at the mysql command prompt:

mysql> show variables like "version_comment"; The Infobright version will be shown. For example: mysql> show variables like "version_comment";

+---+---+ | Variable_name | Value | +---+---+ | version_comment | build number (revision)=IB_version_r5IB_3.2_GA_5316 | +---+---+ 1 row in set (0.00 sec)

Infobright is the Default Storage Engine

 The Infobright storage engine (named “Brighthouse”) should always be used when working with Infobright data. This is the default setting created when using the installer.  To view all available engines, enter the following command:

mysql> show engines;

The following information is displayed at the command prompt. In this example, MyISAM is shown as the default storage engine. You can combine the usage of different storage engines but you should avoid joining across storage engines as this can result in sub-optimal

performance due to the use of the MySQL query engine. However it can be quite useful in some cases to store query results in Memory or MyISAM tables and do further manipulations of results.

mysql> show engines;

+---+---+---+---+---+---+

|Engine |Support |Comment |Transactions |XA |Savepoints |

+---+---+---+---+---+---+

|BRIGHTHOUSE|DEFAULT |Infobright storage

engine |YES |NO | NO | |MRG_MYISAM |YES |Collection of identical MyISAM

tables |NO |NO | NO | |CSV |YES |CSV storage

engine |NO |NO | NO | |MEMORY |YES |Hash based, stored in memory, useful for temporary tables |NO |NO | NO |

|MyISAM |YES |Default engine as of MySQL 3.23 with great performance |NO |NO | NO |

+---+---+---+---+---+---+

(26)

3.USING INFOBRIGHT 19

About Log Files

Infobright uses the MySQL server logs and also creates several new logs. For more

information about MySQL logs, see MySQL Server Logs in the MySQL 5.1 Reference Manual.

I

NFOBRIGHT

L

OG

F

ILES

Log Type Information Written to Log

Error log Errors starting, stopping and running the Infobright server (mysqld). To generate this log, add the following lines to my.cnf:

log-error=<filename> log-output=FILE General query

log Connection and statement information received from clients.

Infobright log Server start and stop information. Also contains missing configuration settings.

It is possible to turn on the display of diagnostic information. By default this information is redirected to Infobright’s console, unless an error log is specified (see table above). To turn on diagnostic messages you have to modify your brighthouse.ini configuration file (see

"Configuring Infobright" on page 12) and set parameter ControlMessages to 1 (log actions), 2 (to add a time stamp to each message), or 3 (to add memory and CPU resource information).

Note

In general, more detail in the log may have an impact on performance; it is recommended that you find and use the setting that strikes the best balance for you in terms of performance versus log details.

(27)

About Errors

Infobright reports the same errors as the standard MySQL server. For more information, see Appendix B. Errors, Error Codes, and Common Problems in the MySQL 5.1 Reference Manual.

There are a few additional errors specific to Infobright import and export commands. For more information, see "About Import Errors" on page 42 and "About Export Errors" on page 42.

About SQL Command Syntax

The syntax for Infobright SQL commands is exactly the same as the syntax for MySQL commands. For more information, see SQL Statement Syntax in the MySQL 5.1 Reference Manual.

There are special considerations when using the following commands with Infobright. All other SQL commands can be used with Infobright as they are with the standard MySQL.

U

SING

M

Y

SQL

C

OMMANDS WITH

I

NFOBRIGHT MySQL Command More Information

CREATE TABLE, DROP TABLE "Creating and Dropping Tables" on page 24 SHOW TABLE STATUS, SHOW

FULL COLUMNS "Viewing Table Information" on page 27 "Viewing Compression Ratio Statistics" on page 28 INSERT, UPDATE, DELETE "Unsupported Data Manipulation Commands (INSERT,

UPDATE, DELETE)" on page 31 Important

Do not use INSERT, UPDATE, and DELETE to manipulate Infobright data.

LOAD DATA INFILE "Infobright Import/Export Syntax" on page 36 SELECT "Running Queries in Infobright" on page 45 VIEW "Creating VIEWs in Infobright" on page 47

(28)

3.USING INFOBRIGHT 21

About SQL ISO Standards

As mentioned in the previous section, Infobright uses the same syntax as the standard MySQL commands. For information about the compliance of the MySQL language with ISO SQL standards, see MySQL Standards Compliance in the MySQL 5.1 Reference Manual. Infobright is approaching full ISO SQL compliance. However, certain sections of the ISO SQL standard are open to interpretation and each DBMS, including Infobright, may implement these sections slightly differently. Consequently, Infobright query results may differ from those of other databases.

For example, the SQL standard does not define a default collation for string comparisons, which affects the ordering of query results. Different databases will implement different collation approaches, thus displaying inconsistent results for such things as sorts.

(29)

4. Managing Infobright Tables

About the Infobright Database Files

Infobright tables are located in the data subdirectory in your Infobright installation directory. This is the same directory structure used for standard MySQL databases and tables. For more information, see Installation Layouts in the MySQL 5.1 Reference Manual.

Within the data subdirectory, Infobright databases are stored in separate subdirectories. Within each database subdirectory, data files for each Infobright table are stored in separate subdirectories.

Important

Do not manually copy a data table from one database to another by copying the database files—internal table numbering errors and Knowledge Grid inconsistencies may occur. To copy a table, use import and export commands (see "Importing and Exporting Data in Infobright" on page 34) or backup the entire database directory (see "Infobright Backup and Recovery" on page 65).

The Infobright server uses additional directories to store temporary data, and optimization information, such as Knowledge Nodes. The following shows the data directory, containing the Infobright databases:

[root@ib03 data]# pwd /usr/local/infobright/data [root@ib03 data]# ls BH_RSI_Repository Infobright.log Infobright.seq ib03.corp.infobright.com.err mysql test

About Supported Data Types

The following data types are supported in Infobright. Note that numeric data types ranges are 1 less than the MySQL minimums and maximums.

N

UMERIC

T

YPES

Data Type Minimum Maximum

TINYINT -127 127

(30)

4.MANAGING INFOBRIGHT TABLES 23

N

UMERIC

T

YPES

Data Type Minimum Maximum

SMALLINT -32767 32767

MEDIUMINT -8388608 8388607

INT (INTEGER) -2147483647 2147483647

BIGINT -9223372036854775806 9223372036854775806

FLOAT -3.402823466E+38 3.402823466E+38

DOUBLE (DOUBLE PRECISION) -1.7976931348623157E+308 1.7976931348623157E+308 DEC(M, D) (DECIMAL(M, D))

where 0 < M <= 18 and 0 <= D <= M

-(1E+M – 1) / (1E+D) (1E+M – 1) / (1E+D)

D

ATE AND

T

IME

T

YPES

Data Type Minimum Maximum Format

DATE 100-01-01 9999-12-31 YYYY-mm-dd

DATETIME 100-01-01 00:00:00 9999-12-31 23:59:59 YYYY-mm-dd HH:MM:SS TIMESTAMP 1970-01-01 00:00:00 2038-01-01 00:59:59 YYYY-mm-dd HH:MM:SS

TIME -838:59:59 838:59:59 HHH:MM:SS

YEAR (4-digit format only) 1901 2155 YYYY

S

TRING

T

YPES

Data Type Maximum Length CHAR(N) 255

(31)

S

TRING

T

YPES

Data Type Maximum Length VARCHAR(N) 65532 BINARY(N) 255 VARBINARY(N) 65532 TINYTEXT 255 TEXT(N) 65535

Creating and Dropping Tables

Use the standard MySQL commands to create and drop tables in Infobright, the same as you would with a MySQL table. For detailed syntax information, see CREATE TABLE Syntax and DROP TABLE Syntax in the MySQL 5.1 Reference Manual.

Important

Do not manually copy a data table from one database to another by copying the database files—internal table numbering errors and Knowledge Grid inconsistencies may occur. To copy a table from one database to another, export from the source database and then import into the target database (see "Importing and Exporting Data in Infobright" on page 34) or backup the entire database directory (see "Infobright Backup and Recovery" on page 65). You can rename the entire database by renaming the folder. However, you should not copy a database folder from one active instance to another, or within the same active instance.

 To create a table, enter the following command: mysql> create table <table_name> (<column(s)>);  To drop a table, enter the following command:

mysql> drop table table_name;

See "About Column Options" on page 25 for information on supported and unsupported options when creating columns.

Note

When creating a table, as a matter of practice one should always use the ENGINE= option to ensure that the correct database engine is used. Infobright is shipped with DEFAULT ENGINE = BRIGHTHOUSE, but this can be changed. The name of the engine can be specified explicitly at the end of create table statement:

(32)

4.MANAGING INFOBRIGHT TABLES 25

About Column Options

NULL and NOT NULL

Infobright supports NULL and NOT NULL specifications for columns.  NULL allows NULL values for the column.

 NOT NULL replaces the imported NULL values with default values such as 0 (zero) for numeric columns and an empty string (‘’) for string columns.

Lookup Columns

Infobright provides an additional modifier for string data type columns, called a lookup column. The lookup column utilizes an integer substitution for values. You can declare a lookup column on a CHAR or VARCHAR column to increase its compression and performance in queries. However, to use a lookup column, the CHAR or VARCHAR column must meet the following criteria:

 There is no fixed upper limit for unique values in the column (cardinality). The total size of a dictionary, being the total length of all distinct values, will be loaded into RAM (for example: 1 million distinct values that are each 100-character wide will permanently occupy 100 MB of RAM.) As a rough guideline, the ratio of total number of records to distinct values should be reasonably high (greater than 10).

 The column must contain a large number of duplicate values: the ratio of total number of records to distinct values should be greater than 10.

Typically, a lookup column is useful for fields like state, gender, category, and the like where the number of instances is very high, but the number of unique values is very low. To determine the ratio of records to distinct values, determine the number of distinct values using SELECT COUNT (DISTINCT <COLUMN>) FROM… Then compare this to the number of records using a SELECT COUNT(<COLUMN>) FROM…

Note

Using a lookup on a column where there are more than 10,000 distinct values will result in greatly reduced load speeds.

 To declare a column as a lookup column, add the comment 'lookup' on the column. Enter the following command:

mysql> create table … (…

<<column name>> <<column type>> … comment 'lookup' … …)

engine=brighthouse;

Unsupported Column Options

The following column options are ignored by Infobright:  default values

(33)

Unsupported Indices Options

Infobright uses Knowledge Grid technology instead of standard indices and does not

support explicit indices. The following elements of CREATE TABLE syntax related to indices are not allowed:

 keys  indices

 unique columns

 auto-increment columns

Converting Oracle DDL to Infobright

If you have an existing Oracle schema definition, use the following steps to make it work on Infobright:

 Convert MEDIUMTEXT to VARCHAR (N), where ‘N’ is only as large as necessary  Convert LONGTEXT to VARCHAR (N), where ‘N’ is only as large as necessary  Convert DOUBLE(A,B) to DECIMAL(A,B)

 INTEGER types may be converted to BIGINT  Convert VARCHAR2/CHAR2 to VARCHAR/CHAR

Converting SQL Server to Infobright

If you have an existing SQL Server schema definition, use the following steps to make it work on Infobright:

 Convert MEDIUMTEXT to VARCHAR (N), where ‘N’ is only as large as necessary  Convert LONGTEXT to VARCHAR (N), where ‘N’ is only as large as necessary  Convert DOUBLE(A,B) to DECIMAL(A,B)

 Convert MONEY to DECIMAL(18,4)  Convert SMALLMONEY to DECIMAL(6,4)  INTEGER types may be converted to BIGINT

 NCHAR/NVARCHAR should be converted to CHAR/VARCHAR  Convert NUMBER to INTEGER.

 Convert NUMBER(A,B) to DECIMAL(A,B)

Converting MySQL (MyISAM) to Infobright

If you have an existing MyISAM schema definition, use the following steps to ensure compliance with Infobright:

 Convert MEDIUMTEXT to VARCHAR (N), where ‘N’ is only as large as necessary  Convert LONGTEXT to VARCHAR (N), where ‘N’ is only as large as necessary  Convert DOUBLE(A,B) to DECIMAL(A,B)

(34)

4.MANAGING INFOBRIGHT TABLES 27

Viewing Table Information

You can use the standard MySQL commands to obtain information about a table.

 To view column information, enter the following command:

SHOW [FULL] COLUMNS FROM tbl_name [FROM db_name] [LIKE 'pattern']; For more information, see SHOW COLUMNS Syntax in the MySQL 5.1 Reference Manual.

 Utilization of the FULL option will provide an estimate of the compression for each column.

mysql> show full columns from dim_cars;

+---+---+---+---+---+---+---+---+---+

| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |

+---+---+---+---+---+---+---+---+---+

| make_id | decimal(10,0) | NULL | YES | |

NULL | | select,references | Size[MB]: 0.1; Ratio: 15.64; unique | | make_name | varchar(25) | latin1_swedish_ci | YES | |

NULL | | select,references | Size[MB]: 0.1; Ratio: 5.05 | | model_name | varchar(25) | latin1_swedish_ci | YES | |

NULL | | select,references | Size[MB]: 0.1; Ratio: 1.38 | | record_dt | datetime | NULL | YES | |

NULL | | select,references | Size[MB]: 0.1; Ratio: 3.86 | +---+---+---+---+---+---+---+---+---+

4 rows in set (0.01 sec)

 To view the CREATE TABLE statement used to create a given table, enter the following command:

SHOW CREATE TABLE tbl_name;

For more information, see SHOW CREATE TABLE Syntax in the MySQL 5.1 Reference Manual.

mysql> show create table dim_cars;

+---+---+ | Table | Create Table | +---+---+

| dim_cars | CREATE TABLE `dim_cars` ( `make_id` decimal(10,0) DEFAULT NULL, `make_name` varchar(25) DEFAULT NULL, `model_name` varchar(25) DEFAULT NULL, `record_dt` datetime DEFAULT NULL ) ENGINE=BRIGHTHOUSE DEFAULT

CHARSET=latin1 |

+---+---+

(35)

 To view table information, enter the following command: SHOW TABLE STATUS [FROM db_name] [LIKE 'pattern'];

For more information, see SHOW TABLE STATUS Syntax in the MySQL 5.1 Reference Manual.

mysql> show table status like 'dim_cars';

+---+---+---+---+---+---+---+---+---+---+---+-- | Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | +---+---+---+---+---+---+---+---+---+---+---+-- | dim_cars | BRIGHTHOUSE | 10 | Compressed | 400 | 11

| 4672 | 0 | 0 | 0 | NULL | +---+---+---+---+---+---+---+---+---+---+---+-- ---+---+---+---+---+---+---+

Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |

---+---+---+---+---+---+---+

2008-08-28 05:30:44 | 2008-04-23 14:17:13 | NULL | latin1_swedish_ci | NULL | | Overall compression ratio: 3.622 |

---+---+---+---+---+---+---+

1 row in set (0.01 sec)

Viewing Compression Ratio Statistics

Infobright provides specific statistics on table and column compression. The compression ratio is calculated in relation to the “natural size” of uncompressed data in the table or column. The ratio equal to n means that the compressed data, including statistics and technical description of a column, is n times smaller than its theoretical natural size.

The following natural sizes (in bytes) are defined for various data types. Note the following:  For all data types, if the column is not declared as NOT NULL, add one bit per value for

NULL indicators.

 These data sizes take into account the typical format of data display, for example “yyyy-mm-dd” for DATE or decimal point for DEC. The size also counts the bytes that store the actual text length (VARCHAR).

D

ATA

T

YPES AND

N

ATURAL

S

IZES

Data Type Natural Size (in bytes) CHAR(n), BINARY(n) n*(number of rows) BIGINT, INT, MEDIUMINT,

(36)

4.MANAGING INFOBRIGHT TABLES 29

D

ATA

T

YPES AND

N

ATURAL

S

IZES

Data Type Natural Size (in bytes)

YEAR 4*(number of rows)

DATE 10*(number of rows)

TIME 8*(number of rows)

TIMESTAMP / DATETIME 19*(number of rows)

DEC(x,y) (x+1)*(number of rows)

FLOAT 4*(number of rows)

REAL, DOUBLE 8*(number of rows)

VARCHAR(n), VARBINARY(n) (total number of bytes used—i.e., the total length of all strings, excluding terminating characters) + 2*(number of rows)

Viewing Table Level Compression Ratio Statistics

 To view the compression ratio at the table level, enter the following command: mysql> show table status [from db_name ] [like 'table_name'];

 The optional like clause can be used to filter the tables. Note that the table name must be provided in single quotes.

 The compression statistics are provided in the table comment. For example: mysql> show table status from test like 't1' \G

*********************** 1. Row ********************** Name: t1 Engine: BRIGHTHOUSE Version: 10 Row_format: Compressed Rows: 3430387 Avg_row_length: 0 Data_length: 0 Max_data_length: 0 Index_length: 0 Data_free: 0 Auto_increment: NULL Create_time: 2008-09-04 15:31:39 Check_time: NULL Update_time: 2008-09-04 15:35:30 Collation: ascii_bin Checksum: NULL Create_options:

(37)

Viewing Column Compression Ratio Statistics

 To view the compression ratio and the compressed size for a column, enter the following command:

mysql> show full columns from table_name …;

 A database name and a column filter can be specified in optional clauses. For more information, see SHOW COLUMNS Syntax in the MySQL 5.1 Reference Manual.  The compression statistics are provided in the column comment. In addition to the

compression information, the comment line may also contain a “unique” indicator, meaning that the column has all unique values (except nulls).

For example:

Comparison of Calculated Compression Ratio to Physical Size

The compression ratio calculated above will differ from the compression ratio calculated from physical sizes of files on disk. The compression ratio based on physical size will be slightly smaller, due to extra files that are generated containing statistics on the imported data, such as Knowledge Nodes. Knowledge Nodes are used to optimize query execution and are discussed further in "About the Knowledge Grid" on page 45.

(38)

5.DATA MANIPULATION STATEMENTS 31

5. Data Manipulation Statements

Unsupported Data Manipulation Commands (INSERT, UPDATE, DELETE)

INSERT , UPDATE, and DELETE commands are not supported in Infobright Community Edition and should not be used to manipulate data in Infobright tables. Using these commands may

result in errors.

When using GUI tools with Infobright Community Edition, such as MySQL Browser, use these tools in read-only mode only. Do not use these tools to insert, update, or delete data. This may result in errors and the hanging of the GUI application.

To insert data into Infobright tables, use the MySQL import command. For more information, see "Importing and Exporting Data in Infobright" on page 34.

(39)

6. Character Set Support

Supported Character Sets

Infobright storage supports all ANSI and UTF-8 character sets. This means that Infobright can store and retrieve data encoded in 8-bit and multi-byte character sets.

Important

Queries that evaluate against UTF-8 character data columns will execute with less

performance than an equivalent query against ASCII character data, due to ASCII support of Character Maps in the Knowledge Grid (see "Running Queries in Infobright" on page 45). UTF-8 specific Knowledge Grid extensions will be available in an upcoming release.

Collations and Comparisons

Infobright supports all custom UTF-8 collations supported by MySQL 5.1:  utf8_bin  utf8_czech_ci  utf8_danish_ci  utf8_esperanto_ci  utf8_estonian_ci  utf8_general_ci (default)  utf8_hungarian_ci  utf8_icelandic_ci  utf8_latvian_ci  utf8_lithuanian_ci  utf8_persian_ci  utf8_polish_ci  utf8_roman_ci  utf8_romanian_ci  utf8_slovak_ci  utf8_slovenian_ci  utf8_spanish2_ci  utf8_spanish_ci  utf8_swedish_ci  utf8_turkish_ci  utf8_unicode_ci*

*utf8_unicode_ci properly handles both French and German collation, so specific collation types for these languages are not necessary.

For more information, see Unicode Support in the MySQL 5.1 Reference Manual.

The SQL standard does not define a default collation; therefore, many DBMS engines have different default collations and produce different results. As a result, there are several differences between Infobright and other DBMS engines.

 For Infobright, character data types are case-sensitive. For example, the condition 'toronto'='Toronto' is not true in Infobright. Similarly, the condition LIKE 'Abc%' is not true for 'abcde'.

 The Infobright sorting order is “A…Z a…z” (for example 'Zeta' < 'alfa'), which is the same sorting order as used by Oracle. The Infobright sorting order is different than the

(40)

6.CHARACTER SET SUPPORT 33

default MySQL sorting order, which mixes lowercase and uppercase; the SQL Server order, which is “aAbB…zZ”; and the DB2 order, which is “AaBb…Zz”.

 The Infobright sorting order affects ORDER BY results, GROUP BY results (which is the order of groups and their definitions—for example, 'aaa' and 'AAA' define different groups) and DISTINCT results. WHERE conditions may also be affected if you are expecting a different sorting order than the one used by Infobright.

 To simulate Infobright collation in the MySQL engine, set latin1_bin collation while creating a table (for more information, see Table Character Set and Collation in the MySQL 5.1 Reference Manual). Enter the following command:

mysql> create table … collate ascii_bin;

Padding

Infobright treats padding differently than other DBMS engines. Infobright assumes literal comparisons of text fields, including all whitespace characters. Therefore, a string containing two spaces is different than a string containing one space or an empty (0 length) string, which is also different than the NULL value.

The Infobright padding definition is compatible with the SQL standard. However, most DBMS systems have defined less restricted, customizable rules regarding text comparison. For example, 'abc ' = 'abc' may be true in some databases but is not true in Infobright.

Note

In CHAR columns, trailing spaces are trimmed on LOAD, whereas in VARCHAR columns values are loaded with all spaces.

(41)

7. Importing and Exporting Data in Infobright

Multi-character Delimiter

As of version 4.0.6 GA, data delimited by more than one character can be loaded in Infobright. This means that you can delimit data for each column with a sequence of characters that are not otherwise encountered as valid data. For example, instead of \t, a delimiter such as \t# can be used.

About Transactions

Using AUTOCOMMIT, COMMIT and ROLLBACK Commands

By default, Infobright uses AUTOCOMMIT mode to finalize transactions, meaning that every transaction is either automatically committed or rolled back if an error occurs. However, you can and should choose to disable AUTOCOMMIT and use COMMIT and ROLLBACK commands instead.

A new transaction starts with the first LOAD command or DML statement entered in a new Infobright session. A new transaction also starts after each COMMIT or ROLLBACK command.

 To enable the use of COMMIT and ROLLBACK commands in Infobright, you must disable AUTOCOMMIT. Enter the following command:

mysql> set autocommit=0;

 You can disable AUTOCOMMIT by setting the parameter to 0 (zero) and enable AUTOCOMMIT by setting the parameter to 1. If AUTOCOMMIT is set to 1, then when a LOAD is completed, the transaction is automatically committed.

 To commit the current transaction, enter the following command: mysql> commit;

 If you have not yet committed a LOAD DATA INFILE transaction, you can rollback the transaction. This will restore the import tables to the state that existed before the current transaction. Enter the following command:

mysql> rollback;

Using COMMIT and ROLLBACK makes it possible to check the load within the same session before committing the data, as the loaded data is available (viewable) to the load session. For instance, you could check something about the data (number of records load) before

committing.

After importing data using the LOAD DATA INFILE command, the status of the import and the number of affected rows is shown. All uncommitted rows, including those from previous imports, are shown; therefore, the number of affected rows may be greater than the number of rows in the file you just imported.

(42)

7.IMPORTING AND EXPORTING DATA IN INFOBRIGHT 35

About Transaction Behavior

While a write operation is being performed on a table, the following occurs:

 Queries to the table are not executed until the current import is complete and the operation is committed.

 Until the current write operation is committed, all subsequent write commands to the table are queued. They will wait for the write lock to be released before proceeding in the order they were received.

While a read query is being executed on a table, the following occurs:  All subsequent queries run concurrently with the current query.

In general, Infobright uses table level locking where only one LOAD operation can execute at one time and after queries have completed.

Failure Handling

If AUTOCOMMIT is disabled and the Infobright server is terminated during an import session, the following occurs:

 Infobright does not store the rows that were loaded during the failed import operation.  The input file and the database files are not harmed. To load data from the input file,

repeat the LOAD operation.

If AUTOCOMMIT is disabled and the Infobright server is terminated after an import session is completed successfully but is not committed, the following occurs:

 The transaction is rolled back and the imported data is lost when the server restarts.  The input file and the database files are not harmed by the failed import operation (the

database is unaffected, as if the import session did not occur). To re-import the data, repeat the LOAD operation.

If the Infobright server is terminated during an export operation to a disk file, the following occurs:

 A non-empty file is saved on disk; however, the last row in the saved file is inconsistent.  The database files are not harmed by the failed export operation. To export the data,

repeat the export operation.

If Infobright tries to import data from a file created during a failed export session, the following occurs:

 No data is inserted because the input file consists of corrupted table rows. No new records are added to the database files, so no harm is done.

(43)

About Export Differences in Infobright

There are several important differences between exporting data from Infobright and exporting data from other DBMS engines.

CHAR(n) Data Type Values

In Infobright, when you export CHAR(n) data type values to a text file, the extra spaces are trimmed from the export.

Escape Characters

The Infobright Loader supports escape character definition and usage.

Exporting NULL Values

Infobright recognizes the following representations of NULL values when loading data from a text file:

NULL, \N, <field delimiter><field delimiter>

However, Infobright only exports NULL values in the following representation: <field delimiter><field delimiter>

Other DBMS systems may have different representations of the NULL value; for example, MySQL only recognizes the representation \N for a NULL value. This can create issues if you export data from Infobright and import the data into MySQL. Since MySQL will only look for \N and will not recognize the Infobright representation of the NULL value, MySQL will change the NULL value into the default values in numeric and string columns.

Infobright Import/Export Syntax

Infobright Loader Reject File

Control if and when a load is halted (based on the number of rejected rows) using the Infobright Loader reject file. You can also determine if rejected rows are discarded or placed in a rejected row file. The default is to place them in a reject file.

Note

The reject file may not exist before running load data infile.

(44)

7.IMPORTING AND EXPORTING DATA IN INFOBRIGHT 37

Use the following parameters to configure the reject file:

I

NFOBRIGHT

L

OADER

R

EJECT

F

ILE

O

PTIONS

Option Description

BH_REJECT_FILE_PATH Path to the file where rejected rows are stored. Rejected rows are placed into the reject file in the order they are rejected. The original format is preserved to allow the operator to correct and rerun the load for only the rejected rows.

Note: If BH_REJECT_FILE_PATH is set, BH_ABORT_ON_COUNT or BH_ABORT_ON_THRESHOLD must be set as well.

BH_ABORT_ON_COUNT Abort and rollback the load if the number of rejected rows exceeds this value. If this value is not set, the load will be rolled back to the first bad record if the load fails. A value of -1 means never abort; a value of 0 means abort on first rejected row. There is no upper limit on this value.

Note: BH_ABORT_ON_COUNT and BH_ABORT_ON_THRESHOLD are mutually exclusive.

BH_ABORT_ON_THRESHOLD Abort and rollback the load if the relative number of rejected rows to total processed rows exceeds this value (threshold test starts after one packrow row has been processed). Value must be in the range (0,1) - this is an open interval.

For example:

set @ BH_ABORT_ON_THRESHOLD=0.01 / 0.5 / 0.99 means that 1% / 50% / 99% of all processed lines corrupted will terminate the Infobright Loader and save the problematic rows in the reject file. Note: BH_ABORT_ON_COUNT and BH_ABORT_ON_THRESHOLD are mutually exclusive.

Enabling the Reject File Functionality

To enable the reject file functionality, you must specify BH_REJECT_FILE_PATH and one of the associated parameters (BH_ABORT_ON_COUNT or BH_ABORT_ON_THRESHOLD). For example, if you want to load data from the file DATAFILE.csv to table T but you expects that 10 rows in this file might be wrongly formatted, you would run the following

commands:

set @BH_REJECT_FILE_PATH = '/tmp/reject_file'; set @BH_ABORT_ON_COUNT = 10;

load data infile DATAFILE.csv into table T;

If less than 10 rows are rejected, a warning will be output, the load will succeed and all problematic rows will be output to the file /tmp/reject_file. If the Infobright Loader finds a tenth bad row, the load will terminate with an error and all bad rows found so far will be output to the file /tmp/reject_file.

References

Related documents

Figure 1 illustrates the proposed hybrid adaptive flight control. The control architecture com- prises: 1) a reference model that translates a rate command into a desired

Mackey brings the center a laparoscopic approach to liver and pancreas surgery not available at most area hospitals.. JOSHUA FORMAN, MD

–  Use Apex code to access external REST API’s –  Can be used to send data out or pull data in to/.. from an

An analysis of the economic contribution of the software industry examined the effect of software activity on the Lebanese economy by measuring it in terms of output and value

Although the probability of a beyond design base accident at nuclear power plant is very very low (~ 1 event in 1 million reactor year operations), BARC has developed an Indian

The purpose of this study was to evaluate the diagnostic utility of real-time elastography (RTE) in differentiat- ing between reactive and metastatic cervical lymph nodes (LN)

Overall,  the  picture  obtained  from  Table  2  suggests  that  the  primary  factors  which  distinguish  between  states  with  an  increasing  MFR  are  a 

We have analysed four sets of labour-market experiences. The …rst was the transition from compulsory school to various states: continuing their education, …nding a job,