As of version 4.0.6 GA, data delimited by more than one character can be loaded in Infobright. This means that you can delimit data for each column with a sequence of characters that are not otherwise encountered as valid data. For example, instead of \t, a delimiter such as \t# can be used.
About Transactions
Using AUTOCOMMIT, COMMIT and ROLLBACK Commands
By default, Infobright uses AUTOCOMMIT mode to finalize transactions, meaning that every transaction is either automatically committed or rolled back if an error occurs. However, you can and should choose to disable AUTOCOMMIT and use COMMIT and ROLLBACK commands instead.
A new transaction starts with the first LOAD command or DML statement entered in a new Infobright session. A new transaction also starts after each COMMIT or ROLLBACK command.
To enable the use of COMMIT and ROLLBACK commands in Infobright, you must disable AUTOCOMMIT. Enter the following command:
mysql> set autocommit=0;
You can disable AUTOCOMMIT by setting the parameter to 0 (zero) and enable AUTOCOMMIT by setting the parameter to 1. If AUTOCOMMIT is set to 1, then when a LOAD is completed, the transaction is automatically committed.
To commit the current transaction, enter the following command: mysql> commit;
If you have not yet committed a LOAD DATA INFILE transaction, you can rollback the transaction. This will restore the import tables to the state that existed before the current transaction. Enter the following command:
mysql> rollback;
Using COMMIT and ROLLBACK makes it possible to check the load within the same session before committing the data, as the loaded data is available (viewable) to the load session. For instance, you could check something about the data (number of records load) before
committing.
After importing data using the LOAD DATA INFILE command, the status of the import and the number of affected rows is shown. All uncommitted rows, including those from previous imports, are shown; therefore, the number of affected rows may be greater than the number of rows in the file you just imported.
7.IMPORTING AND EXPORTING DATA IN INFOBRIGHT 35
About Transaction Behavior
While a write operation is being performed on a table, the following occurs:
Queries to the table are not executed until the current import is complete and the operation is committed.
Until the current write operation is committed, all subsequent write commands to the table are queued. They will wait for the write lock to be released before proceeding in the order they were received.
While a read query is being executed on a table, the following occurs: All subsequent queries run concurrently with the current query.
In general, Infobright uses table level locking where only one LOAD operation can execute at one time and after queries have completed.
Failure Handling
If AUTOCOMMIT is disabled and the Infobright server is terminated during an import session, the following occurs:
Infobright does not store the rows that were loaded during the failed import operation. The input file and the database files are not harmed. To load data from the input file,
repeat the LOAD operation.
If AUTOCOMMIT is disabled and the Infobright server is terminated after an import session is completed successfully but is not committed, the following occurs:
The transaction is rolled back and the imported data is lost when the server restarts. The input file and the database files are not harmed by the failed import operation (the
database is unaffected, as if the import session did not occur). To re-import the data, repeat the LOAD operation.
If the Infobright server is terminated during an export operation to a disk file, the following occurs:
A non-empty file is saved on disk; however, the last row in the saved file is inconsistent. The database files are not harmed by the failed export operation. To export the data,
repeat the export operation.
If Infobright tries to import data from a file created during a failed export session, the following occurs:
No data is inserted because the input file consists of corrupted table rows. No new records are added to the database files, so no harm is done.
About Export Differences in Infobright
There are several important differences between exporting data from Infobright and exporting data from other DBMS engines.
CHAR(n) Data Type Values
In Infobright, when you export CHAR(n) data type values to a text file, the extra spaces are trimmed from the export.
Escape Characters
The Infobright Loader supports escape character definition and usage.
Exporting NULL Values
Infobright recognizes the following representations of NULL values when loading data from a text file:
NULL, \N, <field delimiter><field delimiter>
However, Infobright only exports NULL values in the following representation: <field delimiter><field delimiter>
Other DBMS systems may have different representations of the NULL value; for example, MySQL only recognizes the representation \N for a NULL value. This can create issues if you export data from Infobright and import the data into MySQL. Since MySQL will only look for \N and will not recognize the Infobright representation of the NULL value, MySQL will change the NULL value into the default values in numeric and string columns.
Infobright Import/Export Syntax
Infobright Loader Reject File
Control if and when a load is halted (based on the number of rejected rows) using the Infobright Loader reject file. You can also determine if rejected rows are discarded or placed in a rejected row file. The default is to place them in a reject file.
Note
The reject file may not exist before running load data infile.
7.IMPORTING AND EXPORTING DATA IN INFOBRIGHT 37
Use the following parameters to configure the reject file:
I
NFOBRIGHTL
OADERR
EJECTF
ILEO
PTIONSOption Description
BH_REJECT_FILE_PATH Path to the file where rejected rows are stored. Rejected rows are placed into the reject file in the order they are rejected. The original format is preserved to allow the operator to correct and rerun the load for only the rejected rows.
Note: If BH_REJECT_FILE_PATH is set, BH_ABORT_ON_COUNT or BH_ABORT_ON_THRESHOLD must be set as well.
BH_ABORT_ON_COUNT Abort and rollback the load if the number of rejected rows exceeds this value. If this value is not set, the load will be rolled back to the first bad record if the load fails. A value of -1 means never abort; a value of 0 means abort on first rejected row. There is no upper limit on this value.
Note: BH_ABORT_ON_COUNT and BH_ABORT_ON_THRESHOLD are mutually exclusive.
BH_ABORT_ON_THRESHOLD Abort and rollback the load if the relative number of rejected rows to total processed rows exceeds this value (threshold test starts after one packrow row has been processed). Value must be in the range (0,1) - this is an open interval.
For example:
set @ BH_ABORT_ON_THRESHOLD=0.01 / 0.5 / 0.99 means that 1% / 50% / 99% of all processed lines corrupted will terminate the Infobright Loader and save the problematic rows in the reject file. Note: BH_ABORT_ON_COUNT and BH_ABORT_ON_THRESHOLD are mutually exclusive.
Enabling the Reject File Functionality
To enable the reject file functionality, you must specify BH_REJECT_FILE_PATH and one of the associated parameters (BH_ABORT_ON_COUNT or BH_ABORT_ON_THRESHOLD). For example, if you want to load data from the file DATAFILE.csv to table T but you expects that 10 rows in this file might be wrongly formatted, you would run the following
commands:
set @BH_REJECT_FILE_PATH = '/tmp/reject_file'; set @BH_ABORT_ON_COUNT = 10;
load data infile DATAFILE.csv into table T;
If less than 10 rows are rejected, a warning will be output, the load will succeed and all problematic rows will be output to the file /tmp/reject_file. If the Infobright Loader finds a tenth bad row, the load will terminate with an error and all bad rows found so far will be output to the file /tmp/reject_file.
Disabling the Reject File Functionality
Disabling the reject file related parameters after the load is recommended to ensure the reject file functionality is not used by accident. For the same reason, setting any values for those parameters in the file my.ini/cnf is not recommended. To disable the reject file functionality, run the following commands:
set @BH_REJECT_FILE_PATH = NULL; set @BH_ABORT_ON_COUNT = NULL;
Infobright Loader Line Terminators
You can define the type of terminator for a load when using the "line terminated by" command.
Escape Character
The escape character cannot be part of the field terminator. For example, assume you try to execute a command such as:
LOAD DATA INFILE ... FIELDS TERMINATED BY "a\\a" ESCAPED BY '\\'; or
LOAD DATA INFILE ... FIELDS TERMINATED BY "#@\t" ESCAPED BY '@'; The following error message will appear:
Field terminator containing the escape character not supported.
If you try to execute a command such as:
LOAD DATA INFILE ... FIELDS ENCLOSED BY '"' ESCAPED BY '"'; or
LOAD DATA INFILE ... FIELDS ENCLOSED BY '#' ESCAPED BY '#'; the following error message will appear:
The same enclose and escape characters not supported.
End of Line (EOL) Sequence
When using MySQL's LOAD DATA INFILE command, you can specify the end of line (EOL) sequence in the file by adding a LINES TERMINATED BY 'X' clause. For example:
LOAD DATA INFILE 'DATAFILE.csv' INTO TABLE T ... LINES TERMINATED BY '\n'" (Lines terminated like in Linux)
LOAD DATA INFILE 'DATAFILE.csv' INTO TABLE T ... LINES TERMINATED BY '\r\n'" (Lines terminated like in Windows)
If the EOL sequence is not specified, the following error message will be output: Query contains syntax that is not supported and will be ignored.
7.IMPORTING AND EXPORTING DATA IN INFOBRIGHT 39
Importing Data
To import data into an Infobright table, use the following MySQL loading command:
LOAD DATA INFILE 'file_name' INTO TABLE tbl_name [FIELDS [TERMINATED BY 'char'] [ENCLOSED BY 'char'] [ESCAPED BY 'char'] ]; where:
file_name = path to the file to be loaded
tbl_name = name of the table where the data will be loaded
Importing Data Using Remote Load
Using the Infobright Loader, you can load data from a remote machine across the network using the LOAD DATA LOCAL INFILE syntax. This allows you to offload potentially heavy ETL processing to a separate server, keeping the Infobright server on a dedicated machine. This also allows you to save significant time when transferring large LOAD files over the network, which can typically limit load speed. For more information, see LOAD DATA INFILE Syntax in the MySQL 5.1 Reference Manual.
A few important notes:
Before importing data using the LOAD DATA LOCAL syntax, be sure you fully understand the security issues. See Security Issues with LOAD DATA LOCAL in the MySQL 5.1 Reference Manual for details.
You can disable all LOAD DATA LOCAL statements from the server side by starting mysqld with the --local-infile=0 option.
For the mysql command-line client, enable LOAD DATA LOCAL by specifying the --local- infile[=1] option, or disable it with the --local-infile=0 option. For mysqlimport, local data file loading is off by default; enable it with the --local or -L option. In any case, successful use of a local load operation requires that the server permits it. Some (but not all) Windows GUI tools may work with remote load, even with Linux
servers.
To import data into an Infobright table from a remote machine across the network, use the following MySQL loading command (for more information about command options, see the Data Loading Guide):
LOAD DATA [LOCAL] INFILE 'file_name' INTO TABLE tbl_name [FIELDS
[TERMINATED BY 'char'] [ENCLOSED BY 'char']
where:
file_name = path to the file to be loaded
tbl_name = name of the table where the data will be loaded
If LOCAL is specified, the file is read by the client program on the client host and sent to the server. The file can be given as a full path name to specify its exact location. If given as a relative path name, the name is interpreted relative to the directory in which the client program was started.
Note
Network speeds may limit the load speed. Exceptions and errors in the transfer are handled by the MySQL client, and will behave the same as the MySQL client.
Exporting Data
To export data from an Infobright table, use the following MySQL export command:
SELECT … INTO OUTFILE 'file_name' [FIELDS [TERMINATED BY 'string'] [ENCLOSED BY 'char'] [ESCAPED BY 'char']] FROM 'tbl_name'; where:
file_name = path to the file where data will be exported
tbl_name = name of the table from which the data will be retrieved
For more information on export syntax, see SELECT Syntax in the MySQL 5.1 Reference Manual.
Optional FIELDS Clause
Several optional clauses exist for the MySQL LOAD command. All of these clauses are ignored by Infobright, with the exception of the FIELDS clause. You can also use the FIELDS clause when exporting data.
You can use the optional FIELDS clause to specify how values are provided in the input file. To use the FIELDS clause, the data import format must be defined as variable-length text.
Within the FIELDS clause, you can use the following sub-clauses:
Use the TERMINATED BY sub clause to specify the character recognized as the separator (delimiter) between values. By default, a semicolon ; is assumed to separate values. Use the ENCLOSED BY sub clause to specify the character that begins and ends each string
representing a text value. By default, a double quotation mark " is assumed to enclose each value. If the text values in the input file do not use any enclosing characters, use the
7.IMPORTING AND EXPORTING DATA IN INFOBRIGHT 41
value NULL in the ENCLOSED BY sub clause. Note that this is the same as using the empty string '' option in standard MySQL.
Use the ESCAPED BY sub clause to support special characters that may be imbedded within text fields.
Importing Files with Invalid Values
Infobright may abort a load when invalid values are found. Certain invalid values, however, can be loaded in Infobright. The following rules are used with invalid data:
If a numeric, date or time value is invalid, the value is replaced by 0.
If a NULL value is imported into a column defined as NOT NULL (except for TIMESTAMP columns), it is replaced by 0 (for numerical, date and time columns) or by an empty string (for string columns).
Importing Data Using Linux Pipes
You can use Linux pipes when importing data in Infobright. The same dataformat parameter applies; see Setting Import and Export Parameters. You can also use the FIELDS clause when exporting data. For more information, see "Optional FIELDS Clause" on page 40.
To set up a Linux pipe, you need to run the mkfifo command from Linux, and ensure that the pipe is accessible to Infobright. In the following example the pipe is setup as
/pipe_test/thepipe.pipe. You can use the directory and name of your choice. mkfifo /pipe_test/thepipe.pipe
chmod 666 /pipe_test/thepipe.pipe
Once the pipe is set up, direct the data either by directing a file or a process to the pipe: cat /usr/tmp/jkvarload.txt > /pipe_test/thepipe.pipe &
Then execute a LOAD DATA INFILE statement using the pipe:
mysql> load data infile '/pipe_test/thepipe.pipe' into table pipe_test lines terminated by '\n';
When finished, remember to remove the pipe: rm thepipe.pipe
About Import Errors
There are several possible Infobright-related errors that could occur when using the LOAD DATA command on a Infobright table. These errors are described in the following table. Standard MySQL errors may also occur (for more information, see Appendix B. Errors, Error Codes, and Common Problems in the MySQL 5.1 Reference Manual).
I
NFOBRIGHTI
MPORTE
RRORSCode Message Description Action
1 Cannot open file or
pipe Cannot open a file or a pipe containing input data Ensure the file exists and the path is entered correctly 2 Wrong data or
column definition
Format of data does not comply with table definition
Ensure the data being imported is the correct data type and does not exceed the size specified
3 Syntax error Not used N/A
4 Cannot connect to
the database Not used N/A
5 Unknown error Unspecified error occurred Contact customer support 6 Wrong parameter Wrong value for one of the
loading parameters Make sure the correct parameter is used 7 Data conversion
error A value in data cannot be converted to a column type Ensure the data is the correct column type
About Export Errors
There are several possible Infobright-related errors that could occur when exporting data from a Infobright table. These errors are described in the following table. Standard MySQL errors may also occur (for more information, see Appendix B. Errors, Error Codes, and Common Problems in the MySQL 5.1 Reference Manual).
I
NFOBRIGHTE
XPORTE
RRORSCode Message Description Action
1 Cannot open file or
7.IMPORTING AND EXPORTING DATA IN INFOBRIGHT 43
I
NFOBRIGHTE
XPORTE
RRORSCode Message Description Action
2 Wrong data or
column definition Not used Ensure the data being exported is the correct data type and does not exceed the size specified
3 Syntax error Not used Check the export syntax 4 Cannot connect to
the database Not used Ensure database exists, the correct path is given and Infobright is started 5 Unknown error Unspecified error occurred Contact customer support
6 Wrong parameter Wrong value for one of the
export parameters Make sure the correct parameter is used 7 Data conversion
error Not used Ensure the data is the correct column type
Sample Script (Create Table, Import Data, Export Data)
The following sample script creates a table called customers, sets Infobright as the default engine, imports data from an existing text file and exports the data.
USE Northwind;
DROP TABLE IF EXISTS customers; CREATE TABLE customers (
CustomerID varchar(5), CompanyName varchar(40), ContactName varchar(30), ContactTitle varchar(30), Address varchar(60), City varchar(15) Region char(15) PostalCode char(10), Country char(15), Phone char(24), Fax varchar(24), CreditCard float(17,1), FederalTaxes decimal(4,2) ) ENGINE=BRIGHTHOUSE;
-- Import the text file. Set AUTOCOMMIT=0;
-- Export the data into TEXT format. SET @bh_dataformat = 'txt_variable';
SELECT * INTO OUTFILE "/tmp/output/customers.text" FIELDS TERMINATED BY ';' ENCLOSED BY 'NULL' LINES TERMINATED BY '\r\n' FROM customers;
Exporting and Importing Query Results
After exporting the results of a query to an output file, you may not be able to import the file back into the same definition of the accessed table. This is because the query may contain aggregates that will produce values beyond the boundaries of the original data types. In order to load the output file, you may need to create a new table with the appropriate data types for the values to be imported.
8.RUNNING QUERIES IN INFOBRIGHT 45