Alternative HBase clients
6.1.2 Script table schema from the UNIX shell
Way back when learning HBase, you started development on the TwitBase application. One of the first things you did with TwitBase was to create a users table using the
HBase shell. As TwitBase grew, so did your schema. Tables for Twits and Followers soon emerged. All management code for those tables accumulated in the InitTables class. Java isn’t a convenient language for schema management because it’s verbose and requires building a custom application for each migration. Let’s reimagine that code as HBase shell commands.
The main body of code for creating a table in InitTables looks mostly the same for each table:
System.out.println("Creating Twits table...");
HTableDescriptor desc = new HTableDescriptor(TwitsDAO.TABLE_NAME); HColumnDescriptor c = new HColumnDescriptor(TwitsDAO.INFO_FAM); c.setMaxVersions(1);
desc.addFamily(c);
admin.createTable(desc);
System.out.println("Twits table created.");
You can achieve the same effect using the shell:
hbase(main):001:0> create 'twits', {NAME => 't', VERSIONS => 1} 0 row(s) in 1.0500 seconds
A brush with JRuby
If you’re familiar with the Ruby programming language, the create command may look conspicuously like a function invocation. That’s because it is. The HBase shell is implemented in JRuby. We’ll look more at this link to JRuby later in this chapter.
Five lines of Java reduced to a single shell command? Not bad. Now you can take that
HBase shell command and wrap it in a UNIX shell script. Note that the line exec hbase shell may be slightly different for you if the hbase command isn’t on your path. You handle that scenario in the final script, shown in listing 6.1:
#!/bin/sh
exec $HBASE_HOME/bin/hbase shell <<EOF create 'twits', {NAME => 't', VERSIONS => 1} EOF
Adding the other tables to your script is easy:
exec $HBASE_HOME/bin/hbase shell <<EOF create 'twits', {NAME => 't', VERSIONS => 1} create 'users', {NAME => 'info'}
create 'followes', {NAME => 'f', VERSIONS => 1} create 'followedBy', {NAME => 'f', VERSIONS => 1} EOF
At this point, you’ve moved your table and column family names out of Java. Overrid- ing them on the command line is now much easier:
#!/bin/sh
TWITS_TABLE=${TWITS_TABLE-'twits'} TWITS_FAM=${TWITS_FAM-'t'}
exec $HBASE_HOME/bin/hbase shell <<EOF
create '$TWITS_TABLE', {NAME => '$TWITS_FAM', VERSIONS => 1} create 'users', {NAME => 'info'}
create 'followes', {NAME => 'f', VERSIONS => 1} create 'followedBy', {NAME => 'f', VERSIONS => 1} EOF
If you update your application code to read those same constants from a configura- tion file, you can move your schema definition completely out of the Java code. Now you can easily test different versions of TwitBase against different tables on the same
HBase cluster. That flexibility will simplify the process of bringing TwitBase to produc- tion. The complete script is shown next.
#!/bin/sh
HBASE_CLI="$HBASE_HOME/bin/hbase"
test -n "$HBASE_HOME" || { echo >&2 'HBASE_HOME not set. using hbase on $PATH' HBASE_CLI=$(which hbase) } TWITS_TABLE=${TWITS_TABLE-'twits'} TWITS_FAM=${TWITS_FAM-'t'} USERS_TABLE=${USERS_TABLE-'users'} USERS_FAM=${USERS_FAM-'info'} FOLLOWS_TABLE=${FOLLOWS_TABLE-'follows'}
Listing 6.1 UNIX shell replacement for InitTables.java
Find hbase command
Determine table and column family names
147
Programming the HBase shell using JRuby FOLLOWS_FAM=${FOLLOWS_FAM-'f'}
FOLLOWEDBY_TABLE=${FOLLOWED_TABLE-'followedBy'} FOLLOWEDBY_FAM=${FOLLOWED_FAM-'f'}
exec "$HBASE_CLI" shell <<EOF create '$TWITS_TABLE',
{NAME => '$TWITS_FAM', VERSIONS => 1} create '$USERS_TABLE',
{NAME => '$USERS_FAM'} create '$FOLLOWS_TABLE',
{NAME => '$FOLLOWS_FAM', VERSIONS => 1} create '$FOLLOWEDBY_TABLE',
{NAME => '$FOLLOWEDBY_FAM', VERSIONS => 1} EOF
This was a primer on how you can use the HBase shell to create scripts that make it easy to do janitorial tasks on your HBase deployment. The HBase shell isn’t something you’ll use as your primary access method to HBase; it’s not meant to have an entire application built on top of it. It’s an application itself that has been built on top of
JRuby, which we study next.
6.2
Programming the HBase shell using JRuby
The HBase shell provides a convenient interactive environment and is sufficient for many simple administrative tasks. But it can become tedious for more complex opera- tions. As we mentioned in the previous section, the HBase shell is implemented in
JRuby.1 Behind the scenes is a nice library exposing the HBase client to JRuby. You can
access that library in your own scripts to create increasingly complex automation over
HBase. In this example, you’ll build a tool for interacting with the TwitBase users table, similar to the UsersTool you wrote in Java. This will give you a feel for interact- ing with HBase from JRuby.
Programming HBase via this JRuby interface is one step above the shell in terms of sophistication. If you find yourself writing complex shell scripts, a JRuby application may be a preferable approach. If for whatever reason you need to use the C imple- mentation of Ruby instead of JRuby, you’ll want to explore Thrift. We demonstrate using Thrift from Python later in this chapter; using it from Ruby is similar.
You can find the completed TwitBase.jrb script from this section in the TwitBase project source at https://github.com/hbaseinaction/twitbase/blob/master/bin/ TwitBase.jrb.
6.2.1 Preparing the HBase shell
The easiest way to launch your own JRuby applications is through the existing HBase shell. If you haven’t already done so, locate the shell by following the instructions at the beginning of the previous section.
1 JRuby is the Ruby programming language implemented on top of the JVM. Learn more at http://jruby.org/.
Run shell commands
Once you’ve found the hbase command, you can use that as the interpreter for your own scripts. This is particularly useful because it handles importing the necessary libraries and instantiates all the classes you’ll need. To get started, create a script to list the tables. Call it TwitBase.jrb:
def list_tables() @hbase.admin(@formatter).list.each do |t| puts t end end list_tables exit
The variables @hbase and @formatter are two instances created for you by the shell. They’re part of that JRuby API you’re about to take advantage of. Now give the script a try:
$ $HBASE_HOME/bin/hbase shell ./TwitBase.jrb followers
twits users
With everything in place, let’s start working with TwitBase.