Getting started
2.1.4 Connection management
Creating a table instance is a relatively expensive operation, requiring a bit of network overhead. Rather than create a new table handle on demand, it’s better to use a
HBase shell
The HBase shell exposes a wealth of features, though it’s primarily used for admin- istrative purposes. Being implemented in JRuby, it has access to the entire Java cli- ent API. You can further explore the shell’s capabilities using the help command.
HBase client configuration
HBase client applications need to have only one configuration piece available to them to access HBase—the ZooKeeper quorum address. You can manually set this con- figuration like this:
myConf.set("hbase.zookeeper.quorum", "serverip");
Both ZooKeeper and the exact interaction between client and the HBase cluster are covered in the next chapter where we go into details of HBase as a distributed store. For now, all you need to know is that the configuration parameters can be picked ei- ther by the Java client from the hbase-site.xml file in their classpath or by you setting the configuration explicitly in the connection. When you leave the configuration com- pletely unspecified, as you do in this sample code, the default configuration is read and localhost is used for the ZooKeeper quorum address. When working in local mode, as you are here, that’s exactly what you want.
25
Data manipulation
connection pool. Connections are allocated from and returned to the pool. Using an HTablePool is more common in practice than instantiating HTables directly:
HTablePool pool = new HTablePool();
HTableInterface usersTable = pool.getTable("users"); ... // work with the table
usersTable.close();
Closing the table when you’re finished with it allows the underlying connection resources to be returned to the pool.
What good is a table without data in it? No good at all. Let’s store some data.
2.2
Data manipulation
Every row in an HBase table has a unique identifier called its rowkey. Other coordinates are used to locate a piece of data in an HBase table, but the rowkey is primary. Just like a primary key in a table in a relational database, rowkey values are distinct across all rows in an HBase table. Every interaction with data in a table begins with the rowkey. Every user in TwitBase is unique, so the user’s name makes a convenient rowkey for the users table; that’s what you’ll use.
The HBase API is broken into operations called commands. There are five primitive commands for interacting with HBase: Get, Put, Delete, Scan, and Increment. The command used to store data is Put. To store data in a table, you’ll need to create a Put instance. Creating a Put instance from a rowkey looks like this:
Put p = new Put(Bytes.toBytes("Mark Twain"));
Why can’t you store the user’s name directly? All data in HBase is stored as raw data in the form of a byte array, and that includes the rowkeys. The Java client library provides a utility class, Bytes, for converting various Java data types to and from byte[] so you don’t have to worry about doing it yourself. Note that this Put instance has not been inserted into the table yet. You’re only building the object right now.
2.2.1 Storing data
Now that you’ve staged a command for adding data to HBase, you still need to provide data to store. You can start by storing basic information about Mark, such as his email address and password. What happens if another person comes along whose name is also Mark Twain? They’ll conflict, and you won’t be able to store data about them in TwitBase. Instead of using the person’s real name as the rowkey, let’s use a unique username and store their real name in a column. Putting (no pun intended!) it all together:
Put p = new Put(Bytes.toBytes("TheRealMT")); p.add(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("Mark Twain")); p.add(Bytes.toBytes("info"), Bytes.toBytes("email"), Bytes.toBytes("[email protected]"));
Into the cell “info:name” store “Mark Twain”
Into the cell “info:email” store “[email protected]”
p.add(Bytes.toBytes("info"),
Bytes.toBytes("password"), Bytes.toBytes("Langhorne"));
Remember, HBase uses coordinates to locate a piece of data within a table. The rowkey is the first coordinate, followed by the column family. When used as a data coordinate, the column family serves to group columns. The next coordinate is the column qualifier, often called simply column, or qual, once you’re versed in HBase vernacular. The column qualifiers in this example are name, email, and password. Because HBase is schema-less, you never need to predefine the column qualifiers or assign them types. They’re dynamic; all you need is a name that you give them at write time. These three coordi- nates define the location of a cell. The cell is where HBase stores data as a value. A cell is identified by its [rowkey, column family, column qualifier] coordinate within a table. The previous code stores three values in three cells within a single row. The cell storing Mark’s name has the coordinates [TheRealMT, info, name].
The last step in writing data to HBase is sending the command to the table. That part is easy:
HTableInterface usersTable = pool.getTable("users"); Put p = new Put(Bytes.toBytes("TheRealMT"));
p.add(...); usersTable.put(p); usersTable.close();