• No results found

Interacting with the TwitBase users table

In document Hbase in Action (Page 175-178)

Alternative HBase clients

6.2.2 Interacting with the TwitBase users table

A great thing about writing code for the shell is that it’s easy to try out. Launch the shell, and explore the API. Scanning over the users table requires a handle to the table and a scanner. Start by acquiring your handle:

$ hbase shell ...

hbase(main):001:0> users_table = @hbase.table('users', @formatter) => #<Hbase::Table:0x57cae5b7 @table=...>>

From the table, create a scanner. Specify the scanner options using a regular hash. The scanner constructor looks for a few specific keys in that hash, including "START- ROW", "STOPROW", and "COLUMNS". Scan over all users, returning only their username, name, and email address:

hbase(main):002:0> scan = {"COLUMNS" => ['info:user', 'info:name',

'info:email']}

=> {"COLUMNS"=>["info:user", "info:name", "info:email"]} hbase(main):003:0> users_table.scan(scan)

=> {"GrandpaD"=>

{"info:email"=>"timestamp=1338961216314, [email protected]", "info:name"=>"timestamp=1338961216314, value=Fyodor Dostoyevsky", "info:user"=>"timestamp=1338961216314, value=GrandpaD"},

"HMS_Surprise"=>

{"info:email"=>"timestamp=1338961187869, [email protected]", "info:name"=>"timestamp=1338961187869, value=Patrick O'Brian", "info:user"=>"timestamp=1338961187869, value=HMS_Surprise"}, "SirDoyle"=>

149

Programming the HBase shell using JRuby [email protected]",

"info:name"=>"timestamp=1338961221470, value=Sir Arthur Conan Doyle", "info:user"=>"timestamp=1338961221470, value=SirDoyle"},

"TheRealMT"=>

{"info:email"=>"timestamp=1338961231471, [email protected]", "info:name"=>"timestamp=1338961231471, value=Mark Twain",

"info:user"=>"timestamp=1338961231471, value=TheRealMT"}}

Now you have everything you need to iterate over the keypairs produced by the scan- ner. It’s time to start building the script.

A slight diversion in the API, the block version of scan() condenses each column into a string of the format "column=..., timestamp=..., value=...". Parse out the data you’re interested in, and accumulate the results:

scan = {"COLUMNS" => ['info:user', 'info:name', 'info:email']} results = {} users_table.scan(scan) do |row,col| unless results[row] results[row] = {} end m = /^.*info:(.*), t.*value=(.*)$/.match(col) results[row][m[1]] = m[2] if m end

The regular expression extracts just the qualifier and cell value from the scan result. It accumulates that data in the results hash. The last step is to format the results:

results.each do |row,vals|

puts "<User %s, %s, %s>" % [vals['user'], vals['name'], vals['email']] end

Now you have everything you need to complete the example. Wrap it up in a main(), and ship it! The final TwitBase.jrb script is shown in the following listing.

def list_users()

users_table = @hbase.table('users', @formatter) scan = {"COLUMNS" => ['info:user', 'info:name', 'info:email']} results = {} users_table.scan(scan) do |row,col| results[row] ||= {} m = /^.*info:(.*), t.*value=(.*)$/.match(col) results[row][m[1]] = m[2] if m end results.each do |row,vals| puts "<User %s, %s, %s>" % [vals['user'], vals['name'], vals['email']] end

end

def main(args)

if args.length == 0 || args[0] == 'help' puts <<EOM

TwitBase.jrb action ...

help - print this message and exit

Listing 6.2 TwitBase.jrb: programming the HBase shell

Parse KeyValue results Connect to table Scan columns of interest Parse KeyValue results Print user rows

list - list all installed users. EOM exit end if args[0] == 'list' list_users end exit end main(ARGV)

With your script in order, set it to executable and give it a try:

$ chmod a+x TwitBase.jrb $ ./TwitBase.jrb list

<User GrandpaD, Fyodor Dostoyevsky, [email protected]> <User HMS_Surprise, Patrick O'Brian, [email protected]>

<User SirDoyle, Sir Arthur Conan Doyle, [email protected]> <User TheRealMT, Mark Twain, [email protected]>

That’s all there is to it. Programming the JRuby interface is an easy way to explore pro- totypes on top of HBase or automate common tasks. It’s all built on the same HBase Java client you’ve used in previous chapters. For the next sample application, we’ll move off the JVM entirely. HBase provides a REST interface, and we’ll demonstrate that interface using Curl on the command line.

6.3

HBase over REST

One of the factors that prevents people from experimenting with HBase is its close relationship with Java. There are a couple of alternatives for people who are willing to run HBase but want nothing to do with Java for their applications. Whether you’re exploring HBase or you want to put an HBase cluster directly in the hands of your application developers, the REST interface may be appropriate. For the uninitiated,2 REST is a convention for interacting with objects over the web. HBase ships with a

REST service that you can use to access HBase, no Java required.

The REST service runs as a separate process and communicates with HBase using the same client API we explored earlier. It can run on any machine configured to communicate with HBase. That means you can spin up a cluster of REST service

2 Just in case you’ve never encountered REST, here’s a nice introduction: Stefan Tilkov, “A Brief Introduction

to REST,” InfoQ, www.infoq.com/articles/rest-introduction.

REST? Really?

You refuse Java and reject REST? You’re incorrigible! Never fear, HBase has a solu- tion for you as well: Thrift. In practice, the REST service is rarely used for critical ap- plication paths. Instead, you’ll want to use the Thrift bindings. The next section covers exactly this: communicating with HBase from a Python application over Thrift.

151

HBase over REST

machines to host your cluster. Well, almost. The Scanner API is stateful and requires resource allocation, which happens only on the machine that receives the request. That means a client using the scanner must always return to the same REST host while performing that scan. Figure 6.1 loosely illustrates the network topology of a REST

gateway deployment.

The REST service also supports a number of response formats, controlled by the Content-Type request header. All endpoints support XML, JSON, and Protobufs. Many of the status and administrative endpoints also support plain text. The appro- priate header values are text/plain, text/xml, application/json, application/ x-protobuf, and application/octet-stream.

In document Hbase in Action (Page 175-178)