Wu Yudong's Blog

作为开发人员首先要适应hbase的特性，学习hbase的逻辑数据模型，访问hbase的各种方式，以及如何使用这些API的细节。另一个目标是进行hbase的模式设计

为了实现这些目标，可以考虑实现一个小型的应用系统，克隆版的Twitter，取名为TwitBase

TwitBase存储3种简单的核心数据元素：用户（user）、推贴（twit）和关系（relationship）

创建表

wu@ubuntu:~/opt/hbase-0.92.1$ bin/hbase shell
HBase Shell; enter ‘help<RETURN>’ for list of supported commands.
Type “exit<RETURN>” to leave the HBase Shell
Version 0.92.1, r1298924, Fri Mar 9 16:58:34 UTC 2012

hbase(main):001:0> create ‘users’,’info’
0 row(s) in 1.6030 seconds

其中users是表名，info为列族，创建表时至少指定一个列族

检查表模式

使用list命令查看创建表是否成功：

hbase(main):002:0> list
TABLE
users
1 row(s) in 0.0350 seconds

使用describe命令查看表的所有默认参数：

hbase(main):003:0> describe ‘users’
DESCRIPTION ENABLED
{NAME => ‘users’, FAMILIES => [{NAME => ‘info’, BLOOMFILTER => ‘NONE’, REPLICATION_SCOPE = true
> ‘0’, VERSIONS => ‘3’, COMPRESSION => ‘NONE’, MIN_VERSIONS => ‘0’, TTL => ‘2147483647’, B
LOCKSIZE => ‘65536’, IN_MEMORY => ‘false’, BLOCKCACHE => ‘true’}]}
1 row(s) in 0.0650 seconds

建立连接

接下来将使用面向java的客户端操作hbase，在（使用Eclipse搭建HBase开发环境）一文中已经介绍过环境搭建。打开users表连接的代码如下：

HTableInterface usersTable = new HTable(“users”);

连接管理

创建一张表实例是一个开销很大的操作，与直接创造表句柄相比，使用连接池操作更好。连接从连接池分配，然后再返回到连接池，实践中，使用HTablePool比使用HTable更加常见

HTablePool pool = new HTablePool();
HTableInterface usersTable = pool.getTable(“users”);
//表操作……
usersTable.close()

当完成工作关闭表的时候，连接资源会返回到连接池

数据操作

hbase表的行有唯一标志符，叫做行键，就像关系型数据库的主键。hbase里面每行的行健值是不同的，每次访问表中的数据都是从行键开始。

与数据有关的hbase api命令可以参考《HBase基本操作》

hbase中的所有数据都是作为原始数据使用字节数组的形式存储的，java客户端函数库提供了一个公共类Bytes用来转换各种类型

现在往数据表里面添加数据：

Put p = new Put(Bytes.toBytes("TheRealMT"));
p.add(Bytes.toBytes("info"),
	Bytes.toBytes("name"),
	Bytes.toBytes("wuyudong"));
p.add(Bytes.toBytes("info"),
	Bytes.toBytes("email"),
	Bytes.toBytes("wuyudong@wu.org"));
p.add(Bytes.toBytes("info"),
	Bytes.toBytes("password"),
	Bytes.toBytes("wuyudong"));

hbase使用坐标来定位数据，行键是第一个坐标，列族是下一个，列族用作数据坐标时，表示一组列，再下一个坐标是列限定符（简称为列或标志），上面代码中的列限定符分别是：name,email,password

写数据到hbase中的最后一步是提交命令给表：

HTablePool pool = new HTablePool();
HTableInterface userTable = pool.getTable("users");
Put p = new Put(Bytes.toBytes("TheRealMT"));
p.add(...);
userTable.put(p);
userTable.close();

修改数据

与存储数据一样，使用Put，在正确的坐标是给出数据，提交到表

Put p = new Put(Bytes.toBytes("TheRealMT"));
p.add(Bytes.toBytes("info"),
	Bytes.toBytes("password"),
	Bytes.toBytes("abx123"));
userTable.put(p);

读数据

创建一个Get命令实例，指定单元后提交到表

Get get = new Get(Bytes.toBytes(“TheRealMT”));
Result r = userTable.get(g);

该表会返回一个包含数据的Result 实例。实例中包含行中所有列族的所有列。这可能大大超过你所需要的。你可以在Get 实例中放置限制条件来减少返回的数据量。为了返回列password，可以执行命令addColumn()。对于列族同样可以执行命令addFamily()，下面的例子可以返回指定列族的所有列：

Get get = new Get(Bytes.toBytes(“TheRealMT”));
g.addColumn(Bytes.toBytes(“info”),
Bytes.toBytes(“password”));
Result r = userTable.get(g);

检索特定值，从字节转换回字符串，如下所示：

Get g = new Get(Bytes.toBytes(“TheRealMT”));
g.addFamily(Bytes.toBytes(“info”));
byte[] b = r.getValue(Bytes.toBytes(“info”), Bytes.toBytes(“email”));
String email = Bytes.toString(b);

删除数据

从HBase 中删除数据和存储数据工作方式类似。基于一个行键创建一个Delete命令实例：

Delete d = new Delete(Bytes.toBytes(“TheRealMT”));
usersTable.delete(d);

也可以指定更多坐标删除行的一部分

Delete d = new Delete(Bytes.toBytes(“TheRealMT”));
d.deleteColumns(
Bytes.toBytes(“info”),
Bytes.toBytes(“email”));
usersTable.delete(d);

deleteColumns()方法从行中删除一个单元。这和deleteColumn()方法不同（注意方法名字尾部少了s）。deleteColumn()方法删除单元的内容。

HBase数据操作实战

Comments