1.hbase Shell概述
Apache HBase Shell 是(J)Ruby的 IRB,其中添加了一些 HBase 特定命令。您可以在 IRB 中执行的任何操作,您都应该可以在 HBase Shell 中执行。
0. 首先创建hbase集群的操作用户 hbase_test
1.首先root用户在本地客户端添加hbase_test用户
[root@10-90-50-77-jhdxyjd ~]# useradd hbase_test
2.切换hbase集群的超级用户,创建/user/hbase_test目录,修改授权
[hdfs@10-90-50-77-jhdxyjd ~]# hdfs dfs -mkdir /user/hbase_test
[hdfs@10-90-50-77-jhdxyjd ~]# hdfs dfs -chown hbase_test:hbase_test /user/hbase_test
3.注意这里hbase没有开启授权管理,后面详细讲解。
1.开启hbase shell很简单,在装了Hbase的节点直接执行./hbase shell即可进入
Hbase官网Shell命令查看:Apache HBase ™ Reference Guide
2.Hbase shell中所有命令分类汇总
如上,hbase shell中help可以查看所有hbase命令。分类汇总展示。标红的要重点关注,如果不会用help 一下命令。
类别 | 命令名 | 介绍描述 | 语法 |
1.通用类型命令
(主要用来查看基本的hbase操作和集群基本信息) | status | 返回hbase集群的状态信息 | hbase(main):053:0> status 1 active master, 1 backup masters, 7 servers, 0 dead, 2.1429 average load Took 0.0082 seconds |
processlist | 查看regionser上的task列表,可以查看多种明细 | hbase(main):054:0> processlist hbase> processlist | |
table_help | 查看如何操作表 | table_help 会告诉你操作表的crud命令语法和演示 比如: Or, if you have already created the table, you can get a reference to it: hbase> t = get_table 't' You can do things like call 'put' on the table: | |
version | 返回hbase版本信息 | hbase(main):068:0> version 2.1.0-cdh6.1.0, rUnknown, Thu Dec 6 16:59:59 PST 2018 Took 0.0003 seconds | |
whoami | 查看当前hbase操作用户 | hbase(main):069:0> whoami hbase_test (auth:SIMPLE) groups: hbase_test Took 0.0137 seconds | |
2.namespace的所有操作命令 | create_namespace | 创建namespace,类似数据库 | create_namespace 'myns_test' |
describe_namespace | 查看namespace的信息 | hbase(main):070:0> describe_namespace 'myns_test' DESCRIPTION {NAME => 'myns_test', PROPERTY_NAME => 'PROPERTY_VALUE'} => 1 | |
drop_namespace | 删除namespace,前提必须先删除其中的表,否则异常 | drop_namespace 'myns_test' | |
alter_namespace | 修改namespace其中属性 | alter_namespace 'myns_test',{METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'} | |
list_namespace | 查看hbase中有个多少个namespace | hbase(main):071:0> list_namespace NAMESPACE default hbase myns_test | |
list_namespace _tables | 查看某个namespace中的所有表 | hbase(main):073:0> list_namespace_tables 'myns_test' TABLE t1 tb2 2 row(s) Took 0.0043 seconds => ["t1", "tb2"] | |
3.表的ddl语句 重点掌握表的操作 | alter, | alter修改表模式,表列族和表所有的属性。类似alter table | alter 't1', METHOD => 'table_conf_unset', NAME => 'hbase.hregion.majorcompaction' |
create drop drop_all, | 创建表 删除表 删除所有符合规则的表,正则匹配 | 具体参考下面详细介绍 create 't1', 'f1', 'f2', 'f3' drop 't1' hbase> drop_all 't.*' | |
list
| list列举出hbase中所有的表;也支持模糊匹配检索列举 | hbase> list hbase> list 'abc.*' hbase> list 'ns:abc.*' hbase> list 'ns:.*' | |
exists | 查看表是否存在,返回布尔值 | hbase(main):076:0> exists 't1' Table t1 does exist Took 0.0279 seconds => true hbase(main):077:0> exists 'ns1:t3' Table ns1:t3 does not exist => false | |
describe/desc | 都是查看表结构详细信息 | desc 'myns:t1' | |
disable, disable_all, enable, enable_all, | 设置表可用和不可用状态。同样disable_all是按正则匹配批量设置表的disable状态 | hbase> disable 't1' hbase> disable_all 't.*' | |
is_disabled, is_enabled, | 查看表是否可用,或者不可用,返回布尔值 | hbase> is_disabled 't1' hbase> is_disabled 'ns1:t1' | |
get_table, | 获取表,将其作为object对象返回,然后基于对象操作 |
hbase>t1.help | |
list_regions 一般查一个表的所有region,可以通过web 界面查看 | 以数组的形式列出特定表的所有region。 该命令显示服务器名称、rs名称、起始键、结束键、区域大小(以MB为单位)、请求数 和位置。 | hbase> list_regions 'table_name' hbase> list_regions 'table_name', 'server_name' hbase> list_regions 'table_name', {SERVER_NAME => 'server_name', LOCALITY_THRESHOLD => 0.8} hbase> list_regions 'table_name', {SERVER_NAME => 'server_name', LOCALITY_THRESHOLD => 0.8}, ['SERVER_NAME'] | |
locate_region, | 定位一个表的key所在region | hbase> locate_region 'tableName', 'key0' | |
show_filters | 查看hbase集群中所有的过滤器。过滤器用于get和scan命令中作为筛选数据的条件,类型关系型数据库中的where的作用,后面详解 | hbase(main):096:0> show_filtersDependentColumnFilter ............ | |
alter_status | 获取alter命令的状态 | hbase> alter_status 't1' hbase> alter_status 'ns1:t1' hbase(main):098:0> alter_status 't1' 1/1 regions updated. Done. Took 1.0209 seconds | |
alter_async, clone_table_schema | 克隆表模式类似like | hbase> clone_table_schema 'table_name', 'new_table_name' hbase> clone_table_schema 'table_name', 'new_table_name', false 注意,带false参数表示不保存表的拆分split模式。 | |
4.表的dml语句 | count, | 统计表中行数,类似hive、数据库里count语法,支持各种复杂聚合统计。 | hbase> count 'ns1:t1' hbase> count 't1' hbase> count 't1', INTERVAL => 100000 hbase> count 't1', CACHE => 1000 hbase> count 't1', INTERVAL => 10, CACHE => 1000 hbase> count 't1', FILTER => " (QualifierFilter (>=, 'binary:xyz')) AND (TimestampsFilter ( 123, 456))" hbase> count 't1', COLUMNS => ['c1', 'c2'], STARTROW => 'abc', STOPROW => 'xyz' |
put, get, append, | put往指定的表,行,列族插入一个cells值。 get是获取指定指定行的值 append是给指定表,行,列族上cells的value追加值. | hbase(main):119:0> put 'tt1', 'rowkey1','f1:name','tom' hbase(main):120:0> get 'tt1','rowkey1' COLUMN CELL f1:name timestamp=1631876854951, value=tom 1 row(s) hbase(main):121:0> append 'tt1', 'rowkey1','f1:name','tom2' CURRENT VALUE = tomtom2 hbase(main):122:0> get 'tt1','rowkey1' COLUMN CELL f1:name timestamp=1631876898153, value=tomtom2 | |
delete, deleteall, | delete删除一个指定表,指定列族,指定列的一个cell值。 而deleteall指定删除表,列族或者时间戳的所有celles值。 |
| |
scan,重点,后详 | 扫描一张表,通过各种属性方式设置扫描规则。后面详解 | hbase> scan 'hbase:meta' hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'} hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804000, 1303668904000]} hbase> scan 't1', {REVERSED => true} hbase> scan 't1', {ALL_METRICS => true} | |
truncate, truncate_preserve | 清空表,保留表结构,实际本质是先disabled表,删除表,再重建表。 | ||
get_counter,
| get_counter获取计数器 | # 点击量:日、周、月 create 'counters', 'daily', 'weekly', 'monthly' incr 'counters', '20110101', 'daily:hits', 1 incr 'counters', '20110101', 'daily:hits', 1 get_counter 'counters', '20110101', 'daily:hits' | |
incr, | 注意:incr 可以对不存的行键操作,如果行键已经存在会报错,如果使用put修改了incr的值再使用incr也会报错 | incr# 语法 incr '表名', '行键', '列族:列名', 步长值 hbase(main):171:0> incr 'ns1:t1','r3','cf1:name2',10
| |
get_splits, | 获取表的分隔符 | hbase(main):176:0> create 'ns1:t3', 'f1', SPLITS => ['10', '20', '30', '40'] Created table ns1:t3 => Hbase::Table - ns1:t3 hbase(main):177:0> get_splits 'ns1:t3' Total number of splits = 5 10 20 30 40 | |
hbase集群工具类命令tools重要,运维常用,后面详细展开讲述 |
assign,balance_switch,balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, cleaner_chore_enabled, cleaner_chore_run, cleaner_chore_switch, clear_block_cache, clear_compaction_queues, clear_deadservers, close_region, compact,compact_rs,compaction_state, flush, is_in_maintenance_mode, list_deadservers,major_compact,merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, splitormerge_enabled, splitormerge_switch,stop_master,stop_regionserver,trace, unassign,wal_roll, zk_dump | ||
安全权限管理类命令 | grant,list_security_capabilities,revoke,user_permission security权限管理,运维会用。 | ||
程序类procedures | list_locks, list_procedures | ||
visibility labels | add_labels,clear_auths,get_auths,list_labels, set_auths,set_visibility | ||
rsgroup相关操作命令 | add_rsgroup,balance_rsgroup,get_rsgroup,get_server_rsgroup, get_table_rsgroup, list_rsgroups, move_namespaces_rsgroup, move_servers_namespaces_rsgroup, move_servers_rsgroup, move_servers_tables_rsgroup, move_tables_rsgroup, remove_rsgroup, remove_servers_rsgroup | ||
空间配额类命令quotas | 一般生产对租户对做配额管理,防止单个用户占用大量的资源,运维命令list_quota_snapshots, list_quota_table_sizes, list_quotas, list_snapshot_sizes, set_quota | ||
配置文件更新命令 | update_all_config,update_config 运维命令 | ||
snapshots | clone_snapshot, delete_all_snapshot, delete_snapshot, delete_table_snapshots, list_snapshots, list_table_snapshots, restore_snapshot, snapshot | ||
replication | add_peer, append_peer_namespaces, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config, list_peer_configs, list_peers, list_replicated_tables, remove_peer, remove_peer_namespaces, remove_peer_tableCFs, set_peer_bandwidth, set_peer_exclude_namespaces, set_peer_exclude_tableCFs, set_peer_namespaces, set_peer_replicate_all, set_peer_serial, set_peer_tableCFs, show_peer_tableCFs,update_peer_config |
3.Hbase Shell命令操作与实例演示
3.0 hbase 神器 help
3.01.直接列举出出所有hbase shell的命令,分类归总
hbase(main):019:0> help
HBase Shell, version 2.1.0-cdh6.1.0, rUnknown, Thu Dec 6 16:59:59 PST 2018
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.
这里是hbase所有命令,并且按类分组了。
COMMAND GROUPS:
Group name: general //通用命令
Commands: processlist, status, table_help, version, whoami
Group name: ddl
Commands: alter, alter_async, alter_status, clone_table_schema, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, list_regions, locate_region, show_filters
Group name: namespace
Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables
Group name: dml
Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve
Group name: tools
Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, cleaner_chore_enabled, cleaner_chore_run, cleaner_chore_switch, clear_block_cache, clear_compaction_queues, clear_deadservers, close_region, compact, compact_rs, compaction_state, flush, is_in_maintenance_mode, list_deadservers, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, splitormerge_enabled, splitormerge_switch, stop_master, stop_regionserver, trace, unassign, wal_roll, zk_dump
Group name: replication
Commands: add_peer, append_peer_namespaces, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config, list_peer_configs, list_peers, list_replicated_tables, remove_peer, remove_peer_namespaces, remove_peer_tableCFs, set_peer_bandwidth, set_peer_exclude_namespaces, set_peer_exclude_tableCFs, set_peer_namespaces, set_peer_replicate_all, set_peer_serial, set_peer_tableCFs, show_peer_tableCFs, update_peer_config
Group name: snapshots
Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, delete_table_snapshots, list_snapshots, list_table_snapshots, restore_snapshot, snapshot
Group name: configuration
Commands: update_all_config, update_config
Group name: quotas
Commands: list_quota_snapshots, list_quota_table_sizes, list_quotas, list_snapshot_sizes, set_quota
Group name: security
Commands: grant, list_security_capabilities, revoke, user_permission
Group name: procedures
Commands: list_locks, list_procedures
Group name: visibility labels
Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility
Group name: rsgroup
Commands: add_rsgroup, balance_rsgroup, get_rsgroup, get_server_rsgroup, get_table_rsgroup, list_rsgroups, move_namespaces_rsgroup, move_servers_namespaces_rsgroup, move_servers_rsgroup, move_servers_tables_rsgroup, move_tables_rsgroup, remove_rsgroup, remove_servers_rsgroup
SHELL USAGE:
Quote all names in HBase Shell such as table and column names. Commas delimit
command parameters. Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:
{'key1' => 'value1', 'key2' => 'value2', ...}
and are opened and closed with curley-braces. Key/values are delimited by the
'=>' character combination. Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type
'Object.constants' to see a (messy) list of all constants in the environment.
If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:
hbase> get 't1', "key\x03\x3f\xcd"
hbase> get 't1', "key\003\023\011"
hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"
The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/book.html
3.0.2查看某个命令的详细使用
hbase(main):020:0> help 'create'
Creates a table. Pass a table name, and a set of column family
specifications (at least one), and, optionally, table configuration.
Column specification can be a simple string (name), or a dictionary
(dictionaries are described below in main help output), necessarily
including NAME attribute.
Examples:
Create a table with namespace=ns1 and table qualifier=t1
hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5}
Create a table with namespace=default and table qualifier=t1
hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
hbase> # The above in shorthand would be the following:
hbase> create 't1', 'f1', 'f2', 'f3'
hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}
hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 1000000, MOB_COMPACT_PARTITION_POLICY => 'weekly'}
Table configuration options can be put at the end.
Examples:
hbase> create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40']
hbase> create 't1', 'f1', SPLITS => ['10', '20', '30', '40']
hbase> create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'
hbase> create 't1', {NAME => 'f1', VERSIONS => 5}, METADATA => { 'mykey' => 'myvalue' }
hbase> # Optionally pre-split the table into NUMREGIONS, using
hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname)
hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', REGION_REPLICATION => 2, CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}}
hbase> create 't1', {NAME => 'f1', DFS_REPLICATION => 1}
You can also keep around a reference to the created table:
hbase> t1 = create 't1', 'f1'
Which gives you a reference to the table named 't1', on which you can then
call methods.
3.1 namespace概述与创建更新删除
namespace命名空间是表的逻辑分组,类似于关系数据库系统中的数据库。这种抽象为即将到来的多租户相关功能奠定了基础。简单理解就是hbase中的数据库,隔离用户,做如下配额,安全管理等。
-
配额管理 ( HBASE-8410 ) - 限制命名空间可以消耗的资源量(即区域、表)。
-
命名空间安全管理 ( HBASE-9206 ) - 为租户提供另一个级别的安全管理。
-
区域服务器组 ( HBASE-6721 ) - 命名空间/表可以固定到 RegionServers 的子集上,从而保证粗略的隔离级别。
注意:hbase集群在创建时,默认预定义了两个特殊的命名空间
-
hbase - 系统命名空间,用于包含 HBase 内部表
-
default - 没有明确指定命名空间的表将自动落入这个命名空间
尖叫总结:实际生产中很少通过hbase shell去操作hbase,更多的是学习测试,问题排查等等才会使用到hbase shell ,hbase总的来说就是写数据,然后查询。 前者是通过API bulkload等形式写数据,后者通过api调用查询。
3.1.1 namespace的操作
1.创建namespace
hbase(main):008:0> create_namespace 'myns_test'
2.在指定的namespace中创建一个表
hbase(main):009:0> create 'myns_test:t1','cl1'
Created table myns_test:t1
=> Hbase::Table - myns_test:t1
3.删除一个namespace,前提必须要先把其中的表删完,不然报错。跟数据库一样
hbase(main):010:0> drop_namespace 'myns_test'
ERROR: org.apache.hadoop.hbase.constraint.ConstraintException: Only empty namespaces can be removed. Namespace myns_test has 1 tables
at org.apache.hadoop.hbase.master.procedure.DeleteNamespaceProcedure.prepareDelete(DeleteNamespaceProcedure.java:217)
at org.apache.hadoop.hbase.master.procedure.DeleteNamespaceProcedure.executeFromState(DeleteNamespaceProcedure.java:78)
at org.apache.hadoop.hbase.master.procedure.DeleteNamespaceProcedure.executeFromState(DeleteNamespaceProcedure.java:45)
at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:189)
at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:957)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1835)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1595)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:80)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2140)
For usage try 'help "drop_namespace"'
Took 0.7591 seconds
5.修改namespace的属性
hbase(main):011:0> alter_namespace 'myns_test',{METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}
Took 0.5489 seconds
7.查看所有的namespace,看系统预定义与默认的defalut/hbase namespace。如果创建表不指定namespace,则在默认default里。
hbase(main):014:0> create_namespace 'myns_test1'
Took 0.2504 seconds
hbase(main):015:0> list_namespace
NAMESPACE
default
hbase
myns_test
myns_test1
4 row(s)
hbase(main):016:0> drop_namespace 'myns_test1' #删除一个空的namespace
Took 0.2248 seconds
hbase(main):017:0> list_namespace
NAMESPACE
default
hbase
myns_test
3 row(s)
Took 0.0113 seconds
8.查看namespace的详细信息
hbase(main):018:0> describe_namespace 'myns_test'
DESCRIPTION
{NAME => 'myns_test', PROPERTY_NAME => 'PROPERTY_VALUE'}
Took 0.0058 seconds
=> 1
9.列出某个namespace下所有的表
hbase(main):023:0> list_namespace_tables 'myns_test'
TABLE
t1
1 row(s)
Took 0.0237 seconds
=> ["t1"]
尖叫总结:namespace了解即可,实际生产中用的很少,一般也都是运维同学给创建好。开发更多的是表级别的操作。
3.2hbase 表的CRUD
创建表,必须传递两个值,一个是表名,一个是列族名。其他可选的表的配置可加可不加,其他都是对表(实际列族)的约束,根据实际生产要求添加,比如压缩,时间戳,版本等等。且属性可以单独指定,不指定的属性就是默认值。
3.2.1列举表,查看表结构等
1.列出某个namespace下所有的表
hbase(main):023:0> list_namespace_tables 'myns_test'
TABLE
t1
1 row(s)
Took 0.0237 seconds
=> ["t1"]
2.列出所有表,所有namespace下所有表。
hbase(main):024:0> list
TABLE
myns_test:t1
test
2 row(s)
Took 0.0041 seconds
=> ["myns_test:t1", "test"]
3.查看表结构,describe会显示表的结构,默认值,参数等。类似show create table
hbase(main):027:0> describe 't1'
Table t1 is ENABLED
t1
COLUMN FAMILIES DESCRIPTION
{NAME => 'f1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS
=> 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', RE
PLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_
WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '6553
6'}
{NAME => 'f2', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS
=> 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', RE
PLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_
WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '6553
6'}
{NAME => 'f3', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS
=> 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', RE
PLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_
WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '6553
6'}
3 row(s)
Took 0.1012 seconds
3.2.2各种形式的创建表,常用
1.基础建表语句,要有表名+列族名,如下在默认namespace中创建表tb1,列族名cf。同时创建一个表tb2,3个列族。注意观察两个表的表结构有啥不一样。
1.创建一个列族的表tb1
hbase(main):028:0> create 'tb1','cf'
Created table tb1
Took 0.7221 seconds
=> Hbase::Table - tb1
hbase(main):029:0> describe 'tb1'
Table tb1 is ENABLED
tb1
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS
=> 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', RE
PLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_
WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '6553
6'}
2.创建3个列族的表,tb2
hbase(main):036:0> create 'tb2','cf1','cf2','cf3'
Created table tb2
hbase(main):038:0> describe 'tb2'
Table tb2 is ENABLED
tb2
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELL
S => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', R
EPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON
_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '655
36'}
{NAME => 'cf2', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELL
S => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', R
EPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON
_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '655
36'}
{NAME => 'cf3', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELL
S => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', R
EPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON
_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '655
36'}
如上,我们简单建表后,describe以后我们发现,表tb1有如下默认配置属性。
- NAME => 'cf', 列族名
- VERSIONS => '1', 版本数,默认数据存放一个版本,多余删除,实际生产常用参数,可以设置更多。
- EVICT_BLOCKS_ON_CLOSE => 'false',是否在关闭时从blockcache中取出缓存块。
- NEW_VERSION_BEHAVIOR => 'false',可选新版本行为,hbase2特性,与删除有关。后续讲解
- KEEP_DELETED_CELLS => 'FALSE',列族是否可以选择保留已删除的单元格。如果true情况下,仍然可以检索已删除的单元格。默认一般删除了就不保留了,false。具体可以参考:Apache HBase ™ Reference Guide
- CACHE_DATA_ON_WRITE => 'false',写入缓存数据
- DATA_BLOCK_ENCODING => 'NONE', 数据块block的编码方式设置,HBase目前提供了四种常用的编码方式: Prefix_Tree、 Diff 、 Fast_Diff 、Prefix。
- TTL => 'FOREVER', 全称time to live,列族可以以设置一个以秒为单位的 TTL 长度,一旦到了过期时间,HBase 会自动删除行。这适用于一行的所有版本——甚至是当前版本。在 HBase 中为行编码的 TTL 时间以 UTC 指定。非常常用,生产一般设置TTL,相当于数仓里的生命周期,比如一个月等,不然数据一直膨胀。具体可以参考:Apache HBase ™ Reference Guide
- MIN_VERSIONS => '0',如果 HBase 中的表设置了 TTL 的时候,MIN_VERSIONS 才会起作用。
- REPLICATION_SCOPE => '0',REPLICATION_SCOPE 是列族级别属性,其值可以是 0 或 1。值 0 表示禁用复制,而 1 表示启用复制。这个一般默认值0。关于hbase的复制可以参考这两篇文章后续详细介绍:https://clouderatemp.wpengine.com/blog/2012/07/hbase-replication-overview-2/ Apache HBase Replication: Operational Overview - Cloudera Blog
- BLOOMFILTER => 'ROW',布隆过滤器级别,默认行级别
- CACHE_INDEX_ON_WRITE => 'false',写入缓存索引
- IN_MEMORY => 'false',是否将列族存储在内存中,HBase 可以选择一个列族赋予更高的优先级缓存,激进缓存(表示优先级更高),IN_MEMORY 默认是false。如果设置为true,HBase 会尝试将整个列族保存在内存中,只有在需要保存是才会持久化写入磁盘。但是在运行时 HBase 会尝试将整张表加载到内存里。
- CACHE_BLOOMS_ON_WRITE => 'false',写入时缓存爆发
- PREFETCH_BLOCKS_ON_OPEN => 'false',在打开状态下预取块,默认false
- COMPRESSION => 'NONE', 配置数据是否压缩,以及压缩算法,如snappy等,针对列族进行配置,一张表多个列族可以不同列族不同压缩算法。
- BLOCKCACHE => 'true', 块缓存是否开启,默认开启,后续介绍
- BLOCKSIZE => '65536'} 设置HFile数据块大小(默认64kb)一般不改,即使修改也是集群层面的统一设置,很少设置单个表,单个列族的属性。
尖叫总结:注意通过上面我们创建的3个列族的tb2表我们可以看出,如上表的属性都是针对列族的,所有的操作属性都是列族级别的。我们可以针对列族设置,也可以使用默认值。
2.指定列族属性创建表。
1.创建表使用NAME属性值指定列族名
hbase(main):040:0> create 't4', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
hbase(main):040:0> create 't4', 'f1','f2','f3'
注意这两种创建的表结构都是一样的
2.其他指定列族属性创建表
hbase> create 't1', 'f1', 'f2', 'f3'
hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}
hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 1000000, MOB_COMPACT_PARTITION_POLICY => 'weekly'}
其实列族的属性有很多,上面是默认的,可以通过创建表时指定很多属性,比如预分区。具体参考hbase官网Apache HBase ™ Reference Guide
hbase> create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40']
hbase(main):047:0> describe 'tb12'
Table tb12 is ENABLED
tb12
COLUMN FAMILIES DESCRIPTION
{NAME => 'f1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS
=> 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', RE
PLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_
WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '6553
6'}
1 row(s)
Took 0.0152 seconds
hbase> create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'
hbase> create 't1', {NAME => 'f1', VERSIONS => 5}, METADATA => { 'mykey' => 'myvalue' }
hbase> # Optionally pre-split the table into NUMREGIONS, using
hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname)
hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', REGION_REPLICATION => 2, CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}}
hbase> create 't1', {NAME => 'f1', DFS_REPLICATION => 1}