当前位置:首页 » 《随便一记》 » 正文

Hbase 表操作命令深度详解与案例剖析_涤生手记

7 人参与  2021年11月19日 10:43  分类 : 《随便一记》  评论

点击全文阅读


1.hbase Shell概述

      Apache HBase Shell 是(J)Ruby的 IRB,其中添加了一些 HBase 特定命令。您可以在 IRB 中执行的任何操作,您都应该可以在 HBase Shell 中执行。

0. 首先创建hbase集群的操作用户 hbase_test

1.首先root用户在本地客户端添加hbase_test用户
[root@10-90-50-77-jhdxyjd ~]# useradd hbase_test

2.切换hbase集群的超级用户,创建/user/hbase_test目录,修改授权
[hdfs@10-90-50-77-jhdxyjd ~]# hdfs dfs -mkdir /user/hbase_test
[hdfs@10-90-50-77-jhdxyjd ~]# hdfs dfs -chown hbase_test:hbase_test /user/hbase_test

3.注意这里hbase没有开启授权管理,后面详细讲解。

1.开启hbase shell很简单,在装了Hbase的节点直接执行./hbase shell即可进入

     Hbase官网Shell命令查看:Apache HBase ™ Reference Guide

2.Hbase shell中所有命令分类汇总

如上,hbase shell中help可以查看所有hbase命令。分类汇总展示。标红的要重点关注,如果不会用help 一下命令。

类别命令名介绍描述语法

1.通用类型命令

(主要用来查看基本的hbase操作和集群基本信息)

 status返回hbase集群的状态信息hbase(main):053:0> status
1 active master, 1 backup masters, 7 servers, 0 dead, 2.1429 average load
Took 0.0082 seconds             
processlist查看regionser上的task列表,可以查看多种明细

hbase(main):054:0> processlist
0 tasks as of: 2021-09-17 17:33:33
No general tasks currently running.
Took 0.5326 seconds  

 hbase> processlist
  hbase> processlist 'all'
  hbase> processlist 'general'
  hbase> processlist 'handler'
  hbase> processlist 'rpc'
  hbase> processlist 'operation'
  hbase> processlist 'all','host187.example.com'
  hbase> processlist 'all','host187.example.com,16020'
  hbase> processlist 'all','host187.example.com,16020,1289493121758'

table_help查看如何操作表table_help 会告诉你操作表的crud命令语法和演示

比如:
   hbase> t = create 't', 'cf'

Or, if you have already created the table, you can get a reference to it:

   hbase> t = get_table 't'

You can do things like call 'put' on the table:
 

version返回hbase版本信息hbase(main):068:0> version
2.1.0-cdh6.1.0, rUnknown, Thu Dec  6 16:59:59 PST 2018
Took 0.0003 seconds 
whoami查看当前hbase操作用户hbase(main):069:0> whoami
hbase_test (auth:SIMPLE)
    groups: hbase_test
Took 0.0137 seconds
2.namespace的所有操作命令

 create_namespace

创建namespace,类似数据库create_namespace 'myns_test'    

describe_namespace

查看namespace的信息hbase(main):070:0> describe_namespace 'myns_test'
DESCRIPTION                                                        
{NAME => 'myns_test', PROPERTY_NAME => 'PROPERTY_VALUE'}                                              
=> 1

drop_namespace

删除namespace,前提必须先删除其中的表,否则异常drop_namespace 'myns_test'

alter_namespace

修改namespace其中属性 alter_namespace 'myns_test',{METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}
list_namespace查看hbase中有个多少个namespacehbase(main):071:0> list_namespace
NAMESPACE                                                          
default                                                                     
hbase                                                                      
myns_test                                                                

list_namespace

_tables

查看某个namespace中的所有表hbase(main):073:0> list_namespace_tables 'myns_test'
TABLE                                                                     
t1                                                                             
tb2                                                                           
2 row(s)
Took 0.0043 seconds                                              
=> ["t1", "tb2"]

3.表的ddl语句

重点掌握表的操作

alter,
 

alter修改表模式,表列族和表所有的属性。类似alter tablealter 't1', METHOD => 'table_conf_unset', NAME => 'hbase.hregion.majorcompaction'

create

drop 

drop_all,

创建表

删除表

删除所有符合规则的表,正则匹配

具体参考下面详细介绍

create 't1', 'f1', 'f2', 'f3'

drop  't1'

hbase> drop_all 't.*'
hbase> drop_all 'ns:t.*'
hbase> drop_all 'ns:.*

list

list列举出hbase中所有的表;也支持模糊匹配检索列举  hbase> list
  hbase> list 'abc.*'
  hbase> list 'ns:abc.*'
  hbase> list 'ns:.*'
exists查看表是否存在,返回布尔值hbase(main):076:0>  exists 't1'
Table t1 does exist 
Took 0.0279 seconds                                              
=> true
hbase(main):077:0>  exists 'ns1:t3'
Table ns1:t3 does not exist                                     
=> false
describe/desc都是查看表结构详细信息desc 'myns:t1'

disable,

disable_all,

enable,

enable_all,

设置表可用和不可用状态。同样disable_all是按正则匹配批量设置表的disable状态

hbase> disable 't1'
hbase> disable 'ns1:t1'

hbase> disable_all 't.*'
hbase> disable_all 'ns:t.*'
hbase> disable_all 'ns:.*'

is_disabled,

is_enabled,

查看表是否可用,或者不可用,返回布尔值  hbase> is_disabled 't1'
  hbase> is_disabled 'ns1:t1'
get_table,获取表,将其作为object对象返回,然后基于对象操作


  hbase> t1 = get_table 't1'
  hbase> t1 = get_table 'ns1:t1'

  hbase>t1.help

list_regions

一般查一个表的所有region,可以通过web

界面查看

以数组的形式列出特定表的所有region。

该命令显示服务器名称、rs名称、起始键、结束键、区域大小(以MB为单位)、请求数

和位置。

 hbase> list_regions 'table_name'
 hbase> list_regions 'table_name', 'server_name'
    hbase> list_regions 'table_name', {SERVER_NAME => 'server_name', LOCALITY_THRESHOLD => 0.8}
     hbase> list_regions 'table_name', {SERVER_NAME => 'server_name', LOCALITY_THRESHOLD => 0.8}, ['SERVER_NAME']
   

locate_region,

定位一个表的key所在regionhbase> locate_region 'tableName', 'key0'
show_filters查看hbase集群中所有的过滤器。过滤器用于get和scan命令中作为筛选数据的条件,类型关系型数据库中的where的作用,后面详解

hbase(main):096:0> show_filtersDependentColumnFilter                       
KeyOnlyFilte

............

alter_status获取alter命令的状态hbase> alter_status 't1'
hbase> alter_status 'ns1:t1'
hbase(main):098:0>  alter_status 't1'
1/1 regions updated.
Done.
Took 1.0209 seconds  

alter_async, 

clone_table_schema

克隆表模式类似likehbase> clone_table_schema 'table_name', 'new_table_name'
hbase> clone_table_schema 'table_name', 'new_table_name', false  注意,带false参数表示不保存表的拆分split模式。
4.表的dml语句count,统计表中行数,类似hive、数据库里count语法,支持各种复杂聚合统计。     hbase> count 'ns1:t1'
 hbase> count 't1'
 hbase> count 't1', INTERVAL => 100000
 hbase> count 't1', CACHE => 1000
 hbase> count 't1', INTERVAL => 10, CACHE => 1000
 hbase> count 't1', FILTER => "
    (QualifierFilter (>=, 'binary:xyz')) AND (TimestampsFilter ( 123, 456))"
 hbase> count 't1', COLUMNS => ['c1', 'c2'], STARTROW => 'abc', STOPROW => 'xyz'

put,

get,

append,

put往指定的表,行,列族插入一个cells值。

get是获取指定指定行的值

append是给指定表,行,列族上cells的value追加值.

hbase(main):119:0> put 'tt1', 'rowkey1','f1:name','tom'
hbase(main):120:0> get 'tt1','rowkey1'
COLUMN                         CELL                               
 f1:name                       timestamp=1631876854951, value=tom                 
1 row(s)                                                                   
hbase(main):121:0> append  'tt1', 'rowkey1','f1:name','tom2'
CURRENT VALUE = tomtom2
hbase(main):122:0> get 'tt1','rowkey1'
COLUMN                         CELL                              
 f1:name                       timestamp=1631876898153, value=tomtom2       
  

delete,

deleteall,

delete删除一个指定表,指定列族,指定列的一个cell值。

而deleteall指定删除表,列族或者时间戳的所有celles值。


  hbase> delete 'ns1:t1', 'r1', 'c1', ts1
  hbase> delete 't1', 'r1', 'c1', ts1
  hbase> delete 't1', 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}


  hbase> deleteall 'ns1:t1', 'r1'
  hbase> deleteall 't1', 'r1'
  hbase> deleteall 't1', 'r1', 'c1'
  hbase> deleteall 't1', 'r1', 'c1', ts1
  hbase> deleteall 't1', 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}


 

scan,重点,后详
 

扫描一张表,通过各种属性方式设置扫描规则。后面详解  hbase> scan 'hbase:meta'
  hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'}
  hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
  hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804000, 1303668904000]}
  hbase> scan 't1', {REVERSED => true}
  hbase> scan 't1', {ALL_METRICS => true}

truncate,

truncate_preserve

清空表,保留表结构,实际本质是先disabled表,删除表,再重建表。

get_counter,

get_counter获取计数器# 点击量:日、周、月
create 'counters', 'daily', 'weekly', 'monthly'
incr 'counters', '20110101', 'daily:hits', 1
incr 'counters', '20110101', 'daily:hits', 1
get_counter 'counters', '20110101', 'daily:hits'
 
incr,注意:incr 可以对不存的行键操作,如果行键已经存在会报错,如果使用put修改了incr的值再使用incr也会报错

incr# 语法 incr '表名', '行键', '列族:列名', 步长值

hbase(main):171:0> incr 'ns1:t1','r3','cf1:name2',10
COUNTER VALUE = 20


hbase(main):172:0> incr 'ns1:t1','r3','cf1:name2',10
COUNTER VALUE = 30


hbase(main):173:0> get 'ns1:t1','r3'
COLUMN                         CELL                              
 cf1:name2                     timestamp=1631878212258, value=\x00\x00\x00\x00\x00\x00\x00\x1E             

get_splits,获取表的分隔符hbase(main):176:0> create 'ns1:t3', 'f1', SPLITS => ['10', '20', '30', '40']
Created table ns1:t3                                                                                 
=> Hbase::Table - ns1:t3
hbase(main):177:0> get_splits 'ns1:t3'
Total number of splits = 5
10
20
30
40
hbase集群工具类命令tools重要,运维常用,后面详细展开讲述

assign,balance_switch,balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, cleaner_chore_enabled, cleaner_chore_run, cleaner_chore_switch, clear_block_cache, clear_compaction_queues, clear_deadservers,

close_region, compact,compact_rs,compaction_state, flush, is_in_maintenance_mode, list_deadservers,major_compact,merge_region,

move, normalize, normalizer_enabled, normalizer_switch, split, splitormerge_enabled, splitormerge_switch,stop_master,stop_regionserver,trace, unassign,wal_roll, zk_dump

安全权限管理类命令

grant,list_security_capabilities,revoke,user_permission

security权限管理,运维会用。

程序类procedures

list_locks,

list_procedures

visibility labels

add_labels,clear_auths,get_auths,list_labels, set_auths,set_visibility

rsgroup相关操作命令

add_rsgroup,balance_rsgroup,get_rsgroup,get_server_rsgroup, get_table_rsgroup,

list_rsgroups, move_namespaces_rsgroup, move_servers_namespaces_rsgroup, move_servers_rsgroup, move_servers_tables_rsgroup, move_tables_rsgroup, remove_rsgroup, remove_servers_rsgroup

空间配额类命令quotas

一般生产对租户对做配额管理,防止单个用户占用大量的资源,运维命令list_quota_snapshots, list_quota_table_sizes,

list_quotas, list_snapshot_sizes, set_quota

配置文件更新命令

 update_all_config,update_config

运维命令
 snapshots

clone_snapshot, delete_all_snapshot, delete_snapshot, delete_table_snapshots, list_snapshots,

list_table_snapshots, restore_snapshot, snapshot

replication

add_peer, append_peer_namespaces, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config,

list_peer_configs, list_peers, list_replicated_tables,

remove_peer, remove_peer_namespaces, remove_peer_tableCFs, set_peer_bandwidth, set_peer_exclude_namespaces, set_peer_exclude_tableCFs, set_peer_namespaces, set_peer_replicate_all, set_peer_serial, set_peer_tableCFs, show_peer_tableCFs,update_peer_config

 3.Hbase Shell命令操作与实例演示

3.0 hbase 神器 help

3.01.直接列举出出所有hbase shell的命令,分类归总

hbase(main):019:0> help
HBase Shell, version 2.1.0-cdh6.1.0, rUnknown, Thu Dec  6 16:59:59 PST 2018
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.
这里是hbase所有命令,并且按类分组了。
COMMAND GROUPS:  
  Group name: general  //通用命令
  Commands: processlist, status, table_help, version, whoami

  Group name: ddl
  Commands: alter, alter_async, alter_status, clone_table_schema, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, list_regions, locate_region, show_filters

  Group name: namespace
  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

  Group name: dml
  Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve

  Group name: tools
  Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, cleaner_chore_enabled, cleaner_chore_run, cleaner_chore_switch, clear_block_cache, clear_compaction_queues, clear_deadservers, close_region, compact, compact_rs, compaction_state, flush, is_in_maintenance_mode, list_deadservers, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, splitormerge_enabled, splitormerge_switch, stop_master, stop_regionserver, trace, unassign, wal_roll, zk_dump

  Group name: replication
  Commands: add_peer, append_peer_namespaces, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config, list_peer_configs, list_peers, list_replicated_tables, remove_peer, remove_peer_namespaces, remove_peer_tableCFs, set_peer_bandwidth, set_peer_exclude_namespaces, set_peer_exclude_tableCFs, set_peer_namespaces, set_peer_replicate_all, set_peer_serial, set_peer_tableCFs, show_peer_tableCFs, update_peer_config

  Group name: snapshots
  Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, delete_table_snapshots, list_snapshots, list_table_snapshots, restore_snapshot, snapshot

  Group name: configuration
  Commands: update_all_config, update_config

  Group name: quotas
  Commands: list_quota_snapshots, list_quota_table_sizes, list_quotas, list_snapshot_sizes, set_quota

  Group name: security
  Commands: grant, list_security_capabilities, revoke, user_permission

  Group name: procedures
  Commands: list_locks, list_procedures

  Group name: visibility labels
  Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility

  Group name: rsgroup
  Commands: add_rsgroup, balance_rsgroup, get_rsgroup, get_server_rsgroup, get_table_rsgroup, list_rsgroups, move_namespaces_rsgroup, move_servers_namespaces_rsgroup, move_servers_rsgroup, move_servers_tables_rsgroup, move_tables_rsgroup, remove_rsgroup, remove_servers_rsgroup

SHELL USAGE:
Quote all names in HBase Shell such as table and column names.  Commas delimit
command parameters.  Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:

  {'key1' => 'value1', 'key2' => 'value2', ...}

and are opened and closed with curley-braces.  Key/values are delimited by the
'=>' character combination.  Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc.  Constants do not need to be quoted.  Type
'Object.constants' to see a (messy) list of all constants in the environment.

If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:

  hbase> get 't1', "key\x03\x3f\xcd"
  hbase> get 't1', "key\003\023\011"
  hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"

The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/book.html

3.0.2查看某个命令的详细使用 

hbase(main):020:0> help 'create'
Creates a table. Pass a table name, and a set of column family
specifications (at least one), and, optionally, table configuration.
Column specification can be a simple string (name), or a dictionary
(dictionaries are described below in main help output), necessarily
including NAME attribute.
Examples:

Create a table with namespace=ns1 and table qualifier=t1
  hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5}

Create a table with namespace=default and table qualifier=t1
  hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
  hbase> # The above in shorthand would be the following:
  hbase> create 't1', 'f1', 'f2', 'f3'
  hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
  hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}
  hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 1000000, MOB_COMPACT_PARTITION_POLICY => 'weekly'}

Table configuration options can be put at the end.
Examples:

  hbase> create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40']
  hbase> create 't1', 'f1', SPLITS => ['10', '20', '30', '40']
  hbase> create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'
  hbase> create 't1', {NAME => 'f1', VERSIONS => 5}, METADATA => { 'mykey' => 'myvalue' }
  hbase> # Optionally pre-split the table into NUMREGIONS, using
  hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname)
  hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
  hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', REGION_REPLICATION => 2, CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}}
  hbase> create 't1', {NAME => 'f1', DFS_REPLICATION => 1}

You can also keep around a reference to the created table:

  hbase> t1 = create 't1', 'f1'

Which gives you a reference to the table named 't1', on which you can then
call methods.

3.1 namespace概述与创建更新删除

namespace命名空间是表的逻辑分组,类似于关系数据库系统中的数据库。这种抽象为即将到来的多租户相关功能奠定了基础。简单理解就是hbase中的数据库,隔离用户,做如下配额,安全管理等。

  • 配额管理 ( HBASE-8410 ) - 限制命名空间可以消耗的资源量(即区域、表)。

  • 命名空间安全管理 ( HBASE-9206 ) - 为租户提供另一个级别的安全管理。

  • 区域服务器组 ( HBASE-6721 ) - 命名空间/表可以固定到 RegionServers 的子集上,从而保证粗略的隔离级别。

注意:hbase集群在创建时,默认预定义了两个特殊的命名空间

  • hbase - 系统命名空间,用于包含 HBase 内部表

  • default - 没有明确指定命名空间的表将自动落入这个命名空间

尖叫总结:实际生产中很少通过hbase shell去操作hbase,更多的是学习测试,问题排查等等才会使用到hbase shell ,hbase总的来说就是写数据,然后查询。 前者是通过API bulkload等形式写数据,后者通过api调用查询。

 3.1.1 namespace的操作

1.创建namespace
hbase(main):008:0> create_namespace 'myns_test'    
2.在指定的namespace中创建一个表                                                                                               
hbase(main):009:0> create 'myns_test:t1','cl1'
Created table myns_test:t1                                                                                               
=> Hbase::Table - myns_test:t1
3.删除一个namespace,前提必须要先把其中的表删完,不然报错。跟数据库一样
hbase(main):010:0> drop_namespace 'myns_test'

ERROR: org.apache.hadoop.hbase.constraint.ConstraintException: Only empty namespaces can be removed. Namespace myns_test has 1 tables
        at org.apache.hadoop.hbase.master.procedure.DeleteNamespaceProcedure.prepareDelete(DeleteNamespaceProcedure.java:217)
        at org.apache.hadoop.hbase.master.procedure.DeleteNamespaceProcedure.executeFromState(DeleteNamespaceProcedure.java:78)
        at org.apache.hadoop.hbase.master.procedure.DeleteNamespaceProcedure.executeFromState(DeleteNamespaceProcedure.java:45)
        at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:189)
        at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:957)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1835)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1595)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:80)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2140)

For usage try 'help "drop_namespace"'

Took 0.7591 seconds   
5.修改namespace的属性                                                                                               
hbase(main):011:0> alter_namespace 'myns_test',{METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}
Took 0.5489 seconds                                                                                                  

7.查看所有的namespace,看系统预定义与默认的defalut/hbase namespace。如果创建表不指定namespace,则在默认default里。
hbase(main):014:0> create_namespace 'myns_test1'
Took 0.2504 seconds                                                                                                  
hbase(main):015:0> list_namespace
NAMESPACE                                                                                                            
default                                                                                                              
hbase                                                                                                                
myns_test                                                                                                            
myns_test1                                                                                                           
4 row(s)
hbase(main):016:0> drop_namespace 'myns_test1'  #删除一个空的namespace
Took 0.2248 seconds                                                                                                  
hbase(main):017:0> list_namespace
NAMESPACE                                                                                                            
default                                                                                                              
hbase                                                                                                                
myns_test                                                                                                            
3 row(s)
Took 0.0113 seconds 

8.查看namespace的详细信息
hbase(main):018:0> describe_namespace 'myns_test'
DESCRIPTION                                                                                                          
{NAME => 'myns_test', PROPERTY_NAME => 'PROPERTY_VALUE'}                                                             
Took 0.0058 seconds                                                                                                  
=> 1

9.列出某个namespace下所有的表
hbase(main):023:0> list_namespace_tables 'myns_test'
TABLE                                                                                                                
t1                                                                                                                   
1 row(s)
Took 0.0237 seconds                                                                                                  
=> ["t1"]

尖叫总结:namespace了解即可,实际生产中用的很少,一般也都是运维同学给创建好。开发更多的是表级别的操作。

3.2hbase 表的CRUD

   创建表,必须传递两个值,一个是表名,一个是列族名。其他可选的表的配置可加可不加,其他都是对表(实际列族)的约束,根据实际生产要求添加,比如压缩,时间戳,版本等等。且属性可以单独指定,不指定的属性就是默认值。

3.2.1列举表,查看表结构等

1.列出某个namespace下所有的表                                                                                                 
hbase(main):023:0> list_namespace_tables 'myns_test'
TABLE                                                                                                                
t1                                                                                                                   
1 row(s)
Took 0.0237 seconds                                                                                                  
=> ["t1"]
2.列出所有表,所有namespace下所有表。
hbase(main):024:0> list
TABLE                                                                                                                
myns_test:t1                                                                                                         
test                                                                                                                 
2 row(s)
Took 0.0041 seconds                                                                                                  
=> ["myns_test:t1", "test"]

3.查看表结构,describe会显示表的结构,默认值,参数等。类似show create table 
hbase(main):027:0> describe 't1'
Table t1 is ENABLED                                                                                                  
t1                                                                                                                   
COLUMN FAMILIES DESCRIPTION                                                                                          
{NAME => 'f1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS
 => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', RE
PLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_
WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '6553
6'}                                                                                                                  
{NAME => 'f2', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS
 => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', RE
PLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_
WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '6553
6'}                                                                                                                  
{NAME => 'f3', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS
 => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', RE
PLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_
WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '6553
6'}                                                                                                                  
3 row(s)
Took 0.1012 seconds                

3.2.2各种形式的创建表,常用

1.基础建表语句,要有表名+列族名,如下在默认namespace中创建表tb1,列族名cf。同时创建一个表tb2,3个列族。注意观察两个表的表结构有啥不一样。

1.创建一个列族的表tb1
hbase(main):028:0>  create 'tb1','cf'
Created table tb1
Took 0.7221 seconds                                                                                                  
=> Hbase::Table - tb1
hbase(main):029:0> describe 'tb1'
Table tb1 is ENABLED                                                                                                 
tb1                                                                                                                  
COLUMN FAMILIES DESCRIPTION                                                                                          
{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS
 => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', RE
PLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_
WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '6553
6'}                                                                                                                  

2.创建3个列族的表,tb2
hbase(main):036:0> create 'tb2','cf1','cf2','cf3'
Created table tb2
hbase(main):038:0> describe 'tb2'
Table tb2 is ENABLED                                                                                                 
tb2                                                                                                                  
COLUMN FAMILIES DESCRIPTION                                                                                          
{NAME => 'cf1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELL
S => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', R
EPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON
_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '655
36'}                                                                                                                 
{NAME => 'cf2', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELL
S => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', R
EPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON
_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '655
36'}                                                                                                                 
{NAME => 'cf3', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELL
S => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', R
EPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON
_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '655
36'}                                                                                                                 

如上,我们简单建表后,describe以后我们发现,表tb1有如下默认配置属性。

  1. NAME => 'cf', 列族名
  2. VERSIONS => '1',  版本数,默认数据存放一个版本,多余删除,实际生产常用参数,可以设置更多。
  3. EVICT_BLOCKS_ON_CLOSE => 'false',是否在关闭时从blockcache中取出缓存块。
  4. NEW_VERSION_BEHAVIOR => 'false',可选新版本行为,hbase2特性,与删除有关。后续讲解
  5. KEEP_DELETED_CELLS => 'FALSE',列族是否可以选择保留已删除的单元格。如果true情况下,仍然可以检索已删除的单元格。默认一般删除了就不保留了,false。具体可以参考:Apache HBase ™ Reference Guide
  6. CACHE_DATA_ON_WRITE => 'false',写入缓存数据
  7. DATA_BLOCK_ENCODING => 'NONE', 数据块block的编码方式设置,HBase目前提供了四种常用的编码方式: Prefix_Tree、 Diff 、 Fast_Diff 、Prefix。
  8. TTL => 'FOREVER', 全称time to live,列族可以以设置一个以秒为单位的 TTL 长度,一旦到了过期时间,HBase 会自动删除行。这适用于一行的所有版本——甚至是当前版本。在 HBase 中为行编码的 TTL 时间以 UTC 指定。非常常用,生产一般设置TTL,相当于数仓里的生命周期,比如一个月等,不然数据一直膨胀。具体可以参考:Apache HBase ™ Reference Guide
  9. MIN_VERSIONS => '0',如果 HBase 中的表设置了 TTL 的时候,MIN_VERSIONS 才会起作用。
  10. REPLICATION_SCOPE => '0',REPLICATION_SCOPE 是列族级别属性,其值可以是 0 或 1。值 0 表示禁用复制,而 1 表示启用复制。这个一般默认值0。关于hbase的复制可以参考这两篇文章后续详细介绍:https://clouderatemp.wpengine.com/blog/2012/07/hbase-replication-overview-2/     Apache HBase Replication: Operational Overview - Cloudera Blog
  11. BLOOMFILTER => 'ROW',布隆过滤器级别,默认行级别
  12. CACHE_INDEX_ON_WRITE => 'false',写入缓存索引
  13. IN_MEMORY => 'false',是否将列族存储在内存中,HBase 可以选择一个列族赋予更高的优先级缓存,激进缓存(表示优先级更高),IN_MEMORY 默认是false。如果设置为true,HBase 会尝试将整个列族保存在内存中,只有在需要保存是才会持久化写入磁盘。但是在运行时 HBase 会尝试将整张表加载到内存里。
  14. CACHE_BLOOMS_ON_WRITE => 'false',写入时缓存爆发
  15. PREFETCH_BLOCKS_ON_OPEN => 'false',在打开状态下预取块,默认false
  16. COMPRESSION => 'NONE', 配置数据是否压缩,以及压缩算法,如snappy等,针对列族进行配置,一张表多个列族可以不同列族不同压缩算法。
  17. BLOCKCACHE => 'true',  块缓存是否开启,默认开启,后续介绍
  18. BLOCKSIZE => '65536'}   设置HFile数据块大小(默认64kb)一般不改,即使修改也是集群层面的统一设置,很少设置单个表,单个列族的属性。

 尖叫总结:注意通过上面我们创建的3个列族的tb2表我们可以看出,如上表的属性都是针对列族的,所有的操作属性都是列族级别的。我们可以针对列族设置,也可以使用默认值。

2.指定列族属性创建表。

1.创建表使用NAME属性值指定列族名
hbase(main):040:0>  create 't4', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
hbase(main):040:0>  create 't4', 'f1','f2','f3'
注意这两种创建的表结构都是一样的

2.其他指定列族属性创建表
hbase> create 't1', 'f1', 'f2', 'f3'
hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}
hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 1000000, MOB_COMPACT_PARTITION_POLICY => 'weekly'}

其实列族的属性有很多,上面是默认的,可以通过创建表时指定很多属性,比如预分区。具体参考hbase官网Apache HBase ™ Reference Guide

hbase> create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40']
hbase(main):047:0> describe  'tb12'
Table tb12 is ENABLED                                                                                                
tb12                                                                                                                 
COLUMN FAMILIES DESCRIPTION                                                                                          
{NAME => 'f1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS
 => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', RE
PLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_
WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '6553
6'}                                                                                                                  
1 row(s)
Took 0.0152 seconds           
hbase> create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'
hbase> create 't1', {NAME => 'f1', VERSIONS => 5}, METADATA => { 'mykey' => 'myvalue' }
hbase> # Optionally pre-split the table into NUMREGIONS, using
hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname)
hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', REGION_REPLICATION => 2, CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}}
hbase> create 't1', {NAME => 'f1', DFS_REPLICATION => 1}


点击全文阅读


本文链接:http://m.zhangshiyu.com/post/31028.html

命令  创建  指定  
<< 上一篇 下一篇 >>

  • 评论(0)
  • 赞助本站

◎欢迎参与讨论,请在这里发表您的看法、交流您的观点。

关于我们 | 我要投稿 | 免责申明

Copyright © 2020-2022 ZhangShiYu.com Rights Reserved.豫ICP备2022013469号-1