Monday 14 September 2015

Loading data to hbase - bulk and non-bulk loading

Loading csv to hadoop fs:

hadoop fs -put test.tsv /tmp/
hadoop fs -ls /tmp/

1. BULK LOADING

 a) Preparing StoreFiles

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns="HBASE_ROW_KEY,cf1:c1,cf1:c2" -Dimporttsv.separator="," -Dimporttsv.bulk.output="/tmp/hbaseoutput" t1 /tmp/test.tsv

b) Upload the data from the HFiles located at /tmp/hbaseoutput to the HBase table t1

hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/hbaseoutput t1


2. NON-BULK LOADING

Upload the data from TSV format in HDFS into HBase via Puts

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns="HBASE_ROW_KEY,cf1:c1,cf1:c2" t1 /tmp/test.tsv

2 comments: