Can Hbase table be partitioned based on time? -
i need data based on time range.is there way partition hbase table based on time range. ex : want data 9:00 9:05 .
you can create compound key of type <timestamp><id>
, , entries in hbase ordered timestamp. can create scanner starts @ beginning of range , ends @ end of range.
one issue may face if have high insert rate, have single server hotspot new entries. 1 way around invert key , ensure first part random: <sha1 of id><timestamp>
. has advantage of distributing writes across entire cluster, disadvantage of requiring read of entire table particular range.
if use first method of <timestamp><id>
, map job may not able split work many chunks might like. default way table splits work on region. if time slice small enough, have single region serving data , not gain parallelism in query. potentially have custom table split parallelizes query across more mappers regions, still reading of data 1 region, , can have drawbacks parallelism well.
how set table depends on projected usage scenario , read/write proportion, , how high of performance need each.
if append id timestamp ensure uniqueness, can still scanner return events given timestamp. hbase sorts keys lexographically based on byte representation. so, if key <timestamp>:<id>
, can set scanner start @ row <timestamp>
, stop row @ <timestamp+1>
events @ timestamp
Comments
Post a Comment