THUIRDB :A Large-Scale, Highly-
Efficient Index, Fast-Access
Key-Value Store
10billion records,1bit index per
record,1million/sec throughput in 1 machine
梁斌
清华大学计算机科学与技术系
http://www.thuir.org/thuirdb/
mgigabyte@gmail.com
Structure of Report
• Introduction
• Requirement and motivation
• Background review
• Problems
• Problems for binary-search
• Problems for B+-tree
• Problems for hash
• Problems for building database
• Rational
• Separating key and value by location
• Sorting then linear write
• Data lay-out
• Building the index bottom-up
• Highly-efficiency compressing
Requirement and motivation
• A special requirement for key-value store:
(1)Large scale, billions of record
(2)Random query, hundreds of random query for a task
(3)Static dataset, once built never changed, just read no update
(4) Low cost, sometimes should be built in a machine
• Pratical application situation:
(1)log-based analysis systems
(2)language-model based machine translation
Backgroud Review
• [Tobin 1986] issue an new data structure:T-
tree . T-tree inherits from both B-tree and
AVL-tree and is used by main-memory
databases, such as Datablitz, eXtremeDB,
MySQL Cluster.
• [Philip 2001]pk-Ttree and pkB-tree add index
compression.
• [1998 Rao Jun]Rao created CSS-tree, and
developed to CSB+ tree and CSS+tree
Background Review
Google introduced LMS-tree and multi-level skip
list based key-value store called LevelDB in 2011.
http://blog.nosqlfan.com/html/3041.html
From Alon Efrat’s
ppt
关于艾滋病ppt课件精益管理ppt下载地图下载ppt可编辑假如ppt教学课件下载triz基础知识ppt
of skip list
Problems for binary-search
• Importance of data is out of control
– 9 is the key and index key, very important
– 8 and 10 is only key, not important, but have been put
into L1 cache with 9 unfortunately
• Poor Data reference locality
Problems for B+-tree
(1)Each node stores child pointers and Each node 50% full.
(2)Data write is low-efficiency.
(3)Data write is not sequential
Problems for Hash
• High space overhead
• Hard to compress
• Can’t support ordered access
• Cant’s support range-query
Problems for building database
• Both B+-tree-based and Hash-based databases
are in trouble with memory-shortage, why?
• Why them need so much memory in hand?
Why not sync these dirty data to disk from
buffer?
Rational of THUIR-DB
• (1) separating key and value by location
• (2) sorting the key-value pairs, linear write them
• (3) build the index bottom-up
• (4) pointer eliminating
• (5)highly-efficiency compressing
Separation of index and data
• Insert:
location = store_s.put(value)
search_s.insert(key,location)
• query:
location = search_s.find(key)
value = store_s.get(location)
• Separating index and data makes the index smaller, provides
chances to apply different compression methods and some other
tricks, such as data reordering.
• In a word, Indirection provides flexibility
Variable-length key and value
• insert:
location = store_s.put(key,value);
inner_key = md5(key)
search_s.insert(inner_key,location);
• query:
inner_key = md5(selected_key)
locations = search_s.find(inner_key)
array(
)=store_s.get(locations)
for each item in arry
if(keyi = selected_key)
return valuei
• Certainty is the base of optimizing
linear inserting, building index bottom-up
advantage:
(1)important data constructed together(cache conscious)
(2)write data sequentially
(3)full-write each node
(4) set the stage for compression
Lay-out
Compression of Integer Sequence
• 1)if we known value-range in advance
a give integer sequence S,all integer in the range [0,L],then it
cost bits to save them all
• 2)if we know it is ordered .
use difference to make the smaller, such as
1,4,6,15,25,40, after differencing, we get 1,3,2,9,10,15,if all
integer in the new sequence is in the range [0,L’],then it cost
bits.
• 3)sometimes there are very large number, such as
1,4,6,15,25,10000,100001, use exception block to save them,
Pfordelta and New-Pfordelta solve the problem very well.
2* logS L
2* log 'S L
Adaptive-compression
• If we have the integer sequence like these.
1 1 2 3 15 16 16 17 19 20 21 22…
• Try to find a economical way to compress
1 1 2 3 [1bit gap] -> X bit /record
1 1 2 3 15 16 17.. [4bit gap]-> Y bit/record
If X
本文档为【数据存储和查找】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。