Size Balanced Tree
Chen Qifeng (Farmer John)
Zhongshan Memorial Middle School, Guangdong, China
Email:344368722@QQ.com
December 29, 2006
Abstract
This paper presents a unique strategy for maintaining balance in
dynamically changing Binary Search Trees that has optimal expected behavior at
worst. Size Balanced Tree is, as the name suggests, a Binary Search Tree (abbr.
BST) kept balanced by size. It is simple, efficient and versatile in every aspect. It
is very easy to implement and has a straightforward description and a
surprisingly simple proof of correctness and runtime. Its runtime matches that of
the fastest BST known so far. Furthermore, it works much faster than many
other famous BSTs due to the tendency of a perfect BST in practice. It supports
not only typical primary operations but also Select and Rank.
Key Words And Phrases
Size Balanced Tree
SBT
Maintain
This paper is dedicated to the memory of Heavens.
1 Introduction
Before presenting Size Balanced Trees it is necessary to explicate Binary Search
Trees and rotations on BSTs, Left-Rotate and Right-Rotate.
1.1 Binary Search Tree
Binary Search Tree is a significant kind of advanced data structures. It supports
many dynamic-set operations, including Search, Minimum, Maximum, Predecessor,
Successor, Insert and Delete. It can be used both as a dictionary and as a priority
queue.
A BST is an organized binary tree. Every node in a BST contains two children at
most. The keys for compare in a BST are always stored in such a way as to satisfy the
binary-search-tree property:
Let x be a node in a binary search tree. Then the key of x is not less than
that in left subtree and not larger than that in right subtree.
For every node t we use the fields of left[t] and right[t] to store two pointers to its
children. And we define key[t] to mean the value of the node t for compare. In
addition we add s[t], the size of subtree rooted at t, to keep the number of the nodes in
that tree. Particularly we call 0 the pointer to an empty tree and s[0]=0.
1.2 Rotations
In order to keep a BST balanced (not degenerated to be a chain) we usually
change the pointer structure through rotations to change the configuration, which is a
local operation in a search tree that preserves the binary-search-tree property.
Figure1.1: The operation Left-Rotate(x) transforms the configuration of the two
nodes on the right into the configuration on the left by changing a
constant number of pointers. The configuration on the left can be
transformed into the configuration on the right by the inverse
operation, Right-Rotate(y).
1.2.1 Pseudocode Of Right-Rotate
The Right-Rotate assumes that the left child exists.
Right-Rotate (t)
1 k←left[t]
2 left[t] ←right[k]
3 right[k] ←t
4 s[k] ←s[t]
5 s[t] ←s[left[t]]+s[right[t]]+1
6 t←k
1.2.2 Pseudocode Of Left-Rotate
The Left-Rotate assumes that the right child exists.
Left-Rotate (t)
1 k←right[t]
2 right[t] ←left[k]
3 left[k] ←t
4 s[k] ←s[t]
5 s[t] ←s[left[t]]+s[right[t]]+1
6 t←k
2 Size Balanced Tree
Size Balanced Tree (abbr. SBT) is a kind of Binary Search Trees kept balanced
by size. It supports many dynamic primary operations in the runtime of O(logn):
Insert(t,v) Inserts a node whose key is v into the SBT rooted at t.
Delete(t,v) Deletes a node whose key is v from the SBT rooted at t. In the case
that no such a node in the tree, deletes the node searched at last.
Find(t,v) Finds the node whose key is v and returns it.
Rank(t,v) Returns the rank of v in the tree rooted at t. In another word, it is one
plus the number of the keys which are less than v in that tree.
Select(t,k) Returns the node which is ranked at the kth position. Apparently it
includes operations of Get-max and Get-min because Get-min is
equivalent to Select(t,1) and Get-max is equivalent to Select(t,s[t])
Pred(t,v) Returns the node with maximum key which is less than v.
Succ(t,v) Returns the node with minimum key which is larger than v.
Commonly every node of a SBT contains key, left, right and extra but useful
updated field: its size, which has been defined in the former introduction. The kernel
of a SBT is divided into two restrictions on size:
For every node pointed by t in a SBT, we guarantee that
Property(a):
]]][[[]]],[[[]][[ tleftrightstleftleftstrights ≥
Property(b):
]]][[[]]],[[[]][[ trightleftstrightrightstlefts ≥
Figure 2.1:The nodes L and R are left and right children of the
node T. The Subtrees A and B, C and D are left and
right subtrees of the nodes L and R respectively.
Correspond to properties (a) and (b), ][][],[&][][],[ LsDsCsRsBsAs ≤≤
3 Maintain
Assume that we need to insert a node whose key is v into a BST. Generally we
use the following procedure to accomplish the mission.
Simple-Insert (t,v)
1 If t=0 then
2 t←NEW-NODE(v)
3 Else
4 s[t] ←s[t]+1
5 If vs[right[t]]
In this case that s[A]>s[R] correspond to Figure 2.1 after Insert(left[t],v), we can
execute the following instructions to repair the SBT:
(1) First perform Right-Rotate(T). This operation transforms Figure 2.1 into Figure
3.1;
Figure 3.1: All the nodes are defined the same as in Figure 2.1.
(2) And then sometimes it occurs that the tree is still not a SBT because s[C]>s[B]
or s[D]>s[B] by any possibility. So it is necessary to implement Maintain(T).
(3) In succession the configuration of the right subtree of the node L might be
changed. Because of the possibility of violating the properties it is needful to
run Maintain(L) again.
Case 2: s[right[left[t]]>s[right[t]]
In this case that s[B]>s[R] correspond to Figure 3.2 after Insert(left[t],v), the
maintaining is kind of complicated than that in case 1. We can carry out the following
operations for repair.
Figure 3.2: All the nodes are defined the same as in Figure 2.1 except
E, F and B. E and F are the subtrees of the node B.
(1) First perform Left-Rotate(L). After that Figure 3.2 was transformed to Figure
3.3 shown below;
Figure 3.3: All the nodes are defined the same as in Figure 3.2.
(2) And then perform Right-Rotate(T). This performance results in the
transformation from Figure 3.3 to following Figure 3.4.
Figure 3.4: All the nodes are defined the same as in Figure 3.2.
(3) After (1) and (2), the tree becomes to be very unpredictable. But luckily, in
Figure 3.4 subtrees A, E, F and R are still SBTs. So we can perform Maintain(L)
and Maintain(T) to repair the subtrees of the node B.
(4) After step (3), those subtrees are already SBTs. But properties (a) and (b) may
be still violated on the node B. Thus we need perform Maintain(B) again.
Case 3: s[right[right[t]]>s[left[t]]
This case is symmetrical with case 1.
Case 4: s[left[right[t]]>s[left[t]]
This case is symmetrical with case 2.
3.1 Pseudocode Of Standard Maintain
According to the previous analysis it is easy to implement a normal Maintain.
Maintain (t)
1 If s[left[left[t]]>s[right[t]] then
2 Right-Rotate(t)
3 Maintain(right[t])
4 Maintain(t)
5 Exit
6 If s[right[left[t]]>s[right[t]] then
7 Left-Rotate(left[t])
8 Right-Rotate(t)
9 Maintain(left[t])
10 Maintain(right[t])
11 Maintain(t)
12 Exit
13 If s[right[right[t]]>s[left[t]] then
14 Left-Rotate(t)
15 Maintain(left[t])
16 Maintain(t)
17 Exit
18 If s[left[right[t]]>s[left[t]] then
19 Right-Rotate(right[t])
20 Left-Rotate(t)
21 Maintain(left[t])
22 Maintain(right[t])
23 Maintain(t)
3.2 Pseudocode Of Faster And Simpler Maintain
The standard pseudocode is a little complicated and slow. Generally we can
ensure that property (a) or (b) has been satisfied. Therefore we only need to check
cases 1 and 2 or case 3 and 4 to speed up. In that case we can add a boolean variant,
flag, to avoid meaningless checking. If flag is false cases 1 and 2 will be checked,
otherwise cases 3 and 4 will be checked.
Maintain (t,flag)
1 If flag=false then
2 If s[left[left[t]]>s[right[t]] then
3 Right-Rotate(t)
4 Elseif s[right[left[t]]>s[right[t]] then
5 Left-Rotate(left[t])
6 Right-Rotate(t)
7 Else exit
8 Elseif s[right[right[t]]>s[left[t]] then
9 Left-Rotate(t)
10 Elseif s[left[right[t]]>s[left[t]] then
11 Right-Rotate(right[t])
12 Left-Rotate(t)
13 Else exit
14 Maintain(left[t],false)
15 Maintain(right[t],true)
16 Maintain(t,false)
17 Maintain(t,true)
Why Maintain(left[t],true) and Maintain(right[t],false) are expelled? What is the
runtime of Maintain? You can find the answers in part 6, analysis.
4 Insertion
The insertion on a SBT is very simple. Here’s the pseudocode of Insert on a SBT.
4.1 Pseudocode Of Insert
Insert (t,v)
1 If t=0 then
2 t←NEW-NODE(v)
3 Else
4 s[t] ←s[t]+1
5 If vkey[t])and(right[t]=0) then
3 Delete ←key[t]
4 If (left[t]=0)or(right[t]=0) then
5 t ←left[t]+right[t]
6 Else
7 key[t] ←Delete(left[t],v[t]+1)
8 Else
9 If v
=
=
⎪⎩
⎪⎨
⎧
+−+−
=
h
h
h
hfhf
hf
6.1.1 Proof
(1) It is easy to work out that 1]0[ =f and 2]1[ =f .
(2) First of all, )1(1]2[]1[][ >+−+−≥ hhfhfhf . For each h>1, let’s assume that
t points to a SBT whose height is h. And then this SBT contains a subtree at
the height of h-1. It doesn’t matter to suppose it as the left subtree. According
to the previous definition of ][hf , we have that ]1[ . And there
is an h-2-tall subtree of the left subtree. In another word there is a subtree
whose size is at least ]2[
]][[ −≥ hftlefts
−hf of the left subtree. Owing to the property (b)
we have that ]2[]][[ −≥trights hf . Therefore we draw the conclusion that
1]2[]1[][ +−+−≥ hfhfts .
On the other hand, )1(1]2[]1[][ >+−+−≤ hhfhfhf . We can construct a
SBT with exact node(s) and its height is h. We call this kind of SBT
tree[h].
][hf
)1(
)1(
)0(
subtrees twoas 2]- tree[hand 1]- tree[hcontaining BST a
nodes with twoBSTany
node one with BST a
][
>
=
=
⎪⎩
⎪⎨
⎧
=
h
h
h
htree
Hence that )1(1]2[]1[][ >+−+−= hhfhfhf is obtained by summing up
two upper aspects.
6.1.2 The Worst Height
In fact is exponential function. Its precise value can be calculated from the
recursion.
][hf
][1]2[1
5
1
5
][
0
333
iacciFibonhFibonaccihf
h
i
hhh ∑
=
+++
=−+=−=−−= αβα
where
2
51+=α and
2
51−=β
Some usual values of f[h]
H 13 15 17 19 21 23 25 27 29 31
F[h] 986 2583 6764 17710 46367 121392 317810 832039 2178308 5702886
Lemma
The worst height of an n-node SBT is the maximum h subjected to . nhf ≤][
Assume that maxh is the worst height of a SBT at the size of n. According to the
lemma above, we have
33.1log44.1maxh
3logmaxh
5.1
5
1
5
]maxh[
1.5n
2
)5.1(5
3maxh
3maxh
−≤
⇒−≤
⇒+≤
⇒≤−=
+
+
+
+
n
n
nf
α
α
α
Now it is clear that the height of a SBT is O(logn).
6.2 Analysis Of Maintain
We can easily prove that Maintain works quite efficiently using the previous
conclusions.
There is a very important value to estimate how well a BST is, the average of all
nodes’ depths. It is the quotient obtained by SD, the sum of the depths of all nodes,
dividing n. Generally the less it is, the better a BST is. Because of the constant n, SD
is expected to be as small as possible.
Now we need to concentrate on SD. Its significance is the ability to restrict the
times of Maintain. Recalling the conditions to perform rotations in Maintain it is
surprising that SD always decreases after rotations.
In case 1, for example, comparing Figure 2.1 with Figure 3.1, SD increases a
negative value, s[right[t]]-s[left[left[t]].
And for instance comparing Figure 3.2 to Figure 3.4, SD increases a value less
than -1, s[right[t]]-s[right[left[t]]]-1.
Due to the height of O(logn) SD is always kept O(nlogn). And SD just increases
O(logn) after an insertion on a SBT. Therefore
)log(
)(log)(log
nnOT
nOTnOnSD
=
⇒=−×=
where T is the number of Maintains running rotations.
The total number of Maintains is T plus the number of Maintains without rotations.
Since the latter is O(nlogn)+O(T), the amortized runtime for Maintain is
)1(
log
)()log()( O
nn
TOnnOTO =++
6.3 Analysis Of Each Operation
Now that the height of SBT is O(logn) and Maintain is O(1), the runtime for all
primary operations are O(logn).
6.4 Analysis Of Faster And Simpler Delete
We call the statement P(n*) that a BST with the faster and simpler Delete and n*
standard Inserts is at the height of O(logn*). We prove P(n*) is true for arbitrary
positive integer n* by mathematical induction.
6.4.1 Proof
Here I just give a rough proof.
Assume that a node t is checked by Maintain(t,false), we have
3
1][2]][[
]][[
2
1]][[
−≤
⇒≤−
tstlefts
trightstlefts
Therefore if all nodes on the path from a node to the root are checked by
Maintain the depth of this node is O(logn).
(1) For n*=1, P(n*) is true apparently;
(2) Assume that P(n*) is true for n*
本文档为【10.陈启峰《Size Balanced Tree》】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。