Kolmogorov Complexity
Lance Fortnow∗
1. Introduction
010101010101010101010101
100111011101011100100110
110100110010110100101100
Consider the three strings shown above. Although all are 24-bit binary strings
and therefore equally likely to represent the result of 24 flips of a fair coin, there
is a marked difference in the complexity of describing each of the three. The first
string can be fully described by stating that it is a 24-bit string with a 1 in position
n iff n is odd, the third has the property that position i is a 1 iff there are an odd
number of 1’s in the binary expansion of position i, but the second string appears
not to have a similarly simple description, and thus in order to describe the string
we are required to recite its contents verbatim.
What is a description? Fix Σ = {0, 1}. Let f : Σ∗ 7→ Σ∗. Then (relative to f),
a description of a string σ is simply some τ with f(τ) = σ.
Care is needed in using this definition. Without further restrictions, paradoxes
are possible (consider the description “the smallest positive integer not describable
in fewer than fifty words”). We restrict f by requiring that it be computable,
although not necessarily total, but do not initially concern ourselves with the
time-complexity of f .
We are now able to define Kolmogorov Complexity Cf :
Definition 1.1. Cf (x) =
{
min{|p| : f(p) = x} if x ∈ ran f
∞ otherwise
It would be better to have a fixed notion of description rather than having
to define f each time. Or, indeed, to have a notion of complexity that does not
vary according to which f we choose. To some extent this problem is unavoidable,
but we can achieve a sort of independence if we use a Universal Turing Machine
∗This article was prepared from notes of the author taken by Amy Gale in Kaikoura, January
2000.
2
(UTM). As is well known, there exists a Turing Machine U such that for all partial
computable f , there exists a program p such that for all y, U(p, y) = f(y).
We define a partial computable function g by letting g(0|p|1py) = U(p, y). The
following basic fact is not difficult to see, and removes the dependence on f . Thus
it allows us to talk about the Kolmogorov complexity.
Claim 1.1. For all partial computable f there exists a constant c such that for all
x, Cg(x) ≤ Cf (x) + c. (In fact, c = 2|p|+ 1, where p is a program for f .)
Now define C(x) = Cg(x). As we have seen, this is within a constant of
other complexities. We can also extend Claim 1.1 to binary functions. Let g be
a binary partial computable function. Then we define the conditional complexity
Cg(x|y) = min{|p| : g(p, y) = x}. Again we can drop the g for a universal function.
Note that by this definition, C(x|�) = C(x), where � is the empty string, since for
a universal binary g, g(x, �) is equivalent to a universal unary function.
The idea is that C(x) gives us a way to describe the randomness of x. Before we
turn to some applications of Kolmogorov complexity, here are some easy properties
of C(x).
1. C(x) ≤ |x| + c for all x. (We can see intuitively that there is a program I
that just prints out its input, and that C(x) ≤ CI(x) + c.)
2. C(xx) ≤ C(x) + c for all x. (We can take a program for x and put it in a
loop, increasing the program size by only a constant.)
3. For any partial computable h, C(h(x)) ≤ C(x) + ch, where ch is the length
of a description of h.
4. C(x|y) ≤ C(x) + c. (At worst, y is of no benefit in describing x.)
For the present article, we think of 〈x, y〉 as the concatenation of x and y. One
very desirable property we would like is that
C(〈x, y〉) ?≤ C(x) + C(y) + c.
However, there is a problem here: suppose we had programs p and q such that
g(p) = x and g(q) = y. The difficulty is that we want to encode p and q together
in such a way as to make it possible to extract them both intact at a later point.
If we simply encode them as the string pq, we will not know where p ends and q
begins. If we add additional elements, for example by encoding as 0|p|1pq, then a
lot of space is wasted. Trying |p|pq1 is still problematic since we don’t know where
|p| ends. One way is to self-delimit the strings by encoding |p| by local doubling
with an end point marker. That is, if |p| = 1010 then |p| is coded by 1100110001,
with 01 being the end marker. Of course such a trick could be repeated (by using
|p′|p′pq, where p′ = |p|, and encoding |p′| by local doubling with an end point
1In this article, we identify a natural number such as |p| with its binary representation.
3
marker) etc. This encoding is more or less best possible, and gives the following
tight bound on C(〈x, y〉). (In this article, all logs are “log2”.)
C(〈x, y〉) ≤ C(x) + C(y) + 2 logC(x) + c
C(〈x, y〉) ≤ C(x) + C(y) + logC(x) + 2 log logC(x) + c
C(〈x, y〉) ≤ C(x) + C(y) + logC(x) + log logC(x) + 2 log log logC(x) + c
...
This brings us to randomness. The idea is that a string is random if it cannot
be compressed. That is, if it has no short description. Using C(x) we can formalize
this idea via the following.
Theorem 1.2. For all n, there exists some x with |x| = n such that C(x) ≥ n.
Such x are called (Kolmogorov) random.
Proof. Suppose not. Then for all x, C(x) < n. Thus for all x there exists some px
such that g(px) = x and |px| < n. Obviously, if x 6= y then px 6= py.
But there are 2n−1 programs of length less than n, and 2n strings of length n.
By the pigeonhole principle, if all strings of length n have a program shorter than
n, then there must be some program that produces two different strings. Clearly
this is absurd, so it must be the case that at least one string of length n has a
program of length at least n. uunionsq
2. Some Applications
Aside from the intrinsic interest in the notion of randomness itself, the concept of
incompressibility can be used to give alternative proofs of many classical theorems.
Here are some examples.
Theorem 2.1. There are infinitely many primes.
Proof. Suppose not. Then there are k primes p1, p2, . . . , pk for some k ∈ N.
Thus we can take any number m ∈ N and write it as a product of these k
primes:
m = pe11 · · · pekk .
Let m be Kolmogorov random and have length n. We can describe m by
e1, . . . , ek. We claim that this gives a short description of m. First, ei ≤ logm.
Thus |ei| ≤ log logm. Hence, |〈e1, . . . , ek〉| ≤ 2k log logm. Therefore, as m ≤
2n+1, |〈e1, . . . , ek〉| ≤ 2k log(n+ 1), so C(m) ≤ 2k log(n+ 1) + c. For large enough
n, this contradicts C(m) ≥ n, which follows from the fact that m is random. uunionsq
4
Of course the dubious reader could well say that the proof above is more com-
plex than the original one. However, the following result is quite close to the real
“prime number theorem” and is definitely easier.
Let pm be the mth prime. It is reasonable to ask how big pm is, and Kolmogorov
complexity allows us to put a bound on this value.
Let pi be the largest prime that divides m. Then we can describe m by speci-
fying pi and mpi ; in fact all we need is i and
m
pi
because we can compute pi given i.
For m random, we have the following.
C(m) ≤ C(〈i, m
pi
〉)
≤ 2 log |i|+ |i|+ |m
pi
|,
so
logm ≤ 2 log log i+ log i+ logm− log pi.
Canceling this gives us
log pi ≤ log i+ 2 log log i
pi ≤ i(log i)2.
The classical theorem is that the i-th prime is below i log i, so the above is pretty
close.
Interestingly, most strings are “close” to being random.
Theorem 2.2. For all k and n,
|{x ∈ Σn : C(x) ≥ |x| − k}| ≥ 2n(1− 2−k).
Proof. The number of programs of size less than 2n−k is 2n−k − 1 < 2n−k, which
leaves over 2n − 2n−k = 2n(1− 2−k) programs of length n− k or greater. uunionsq
Thus, for example, 99.9% of all strings x have C(x) ≥ |x| − 10. The intuition
is that we can easily generate a string that is close to random if we are provided
with a simple random process, for example a coin that can be tossed.
Randomness has interesting interactions with classical computability theory.
Consider the non-random, co-computably enumerable set of strings
A = {x : C(x) ≥ |x|
2
}.
Here is an easy proof that A is a so-called “immune” set.
Theorem 2.3. If B is a computably enumerable subset of A then B is finite.
Proof. Suppose that B is computably enumerable and infinite, and B ⊆ A. Con-
sider the function h, where h(n) is the first element to be enumerated into B whose
length is n or greater. Then h is total (because otherwise B could not be infinite)
5
and computable (by dovetailing). We have h(n) ∈ B ⊆ A and by the definition of
A,
C(h(n)) ≥ |h(n)|
2
≥ n
2
.
But
C(h(n)) ≤ C(n) + c ≤ log n+ c,
which gives us a contradiction, because for any c, n2 > log n + c given a large
enough n. uunionsq
This theorem has a nice application in logic. We consider what can be proved
in a given proof system, for example Peano arithmetic. If we assume the system
is complete and sound, then we can either prove or disprove all theorems of the
form “x is random”. We can define the set
B = {x : there is a proof that x is random}.
Then B ⊆ A, and B is computably enumerable (since we can determine the
membership of x in B by searching the proof space). By the above theorem, B
must therefore be finite, which we know to be a contradiction. This provides
an easy view on Go¨del’s Incompleteness Theorem. Almost every random number
cannot be proven so!
3. Runs in Random Strings
Suppose x is random, with C(x) = |x| = n. What is the longest run of zeroes in
x? The first intuition would be that there are only very short runs of zeroes in a
random string. We can show that the longest run can have length of order log n
— if there were many more consecutive zeroes, we could compress them. Suppose
x is a random string of length n and x = u02 lognv for some u and v. Then we can
fully describe x by u and v, noting we can compute n given |u|+ |v|. We have
C(x) ≤ |u|+ |v|+ log |u|+ 2 log log |u|+ c
≤ n− 2 logn+ log n+ 2 log log n+ c.
Hence,
C(x) ≤ n− log n+ 2 log log n+ c,
a contradiction to x’s randomness for long enough x. This is called a “cut and
paste” argument and is quite typical of arguments using Kolmogorov complexity.
More surprisingly, one can show that x must have relatively long runs of zeros.
To show this we need to develop some new machinery.
Recall that
C(x|y) = min{|p| : g(p, y) = x},
6
where g is a universal binary partial computable function.
There are two main things to remember:
1. (∀n)(∃x ∈ Σn)(C(x) ≥ n), and
2. (∀x)(C(x) ≤ |x|+ c).
That is, for all n there is at least one random string of length n, but no string
of length n has complexity greater than n by more than a constant.
We now show a close relationship between the size of a set and the maximum
Kolmogorov complexity of a string in that set.
Theorem 3.1.
• Let A be finite. (∀y ∈ Σ∗)(∃x ∈ A)(C(x|y) ≥ log |A|).
• Let B ⊆ Σ∗×Σ∗ be an infinite computably enumerable set such that the sets
of the form By = {x : 〈x, y〉 ∈ B} are finite for all y. Then
(∀x, y : x ∈ By)(C(x|y) ≤ log |By|+ c),
where c is independent of both x and y.
Proof. The first item follows from a simple counting argument similar to the proofs
of Theorems 1.2 and 2.2. For the second item consider the generator program for
B. It will enumerate elements of By in some order x1, . . . , x|By|. We can describe
x by the program for B, y and the i such that x = xi. uunionsq
Now suppose a random string x with n = |x| has no runs of 12 log n = log
√
n
zeros. Break x into 2n/ log n segments of length log
√
n. Each segment must be
one of only
√
n− 1 possibilities, since 0log
√
n cannot be in any segment. The total
number of possibilities for x is at most
(
√
n− 1)2n/ logn = √n2n/ logn(1− 1/√n)2n/ logn ≈ 2ne− 2
√
n
logn .
We can enumerate these strings easily, so by Theorem 3.1, C(x) ≤ n−Ω(√n/ log n),
contradicting the fact that x is random.
Theorem 3.1 also allows us to consider Kolmogorov complexity over objects
besides strings. For example, let Bn be the set of permutations on {1, . . . , n}
encoded into strings in some natural way. Then
(∃x ∈ Bn)(C(x|n) ≥ log |Bn| = n log n)
and
(∀x ∈ Bn)(C(x|n) ≤ log |Bn|+ c).
4. Symmetry of Information
7
One important theorem is the following. We know that
C(〈x, y〉) ≤ C(y|x) + C(x) +O(log n),
where n = max{|x|, |y|}. Surprisingly, this inequality is essentially tight.
Theorem 4.1.
C(y|x) + C(x) ≤ C(〈x, y〉) +O(log n),
where n = max{|x|, |y|}.
Proof. Define the sets
A = {〈u, v〉 : C(〈u, v〉) ≤ C(〈x, y〉)}
and
Au = {v : 〈u, v〉 ∈ A}.
A is finite and recursively enumerable given 〈x, y〉, and likewise Au for all u.
We take e ∈ N such that 2e+1 > |Ax| ≥ 2e. Then
C(y|x) ≤ log |Ax|+O(1) = e+O(1).
Now consider the set B = {u : |Au| ≥ 2e}. It is clear that x ∈ B. Now, what
is |B|?
|B| ≤ |A|
2e
≤ 2
C(〈x,y〉)
2e
.
This is independent of the pairing function used, provided the function is injective.
Note that |⋃uAu| ≤ |A|.
We now have
C(x) ≤ |e|+ log 2
C(〈x,y〉)
2e
+ 2 log |e|
≤ C(〈x, y〉)− e+O(log n),
and thus
C(x) + C(y|x) ≤ C(〈x, y〉) +O(log n)
as required. uunionsq
Theorem 4.1 is often referred to as “Symmetry of Information”. We define the
information content of y in x as the difference in the sizes of the programs needed
to describe y given x as opposed to not given x, i.e., I(x : y) = C(y) − C(y|x).
The following Corollary of Theorem 4.1 shows that the amount of information of
y in x is roughly the same as the amount of information of x in y.
Corollary 4.2. I(x : y) = I(y : x)±O(log n), where n = max{|y|, |x|}.
8
Proof.
C(〈x, y〉) = C(y) + C(x|y)±O(log n)
= C(x) + C(y|x)±O(log n),
and hence
C(x)− C(x|y) = C(y)− C(y|x)±O(log n).
uunionsq
5. Prefix-Free Complexity
A problem with Kolmogorov complexity is that we are not always able to determine
where one string stops and another begins. A solution to this is to use prefix-free
languages.
Definition 5.1. We say a language A ⊆ Σ∗ is prefix-free if, for all x, y in A, if
x 6= y then x is not a prefix of y.
We say a function f is prefix-free if dom f is prefix-free.
Now we consider Kolmogorov complexity with respect to prefix-free codes. If
this is to be analogous with our original characterization of Kolmogorov com-
plexity, the first question we must ask is whether there is a universal prefix-free
function, that is, one that provides descriptions that are within a constant of those
provided by any prefix-free function, as an analog of the universal Turing Machine.
Definition 5.2. A prefix-free machine is a Turing machine with an input tape,
some number of work tapes and an output tape. The input head can only read
from left to right. At each stage there are three possible actions for the machine:
1. read a bit from the input and move the head to the right,
2. halt and output, or
3. go into an infinite loop.
Prefix-free machines were considered by several authors, notably Chaitin and
Martin-Lo¨f. (See Li and Vita´nyi [3] for the history here.) We say that a machine
M accepts a function f if for all x, y, if f(x) = y then the machine M reads exactly
all bits of x, then outputs y and halts, while if f(x) is undefined then M does not
halt on input x.
Theorem 5.1. Every prefix-free partial computable function can be accepted by a
prefix-free machine, and there is a universal prefix-free machine.
9
The proof of this theorem is not completely obvious, but not too difficult. The
proof sketch runs as follows. Let f be partial computable, and dom f prefix-free.
The corresponding prefix-free machine acts as follows. Read a bit of the input z.
Before reading any more, simulate f simultaneously on all y such that z is a prefix
of y, until f(y) halts, if ever. If y = z, output f(y), and otherwise, if y 6= z, read
the next bit of input.
This argument produces a prefix-free version of each partial computable func-
tion, and gives us our universal machine, which on input 0|p|1px uses p as the
program for some f and simulates the above prefix-free machine for f on input x,
and a corresponding universal function h.
Clearly domh is prefix-free, and there is a constant c such that
Ch(x) ≤ Cf (x) + c
for any prefix-free partial computable function f . The prefix-free complexity is
then defined as before:
K(x) = Ch(x).
Notice that we get the following:
Theorem 5.2. K(〈x, y〉) ≤ K(x) +K(y) + c.
We don’t need the log n factor any more. This is because if h(p) = x and
h(q) = y then, by prefix-freeness, p is the only initial segment of the string pq that
will give a halting computation.
Using the same counting argument as before, we see that
Theorem 5.3. (∀n)(∃x ∈ Σn)(K(x) ≥ n).
So we have gained something. On the principle that there are no free lunches
(except at this conference), we also lose something. Recall that we had
C(x) ≤ |x|+ c.
Now we no longer have this because we need a prefix-free way of describing x. We
get the following instead.
K(x) ≤ 2 log |x|+ |x|+ c,
or refining as before,
K(x) ≤ 2 log log |x|+ log |x|+ |x|+ c, etc.
We remark that a counting argument demonstrates that
(∀c)(∃x)(K(x) ≥ |x|+ log |x|+ c).
6. Kraft’s Inequality and the Universal Measure
10
The following theorem is the basis for assigning complexity to infinite sets.
Theorem 6.1 (Kraft’s Inequality). If A is prefix-free then∑
x∈A
2−|x| ≤ 1.
Proof. Let Rx denote the interval of real numbers whose dyadic expansion begins
with 0.x . . . . Then |Rx| = 2−|x|, where |Rx| denotes the length of the interval.
Note that if x 6= y with x, y ∈ A, then Rx ∩Ry = ∅. The result follows. uunionsq
This allows us to assign a measure: set µ(x) = 2−K(x). Note that µ : Σ∗ 7→ [0, 1]
and
∑
x µ(x) < 1. This measure assigns a weight to each string according to
its Kolmogorov complexity. The shorter the string’s description, the heavier the
weighting.
The measure µ(x) is semicomputable, i.e. there exists a computable function
f(x, k), nondecreasing in k, such that
lim
k→∞
f(x, k) = µ(x).
The measure µ(x) is universal for the semicomputable measures.
Fact 6.2. Let τ(x) be any semicomputable function such that
∑
x∈Σ∗ τ(x) ≤ 1.
There exists a constant c such that for all x, τ(x) ≤ cµ(x).
Take any algorithm and look at the average case under µ. This equals the
worst case. Specifically, let T (x) be the running time of some algorithm. Let
Tw(n) = maxx∈Σn T (x), and Tave(n) =
Σx∈Σnµ(x)T (x)
Σx∈Σnµ(x)
. Then we have the following.
Theorem 6.3. Tw(n) = O(Tave(n)).
Proof. Let µ(n) =
∑
x∈Σn µ(x). Let µ
′(x) be the distribution that puts µ(n)
weight at the lexicographically first string x of length n that maximizes T (x).
Theorem 6.3 follows from the universality of µ(x). uunionsq
7. Time-Bounded Kolmogorov Complexity
One of the problems with Kolmogorov complexity is that the shortest description
of a string can run very slowly. In fact, it must often do so, lest the set of random
strings have an infinite computably enumerable subset. One very useful variant of
classical complexity is the time-bounded version. For a function t, we have:
Ctf (x|y) = min{|p| : f(p, y) = x using time at most t(|x|+ |y|)}.
Here and below, we use the convention that this quantity is ∞ if there is no such
p.
11
There is a universality theorem in this setting. A function f is time-constructible
if there exists a Turing machine that on input n outputs t(n) using at most t(n)
time.
Fact 7.1. There exists a computable g such that for all computable f and time-
constructible t there is a constant c such that
Ct log tg (x|y) ≤ Ctf (x|y) + c.
We define Ct(x|y) = Ctg(x|y) and Ct(x) = Ct(x|�).
We can also look at the programs that distinguish x rather than generate x.
CD tf (x|y) = min{|p| : f(p, y, x) = 1 ∧ f(p, y, z) = 0 if z 6= x ∧
∀z(f(p, y, z) uses at most time t(|y|+ |z|))}.
We define CDt(x|y) and CDt(x) as above.
Without the time bound, C(x|y) ≤ CD(x|y) + O(1) for all x and y: Given
the CD program p for x and y search for the first z such that f(p, y, z) = 1 and
by definition z = x. However, for polynomial t we may not have enough time to
perform this search.
So what is the relationship between C and CD for polynomial-time bounds?
(∀ poly p)(∃
本文档为【Kolmogorov Complexity】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。