首页 Kolmogorov Complexity

Kolmogorov Complexity

举报
开通vip

Kolmogorov Complexity Kolmogorov Complexity Lance Fortnow∗ 1. Introduction 010101010101010101010101 100111011101011100100110 110100110010110100101100 Consider the three strings shown above. Although all are 24-bit binary strings and therefore equally likely to represent the ...

Kolmogorov Complexity
Kolmogorov Complexity Lance Fortnow∗ 1. Introduction 010101010101010101010101 100111011101011100100110 110100110010110100101100 Consider the three strings shown above. Although all are 24-bit binary strings and therefore equally likely to represent the result of 24 flips of a fair coin, there is a marked difference in the complexity of describing each of the three. The first string can be fully described by stating that it is a 24-bit string with a 1 in position n iff n is odd, the third has the property that position i is a 1 iff there are an odd number of 1’s in the binary expansion of position i, but the second string appears not to have a similarly simple description, and thus in order to describe the string we are required to recite its contents verbatim. What is a description? Fix Σ = {0, 1}. Let f : Σ∗ 7→ Σ∗. Then (relative to f), a description of a string σ is simply some τ with f(τ) = σ. Care is needed in using this definition. Without further restrictions, paradoxes are possible (consider the description “the smallest positive integer not describable in fewer than fifty words”). We restrict f by requiring that it be computable, although not necessarily total, but do not initially concern ourselves with the time-complexity of f . We are now able to define Kolmogorov Complexity Cf : Definition 1.1. Cf (x) = { min{|p| : f(p) = x} if x ∈ ran f ∞ otherwise It would be better to have a fixed notion of description rather than having to define f each time. Or, indeed, to have a notion of complexity that does not vary according to which f we choose. To some extent this problem is unavoidable, but we can achieve a sort of independence if we use a Universal Turing Machine ∗This article was prepared from notes of the author taken by Amy Gale in Kaikoura, January 2000. 2 (UTM). As is well known, there exists a Turing Machine U such that for all partial computable f , there exists a program p such that for all y, U(p, y) = f(y). We define a partial computable function g by letting g(0|p|1py) = U(p, y). The following basic fact is not difficult to see, and removes the dependence on f . Thus it allows us to talk about the Kolmogorov complexity. Claim 1.1. For all partial computable f there exists a constant c such that for all x, Cg(x) ≤ Cf (x) + c. (In fact, c = 2|p|+ 1, where p is a program for f .) Now define C(x) = Cg(x). As we have seen, this is within a constant of other complexities. We can also extend Claim 1.1 to binary functions. Let g be a binary partial computable function. Then we define the conditional complexity Cg(x|y) = min{|p| : g(p, y) = x}. Again we can drop the g for a universal function. Note that by this definition, C(x|�) = C(x), where � is the empty string, since for a universal binary g, g(x, �) is equivalent to a universal unary function. The idea is that C(x) gives us a way to describe the randomness of x. Before we turn to some applications of Kolmogorov complexity, here are some easy properties of C(x). 1. C(x) ≤ |x| + c for all x. (We can see intuitively that there is a program I that just prints out its input, and that C(x) ≤ CI(x) + c.) 2. C(xx) ≤ C(x) + c for all x. (We can take a program for x and put it in a loop, increasing the program size by only a constant.) 3. For any partial computable h, C(h(x)) ≤ C(x) + ch, where ch is the length of a description of h. 4. C(x|y) ≤ C(x) + c. (At worst, y is of no benefit in describing x.) For the present article, we think of 〈x, y〉 as the concatenation of x and y. One very desirable property we would like is that C(〈x, y〉) ?≤ C(x) + C(y) + c. However, there is a problem here: suppose we had programs p and q such that g(p) = x and g(q) = y. The difficulty is that we want to encode p and q together in such a way as to make it possible to extract them both intact at a later point. If we simply encode them as the string pq, we will not know where p ends and q begins. If we add additional elements, for example by encoding as 0|p|1pq, then a lot of space is wasted. Trying |p|pq1 is still problematic since we don’t know where |p| ends. One way is to self-delimit the strings by encoding |p| by local doubling with an end point marker. That is, if |p| = 1010 then |p| is coded by 1100110001, with 01 being the end marker. Of course such a trick could be repeated (by using |p′|p′pq, where p′ = |p|, and encoding |p′| by local doubling with an end point 1In this article, we identify a natural number such as |p| with its binary representation. 3 marker) etc. This encoding is more or less best possible, and gives the following tight bound on C(〈x, y〉). (In this article, all logs are “log2”.) C(〈x, y〉) ≤ C(x) + C(y) + 2 logC(x) + c C(〈x, y〉) ≤ C(x) + C(y) + logC(x) + 2 log logC(x) + c C(〈x, y〉) ≤ C(x) + C(y) + logC(x) + log logC(x) + 2 log log logC(x) + c ... This brings us to randomness. The idea is that a string is random if it cannot be compressed. That is, if it has no short description. Using C(x) we can formalize this idea via the following. Theorem 1.2. For all n, there exists some x with |x| = n such that C(x) ≥ n. Such x are called (Kolmogorov) random. Proof. Suppose not. Then for all x, C(x) < n. Thus for all x there exists some px such that g(px) = x and |px| < n. Obviously, if x 6= y then px 6= py. But there are 2n−1 programs of length less than n, and 2n strings of length n. By the pigeonhole principle, if all strings of length n have a program shorter than n, then there must be some program that produces two different strings. Clearly this is absurd, so it must be the case that at least one string of length n has a program of length at least n. uunionsq 2. Some Applications Aside from the intrinsic interest in the notion of randomness itself, the concept of incompressibility can be used to give alternative proofs of many classical theorems. Here are some examples. Theorem 2.1. There are infinitely many primes. Proof. Suppose not. Then there are k primes p1, p2, . . . , pk for some k ∈ N. Thus we can take any number m ∈ N and write it as a product of these k primes: m = pe11 · · · pekk . Let m be Kolmogorov random and have length n. We can describe m by e1, . . . , ek. We claim that this gives a short description of m. First, ei ≤ logm. Thus |ei| ≤ log logm. Hence, |〈e1, . . . , ek〉| ≤ 2k log logm. Therefore, as m ≤ 2n+1, |〈e1, . . . , ek〉| ≤ 2k log(n+ 1), so C(m) ≤ 2k log(n+ 1) + c. For large enough n, this contradicts C(m) ≥ n, which follows from the fact that m is random. uunionsq 4 Of course the dubious reader could well say that the proof above is more com- plex than the original one. However, the following result is quite close to the real “prime number theorem” and is definitely easier. Let pm be the mth prime. It is reasonable to ask how big pm is, and Kolmogorov complexity allows us to put a bound on this value. Let pi be the largest prime that divides m. Then we can describe m by speci- fying pi and mpi ; in fact all we need is i and m pi because we can compute pi given i. For m random, we have the following. C(m) ≤ C(〈i, m pi 〉) ≤ 2 log |i|+ |i|+ |m pi |, so logm ≤ 2 log log i+ log i+ logm− log pi. Canceling this gives us log pi ≤ log i+ 2 log log i pi ≤ i(log i)2. The classical theorem is that the i-th prime is below i log i, so the above is pretty close. Interestingly, most strings are “close” to being random. Theorem 2.2. For all k and n, |{x ∈ Σn : C(x) ≥ |x| − k}| ≥ 2n(1− 2−k). Proof. The number of programs of size less than 2n−k is 2n−k − 1 < 2n−k, which leaves over 2n − 2n−k = 2n(1− 2−k) programs of length n− k or greater. uunionsq Thus, for example, 99.9% of all strings x have C(x) ≥ |x| − 10. The intuition is that we can easily generate a string that is close to random if we are provided with a simple random process, for example a coin that can be tossed. Randomness has interesting interactions with classical computability theory. Consider the non-random, co-computably enumerable set of strings A = {x : C(x) ≥ |x| 2 }. Here is an easy proof that A is a so-called “immune” set. Theorem 2.3. If B is a computably enumerable subset of A then B is finite. Proof. Suppose that B is computably enumerable and infinite, and B ⊆ A. Con- sider the function h, where h(n) is the first element to be enumerated into B whose length is n or greater. Then h is total (because otherwise B could not be infinite) 5 and computable (by dovetailing). We have h(n) ∈ B ⊆ A and by the definition of A, C(h(n)) ≥ |h(n)| 2 ≥ n 2 . But C(h(n)) ≤ C(n) + c ≤ log n+ c, which gives us a contradiction, because for any c, n2 > log n + c given a large enough n. uunionsq This theorem has a nice application in logic. We consider what can be proved in a given proof system, for example Peano arithmetic. If we assume the system is complete and sound, then we can either prove or disprove all theorems of the form “x is random”. We can define the set B = {x : there is a proof that x is random}. Then B ⊆ A, and B is computably enumerable (since we can determine the membership of x in B by searching the proof space). By the above theorem, B must therefore be finite, which we know to be a contradiction. This provides an easy view on Go¨del’s Incompleteness Theorem. Almost every random number cannot be proven so! 3. Runs in Random Strings Suppose x is random, with C(x) = |x| = n. What is the longest run of zeroes in x? The first intuition would be that there are only very short runs of zeroes in a random string. We can show that the longest run can have length of order log n — if there were many more consecutive zeroes, we could compress them. Suppose x is a random string of length n and x = u02 lognv for some u and v. Then we can fully describe x by u and v, noting we can compute n given |u|+ |v|. We have C(x) ≤ |u|+ |v|+ log |u|+ 2 log log |u|+ c ≤ n− 2 logn+ log n+ 2 log log n+ c. Hence, C(x) ≤ n− log n+ 2 log log n+ c, a contradiction to x’s randomness for long enough x. This is called a “cut and paste” argument and is quite typical of arguments using Kolmogorov complexity. More surprisingly, one can show that x must have relatively long runs of zeros. To show this we need to develop some new machinery. Recall that C(x|y) = min{|p| : g(p, y) = x}, 6 where g is a universal binary partial computable function. There are two main things to remember: 1. (∀n)(∃x ∈ Σn)(C(x) ≥ n), and 2. (∀x)(C(x) ≤ |x|+ c). That is, for all n there is at least one random string of length n, but no string of length n has complexity greater than n by more than a constant. We now show a close relationship between the size of a set and the maximum Kolmogorov complexity of a string in that set. Theorem 3.1. • Let A be finite. (∀y ∈ Σ∗)(∃x ∈ A)(C(x|y) ≥ log |A|). • Let B ⊆ Σ∗×Σ∗ be an infinite computably enumerable set such that the sets of the form By = {x : 〈x, y〉 ∈ B} are finite for all y. Then (∀x, y : x ∈ By)(C(x|y) ≤ log |By|+ c), where c is independent of both x and y. Proof. The first item follows from a simple counting argument similar to the proofs of Theorems 1.2 and 2.2. For the second item consider the generator program for B. It will enumerate elements of By in some order x1, . . . , x|By|. We can describe x by the program for B, y and the i such that x = xi. uunionsq Now suppose a random string x with n = |x| has no runs of 12 log n = log √ n zeros. Break x into 2n/ log n segments of length log √ n. Each segment must be one of only √ n− 1 possibilities, since 0log √ n cannot be in any segment. The total number of possibilities for x is at most ( √ n− 1)2n/ logn = √n2n/ logn(1− 1/√n)2n/ logn ≈ 2ne− 2 √ n logn . We can enumerate these strings easily, so by Theorem 3.1, C(x) ≤ n−Ω(√n/ log n), contradicting the fact that x is random. Theorem 3.1 also allows us to consider Kolmogorov complexity over objects besides strings. For example, let Bn be the set of permutations on {1, . . . , n} encoded into strings in some natural way. Then (∃x ∈ Bn)(C(x|n) ≥ log |Bn| = n log n) and (∀x ∈ Bn)(C(x|n) ≤ log |Bn|+ c). 4. Symmetry of Information 7 One important theorem is the following. We know that C(〈x, y〉) ≤ C(y|x) + C(x) +O(log n), where n = max{|x|, |y|}. Surprisingly, this inequality is essentially tight. Theorem 4.1. C(y|x) + C(x) ≤ C(〈x, y〉) +O(log n), where n = max{|x|, |y|}. Proof. Define the sets A = {〈u, v〉 : C(〈u, v〉) ≤ C(〈x, y〉)} and Au = {v : 〈u, v〉 ∈ A}. A is finite and recursively enumerable given 〈x, y〉, and likewise Au for all u. We take e ∈ N such that 2e+1 > |Ax| ≥ 2e. Then C(y|x) ≤ log |Ax|+O(1) = e+O(1). Now consider the set B = {u : |Au| ≥ 2e}. It is clear that x ∈ B. Now, what is |B|? |B| ≤ |A| 2e ≤ 2 C(〈x,y〉) 2e . This is independent of the pairing function used, provided the function is injective. Note that |⋃uAu| ≤ |A|. We now have C(x) ≤ |e|+ log 2 C(〈x,y〉) 2e + 2 log |e| ≤ C(〈x, y〉)− e+O(log n), and thus C(x) + C(y|x) ≤ C(〈x, y〉) +O(log n) as required. uunionsq Theorem 4.1 is often referred to as “Symmetry of Information”. We define the information content of y in x as the difference in the sizes of the programs needed to describe y given x as opposed to not given x, i.e., I(x : y) = C(y) − C(y|x). The following Corollary of Theorem 4.1 shows that the amount of information of y in x is roughly the same as the amount of information of x in y. Corollary 4.2. I(x : y) = I(y : x)±O(log n), where n = max{|y|, |x|}. 8 Proof. C(〈x, y〉) = C(y) + C(x|y)±O(log n) = C(x) + C(y|x)±O(log n), and hence C(x)− C(x|y) = C(y)− C(y|x)±O(log n). uunionsq 5. Prefix-Free Complexity A problem with Kolmogorov complexity is that we are not always able to determine where one string stops and another begins. A solution to this is to use prefix-free languages. Definition 5.1. We say a language A ⊆ Σ∗ is prefix-free if, for all x, y in A, if x 6= y then x is not a prefix of y. We say a function f is prefix-free if dom f is prefix-free. Now we consider Kolmogorov complexity with respect to prefix-free codes. If this is to be analogous with our original characterization of Kolmogorov com- plexity, the first question we must ask is whether there is a universal prefix-free function, that is, one that provides descriptions that are within a constant of those provided by any prefix-free function, as an analog of the universal Turing Machine. Definition 5.2. A prefix-free machine is a Turing machine with an input tape, some number of work tapes and an output tape. The input head can only read from left to right. At each stage there are three possible actions for the machine: 1. read a bit from the input and move the head to the right, 2. halt and output, or 3. go into an infinite loop. Prefix-free machines were considered by several authors, notably Chaitin and Martin-Lo¨f. (See Li and Vita´nyi [3] for the history here.) We say that a machine M accepts a function f if for all x, y, if f(x) = y then the machine M reads exactly all bits of x, then outputs y and halts, while if f(x) is undefined then M does not halt on input x. Theorem 5.1. Every prefix-free partial computable function can be accepted by a prefix-free machine, and there is a universal prefix-free machine. 9 The proof of this theorem is not completely obvious, but not too difficult. The proof sketch runs as follows. Let f be partial computable, and dom f prefix-free. The corresponding prefix-free machine acts as follows. Read a bit of the input z. Before reading any more, simulate f simultaneously on all y such that z is a prefix of y, until f(y) halts, if ever. If y = z, output f(y), and otherwise, if y 6= z, read the next bit of input. This argument produces a prefix-free version of each partial computable func- tion, and gives us our universal machine, which on input 0|p|1px uses p as the program for some f and simulates the above prefix-free machine for f on input x, and a corresponding universal function h. Clearly domh is prefix-free, and there is a constant c such that Ch(x) ≤ Cf (x) + c for any prefix-free partial computable function f . The prefix-free complexity is then defined as before: K(x) = Ch(x). Notice that we get the following: Theorem 5.2. K(〈x, y〉) ≤ K(x) +K(y) + c. We don’t need the log n factor any more. This is because if h(p) = x and h(q) = y then, by prefix-freeness, p is the only initial segment of the string pq that will give a halting computation. Using the same counting argument as before, we see that Theorem 5.3. (∀n)(∃x ∈ Σn)(K(x) ≥ n). So we have gained something. On the principle that there are no free lunches (except at this conference), we also lose something. Recall that we had C(x) ≤ |x|+ c. Now we no longer have this because we need a prefix-free way of describing x. We get the following instead. K(x) ≤ 2 log |x|+ |x|+ c, or refining as before, K(x) ≤ 2 log log |x|+ log |x|+ |x|+ c, etc. We remark that a counting argument demonstrates that (∀c)(∃x)(K(x) ≥ |x|+ log |x|+ c). 6. Kraft’s Inequality and the Universal Measure 10 The following theorem is the basis for assigning complexity to infinite sets. Theorem 6.1 (Kraft’s Inequality). If A is prefix-free then∑ x∈A 2−|x| ≤ 1. Proof. Let Rx denote the interval of real numbers whose dyadic expansion begins with 0.x . . . . Then |Rx| = 2−|x|, where |Rx| denotes the length of the interval. Note that if x 6= y with x, y ∈ A, then Rx ∩Ry = ∅. The result follows. uunionsq This allows us to assign a measure: set µ(x) = 2−K(x). Note that µ : Σ∗ 7→ [0, 1] and ∑ x µ(x) < 1. This measure assigns a weight to each string according to its Kolmogorov complexity. The shorter the string’s description, the heavier the weighting. The measure µ(x) is semicomputable, i.e. there exists a computable function f(x, k), nondecreasing in k, such that lim k→∞ f(x, k) = µ(x). The measure µ(x) is universal for the semicomputable measures. Fact 6.2. Let τ(x) be any semicomputable function such that ∑ x∈Σ∗ τ(x) ≤ 1. There exists a constant c such that for all x, τ(x) ≤ cµ(x). Take any algorithm and look at the average case under µ. This equals the worst case. Specifically, let T (x) be the running time of some algorithm. Let Tw(n) = maxx∈Σn T (x), and Tave(n) = Σx∈Σnµ(x)T (x) Σx∈Σnµ(x) . Then we have the following. Theorem 6.3. Tw(n) = O(Tave(n)). Proof. Let µ(n) = ∑ x∈Σn µ(x). Let µ ′(x) be the distribution that puts µ(n) weight at the lexicographically first string x of length n that maximizes T (x). Theorem 6.3 follows from the universality of µ(x). uunionsq 7. Time-Bounded Kolmogorov Complexity One of the problems with Kolmogorov complexity is that the shortest description of a string can run very slowly. In fact, it must often do so, lest the set of random strings have an infinite computably enumerable subset. One very useful variant of classical complexity is the time-bounded version. For a function t, we have: Ctf (x|y) = min{|p| : f(p, y) = x using time at most t(|x|+ |y|)}. Here and below, we use the convention that this quantity is ∞ if there is no such p. 11 There is a universality theorem in this setting. A function f is time-constructible if there exists a Turing machine that on input n outputs t(n) using at most t(n) time. Fact 7.1. There exists a computable g such that for all computable f and time- constructible t there is a constant c such that Ct log tg (x|y) ≤ Ctf (x|y) + c. We define Ct(x|y) = Ctg(x|y) and Ct(x) = Ct(x|�). We can also look at the programs that distinguish x rather than generate x. CD tf (x|y) = min{|p| : f(p, y, x) = 1 ∧ f(p, y, z) = 0 if z 6= x ∧ ∀z(f(p, y, z) uses at most time t(|y|+ |z|))}. We define CDt(x|y) and CDt(x) as above. Without the time bound, C(x|y) ≤ CD(x|y) + O(1) for all x and y: Given the CD program p for x and y search for the first z such that f(p, y, z) = 1 and by definition z = x. However, for polynomial t we may not have enough time to perform this search. So what is the relationship between C and CD for polynomial-time bounds? (∀ poly p)(∃
本文档为【Kolmogorov Complexity】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_730737
暂无简介~
格式:pdf
大小:215KB
软件:PDF阅读器
页数:14
分类:
上传时间:2011-07-31
浏览量:33