统计软件和R语言装了R没有?一个广泛接受的统计定义为:一个广泛接受的统计定义为:统计是用以收集数据、分析数据和由数据得出结论的一组概念、原则和方法.这个定义决定了统计的命运:这个定义决定了统计的命运:和数学及音乐不同, 统计不能欣赏自己, 它不为实际服务就没有存在必要 统计必须为各个领域服务 统计必须和数据打交道 因此,统计必须和计算机结合搞"理论统计"是否用不着动手搞数据呢?搞"理论统计"是否用不着动手搞数据呢?如果倒退几十年… 就可以. 如果没有应用背景如果没有应用背景文章没人要, 基金无人给...

x=rnorm(10) all(x>0);all(x!=0);any(x>0);(1:10)[x>0] x=sample(1:7,5,rep=T);unique(x)矩阵的转置和逆矩阵矩阵的转置和逆矩阵x=matrix(runif(9),3,3);x [,1] [,2] [,3] [1,] 0.6747652 0.9954731 0.7524502 [2,] 0.3090199 0.2390141 0.2472961 [3,] 0.5102675 0.9515505 0.6082803 t(x) [,1] [,2] [,3] [1,] 0.6747652 0.3090199 0.5102675 [2,] 0.9954731 0.2390141 0.9515505 [3,] 0.7524502 0.2472961 0.6082803 solve(x) # solve(a,b)可以解ax=b方程 [,1] [,2] [,3] [1,] -12.313293 15.125819 9.082300 [2,] -8.459725 3.627898 8.989864 [3,] 23.563034 -18.363808 -20.037986警告:计算机中的0是什么?警告:计算机中的0是什么?x%*%solve(x) [,1] [,2] [,3] [1,] 1.000000e+00 -9.454243e-17 -3.911801e-16 [2,] 5.494737e-16 1.000000e+00 3.248270e-16 [3,] -3.018419e-16 1.804980e-15 1.000000e+00 要用线性代数的知识来判断诸如有多少非零特征根等问题.假定v是特征值组成的向量,不能用诸如sum(v!=0) 等方法来判断非零特征根的数目!Matrix & Array Matrix & Array x=array(runif(20),c(4,5)); x [,1] [,2] [,3] [,4] [,5] [1,] 0.5474306 0.2362356 0.687007107 0.4036998 0.5255839 [2,] 0.8234363 0.4922711 0.960554564 0.4704976 0.1327870 [3,] 0.1861151 0.8461655 0.390523424 0.2202575 0.4057607 [4,] 0.8117521 0.5375946 0.004505845 0.4821567 0.7644741 is.matrix(x) [1] TRUE x[1,2] x[1,] x[,2] dim(x)#得到维数(4,5)Array Array x=array(runif(24),c(4,3,2)) is.matrix(x) #可由dim(x)得到维数(4,3,2) [1] FALSE x , , 1 [,1] [,2] [,3] [1,] 0.3512615 0.7270611 0.009055522 [2,] 0.1444965 0.2527673 0.697977027 [3,] 0.6658176 0.6638542 0.773747542 [4,] 0.4258436 0.4168940 0.634235148 , , 2 [,1] [,2] [,3] [1,] 0.3664152 0.9633497 0.5628006 [2,] 0.3466645 0.5036830 0.1542986 [3,] 0.4552553 0.1289775 0.8423017 [4,] 0.1074899 0.3841463 0.7648297Array的子集 Array的子集 > x=array(1:24,c(4,3,2)) x[c(1,3),,] , , 1 [,1] [,2] [,3] [1,] 1 5 9 [2,] 3 7 11 , , 2 [,1] [,2] [,3] [1,] 13 17 21 [2,] 15 19 23矩阵乘法及行列运算 矩阵乘法及行列运算 x=matrix(1:30,5,6);y=matrix(rnorm(20),4,5) y%*%x [,1] [,2] [,3] [,4] [,5] [,6] [1,] -3.231808 -8.13791204 -13.044017 -17.950121 -22.856225 -27.762330 [2,] -14.072030 -39.33640851 -64.600787 -89.865165 -115.129543 -140.393921 [3,] -1.750057 -0.02764783 1.694761 3.417170 5.139578 6.861987 [4,] 5.862412 9.78064218 13.698872 17.617103 21.535333 25.453563 apply(x,1,mean) [1] 13.5 14.5 15.5 16.5 17.5 apply(x,2,sum) [1] 15 40 65 90 115 140 apply(x,2,prod) [1] 120 30240 360360 1860480 6375600 17100720 Array的维运算 Array的维运算 x=array(1:24,c(4,3,2)) apply(x,1,mean) [1] 11 12 13 14 apply(x,1:2,sum) [,1] [,2] [,3] [1,] 14 22 30 [2,] 16 24 32 [3,] 18 26 34 [4,] 20 28 36 apply(x,c(1,3),prod) [,1] [,2] [1,] 45 4641 [2,] 120 5544 [3,] 231 6555 [4,] 384 7680 矩阵与向量之间的运算 矩阵与向量之间的运算 sweep(x,1,1:5,"*") [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 6 11 16 21 26 [2,] 4 14 24 34 44 54 [3,] 9 24 39 54 69 84 [4,] 16 36 56 76 96 116 [5,] 25 50 75 100 125 150 x*1:5 sweep(x,2,1:6,"+") [,1] [,2] [,3] [,4] [,5] [,6] [1,] 2 8 14 20 26 32 [2,] 3 9 15 21 27 33 [3,] 4 10 16 22 28 34 [4,] 5 11 17 23 29 35 [5,] 6 12 18 24 30 36 Array和矩阵/向量/array之间的运算 Array和矩阵/向量/array之间的运算 z=array(1:24,c(2,3,4))#注意排列次序 > z , , 1 [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 , , 2 [,1] [,2] [,3] [1,] 7 9 11 [2,] 8 10 12 , , 3 [,1] [,2] [,3] [1,] 13 15 17 [2,] 14 16 18 , , 4 [,1] [,2] [,3] [1,] 19 21 23 [2,] 20 22 24Array和矩阵/向量/array之间的运算 Array和矩阵/向量/array之间的运算 sweep(z,1,1:2,"-") , , 1 [,1] [,2] [,3] [1,] 0 2 4 [2,] 0 2 4 , , 2 [,1] [,2] [,3] [1,] 6 8 10 [2,] 6 8 10 , , 3 [,1] [,2] [,3] [1,] 12 14 16 [2,] 12 14 16 , , 4 [,1] [,2] [,3] [1,] 18 20 22 [2,] 18 20 22Array和矩阵/向量/array之间的运算 Array和矩阵/向量/array之间的运算 sweep(z,c(1,2),matrix(1:6,2,3),"-") , , 1 [,1] [,2] [,3] [1,] 0 0 0 [2,] 0 0 0 , , 2 [,1] [,2] [,3] [1,] 6 6 6 [2,] 6 6 6 , , 3 [,1] [,2] [,3] [1,] 12 12 12 [2,] 12 12 12 , , 4 [,1] [,2] [,3] [1,] 18 18 18 [2,] 18 18 18外积(产生矩阵或array) 外积(产生矩阵或array) outer(1:2,rep(1,2)) [,1] [,2] [1,] 1 1 [2,] 2 2 outer(1:2,matrix(rep(1,6),3,2)) , , 1 [,1] [,2] [,3] [1,] 1 1 1 [2,] 2 2 2 , , 2 [,1] [,2] [,3] [1,] 1 1 1 [2,] 2 2 2 List(set of objects)List(set of objects)list可以是任何对象的集合(包括lists) z=list(1:3,Tom=c(1:2, a=list("R",letters[1:5]),w="hi!")) z[[1]];z[[2]];z$T;z$T$a2;z$T[[3]];z$T$w attributes(z)#属性! $names [1] "" "Tom" attributes(matrix(1:6,2,3)) $dim [1] 2 3矩阵,array及其维名字 矩阵,array及其维名字 x=matrix(1:12,nrow=3,dimnames=list(c("I","II","III"),paste("X",1:4,sep=""))) X1 X2 X3 X4 I 1 4 7 10 II 2 5 8 11 III 3 6 9 12 y=array(1:12,c(3,2,2),dimnames=list(c("I","II","III"),paste("X",1:2 sep=""),paste("Y",1:2, sep=""))) , , Y1 X1 X2 I 1 4 II 2 5 III 3 6 , , Y2 X1 X2 I 7 10 II 8 11 III 9 12data.frame data.frame x=matrix(1:6,2,3) x=as.data.frame(x);x V1 V2 V3 1 1 3 5 2 2 4 6 x$V2 [1] 3 4 x$V2 [1] 3 4 attributes(x) $names [1] "V1" "V2" "V3" $row.names [1] "1" "2" $class [1] "data.frame" data.frame data.frame names(x)=c("TOYOTA","GM","HUNDA") row.names(x)=c("2001","2002") x TOYOTA GM HUNDA 2001 1 3 5 2002 2 4 6 x$GM [1] 3 4 data.frame data.frame attach(x) GM [1] 3 4 detach(x) GM Error: Object "GM" not found 直接手工输入和编辑数据 直接手工输入和编辑数据 直接敲入:x=c(1,2,7,8,…) 或者 x=scan() 1 2 7 8 ….(以“Enter”两次来结束) fix(x)(通过编辑修改数据) nullCategorical data A survey asks people if they smoke or not. The data is Yes, No, No, Yes, Yes x=c("Yes","No","No","Yes","Yes") table(x);x factor(x)nullBarplot:Suppose, a group of 25 people are surveyed as to their beer-drinking preference. The categories were (1) Domestic can, (2) Domestic bottle, (3) Microbrew and (4) import. The raw data is 3 4 1 1 3 4 3 3 1 3 2 1 2 1 2 3 2 3 1 1 1 1 4 3 1 beer = scan() 3 4 1 1 3 4 3 3 1 3 2 1 2 1 2 3 2 3 1 1 1 1 4 3 1 barplot(beer) # this isn't correct barplot(table(beer)) # Yes, call with summarized data barplot(table(beer)/length(beer)) # divide by n for proportion table(beer)/length(beer)nullCEO salaries: Suppose, CEO yearly compensations are sampled and the following are found (in millions). (This is before being indicted for cooking the books.) 12 .4 5 2 50 8 3 1 4 0.25 sals = scan() # read in with scan 12 .4 5 2 50 8 3 1 4 0.25 mean(sals) ;var(sals) ; sd(sals) ;median(sals) fivenum(sals) # min, lower hinge, Median, upper hinge, max summary(sals) data=c(10, 17, 18, 25, 28,28); summary(data); quantile(data,.25); quantile(data,c(.25,.75)) nullsort(sals); fivenum(sals);summary(sals) mean(sals,trim=1/10) ;mean(sals,trim=2/10) IQR(sals) Mad:median|Xi-median(X)|(1.4826) mad(sals) median(abs(sals - median(sals))) # without median(abs(sals - median(sals))) * 1.4826nullStem-and-leaf Charts Suppose you have the box score of a basketball game and the following points per game for players on both teams 2 3 16 23 14 12 4 13 2 0 0 0 6 28 31 14 4 8 2 5 scores = scan() 2 3 16 23 14 12 4 13 2 0 0 0 6 28 31 14 4 8 2 5 apropos("stem")#`apropos‘ returns a character vector giving the names of all objects in the search list matching `what’. 如> apropos(“stem”) [1] “stem” “system” “system.file” “system.time” 参看find("stem") stem(scores);stem(scores,scale=2)nullThe salaries could be placed into broad categories of 0-1 million, 1-5 million and over 5 million. To do this using R one uses the cut() function and the table() function. Suppose the salaries are again 12 .4 5 2 50 8 3 1 4 .25 And we want to break that data into the intervals [0; 1]; (1; 5]; (5; 50] sals = c
