统计软件R入门

统计软件R入门nullnull统计软件和R语言装了R没有？一个广泛接受的统计定义为：一个广泛接受的统计定义为：统计是用以收集数据、分析数据和由数据得出结论的一组概念、原则和方法.这个定义决定了统计的命运：这个定义决定了统计的命运：和数学及音乐不同, 统计不能欣赏自己, 它不为实际服务就没有存在必要统计必须为各个领域服务统计必须和数据打交道因此,统计必须和计算机结合搞“理论统计”是否用不着动手搞数据呢?搞“理论统计”是否用不着动手搞数据呢?如果倒退几十年… 就可以. 如果没有应用背景如果没有应用背景文章没人要, 基金无人给...

nullnull统计软件和R语言装了R没有？一个广泛接受的统计定义为：一个广泛接受的统计定义为：统计是用以收集数据、分析数据和由数据得出结论的一组概念、原则和方法 .这个定义决定了统计的命运：这个定义决定了统计的命运：和数学及音乐不同, 统计不能欣赏自己, 它不为实际服务就没有存在必要统计必须为各个领域服务统计必须和数据打交道因此,统计必须和计算机结合搞“理论统计”是否用不着动手搞数据呢?搞“理论统计”是否用不着动手搞数据呢?如果倒退几十年… 就可以. 如果没有应用背景如果没有应用背景文章没人要, 基金无人给. 现在一些人即使瞎编也要编出一个应用背景来. 纯理论统计存在吗？统计和计算机统计和计算机现代生活已离不开计算机了。但最早使用计算机的是统计。最初的计算机仅仅是为科学计算而建造的。大型计算机的最早一批用户就包含统计。而现在统计仍然是进行数字计算最多的用户。统计和计算机统计和计算机计算机现在早已脱离了仅有计算功能的单一模式，而成为百姓生活的一部分。计算机的使用，也从过去必须学会计算机语言到只需要“傻瓜式”地点击鼠标。结果也从单纯的数字输出到包括漂亮的表格和图形的各种形式。统计软件统计软件统计软件的发展，也使得统计从统计学家的圈内游戏变成了大众的游戏。只要输入数据，点几下鼠标，做一些选项，马上就得到令人惊叹的漂亮结果了。统计软件统计软件是否傻瓜式的统计软件使用可以代替统计课程了？当然不是。数据的整理和识别，方法的选用，计算机输出结果的理解都不象使用傻瓜相机那样简单可靠。统计软件的问题统计软件的问题诸如法律和医学的软件都有不少警告，不时提醒你去咨询专家。这是注意饭碗的律师和大夫的高明之处。但统计软件则不那么负责。只要数据格式无误、方法不矛盾而且不用零作为除数就一定给你结果，而且没有任何警告。可能统计学家缺乏商业头脑。统计软件的问题统计软件的问题另外，统计软件输出的结果太多；即使是同样的方法，不同软件输出的内容还不一样；有时同样的内容名称也不一样。这就使得使用者大伤脑筋。即使统计学家也不一定能解释所有的输出。因此，就应该特别留神，明白自己是在干什么。不要在得到一堆毫无意义的垃圾之后还沾沾自喜。 null data test; input x; cards; 1 2 3 17 60 run; proc univariate freq normal; run;随意键入几行SAS语句和5个数目...得到下面结果,一共50多个数目(你能够解释多少?你需要多少?). null The SAS System 15:33 Friday, September 12, 2003 1 Univariate Procedure Variable=X Moments Quantiles(Def=5) N 5 Sum Wgts 5 100% Max 60 99% 60 Mean 16.6 Sum 83 75% Q3 17 95% 60 Std Dev 25.12568 Variance 631.3 50% Med 3 90% 60 Skewness 1.899804 Kurtosis 3.563057 25% Q1 2 10% 1 USS 3903 CSS 2525.2 0% Min 1 5% 1 CV 151.3595 Std Mean 11.23655 1% 1 T:Mean=0 1.477322 Pr>|T| 0.2136 Range 59 Num ^= 0 5 Num > 0 5 Q3-Q1 15 M(Sign) 2.5 Pr>=|M| 0.0625 Mode 1 Sgn Rank 7.5 Pr>=|S| 0.0625 W:Normal 0.726472 Pry->w 简单数学运算有: +,-,*,/, ^,%*%,%%(mod) %/%(整数除法)等等常用的数学函数有:abs , sign , log , log2, log10 , logb, expm1, log1p(x), sqrt , exp , sin , cos , tan , acos , asin, atan , cosh , sinh， tanh 还有还有 round, floor, ceiling gamma , lgamma, digamma and trigamma. sum, prod, cumsum, cumprod max, min, cummax, cummin, pmax, pmin, range mean, length, var, duplicated, unique union, intersect, setdiff >, >=, <, <=, &, |, !null还有还有输入输出数据: scan, read.table, dump, save, load, write, write.table letters, LETTERS list, matrix, array, cbind, rbind, merge sort, order, sort.list, rev, stack, unstack , reshape 序列和向量序列和向量 z=seq(-1,10,length=100) z=seq(-1,10, len=100) z=seq(10,-1,-1) z=10:-1 x=rep(3,1:3) x=rep(3:5,1:3) > x [1] 3 4 4 5 5 5 x=rep(c(1,10),c(4,5)) w=c(1,3,x,z);w[3]分布和产生随机数分布和产生随机数正态分布: pnorm(1.2,2,1);dnorm(1.2,2,1); qnorm(.7,2,1);rnorm(10,0,1)#rnorm(10) t分布:pt(1.2,1);dt(1.2,2);qt(.7,1);rt(10,1) 此外还有指数分布、F分布、“卡方”分布、Beta分布、二项分布、Cauchy分布、Gamma分布、几何分布、超几何分布、对数正态分布、Logistic分布、负二项分布、Poisson分布、均匀分布、Weibull分布、Willcoxon分布等变元可以是向量!向量运算向量运算 x=rep(0,10);z=1:3;x+z [1] 1 2 3 1 2 3 1 2 3 1 Warning message: longer object length is not a multiple of shorter object length in: x + z x*z [1] 0 0 0 0 0 0 0 0 0 0 Warning message: longer object length is not a multiple of shorter object length in: x * z rev(x) z=c("no cat","has ","nine","tails") z[1]=="no cat" [1] TRUE向量名字和append 向量名字和append x=1:3;names(x)=LETTERS[1:3] x A B C 1 2 3 append(x,runif(3),after=2) A B C 1.0000000 2.0000000 0.3107987 0.7505149 0.5752226 3.0000000 向量赋值向量赋值 z=1:5 z[7]=8;z [1] 1 2 3 4 5 NA 8 z=NULL z[c(1,3,5)]=1:3; z [1] 1 NA 2 NA 3 rnorm(10)[c(2,5)] z[-c(1,3)] #去掉第1、3元素. z[(length(z)-4):length(z)] #最后五个元素.向量的大小次序向量的大小次序 z=sample(1:100,10);z #比较sample(1:100,10,rep=T) [1] 75 68 28 42 17 21 96 34 69 47 order(z) [1] 5 6 3 8 4 10 2 9 1 7 z[order(z)] [1] 17 21 28 34 42 47 68 69 75 96 sort(z) [1] 17 21 28 34 42 47 68 69 75 96 which(z==max(z))#给出下标MatrixMatrixx=matrix(runif(20),4,5) > x [,1] [,2] [,3] [,4] [,5] [1,] 0.7983678 0.04607601 0.04555323 0.8594483 0.73089500 [2,] 0.6559851 0.79562222 0.02948270 0.1453364 0.79552838 [3,] 0.6759171 0.56193147 0.48286653 0.2419931 0.56069988 [4,] 0.1183701 0.80652627 0.49405167 0.6523137 0.08345406 > x=matrix(1:20,4,5);x [,1] [,2] [,3] [,4] [,5] [1,] 1 5 9 13 17 [2,] 2 6 10 14 18 [3,] 3 7 11 15 19 [4,] 4 8 12 16 20 > x=matrix(1:20,4,5,byrow=T);x [,1] [,2] [,3] [,4] [,5] [1,] 1 2 3 4 5 [2,] 6 7 8 9 10 [3,] 11 12 13 14 15 [4,] 16 17 18 19 20一些简单函数一些简单函数max,min,length,mean,median, fivenum,quantile,unique,sd,var,range,rep,diff,sort,order,sum,cumsum,prod,cumprod,rev,print,sample,seq,exp,pi矩阵的行和列(子集)矩阵的行和列(子集)nrow(x); ncol(x);dim(x)#行列数目 x=matrix(rnorm(24),4,6) x[c(2,1),]#第2和第1行 x[,c(1,3)] #第1和第3列 x[2,1] #第[2,1]元素 x[x[,1]>0,1] #第1列大于0的元素 sum(x[,1]>0) #第1列大于0的元素的个数 sum(x[,1]<=0) #第1列不大于0的元素的个数 x[,-c(1,3)] #没有第1、3列的x. x[-2,-c(1,3)] #没有第2行、第1、3列的x. 矩阵/向量的 (子集)矩阵/向量的 (子集)x[x[,1]>0&x[,3]<=1,1]; #第1中大于0并且相应于第3列中小于或等于1的元素(“与”) x[x[,2]>0|x[,1]<.51,1] #第1中小于.51或者相应于第2列中大于0的元素(“或”) x[!x[,2]<.51,1]#第一列中相应于第2列中不小于.51的元素(“非”) 逻辑运算:>, <, ==, <=, >=, !=;&, |, ! x=rnorm(10) all(x>0);all(x!=0);any(x>0);(1:10)[x>0] x=sample(1:7,5,rep=T);unique(x)矩阵的转置和逆矩阵矩阵的转置和逆矩阵x=matrix(runif(9),3,3);x [,1] [,2] [,3] [1,] 0.6747652 0.9954731 0.7524502 [2,] 0.3090199 0.2390141 0.2472961 [3,] 0.5102675 0.9515505 0.6082803 t(x) [,1] [,2] [,3] [1,] 0.6747652 0.3090199 0.5102675 [2,] 0.9954731 0.2390141 0.9515505 [3,] 0.7524502 0.2472961 0.6082803 solve(x) # solve(a,b)可以解ax=b方程 [,1] [,2] [,3] [1,] -12.313293 15.125819 9.082300 [2,] -8.459725 3.627898 8.989864 [3,] 23.563034 -18.363808 -20.037986警告:计算机中的0是什么?警告:计算机中的0是什么?x%*%solve(x) [,1] [,2] [,3] [1,] 1.000000e+00 -9.454243e-17 -3.911801e-16 [2,] 5.494737e-16 1.000000e+00 3.248270e-16 [3,] -3.018419e-16 1.804980e-15 1.000000e+00 要用线性代数的知识来判断诸如有多少非零特征根等问题.假定v是特征值组成的向量,不能用诸如sum(v!=0) 等方法来判断非零特征根的数目!Matrix & Array Matrix & Array x=array(runif(20),c(4,5)); x [,1] [,2] [,3] [,4] [,5] [1,] 0.5474306 0.2362356 0.687007107 0.4036998 0.5255839 [2,] 0.8234363 0.4922711 0.960554564 0.4704976 0.1327870 [3,] 0.1861151 0.8461655 0.390523424 0.2202575 0.4057607 [4,] 0.8117521 0.5375946 0.004505845 0.4821567 0.7644741 is.matrix(x) [1] TRUE x[1,2] x[1,] x[,2] dim(x)#得到维数(4,5)Array Array x=array(runif(24),c(4,3,2)) is.matrix(x) #可由dim(x)得到维数(4,3,2) [1] FALSE x , , 1 [,1] [,2] [,3] [1,] 0.3512615 0.7270611 0.009055522 [2,] 0.1444965 0.2527673 0.697977027 [3,] 0.6658176 0.6638542 0.773747542 [4,] 0.4258436 0.4168940 0.634235148 , , 2 [,1] [,2] [,3] [1,] 0.3664152 0.9633497 0.5628006 [2,] 0.3466645 0.5036830 0.1542986 [3,] 0.4552553 0.1289775 0.8423017 [4,] 0.1074899 0.3841463 0.7648297Array的子集 Array的子集 > x=array(1:24,c(4,3,2)) x[c(1,3),,] , , 1 [,1] [,2] [,3] [1,] 1 5 9 [2,] 3 7 11 , , 2 [,1] [,2] [,3] [1,] 13 17 21 [2,] 15 19 23矩阵乘法及行列运算矩阵乘法及行列运算 x=matrix(1:30,5,6);y=matrix(rnorm(20),4,5) y%*%x [,1] [,2] [,3] [,4] [,5] [,6] [1,] -3.231808 -8.13791204 -13.044017 -17.950121 -22.856225 -27.762330 [2,] -14.072030 -39.33640851 -64.600787 -89.865165 -115.129543 -140.393921 [3,] -1.750057 -0.02764783 1.694761 3.417170 5.139578 6.861987 [4,] 5.862412 9.78064218 13.698872 17.617103 21.535333 25.453563 apply(x,1,mean) [1] 13.5 14.5 15.5 16.5 17.5 apply(x,2,sum) [1] 15 40 65 90 115 140 apply(x,2,prod) [1] 120 30240 360360 1860480 6375600 17100720 Array的维运算 Array的维运算 x=array(1:24,c(4,3,2)) apply(x,1,mean) [1] 11 12 13 14 apply(x,1:2,sum) [,1] [,2] [,3] [1,] 14 22 30 [2,] 16 24 32 [3,] 18 26 34 [4,] 20 28 36 apply(x,c(1,3),prod) [,1] [,2] [1,] 45 4641 [2,] 120 5544 [3,] 231 6555 [4,] 384 7680 矩阵与向量之间的运算矩阵与向量之间的运算 sweep(x,1,1:5,"*") [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 6 11 16 21 26 [2,] 4 14 24 34 44 54 [3,] 9 24 39 54 69 84 [4,] 16 36 56 76 96 116 [5,] 25 50 75 100 125 150 x*1:5 sweep(x,2,1:6,"+") [,1] [,2] [,3] [,4] [,5] [,6] [1,] 2 8 14 20 26 32 [2,] 3 9 15 21 27 33 [3,] 4 10 16 22 28 34 [4,] 5 11 17 23 29 35 [5,] 6 12 18 24 30 36 Array和矩阵/向量/array之间的运算 Array和矩阵/向量/array之间的运算 z=array(1:24,c(2,3,4))#注意排列次序 > z , , 1 [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 , , 2 [,1] [,2] [,3] [1,] 7 9 11 [2,] 8 10 12 , , 3 [,1] [,2] [,3] [1,] 13 15 17 [2,] 14 16 18 , , 4 [,1] [,2] [,3] [1,] 19 21 23 [2,] 20 22 24Array和矩阵/向量/array之间的运算 Array和矩阵/向量/array之间的运算 sweep(z,1,1:2,"-") , , 1 [,1] [,2] [,3] [1,] 0 2 4 [2,] 0 2 4 , , 2 [,1] [,2] [,3] [1,] 6 8 10 [2,] 6 8 10 , , 3 [,1] [,2] [,3] [1,] 12 14 16 [2,] 12 14 16 , , 4 [,1] [,2] [,3] [1,] 18 20 22 [2,] 18 20 22Array和矩阵/向量/array之间的运算 Array和矩阵/向量/array之间的运算 sweep(z,c(1,2),matrix(1:6,2,3),"-") , , 1 [,1] [,2] [,3] [1,] 0 0 0 [2,] 0 0 0 , , 2 [,1] [,2] [,3] [1,] 6 6 6 [2,] 6 6 6 , , 3 [,1] [,2] [,3] [1,] 12 12 12 [2,] 12 12 12 , , 4 [,1] [,2] [,3] [1,] 18 18 18 [2,] 18 18 18外积(产生矩阵或array) 外积(产生矩阵或array) outer(1:2,rep(1,2)) [,1] [,2] [1,] 1 1 [2,] 2 2 outer(1:2,matrix(rep(1,6),3,2)) , , 1 [,1] [,2] [,3] [1,] 1 1 1 [2,] 2 2 2 , , 2 [,1] [,2] [,3] [1,] 1 1 1 [2,] 2 2 2 List(set of objects)List(set of objects)list可以是任何对象的集合(包括lists) z=list(1:3,Tom=c(1:2, a=list("R",letters[1:5]),w="hi!")) z[[1]];z[[2]];z$T;z$T$a2;z$T[[3]];z$T$w attributes(z)#属性! $names [1] "" "Tom" attributes(matrix(1:6,2,3)) $dim [1] 2 3矩阵,array及其维名字矩阵,array及其维名字 x=matrix(1:12,nrow=3,dimnames=list(c("I","II","III"),paste("X",1:4,sep=""))) X1 X2 X3 X4 I 1 4 7 10 II 2 5 8 11 III 3 6 9 12 y=array(1:12,c(3,2,2),dimnames=list(c("I","II","III"),paste("X",1:2 sep=""),paste("Y",1:2, sep=""))) , , Y1 X1 X2 I 1 4 II 2 5 III 3 6 , , Y2 X1 X2 I 7 10 II 8 11 III 9 12data.frame data.frame x=matrix(1:6,2,3) x=as.data.frame(x);x V1 V2 V3 1 1 3 5 2 2 4 6 x$V2 [1] 3 4 x$V2 [1] 3 4 attributes(x) $names [1] "V1" "V2" "V3" $row.names [1] "1" "2" $class [1] "data.frame" data.frame data.frame names(x)=c("TOYOTA","GM","HUNDA") row.names(x)=c("2001","2002") x TOYOTA GM HUNDA 2001 1 3 5 2002 2 4 6 x$GM [1] 3 4 data.frame data.frame attach(x) GM [1] 3 4 detach(x) GM Error: Object "GM" not found 直接手工输入和编辑数据直接手工输入和编辑数据直接敲入:x=c(1,2,7,8,…) 或者 x=scan() 1 2 7 8 ….(以“Enter”两次来结束) fix(x)(通过编辑修改数据) nullCategorical data A survey asks people if they smoke or not. The data is Yes, No, No, Yes, Yes x=c("Yes","No","No","Yes","Yes") table(x);x factor(x)nullBarplot:Suppose, a group of 25 people are surveyed as to their beer-drinking preference. The categories were (1) Domestic can, (2) Domestic bottle, (3) Microbrew and (4) import. The raw data is 3 4 1 1 3 4 3 3 1 3 2 1 2 1 2 3 2 3 1 1 1 1 4 3 1 beer = scan() 3 4 1 1 3 4 3 3 1 3 2 1 2 1 2 3 2 3 1 1 1 1 4 3 1 barplot(beer) # this isn't correct barplot(table(beer)) # Yes, call with summarized data barplot(table(beer)/length(beer)) # divide by n for proportion table(beer)/length(beer)nullCEO salaries: Suppose, CEO yearly compensations are sampled and the following are found (in millions). (This is before being indicted for cooking the books.) 12 .4 5 2 50 8 3 1 4 0.25 sals = scan() # read in with scan 12 .4 5 2 50 8 3 1 4 0.25 mean(sals) ;var(sals) ; sd(sals) ;median(sals) fivenum(sals) # min, lower hinge, Median, upper hinge, max summary(sals) data=c(10, 17, 18, 25, 28,28); summary(data); quantile(data,.25); quantile(data,c(.25,.75)) nullsort(sals); fivenum(sals);summary(sals) mean(sals,trim=1/10) ;mean(sals,trim=2/10) IQR(sals) Mad:median|Xi-median(X)|(1.4826) mad(sals) median(abs(sals - median(sals))) # without median(abs(sals - median(sals))) * 1.4826nullStem-and-leaf Charts Suppose you have the box score of a basketball game and the following points per game for players on both teams 2 3 16 23 14 12 4 13 2 0 0 0 6 28 31 14 4 8 2 5 scores = scan() 2 3 16 23 14 12 4 13 2 0 0 0 6 28 31 14 4 8 2 5 apropos("stem")#`apropos‘ returns a character vector giving the names of all objects in the search list matching `what’. 如> apropos(“stem”) [1] “stem” “system” “system.file” “system.time” 参看find("stem") stem(scores);stem(scores,scale=2)nullThe salaries could be placed into broad categories of 0-1 million, 1-5 million and over 5 million. To do this using R one uses the cut() function and the table() function. Suppose the salaries are again 12 .4 5 2 50 8 3 1 4 .25 And we want to break that data into the intervals [0; 1]; (1; 5]; (5; 50] sals = c

                    本文档为【统计软件R入门】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：免费已有0 人下载

立即下载

统计软件R入门

你可能还喜欢