nullnull统计软件和R语言装了R没有?一个广泛接受的统计定义为:一个广泛接受的统计定义为:统计是用以收集数据、分析数据和由数据得出结论的一组概念、原则和
方法
快递客服问题件处理详细方法山木方法pdf计算方法pdf华与华方法下载八字理论方法下载
.这个定义决定了统计的命运:这个定义决定了统计的命运:和数学及音乐不同, 统计不能欣赏自己, 它不为实际服务就没有存在必要
统计必须为各个领域服务
统计必须和数据打交道
因此,统计必须和计算机结合搞“理论统计”是否用不着动手搞数据呢?搞“理论统计”是否用不着动手搞数据呢?如果倒退几十年…
就可以.
如果没有应用背景如果没有应用背景文章没人要,
基金无人给.
现在一些人即使瞎编也要编出一个应用背景来.
纯理论统计存在吗?统计和计算机 统计和计算机 现代生活已离不开计算机了。
但最早使用计算机的是统计。
最初的计算机仅仅是为科学计算而建造的。大型计算机的最早一批用户就包含统计。
而现在统计仍然是进行数字计算最多的用户。统计和计算机 统计和计算机 计算机现在早已脱离了仅有计算功能的单一模式,而成为百姓生活的一部分。
计算机的使用,也从过去必须学会计算机语言到只需要“傻瓜式”地点击鼠标。结果也从单纯的数字输出到包括漂亮的表格和图形的各种形式。统计软件统计软件统计软件的发展,也使得统计从统计学家的圈内游戏变成了大众的游戏。
只要输入数据,点几下鼠标,做一些选项,马上就得到令人惊叹的漂亮结果了。统计软件统计软件是否傻瓜式的统计软件使用可以代替统计课程了?
当然不是。
数据的整理和识别,方法的选用,计算机输出结果的理解都不象使用傻瓜相机那样简单可靠。统计软件的问题统计软件的问题诸如法律和医学的软件都有不少警告,不时提醒你去咨询专家。
这是注意饭碗的律师和大夫的高明之处。
但统计软件则不那么负责。只要数据
格式
pdf格式笔记格式下载页码格式下载公文格式下载简报格式下载
无误、方法不矛盾而且不用零作为除数就一定给你结果,而且没有任何警告。
可能统计学家缺乏商业头脑。统计软件的问题统计软件的问题另外,统计软件输出的结果太多;
即使是同样的方法,不同软件输出的内容还不一样;有时同样的内容名称也不一样。
这就使得使用者大伤脑筋。即使统计学家也不一定能解释所有的输出。
因此,就应该特别留神,明白自己是在干什么。不要在得到一堆毫无意义的垃圾之后还沾沾自喜。 null
data test;
input x;
cards;
1
2
3
17
60
run;
proc univariate freq normal;
run;随意键入几行SAS语句和5个数目...得到下面结果,一共50多个数目(你能够解释多少?你需要多少?). null The SAS System 15:33 Friday, September 12, 2003 1
Univariate Procedure
Variable=X
Moments Quantiles(Def=5)
N 5 Sum Wgts 5 100% Max 60 99% 60
Mean 16.6 Sum 83 75% Q3 17 95% 60
Std Dev 25.12568 Variance 631.3 50% Med 3 90% 60
Skewness 1.899804 Kurtosis 3.563057 25% Q1 2 10% 1
USS 3903 CSS 2525.2 0% Min 1 5% 1
CV 151.3595 Std Mean 11.23655 1% 1
T:Mean=0 1.477322 Pr>|T| 0.2136 Range 59
Num ^= 0 5 Num > 0 5 Q3-Q1 15
M(Sign) 2.5 Pr>=|M| 0.0625 Mode 1
Sgn Rank 7.5 Pr>=|S| 0.0625
W:Normal 0.726472 Pr
y->w
简单数学运算有: +,-,*,/, ^,%*%,%%(mod) %/%(整数除法)等等
常用的数学函数有:abs , sign , log , log2, log10 , logb, expm1, log1p(x), sqrt , exp , sin , cos , tan , acos , asin, atan , cosh , sinh, tanh
还有 还有 round, floor, ceiling
gamma , lgamma, digamma and trigamma.
sum, prod, cumsum, cumprod
max, min, cummax, cummin, pmax, pmin, range
mean, length, var, duplicated, unique
union, intersect, setdiff
>, >=, <, <=, &, |, !null还有 还有 输入输出数据: scan, read.table, dump, save, load, write, write.table
letters, LETTERS
list, matrix, array, cbind, rbind, merge
sort, order, sort.list, rev, stack, unstack , reshape
序列和向量 序列和向量 z=seq(-1,10,length=100)
z=seq(-1,10, len=100)
z=seq(10,-1,-1)
z=10:-1
x=rep(3,1:3)
x=rep(3:5,1:3)
> x
[1] 3 4 4 5 5 5
x=rep(c(1,10),c(4,5))
w=c(1,3,x,z);w[3]分布和产生随机数 分布和产生随机数 正态分布:
pnorm(1.2,2,1);dnorm(1.2,2,1); qnorm(.7,2,1);rnorm(10,0,1)#rnorm(10)
t分布:pt(1.2,1);dt(1.2,2);qt(.7,1);rt(10,1)
此外还有指数分布、F分布、“卡方”分布、Beta分布、二项分布、Cauchy分布、Gamma分布、几何分布、超几何分布、对数正态分布、Logistic分布、负二项分布、Poisson分布、均匀分布、Weibull分布、Willcoxon分布等
变元可以是向量!向量运算 向量运算 x=rep(0,10);z=1:3;x+z
[1] 1 2 3 1 2 3 1 2 3 1 Warning message: longer object length is not a multiple of shorter object length in: x + z
x*z
[1] 0 0 0 0 0 0 0 0 0 0 Warning message: longer object length is not a multiple of shorter object length in: x * z
rev(x)
z=c("no cat","has ","nine","tails")
z[1]=="no cat"
[1] TRUE向量名字和append 向量名字和append x=1:3;names(x)=LETTERS[1:3]
x
A B C
1 2 3
append(x,runif(3),after=2)
A B C
1.0000000 2.0000000 0.3107987 0.7505149 0.5752226 3.0000000
向量赋值 向量赋值 z=1:5
z[7]=8;z
[1] 1 2 3 4 5 NA 8
z=NULL
z[c(1,3,5)]=1:3; z
[1] 1 NA 2 NA 3
rnorm(10)[c(2,5)]
z[-c(1,3)] #去掉第1、3元素.
z[(length(z)-4):length(z)] #最后五个元素.向量的大小次序 向量的大小次序 z=sample(1:100,10);z #比较sample(1:100,10,rep=T)
[1] 75 68 28 42 17 21 96 34 69 47
order(z)
[1] 5 6 3 8 4 10 2 9 1 7
z[order(z)]
[1] 17 21 28 34 42 47 68 69 75 96
sort(z)
[1] 17 21 28 34 42 47 68 69 75 96
which(z==max(z))#给出下标MatrixMatrixx=matrix(runif(20),4,5)
> x
[,1] [,2] [,3] [,4] [,5]
[1,] 0.7983678 0.04607601 0.04555323 0.8594483 0.73089500
[2,] 0.6559851 0.79562222 0.02948270 0.1453364 0.79552838
[3,] 0.6759171 0.56193147 0.48286653 0.2419931 0.56069988
[4,] 0.1183701 0.80652627 0.49405167 0.6523137 0.08345406
> x=matrix(1:20,4,5);x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
> x=matrix(1:20,4,5,byrow=T);x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
[4,] 16 17 18 19 20一些简单函数一些简单函数max,min,length,mean,median, fivenum,quantile,unique,sd,var,range,rep,diff,sort,order,sum,cumsum,prod,cumprod,rev,print,sample,seq,exp,pi矩阵的行和列(子集)矩阵的行和列(子集)nrow(x); ncol(x);dim(x)#行列数目
x=matrix(rnorm(24),4,6)
x[c(2,1),]#第2和第1行
x[,c(1,3)] #第1和第3列
x[2,1] #第[2,1]元素
x[x[,1]>0,1] #第1列大于0的元素
sum(x[,1]>0) #第1列大于0的元素的个数
sum(x[,1]<=0) #第1列不大于0的元素的个数
x[,-c(1,3)] #没有第1、3列的x.
x[-2,-c(1,3)] #没有第2行、第1、3列的x.
矩阵/向量的 (子集)矩阵/向量的 (子集)x[x[,1]>0&x[,3]<=1,1]; #第1中大于0并且相应于第3列中小于或等于1的元素(“与”)
x[x[,2]>0|x[,1]<.51,1] #第1中小于.51或者相应于第2列中大于0的元素(“或”)
x[!x[,2]<.51,1]#第一列中相应于第2列中不小于.51的元素(“非”)
逻辑运算:>, <, ==, <=, >=, !=;&, |, !
x=rnorm(10)
all(x>0);all(x!=0);any(x>0);(1:10)[x>0]
x=sample(1:7,5,rep=T);unique(x)矩阵的转置和逆矩阵矩阵的转置和逆矩阵x=matrix(runif(9),3,3);x
[,1] [,2] [,3]
[1,] 0.6747652 0.9954731 0.7524502
[2,] 0.3090199 0.2390141 0.2472961
[3,] 0.5102675 0.9515505 0.6082803
t(x)
[,1] [,2] [,3]
[1,] 0.6747652 0.3090199 0.5102675
[2,] 0.9954731 0.2390141 0.9515505
[3,] 0.7524502 0.2472961 0.6082803
solve(x) # solve(a,b)可以解ax=b方程
[,1] [,2] [,3]
[1,] -12.313293 15.125819 9.082300
[2,] -8.459725 3.627898 8.989864
[3,] 23.563034 -18.363808 -20.037986警告:计算机中的0是什么?警告:计算机中的0是什么?x%*%solve(x)
[,1] [,2] [,3]
[1,] 1.000000e+00 -9.454243e-17 -3.911801e-16
[2,] 5.494737e-16 1.000000e+00 3.248270e-16
[3,] -3.018419e-16 1.804980e-15 1.000000e+00
要用线性代数的知识来判断诸如有多少非零特征根等问题.假定v是特征值组成的向量,不能用诸如sum(v!=0) 等方法来判断非零特征根的数目!Matrix & Array Matrix & Array x=array(runif(20),c(4,5)); x
[,1] [,2] [,3] [,4] [,5]
[1,] 0.5474306 0.2362356 0.687007107 0.4036998 0.5255839
[2,] 0.8234363 0.4922711 0.960554564 0.4704976 0.1327870
[3,] 0.1861151 0.8461655 0.390523424 0.2202575 0.4057607
[4,] 0.8117521 0.5375946 0.004505845 0.4821567 0.7644741
is.matrix(x)
[1] TRUE
x[1,2]
x[1,]
x[,2]
dim(x)#得到维数(4,5)Array Array x=array(runif(24),c(4,3,2))
is.matrix(x) #可由dim(x)得到维数(4,3,2)
[1] FALSE
x
, , 1
[,1] [,2] [,3]
[1,] 0.3512615 0.7270611 0.009055522
[2,] 0.1444965 0.2527673 0.697977027
[3,] 0.6658176 0.6638542 0.773747542
[4,] 0.4258436 0.4168940 0.634235148
, , 2
[,1] [,2] [,3]
[1,] 0.3664152 0.9633497 0.5628006
[2,] 0.3466645 0.5036830 0.1542986
[3,] 0.4552553 0.1289775 0.8423017
[4,] 0.1074899 0.3841463 0.7648297Array的子集 Array的子集 > x=array(1:24,c(4,3,2))
x[c(1,3),,]
, , 1
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 3 7 11
, , 2
[,1] [,2] [,3]
[1,] 13 17 21
[2,] 15 19 23矩阵乘法及行列运算 矩阵乘法及行列运算 x=matrix(1:30,5,6);y=matrix(rnorm(20),4,5)
y%*%x
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -3.231808 -8.13791204 -13.044017 -17.950121 -22.856225 -27.762330
[2,] -14.072030 -39.33640851 -64.600787 -89.865165 -115.129543 -140.393921
[3,] -1.750057 -0.02764783 1.694761 3.417170 5.139578 6.861987
[4,] 5.862412 9.78064218 13.698872 17.617103 21.535333 25.453563
apply(x,1,mean)
[1] 13.5 14.5 15.5 16.5 17.5
apply(x,2,sum)
[1] 15 40 65 90 115 140
apply(x,2,prod)
[1] 120 30240 360360 1860480 6375600 17100720
Array的维运算 Array的维运算 x=array(1:24,c(4,3,2))
apply(x,1,mean)
[1] 11 12 13 14
apply(x,1:2,sum)
[,1] [,2] [,3]
[1,] 14 22 30
[2,] 16 24 32
[3,] 18 26 34
[4,] 20 28 36
apply(x,c(1,3),prod)
[,1] [,2]
[1,] 45 4641
[2,] 120 5544
[3,] 231 6555
[4,] 384 7680
矩阵与向量之间的运算 矩阵与向量之间的运算 sweep(x,1,1:5,"*")
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 6 11 16 21 26
[2,] 4 14 24 34 44 54
[3,] 9 24 39 54 69 84
[4,] 16 36 56 76 96 116
[5,] 25 50 75 100 125 150
x*1:5
sweep(x,2,1:6,"+")
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 2 8 14 20 26 32
[2,] 3 9 15 21 27 33
[3,] 4 10 16 22 28 34
[4,] 5 11 17 23 29 35
[5,] 6 12 18 24 30 36
Array和矩阵/向量/array之间的运算 Array和矩阵/向量/array之间的运算 z=array(1:24,c(2,3,4))#注意排列次序
> z
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
, , 3
[,1] [,2] [,3]
[1,] 13 15 17
[2,] 14 16 18
, , 4
[,1] [,2] [,3]
[1,] 19 21 23
[2,] 20 22 24Array和矩阵/向量/array之间的运算 Array和矩阵/向量/array之间的运算 sweep(z,1,1:2,"-")
, , 1
[,1] [,2] [,3]
[1,] 0 2 4
[2,] 0 2 4
, , 2
[,1] [,2] [,3]
[1,] 6 8 10
[2,] 6 8 10
, , 3
[,1] [,2] [,3]
[1,] 12 14 16
[2,] 12 14 16
, , 4
[,1] [,2] [,3]
[1,] 18 20 22
[2,] 18 20 22Array和矩阵/向量/array之间的运算 Array和矩阵/向量/array之间的运算 sweep(z,c(1,2),matrix(1:6,2,3),"-")
, , 1
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
, , 2
[,1] [,2] [,3]
[1,] 6 6 6
[2,] 6 6 6
, , 3
[,1] [,2] [,3]
[1,] 12 12 12
[2,] 12 12 12
, , 4
[,1] [,2] [,3]
[1,] 18 18 18
[2,] 18 18 18外积(产生矩阵或array) 外积(产生矩阵或array) outer(1:2,rep(1,2))
[,1] [,2]
[1,] 1 1
[2,] 2 2
outer(1:2,matrix(rep(1,6),3,2))
, , 1
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 2
, , 2
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 2
List(set of objects)List(set of objects)list可以是任何对象的集合(包括lists)
z=list(1:3,Tom=c(1:2, a=list("R",letters[1:5]),w="hi!"))
z[[1]];z[[2]];z$T;z$T$a2;z$T[[3]];z$T$w
attributes(z)#属性!
$names
[1] "" "Tom"
attributes(matrix(1:6,2,3))
$dim
[1] 2 3矩阵,array及其维名字 矩阵,array及其维名字 x=matrix(1:12,nrow=3,dimnames=list(c("I","II","III"),paste("X",1:4,sep="")))
X1 X2 X3 X4
I 1 4 7 10
II 2 5 8 11
III 3 6 9 12
y=array(1:12,c(3,2,2),dimnames=list(c("I","II","III"),paste("X",1:2 sep=""),paste("Y",1:2, sep="")))
, , Y1
X1 X2
I 1 4
II 2 5
III 3 6
, , Y2
X1 X2
I 7 10
II 8 11
III 9 12data.frame data.frame x=matrix(1:6,2,3)
x=as.data.frame(x);x
V1 V2 V3
1 1 3 5
2 2 4 6
x$V2
[1] 3 4
x$V2
[1] 3 4
attributes(x)
$names
[1] "V1" "V2" "V3"
$row.names
[1] "1" "2"
$class
[1] "data.frame"
data.frame data.frame names(x)=c("TOYOTA","GM","HUNDA")
row.names(x)=c("2001","2002")
x
TOYOTA GM HUNDA
2001 1 3 5
2002 2 4 6
x$GM
[1] 3 4
data.frame data.frame attach(x)
GM
[1] 3 4
detach(x)
GM
Error: Object "GM" not found
直接手工输入和编辑数据 直接手工输入和编辑数据 直接敲入:x=c(1,2,7,8,…)
或者
x=scan()
1 2 7 8 ….(以“Enter”两次来结束)
fix(x)(通过编辑修改数据)
nullCategorical data
A survey asks people if they smoke or not. The data is Yes, No, No, Yes, Yes
x=c("Yes","No","No","Yes","Yes")
table(x);x
factor(x)nullBarplot:Suppose, a group of 25 people are surveyed as to their beer-drinking preference. The categories were (1) Domestic can, (2) Domestic bottle, (3) Microbrew and (4) import. The raw data is 3 4 1 1 3 4 3 3 1 3 2 1 2 1 2 3 2 3 1 1 1 1 4 3 1
beer = scan()
3 4 1 1 3 4 3 3 1 3 2 1 2 1 2 3 2 3 1 1 1 1 4 3 1
barplot(beer) # this isn't correct
barplot(table(beer)) # Yes, call with summarized data
barplot(table(beer)/length(beer)) # divide by n for proportion table(beer)/length(beer)nullCEO salaries: Suppose, CEO yearly compensations are sampled and the following are found (in millions). (This is before being indicted for cooking the books.) 12 .4 5 2 50 8 3 1 4 0.25
sals = scan() # read in with scan
12 .4 5 2 50 8 3 1 4 0.25
mean(sals) ;var(sals) ; sd(sals) ;median(sals)
fivenum(sals) # min, lower hinge, Median, upper hinge, max
summary(sals)
data=c(10, 17, 18, 25, 28,28); summary(data); quantile(data,.25); quantile(data,c(.25,.75)) nullsort(sals); fivenum(sals);summary(sals) mean(sals,trim=1/10) ;mean(sals,trim=2/10) IQR(sals)
Mad:median|Xi-median(X)|(1.4826)
mad(sals)
median(abs(sals - median(sals))) # without
median(abs(sals - median(sals))) * 1.4826nullStem-and-leaf Charts Suppose you have the box score of a basketball game and the following points per game for players on both teams 2 3 16 23 14 12 4 13 2 0 0 0 6 28 31 14 4 8 2 5 scores = scan()
2 3 16 23 14 12 4 13 2 0 0 0 6 28 31 14 4 8 2 5
apropos("stem")#`apropos‘ returns a character vector giving the names of all objects in the search list matching `what’. 如> apropos(“stem”) [1] “stem” “system” “system.file” “system.time” 参看find("stem")
stem(scores);stem(scores,scale=2)nullThe salaries could be placed into broad categories of 0-1 million, 1-5 million and over 5 million. To do this using R one uses the cut() function and the table() function. Suppose the salaries are again 12 .4 5 2 50 8 3 1 4 .25 And we want to break that data into the intervals [0; 1]; (1; 5]; (5; 50] sals = c