R语言笔记PART1

来源：小侦探旅游网

R语言笔记

画图函数列表

数列函数seq()

seq()是一个输入数列的函数：seq(from,to,step)e.g.seq(1,10,2)输出：1,3,5,7,9格式函数par()

e.g.opar<-par(no.readonly=TRUE)---复制当前画图所有参数par(pin=c(3,7),mai=c(1,2,3,4))--改变图片参数：尺寸（宽=3高=4）及位置（下=1左=2上=3右=4，单位英寸（英分为mar））par(lty=3,pch=17)--改变图片参数：线型=3点型=17par(font.lab=2)—改变坐标轴标签的字体plot(x,y)–用更改后的参数画图par(opar)–恢复原始参数（即第一句复制的参数）画图函数plot()

参数：type:图形类型（b,s,l,m等）col:主图颜色（线型、点型）lty:线型pch:点型lwd:线型宽度cex:缩放倍数（线型、点型）main:主标题sub:副标题xlab(ylab):x(y)轴标签xlim(ylim):x(y)轴最大刻度范围cex.:某项指标的缩放倍数cex.axis:坐标轴宽度缩放倍数cex.xlab:x轴标签缩放倍数cex.main:主标题缩放倍数cex.sub:副标题缩放倍数Font.:某项指标的字体font.axis:坐标轴刻度的字体font.xlab:x轴标签字体font.main:主标题字体font.sub:副标题字体e.gplot(x=c(1:10),y=2*x+3,type=”b”,col=”red”,lty=2,pch=2,lwd=2,main=”Testmaintitle”,sub=”testtitle”,xlab=”level”,ylab=”score”,xlim=c(1:12),ylim=c(1:30),col.main=”pink”,cex.main=2,cex.sub=1.5)线型图形lines()

lines()语句可以为一副现有的图形添加新的图形元素。e.g.x<-c(1:10)y<-xz<-10/xplot(x,y,type=”b”,pch=22,lty=2,col=”red”,ann=FALSE)lines(x,z,type=”b”,pch=21,col=”blue”,lty=2)设置坐标轴格式axis()

e.g(接上)axis(2,at=x,labels=x,col.axis=”red”,las=2”)axis(4,at=z,labels=round(z,digits=2),col.axis=”blue”,lax=2,cex.axis=0.7,tck=-.01)添加绘图区文本函数text()/mtext()

text()可向绘图区域内部添加文本。text(location,”texttoplace”,pos=n,…)mtext用于在图形的边界添加文本。mtext(“testtoplace”,side=n,line=n,…)e.g.(接上)mtext(“y=1/x”,side=4,line=3,cex.lab=1,las=2,col=”blue”)e.g.1sub>attach(mtcars)>plot(wt,mpg,main=\"Mlieagevs.CarWeight\>text(wt,mpg,row.names(mtcars),cex=0.3,pos=4,col=\"red\")>detach(mtcars)e.g.2>opar<-par(no.readonly=TRUE)>par(cex=1.5)>plot(1:7,1:7,type=\"n\")>text(4,4,family=\"mono\ofmono-spacedtext\")>text(5,5,family=\"serif\ofmono-spacedtext\")>text(5,5,family=\"serif\ofseriftext\")>text(5,7,family=\"serif\ofseriftext\")>text(3,3,\"Exampleofdefaulttext\")>par(opar)次要刻度线（Hmisc包）

加载包：Hmisc函数：minor.tick()Install.packages(\"Hmisc”)library(Hmisc)minor.tick(nx=n,ny=n,tick.ratio=n)添加次要刻度线。其中nx、ny笔试次要刻度线现对于主刻度线的大小比例。当前的主刻度线长度可以使用par（”tck”）获取。e.g.minor.tick(nx=2,ny=3,tick.ratio=0.5)表示在x轴的每两条主刻度线之间添加一条次要刻度线，并在y轴的每两条主刻度线之间添加2条次要刻度线。次要刻度线的长度将是主刻度线的一半。参考线abline()

用来为图形添加参考线。abline(h=yvalues,v=xvalues)e.gabline(h=c(1,5,7))在y为1,5,7位置添加了水平实线（默认线型为实线lty=1）。abline(v=seq(1,10,2),lty=2,col=”blue”)则在x为1,3,5,7,9的位置添加了垂直的蓝色虚线。图例1-1

>opar<-par(no.readonly=TRUE)>plot(dose,drugA,lty=1,pch=15,type=\"b\Avs.DrugB\Dosage\Response\>lines(dose,drugB,lty=2,pch=17,type=\"b\>abline(h=c(30),lwd=1.5,lty=2,col=\"grey\")>install.packagse(“Hmisc”)>library(Hmisc)>minor.tick(nx=3,ny=3,tick.ratio=0.5)>legend(\"topleft\Type\图形组合

图形组合参数mfrow()

mfrow(nrows,ncols)用来创建按行填充的、任意行列数的图形矩阵高级绘图函数hist()

得到柱状图。hist()函数包含了一个默认标题（见图例2-2），可以用main=””来禁用它，或者用ann=FALSE来禁用所有的标题和标签。图形组合函数layout()

layout(mat),其中mat为一个矩阵，指定了所要组合的多个图形的所在位置。参见图例2-3。mat元素的数量决定了一个outputdevice被等分成几份，为了方便我把一份叫做一个格子。这样mat内的每个元素根据他们的行列序号对应一个格子。而元素本身的值代表它属于第几个figure。举例来看。layout(matrix(c(1,2,3,0,2,3,0,0,3),nr=3))matrix有9个元素，具有这样的形式：[,1][,2][,3][1,]100[2,]220[3,]333把这个矩阵传入layout函数，我们就能得到这样的outputdevice如此，figure1占据了左上角的一个格子，第二行的前两个格子属于figure2，figure3占满最下一行的三个格子。为了醒目，figure1，2，3分别标记了黄绿红颜色。在输出figure时，会按照先后顺序，将figure绘制在与其顺序相同的区域内。在我的这个例子内，就是按照黄色区域，绿色区域，红色区域的顺序。当然你可以通过更改matrix，使得各个figure按照你需要出现在不同区域，不一定按照从上到下或从左到右的传统顺序。范围参数fig()

其中x1,x2表示图形横向占据范围fromx1tox2,;y1,y2表示图形纵向占据范围fromy1toy2。fig=c(x1,x2,y1,y2)。图例2-1

attach(mtcars)opar<-par(no.readonly=TRUE)par(mfrow=c(2,2))---建立一个数组元素为图形的2*2矩阵plot(wt,mpg,main=\"Scatterploofwtvs.mpg\")plot(wt,disp,main=\"Scatterploofwtvs.disp\")hist(wt,main=\"Histogramofwt\")boxplot(wt,main=\"Boxplotogwt\")par(opar)detach(mtcars)图例2-2

opar<-par(no.readonly=TRUE)hist(wt)hist(mpg)hist(disp)par(opar)detach(mtcars)图例2-3

>attach(mtcars)>layout(matrix(c(1,1,2,3),2,2,byrow=TRUE),widths=c(3,1),heights=c(1,2))--matrix(c(1,1,2,3),2,2,byrow=TRUE)表示2*2矩阵，矩阵元素为（1,1,2,3）—widths=c(3,1)第一二行的宽度分别为(3,1):表示宽度比例为3:1—hights=c(1,2)第一二行的高度分别为(3,1):表示一个比例为1:2>hist(mpg)>hist(wt)>hist(disp)>detach(mtcars)图例2-4

>opar<-par(no.readonly=TRUE)>par(fig=c(0,0.8,0,0.8))>plot(mtcars$wt,mtcars$mpg,xlab=\"MilesperGallon\Weight\")>par(fig=c(0,0.8,0.55,1),new=TRUE)>boxplot(mtcar$wt,horizontal=TRUE,axes=FALSE)错误于boxplot(mtcar$wt,horizontal=TRUE,axes=FALSE):找不到对象'mtcar'>boxplot(mtcars$wt,horizontal=TRUE,axes=FALSE)>par(fig=c(0.65,1,0,0.8),new=TRUE)>boxplot(mtcars$mpg,axes=FALSE)>mtext(\"EnhancedScatterlpot\错误于mtext(\"EnhancedScatterlpot\side=3,outer=True,line=-3):找不到对象'True'>mtext(\"EnhancedScatterlpot\>par(opar)基本数据管理

函数within()

wthin()与with作用类似，不同的是within允许修改数据框。举例参见下面的“变量重编码”变量重编码

更新现有值的过程。格式为：变量名[conditinon]<-变量新值

仅在condition为TRUE时执行赋值。e.g.创建数据集leadership>manager<-c(1,2,3,4,5)>date<-c(\"10/24/08\>country<-c(\"US\>gender<-c(\"M\>age<-c(32,45,25,39,99)>q1<-c(5,3,3,3,2)>q2<-c(4,5,5,3,2)>q3<-c(5,2,5,4,1)>q4<-c(5,5,25,NA,2)>q5<-c(5,5,2,NA,1)>leadership<-data.frame(manager,date,country,gender,age,q1,q2,q3,q4,stringAsFactors=FALSE)>leadershipmanager12345date10/24/0810/28/0810/1/0810/12/085/1/09countryUSUSUKUKUKgenderMFFMFage3245253999q153332q245532q352541q45525NA2stringAsFALSEFALSEFALSEFALSEFALSE

如果希望leadership数据集中的经理人的连续性年龄变量age重编码为类型变量。agecat(young、middleaged、elder)首先，必须将99岁的年龄值重新编码为缺失值：leadership$age[leadership$age==99]<-NAleadership$agecat[leadership$age>75]<-“Elder”leadership$agecat[leadership$age>=55&leadership$age<=75]<-“MiddleAged”leadership$agecat[leadership$age<55]<-“Young”可以将上述代码更紧凑的写成：Leadership<-within(leadership,{agecat<-NA----创建agecat变量并且初始为缺失值agecat[age>75]<-“Elder”agecat[age>=55&age<=75]<-“MiddleAged”agecat[age<55]<-“Young”})变量重命名函数fix()

fix(leadership)此时会弹出对话框。可以做需要的改动。如吧leadership中的的manager修改为managerID。包reshape()中的rename()

reshap()包中包含重命名函数rename().上述基础上执行>install.packages(\"reshape\")>library(reshape)>rename(leadership,c(managerID=\"manager\"))names()

最后，names()函数。names(leadership)names(leasership)[2]<-“testDate”leadership或者names(leadership)[6:10]<-c(“item1”,”item2”,”item3”,”item4”,”item5”)即q1-q5重命名为item1-item5缺失值is.na()

移除缺失值na.rm=TRUE

x<-c(1,2,NA,3)y<-sum(x,na.rm=TRUE)输出：6（1+2+3）日期函数读入格式as.Date()

as.Date(x,”input_format”)–注意Date大写日期默认输入格式为yyyy-mm-dde.g.>mtdates1<-as.Date(c(\"2007-06-22\>mtdates1[1]\"2007-06-22\"\"2004-02-13\">mtdates2<-as.Date(c(\"06/22/2012\Y\")>mtdates2[1]\"2012-06-22\"\"2014-02-23\"—注意Y大写系统日期Sys.Date()

Sys.Date()—注意S和D大写或者直接date()e.g.>Sys.Date()[1]\"2014-09-26\">date()[1]\"FriSep2614:35:392014\"输出格式as.Date()

format(x,format=”output_fotmat”)e.g.>today<-Sys.Date()>format(today,\"%m/%d/%Y\")[1]\"09/26/2014\">format(today,\"%B%A%Y\")[1]\"九月星期五2014\"天数计算difftime()

e.g.1>startdate<-as.Date(\"2014-1-2\")>enddate<-Sys.Date()>daydiff<-enddate-startdate>daydiffTimedifferenceof267daysNote:Rinteriorcalculatethedatedifferencebasedondate01/01/1970.Thedifferencebeforethatdateisstoragedasanegativenumber.e.g.2>difftime(enddate,startdate)Timedifferenceof267days>difftime(enddate,startdate,units=\"weeks\")Timedifferenceof38.14286weeks>difftime(enddate,startdate,units=\"hours\")Timedifferenceof6408hours>difftime(enddate,startdate,units=\"mins\")Timedifferenceof384480mins>difftime(enddate,startdate,units=\"secs\")Timedifferenceof23068800secs>difftime(enddate,startdate,units=\"auto\")Timedifferenceof267days日期转换为字符型变量as.character()

>as.character(Sys.Date())[1]\"2014-09-26\"Table3-2TypeConvertFunctionsJudgeis.numeric()is.character()is.vector()is.matrix()is.data.frame()is.factor()is.logical()e.g.>a<-c(1,2,3)>a[1]123>is.numeric(a)[1]TRUE>is.vector(a)[1]TRUE>a<-as.character(a)>a[1]\"1\"\"2\"\"3\">is.numeric(a)[1]FALSE>is.vector(a)[1]TRUE>is.character(a)[1]TRUEconvertas.numericas.character()as.vector()as.matrix()as.data.frame()as.factor()as.logical()数据排序函数order()

e.g.>leadership[order(leadership$age),]–Ascendingsortasdefault

managerIDdatecountrygenderageq1q2q3q4stringAsFactors3310/1/08UKF25355251110/24/08USM3254554410/12/08UKM39334NAFALSEFALSEFALSE25210/28/0855/1/09USUKF453525F992212FALSEFALSE>leadership[order(leadership$gender,leadership$age),]

32514managerIDdatecountrygenderageq1q2q3q4stringAsFactors310/1/08UKF2535525210/28/08USF45352555/1/09UKF992212110/24/08USM325455410/12/08UKM39334NAFALSEFALSEFALSEFALSEFALSE>leadership[order(leadership$gender,-leadership$age),]

–GettingtheDescendingorderbyaddingaminussigninfront

52341managerIDdatecountrygenderageq1q2q3q4stringAsFactors55/1/09UKF992212210/28/08USF453525310/1/08UKF2535525410/12/08UKM39334NA110/24/08USM325455FALSEFALSEFALSEFALSEFALSENote：>leadership(c(5,2,3,4,1),)

cangetthesameresultasthelastexampleabove.数据集的合并添加列merge()

merger(dataframeA,dataframeB,by=”ID”)

添加行rbind()

rbind(dataframeA,dataframeB)

note:thetwoframesinrbind()shouldhavethesamevariableswhereastheorderofthemarenotnecessarytobeexactlythesame.数据集取子集选入/保留变量

>leadership[,c(6:10)]

12345q1q2q3q4stringAsFactors5455FALSE3525FALSE35525FALSE334NAFALSE2212FALSE>myvars<-c(\"q1\>leadership[myvars]

12345q1q2q3q45455352535525334NA2212>myvars<-paste(\"q\

–theusageofpaste()willbeexplainedinanotherpart

>leadership[myvars]

12345q1q2q3q45455352535525334NA2212提出/丢弃变量

Firstseethefollowingexample:

>myvars<-names(leadership)%in%c(\"q3\")

>newdate<-leadership[!myvars]>newdate

12345managerIDdatecountrygenderageq1q2q4stringAsFactors110/24/08USM32545210/28/08USF45355310/1/08UKF253525410/12/08UKM3933NA55/1/09UKF99222FALSEFALSEFALSEFALSEFALSEHowtocomprehendthecodesabove?First,weoutput‘myvars’

>myvars[1]FALSEFALSEFALSEFALSEFALSEFALSEFALSETRUEFALSEFALSEit’salogicalarrayonlyinvolvingfactors‘TRUE’and‘FALSE’.Theprocesscanbeseenasfollowing:step1:Thefunction‘names’generateacharvectorinvolvingallvariablenames.step2:names(leadership)%in%c(\"q3\")returnalogicalvector,inwhicheveryfactorinnames(leadership)matchesq3isgivenalogicalvalue‘TRUE’whileelse‘FALSE’.step3:Logicaloperator‘Not’reversethelogicalvalue,whichmeans‘TRUE’to‘FALSE’and‘FALSE’to‘TRUE’.step4:leadership[!myvars]meansleadership[c(TRUETRUETRUETRUETRUETRUETRUEFALSETRUETRUE)]andcolumnsof‘TRUE’willbechosen.选入观测

e.g.1Choserecordssatisfyingcondition‘genderisMandagegreaterthan30’>newdata<-leadership[which52(leadership$gender==\"M\"&leadership$age>30),]>newdata

managerIDdatecountrygenderageq1q2q3q4stringAsFactors1110/24/08USM3254554410/12/08UKM39334NAFALSEFALSEe.g.2choserecordssatisfyingcondition‘datebetween2008-1-1and2008-10-1’>leadership$date<-as.Date(leadership$date,\"%m/%d/%y\")>startdate<-as.Date(\"2008-1-1\")>enddate<-as.Date(\"2008-10-1\")

>leadership[which(leadership$date<=enddate&leadership$date>=startdate),]

managerIDdatecountrygenderageq1q2q3q4stringAsFactors332008-10-01UKF2535525FALSE子集函数subset()

Thefollowingexampleclearlyshowstheusageofthefunctionsubset():e.g.>subset<-subset(leadership,age>=35|age<24,select=c(q1,q2,q3,q4))>subset

q1q2q3q4235254334NA52212随机抽样sample()

>sample<-leadership[sample(1:nrow(leadership),3,replace=FALSE),]>sample

高级数据管理

数值和字符处理

functionabs(x)sqrt(x)ceiling(x)floor(x)trunc(x)round(x,digits=n)signif(x,dignits=n)cos(x)sin(x)tan(x)acos(x)asin(x)atan(x)cosh(x)sinh(x)tanh(x)describeabsolutionsqrt(4)=2ceiling(3.75)=4;floor(3.75)=3;floor(-3.75)=-4trunc(3.75)=3;trunc(-3.75)=-3.orientedto0round(3.475,1)=3.48signif(3.475,digits=2)=3.5.significancedigitacosh(x)asinh(x)atanh(z)log(x)log10(x)exp(x)统计函数

functionmean(x)sd(x)median(x)var(x)mad(x)quantile(x,probs)range(x)sum(x)diff(x,lag=n)min(x)max(x)scale(x,center=TRUE,scale=TRUE)describemeanStandarddeviationmedianvariationMedianabsolutedeviationquantilerangesymmary滞后差分，lag用以指定滞后几项。默认的lag值为1x<-c(1,5,23,29)

minimummaxmum为数据对象x按列进行中心化（center=TRUE）或标准化（center=TRUE,scale=TRUE）；

概率函数

分布名称Beta分布二项分布柯西分布（非中心）卡方分布指数分布F分布Gamma分布几何分布超几何分布对数正态分布缩写betabinomcauchychisqexpfgammageomhyperlnorm分布名称Logistic分布多项分布负二项分布正态分布泊松分布Wilcoxon符号秩分布t分布均匀分布Weibull分布Wilcoxon秩和分布缩写logismultinomnbinomnormpoissignranktunifweibullwilcoxIfnospecialmean&sdwereassigned,thedefaultmean&sdis0&1.1.SetrandomnumberseedTest:runif(5)runif(5)set.seed(1234)runif(5)set.seed(1234)2.MultivariatenormaldistributionsimplingPackageMASSmvnorm(n,mean,sigama)sigmameanscoorelationmatrixae.g.library(MASS)options(digits=3)set.seed(1234)mean<-c(230.7,146.7,3.6)sigma<-matrix(c(20000,13293,-45.4,7789,2893,-44.8,9090,2332,-21.9),nrow=3,ncol=3)mydata<-mvnorm(500,mean,sigma)mydata<-as.data.frame(mydata)names(mydata)<-(“y”,”x1”,”x2”)dim(mydata)head(mydata,n=10)插入章节：画图颜色

颜色函数colors()

用法：colors()括号不可省略。包含637中颜色。e.g:install.packages(“vcd”)library(vcd)counts<-table(Arthritis$Improved,Arthritis$Treatment)barplot(counts,col=colors()[111:113],legend=rownames(counts),main=”Improve&Treats”)灰度函数grey()/gray()

用法：grey(灰度值或序列)e.g.barplot(counts,col=grey(0:3/5),legend=rownames(counts))orbarplot(counts,col=grey(seq(0,1,0.1)),legend=rownames(counts))颜色指定函数rgb()

用法rgb(red值,green值,blue值)e.gbarplot(counts,col=rgb(0,0:3/5,0),legend=rownames(counts))

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文

全部栏目

R语言笔记PART1