Tuesday, December 18, 2007

Data Exploration: Vietnamese Age Data

The first thing we will want to do is read our data into R. The data, called VietnameseAgeData.xls, is in the Excel format. In order to read data in the .xls format into R, we need to use a function that is available in the

gdata

library. Load the library and read in the data using the following commands.


> library(gdata)
> ages<-read.xls("VietnameseAgeData.xls")
> attach(ages)



The

attach

command attaches the dataset to the searchpath so that we only have to give the variable name rather than call the dataset each time. Next, we will examine the data.


> names(ages)
[1] "age"
> age[1:10]
[1] 68 70 31 28 22 7 57 27 23 0
> length(age)
[1] 28633
> summary(age)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 12.00 23.00 28.14 41.00 99.00



Graphically, we want to examine a histogram of the data. The four histograms in the notes are replicated using the following commands


> par(mfrow=c(2,2))
> hist(age, breaks=55)
> hist(age, breaks=40)
> hist(age, breaks=25)
> hist(age, breaks=10)
> par(mfrow=c(1,1))





To obtain the plots of the kernel density estimates, the following R commands were used.

> par(mfrow=c(2,2))
> plot(density(age, kernel="epanechnikov", bw=.5), main="")
> plot(density(age, kernel="epanechnikov", bw=2.344), main="")
> plot(density(age, kernel="epanechnikov", bw=5), main="")
> plot(density(age, kernel="epanechnikov", bw=10), main="")
> par(mfrow=c(1,1))



No comments: