Saturday, March 29, 2014

Resampling, Statistical Learning within R

Leave one out Cross Validation (LOOCV)

This text contains:

  • Leave one out Cross Validation (LOOCV)
  • 10-fold CV
  • Bootstrap
require(ISLR)
require(boot)
# ?cv.glm
plot(mpg ~ horsepower, data = Auto)

plot of chunk unnamed-chunk-1

Leave one out Cross Validation (LOOCV)


glm.fit = glm(mpg ~ horsepower, data = Auto)
cv.glm(Auto, glm.fit)$delta
## [1] 24.23 24.23

## Lets write a simple function
loocv = function(fit) {
    h = lm.influence(fit)$h
    mean((residuals(fit)/(1 - h))^2)
}

## Now we try it out
loocv(glm.fit)
## [1] 24.23
cv.error = rep(0, 5)
degree = 1:5
for (d in degree) {
    glm.fit = glm(mpg ~ poly(horsepower, d), data = Auto)
    cv.error[d] = loocv(glm.fit)
}
plot(degree, cv.error, type = "b")

plot of chunk unnamed-chunk-2

10-fold CV

cv.error10 = rep(0, 5)
for (d in degree) {
    glm.fit = glm(mpg ~ poly(horsepower, d), data = Auto)
    cv.error10[d] = cv.glm(Auto, glm.fit, K = 10)$delta[1]
}
plot(degree, cv.error, type = "b")
lines(degree, cv.error10, type = "b", col = "red")

plot of chunk unnamed-chunk-3

Bootstrap

# Minimum risk investment
alpha = function(x, y) {
    vx = var(x)
    vy = var(y)
    cxy = cov(x, y)
    (vy - cxy)/(vx + vy - 2 * cxy)
}
alpha(Portfolio$X, Portfolio$Y)
## [1] 0.5758
## What is the standard error of alpha?
alpha.fn = function(data, index) {
    with(data[index, ], alpha(X, Y))
}

alpha.fn(Portfolio, 1:100)
## [1] 0.5758
set.seed(1)
alpha.fn(Portfolio, sample(1:100, 100, replace = TRUE))
## [1] 0.5964
boot.out = boot(Portfolio, alpha.fn, R = 1000)
boot.out
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = Portfolio, statistic = alpha.fn, R = 1000)
## 
## 
## Bootstrap Statistics :
##     original     bias    std. error
## t1*   0.5758 -7.315e-05     0.08862
plot(boot.out)

plot of chunk unnamed-chunk-4


Credit

Please note, this material is extracted from online Statistical Learning cource at Stanford University by Prof. T Hastie and Prof R. Tibshirani. It aims only for quick and future references in R and statistical learning. Please visit course page for more information and materials.


No comments: