Showing posts with label Leave one out Cross Validation (LOOCV). Show all posts
Showing posts with label Leave one out Cross Validation (LOOCV). Show all posts

Saturday, March 29, 2014

Resampling, Statistical Learning within R

Leave one out Cross Validation (LOOCV)

This text contains:

  • Leave one out Cross Validation (LOOCV)
  • 10-fold CV
  • Bootstrap
require(ISLR)
require(boot)
# ?cv.glm
plot(mpg ~ horsepower, data = Auto)

plot of chunk unnamed-chunk-1

Leave one out Cross Validation (LOOCV)


glm.fit = glm(mpg ~ horsepower, data = Auto)
cv.glm(Auto, glm.fit)$delta
## [1] 24.23 24.23

## Lets write a simple function
loocv = function(fit) {
    h = lm.influence(fit)$h
    mean((residuals(fit)/(1 - h))^2)
}

## Now we try it out
loocv(glm.fit)
## [1] 24.23
cv.error = rep(0, 5)
degree = 1:5
for (d in degree) {
    glm.fit = glm(mpg ~ poly(horsepower, d), data = Auto)
    cv.error[d] = loocv(glm.fit)
}
plot(degree, cv.error, type = "b")

plot of chunk unnamed-chunk-2

10-fold CV

cv.error10 = rep(0, 5)
for (d in degree) {
    glm.fit = glm(mpg ~ poly(horsepower, d), data = Auto)
    cv.error10[d] = cv.glm(Auto, glm.fit, K = 10)$delta[1]
}
plot(degree, cv.error, type = "b")
lines(degree, cv.error10, type = "b", col = "red")

plot of chunk unnamed-chunk-3

Bootstrap

# Minimum risk investment
alpha = function(x, y) {
    vx = var(x)
    vy = var(y)
    cxy = cov(x, y)
    (vy - cxy)/(vx + vy - 2 * cxy)
}
alpha(Portfolio$X, Portfolio$Y)
## [1] 0.5758
## What is the standard error of alpha?
alpha.fn = function(data, index) {
    with(data[index, ], alpha(X, Y))
}

alpha.fn(Portfolio, 1:100)
## [1] 0.5758
set.seed(1)
alpha.fn(Portfolio, sample(1:100, 100, replace = TRUE))
## [1] 0.5964
boot.out = boot(Portfolio, alpha.fn, R = 1000)
boot.out
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = Portfolio, statistic = alpha.fn, R = 1000)
## 
## 
## Bootstrap Statistics :
##     original     bias    std. error
## t1*   0.5758 -7.315e-05     0.08862
plot(boot.out)

plot of chunk unnamed-chunk-4


Credit

Please note, this material is extracted from online Statistical Learning cource at Stanford University by Prof. T Hastie and Prof R. Tibshirani. It aims only for quick and future references in R and statistical learning. Please visit course page for more information and materials.