Lecture 12: Assignment 1 Discussion#
Data Classification (5)
Consider the following R dataset detailing the attributes for different vehicles including vehicle make, model, year, class, transmission, drive type, number of engine cylinders, total engine displacement, fuel type, and mileage (highway and city). Classify each variable in the dataset as one of the following: Discrete Quantitative, Continuous Quantitative, Qualitative, and Categorical.
# Loading the packages
library(dplyr)
library(fueleconomy)
# Loading the dataset
data <- fueleconomy::vehicles
# Dataset Structure
str(data)
tibble [33,442 × 12] (S3: tbl_df/tbl/data.frame)
$ id : num [1:33442] 13309 13310 13311 14038 14039 ...
$ make : chr [1:33442] "Acura" "Acura" "Acura" "Acura" ...
$ model: chr [1:33442] "2.2CL/3.0CL" "2.2CL/3.0CL" "2.2CL/3.0CL" "2.3CL/3.0CL" ...
$ year : num [1:33442] 1997 1997 1997 1998 1998 ...
$ class: chr [1:33442] "Subcompact Cars" "Subcompact Cars" "Subcompact Cars" "Subcompact Cars" ...
$ trans: chr [1:33442] "Automatic 4-spd" "Manual 5-spd" "Automatic 4-spd" "Automatic 4-spd" ...
$ drive: chr [1:33442] "Front-Wheel Drive" "Front-Wheel Drive" "Front-Wheel Drive" "Front-Wheel Drive" ...
$ cyl : num [1:33442] 4 4 6 4 4 6 4 4 6 5 ...
$ displ: num [1:33442] 2.2 2.2 3 2.3 2.3 3 2.3 2.3 3 2.5 ...
$ fuel : chr [1:33442] "Regular" "Regular" "Regular" "Regular" ...
$ hwy : num [1:33442] 26 28 26 27 29 26 27 29 26 23 ...
$ cty : num [1:33442] 20 22 18 19 21 17 20 21 17 18 ...
make: categorical variable
model: categorical variable
year: discrete quantitative variable
class: categorical variable
trans: categorical variable
drive: categorical variable
cyl: discrete quantitative variable
displ: discrete quantitative variable
fuel: categorical variable
hwy: discrete quantitative variable
cty: discrete quantitative variable
Data Summary (10)
a. Using the vehicles dataset filtered out for Renault vehicles, summarise measure of location (mean, median, mode), dispersion (range, inter-quartile range, standard deviation), and shape (skewness, kurtosis) for highway as well as city miles per galon. (8)
# Renault data
data <- fueleconomy::vehicles %>% filter(make=="Renault")
data
id | make | model | year | class | trans | drive | cyl | displ | fuel | hwy | cty |
---|---|---|---|---|---|---|---|---|---|---|---|
<dbl> | <chr> | <chr> | <dbl> | <chr> | <chr> | <chr> | <dbl> | <dbl> | <chr> | <dbl> | <dbl> |
618 | Renault | 18i 4DR Wagon | 1985 | Small Station Wagons | Automatic 3-spd | Front-Wheel Drive | 4 | 2.2 | Regular | 22 | 18 |
619 | Renault | 18i 4DR Wagon | 1985 | Small Station Wagons | Manual 5-spd | Front-Wheel Drive | 4 | 2.2 | Regular | 28 | 20 |
2270 | Renault | 18i Sportwagon | 1986 | Small Station Wagons | Automatic 3-spd | Front-Wheel Drive | 4 | 2.2 | Regular | 22 | 18 |
2271 | Renault | 18i Sportwagon | 1986 | Small Station Wagons | Manual 5-spd | Front-Wheel Drive | 4 | 2.2 | Regular | 28 | 20 |
3301 | Renault | Alliance | 1987 | Compact Cars | Automatic 3-spd | Front-Wheel Drive | 4 | 1.4 | Regular | 28 | 23 |
3302 | Renault | Alliance | 1987 | Compact Cars | Manual 4-spd | Front-Wheel Drive | 4 | 1.4 | Regular | 37 | 29 |
3303 | Renault | Alliance | 1987 | Compact Cars | Manual 4-spd | Front-Wheel Drive | 4 | 1.4 | Regular | 34 | 28 |
3304 | Renault | Alliance | 1987 | Compact Cars | Manual 5-spd | Front-Wheel Drive | 4 | 1.4 | Regular | 36 | 28 |
3305 | Renault | Alliance | 1987 | Compact Cars | Automatic 3-spd | Front-Wheel Drive | 4 | 1.7 | Regular | 26 | 21 |
3306 | Renault | Alliance | 1987 | Compact Cars | Manual 5-spd | Front-Wheel Drive | 4 | 1.7 | Regular | 34 | 25 |
1907 | Renault | Alliance Convertible | 1986 | Subcompact Cars | Automatic 3-spd | Front-Wheel Drive | 4 | 1.7 | Regular | 22 | 19 |
1908 | Renault | Alliance Convertible | 1986 | Subcompact Cars | Manual 5-spd | Front-Wheel Drive | 4 | 1.7 | Regular | 29 | 23 |
3107 | Renault | Alliance Convertible | 1987 | Subcompact Cars | Automatic 3-spd | Front-Wheel Drive | 4 | 1.7 | Regular | 26 | 21 |
3108 | Renault | Alliance Convertible | 1987 | Subcompact Cars | Manual 5-spd | Front-Wheel Drive | 4 | 1.7 | Regular | 32 | 24 |
373 | Renault | Alliance/Encore | 1985 | Compact Cars | Automatic 3-spd | Front-Wheel Drive | 4 | 1.4 | Regular | 26 | 22 |
374 | Renault | Alliance/Encore | 1985 | Compact Cars | Manual 4-spd | Front-Wheel Drive | 4 | 1.4 | Regular | 36 | 29 |
375 | Renault | Alliance/Encore | 1985 | Compact Cars | Manual 4-spd | Front-Wheel Drive | 4 | 1.4 | Regular | 32 | 26 |
376 | Renault | Alliance/Encore | 1985 | Compact Cars | Manual 5-spd | Front-Wheel Drive | 4 | 1.4 | Regular | 35 | 27 |
377 | Renault | Alliance/Encore | 1985 | Compact Cars | Automatic 3-spd | Front-Wheel Drive | 4 | 1.7 | Regular | 24 | 19 |
378 | Renault | Alliance/Encore | 1985 | Compact Cars | Manual 5-spd | Front-Wheel Drive | 4 | 1.7 | Regular | 33 | 25 |
2050 | Renault | Alliance/Encore | 1986 | Compact Cars | Automatic 3-spd | Front-Wheel Drive | 4 | 1.4 | Regular | 27 | 23 |
2051 | Renault | Alliance/Encore | 1986 | Compact Cars | Manual 4-spd | Front-Wheel Drive | 4 | 1.4 | Regular | 37 | 30 |
2052 | Renault | Alliance/Encore | 1986 | Compact Cars | Manual 4-spd | Front-Wheel Drive | 4 | 1.4 | Regular | 35 | 27 |
2053 | Renault | Alliance/Encore | 1986 | Compact Cars | Manual 5-spd | Front-Wheel Drive | 4 | 1.4 | Regular | 35 | 28 |
2054 | Renault | Alliance/Encore | 1986 | Compact Cars | Automatic 3-spd | Front-Wheel Drive | 4 | 1.7 | Regular | 26 | 21 |
2055 | Renault | Alliance/Encore | 1986 | Compact Cars | Manual 5-spd | Front-Wheel Drive | 4 | 1.7 | Regular | 33 | 26 |
221 | Renault | Fuego | 1985 | Subcompact Cars | Manual 5-spd | Front-Wheel Drive | 4 | 1.6 | Premium | 29 | 20 |
222 | Renault | Fuego | 1985 | Subcompact Cars | Automatic 3-spd | Front-Wheel Drive | 4 | 2.2 | Regular | 22 | 18 |
223 | Renault | Fuego | 1985 | Subcompact Cars | Manual 5-spd | Front-Wheel Drive | 4 | 2.2 | Regular | 28 | 20 |
1909 | Renault | Fuego | 1986 | Subcompact Cars | Automatic 3-spd | Front-Wheel Drive | 4 | 2.2 | Regular | 22 | 18 |
1910 | Renault | Fuego | 1986 | Subcompact Cars | Manual 5-spd | Front-Wheel Drive | 4 | 2.2 | Regular | 28 | 20 |
3307 | Renault | GTA | 1987 | Compact Cars | Manual 5-spd | Front-Wheel Drive | 4 | 2.0 | Regular | 27 | 23 |
3109 | Renault | GTA Convertible | 1987 | Subcompact Cars | Manual 5-spd | Front-Wheel Drive | 4 | 2.0 | Regular | 26 | 21 |
# Highway MpG
## creating a probability mass table
v <- sort(unique(data$hwy))
f <- numeric(length(v))
for (r in 1:nrow(data)) {
z <- data$hwy[r]
i <- which(v == z)
f[i] <- f[i] + 1
}
df_hwy <- data.frame(x=v, f=f/sum(f))
# City MpG
## creating a probability mass values
v <- sort(unique(data$cty))
f <- numeric(length(v))
for (r in 1:nrow(data)) {
z <- data$cty[r]
i <- which(v == z)
f[i] <- f[i] + 1
}
df_cty <- data.frame(x=v, f=f/sum(f))
Mean
# Highway MpG
df <- df_hwy
z <- sum(df$f * df$x)
message("Mean Highway MpG: ", round(z, digits=3))
# City MpG
df <- df_cty
z <- sum(df$f * df$x)
message("Mean City MpG: ", round(z, digits=3))
Mean Highway MpG: 29.242
Mean City MpG: 23.03
Median
# Highway MpG
df <- df_hwy
z <- NA
F <- cumsum(df$f)
for (i in 2:nrow(df)) {
if (F[i-1] < 0.5 & F[i] > 0.5) {
z <- df$x[i]
break
} else if (F[i] == 0.5) {
z <- (df$x[i] + df$x[i+1]) / 2
break
}
}
message("Median Highway MpG: ", round(z, digits=3))
# City MpG
df <- df_cty
z <- NA
F <- cumsum(df$f)
for (i in 2:nrow(df)) {
if (F[i-1] < 0.5 & F[i] > 0.5) {
z <- df$x[i]
break
} else if (F[i] == 0.5) {
z <- (df$x[i] + df$x[i+1]) / 2
break
}
}
message("Median City MpG: ", round(z, digits=3))
Median Highway MpG: 28
Median City MpG: 23
Mode
# Highway MpG
df <- df_hwy
z <- df$x[which(df$f == max(df$f))]
message("Mode Highway MpG: ", z[1])
# City MpG
df <- df_cty
z <- df$x[which(df$f == max(df$f))]
message("Mode City MpG: ", z[1])
Mode Highway MpG: 22
Mode City MpG: 20
Range
# Highway MpG
z = max(data$hwy) - min(data$hwy)
message("Range Highway MpG: ", z)
# City MpG
z = max(data$cty) - min(data$cty)
message("Range City MpG: ", z)
Range Highway MpG: 15
Range City MpG: 12
Inter-Quartile Range
# Highway MpG
df <- df_hwy
F <- cumsum(df$f)
## First Quartile
q1 <- NA
for (i in 2:nrow(df)) {
if (F[i-1] < 0.25 & F[i] > 0.25) {
q1 <- df$x[i]
break
} else if (F[i] == 0.25) {
q1 <- (df$x[i] + df$x[i+1]) / 2
break
}
}
## Third Quartile
q3 <- NA
for (i in 2:nrow(df)) {
if (F[i-1] < 0.75 & F[i] > 0.75) {
q3 <- df$x[i]
break
} else if (F[i] == 0.75) {
q3 <- (df$x[i] + df$x[i+1]) / 2
break
}
}
z = q3 - q1
message("IQR Highway MpG: ", z)
# City MpG
df <- df_cty
F <- cumsum(df$f)
## First Quartile
q1 <- NA
for (i in 2:nrow(df)) {
if (F[i-1] < 0.25 & F[i] > 0.25) {
q1 <- df$x[i]
break
} else if (F[i] == 0.25) {
q1 <- (df$x[i] + df$x[i+1]) / 2
break
}
}
## Third Quartile
q3 <- NA
for (i in 2:nrow(df)) {
if (F[i-1] < 0.75 & F[i] > 0.75) {
q3 <- df$x[i]
break
} else if (F[i] == 0.75) {
q3 <- (df$x[i] + df$x[i+1]) / 2
break
}
}
z = q3 - q1
message("IQR City MpG: ", z)
IQR Highway MpG: 8
IQR City MpG: 6
Standard Deviation
# Highway MpG
df <- df_hwy
z <- sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2)))
message("Standard Deviation Highway MpG: ", round(z, digits=3))
# City MpG
df <- df_cty
z <- sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2)))
message("Standard Deviation City MpG: ", round(z, digits=3))
Standard Deviation Highway MpG: 4.799
Standard Deviation City MpG: 3.689
Skewness
# Highway MpG
df <- df_hwy
z <- sum(df$f * ((df$x - sum(df$f * df$x))^3)) / (sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2))))^3
message("Skewness Highway MpG: ", round(z, digits=3))
# City MpG
df <- df_cty
z <- sum(df$f * ((df$x - sum(df$f * df$x))^3)) / (sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2))))^3
message("Skewness City MpG: ", round(z, digits=3))
Skewness Highway MpG: 0.066
Skewness City MpG: 0.309
Kurtosis
# Highway MpG
df <- df_hwy
z <- sum(df$f * ((df$x - sum(df$f * df$x))^4)) / (sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2))))^4
message("Kurtosis Highway MpG: ", round(z, digits=3))
# City MpG
df <- df_cty
z <- sum(df$f * ((df$x - sum(df$f * df$x))^4)) / (sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2))))^4
message("Kurtosis City MpG: ", round(z, digits=3))
Kurtosis Highway MpG: 1.792
Kurtosis City MpG: 1.8
b. Based on these statistics, draw inferences for highway and city mileage (2)
City mileage is generally lower than highway mileage, as indicated by the measures of location (mean, median, mode).
Highway mileage shows greater variability compared to city mileage, based on the measures of dispersion (range, IQR, standard deviation).
City mileage distribution is more asymmetrically clustered around the mean (higher skewness and kurtosis) than highway mileage, as seen in the measures of shape.
Probability Analysis (5)
a. Using the vehicles dataset filtered out for Honda vehicles, verify the axioms of probability for vehicle classes and engine cylinders. (1)
# Honda data
data <- fueleconomy::vehicles %>% filter(make=="Honda")
# Honda vehicle classes
df_class <- data.frame(class=names(table(data$class)), freq=as.numeric(table(data$class)), prob=as.numeric(prop.table(table(data$class))))
df <- df_class
df
message("Axiom #1: ", all(df$prob >= 0))
message("Axiom #2: ", sum(df$prob) == 1)
message("Axiom #3: ", round((df$freq[1] + df$freq[2]) / sum(df$freq), digits=3) == round(df$prob[1] + df$prob[2], digits=3))
class | freq | prob |
---|---|---|
<chr> | <dbl> | <dbl> |
Compact Cars | 142 | 0.180203046 |
Large Cars | 15 | 0.019035533 |
Midsize-Large Station Wagons | 3 | 0.003807107 |
Midsize Cars | 62 | 0.078680203 |
Midsize Station Wagons | 1 | 0.001269036 |
Minivan - 2WD | 25 | 0.031725888 |
Small Sport Utility Vehicle 2WD | 8 | 0.010152284 |
Small Sport Utility Vehicle 4WD | 5 | 0.006345178 |
Small Station Wagons | 73 | 0.092639594 |
Special Purpose Vehicle 2WD | 3 | 0.003807107 |
Special Purpose Vehicle 4WD | 7 | 0.008883249 |
Special Purpose Vehicles | 18 | 0.022842640 |
Sport Utility Vehicle - 2WD | 51 | 0.064720812 |
Sport Utility Vehicle - 4WD | 63 | 0.079949239 |
Standard Pickup Trucks 4WD | 9 | 0.011421320 |
Standard Sport Utility Vehicle 4WD | 1 | 0.001269036 |
Subcompact Cars | 209 | 0.265228426 |
Two Seaters | 93 | 0.118020305 |
Axiom #1: TRUE
Axiom #2: TRUE
Axiom #3: TRUE
# Honda engine cylinders
df_cyl <- data.frame(cyl=names(table(data$cyl)), freq=as.numeric(table(data$cyl)), prob=as.numeric(prop.table(table(data$cyl))))
df <- df_cyl
df
message("Axiom #1: ", all(df$prob >= 0))
message("Axiom #2: ", sum(df$prob) == 1)
message("Axiom #3: ", round((df$freq[1] + df$freq[2]) / sum(df$freq), digits=3) == round(df$prob[1] + df$prob[2], digits=3))
cyl | freq | prob |
---|---|---|
<chr> | <dbl> | <dbl> |
3 | 14 | 0.0178117 |
4 | 629 | 0.8002545 |
6 | 143 | 0.1819338 |
Axiom #1: TRUE
Axiom #2: TRUE
Axiom #3: TRUE
b. Using the vehicles dataset filtered out for Honda vehicles, employ conditional probability formula to evaluate the probability of a compact car having a 4-cylinder engine and consequently, employ the Bayes’ rule to evaluate the probability a 4-cylinder engine vehicle being a compact car. (4)
# Honda vehicle classes and engine cylinders
df <- as.data.frame(table(data$class, data$cyl))
names(df) <- c("class", "cyl", "freq")
df$prob <- prop.table(df$freq)
df
class | cyl | freq | prob |
---|---|---|---|
<fct> | <fct> | <int> | <dbl> |
Compact Cars | 3 | 0 | 0.000000000 |
Large Cars | 3 | 0 | 0.000000000 |
Midsize-Large Station Wagons | 3 | 0 | 0.000000000 |
Midsize Cars | 3 | 0 | 0.000000000 |
Midsize Station Wagons | 3 | 0 | 0.000000000 |
Minivan - 2WD | 3 | 0 | 0.000000000 |
Small Sport Utility Vehicle 2WD | 3 | 0 | 0.000000000 |
Small Sport Utility Vehicle 4WD | 3 | 0 | 0.000000000 |
Small Station Wagons | 3 | 0 | 0.000000000 |
Special Purpose Vehicle 2WD | 3 | 0 | 0.000000000 |
Special Purpose Vehicle 4WD | 3 | 0 | 0.000000000 |
Special Purpose Vehicles | 3 | 0 | 0.000000000 |
Sport Utility Vehicle - 2WD | 3 | 0 | 0.000000000 |
Sport Utility Vehicle - 4WD | 3 | 0 | 0.000000000 |
Standard Pickup Trucks 4WD | 3 | 0 | 0.000000000 |
Standard Sport Utility Vehicle 4WD | 3 | 0 | 0.000000000 |
Subcompact Cars | 3 | 0 | 0.000000000 |
Two Seaters | 3 | 14 | 0.017811705 |
Compact Cars | 4 | 129 | 0.164122137 |
Large Cars | 4 | 10 | 0.012722646 |
Midsize-Large Station Wagons | 4 | 3 | 0.003816794 |
Midsize Cars | 4 | 38 | 0.048346056 |
Midsize Station Wagons | 4 | 1 | 0.001272265 |
Minivan - 2WD | 4 | 0 | 0.000000000 |
Small Sport Utility Vehicle 2WD | 4 | 4 | 0.005089059 |
Small Sport Utility Vehicle 4WD | 4 | 2 | 0.002544529 |
Small Station Wagons | 4 | 71 | 0.090330789 |
Special Purpose Vehicle 2WD | 4 | 1 | 0.001272265 |
Special Purpose Vehicle 4WD | 4 | 3 | 0.003816794 |
Special Purpose Vehicles | 4 | 4 | 0.005089059 |
Sport Utility Vehicle - 2WD | 4 | 33 | 0.041984733 |
Sport Utility Vehicle - 4WD | 4 | 42 | 0.053435115 |
Standard Pickup Trucks 4WD | 4 | 0 | 0.000000000 |
Standard Sport Utility Vehicle 4WD | 4 | 0 | 0.000000000 |
Subcompact Cars | 4 | 209 | 0.265903308 |
Two Seaters | 4 | 79 | 0.100508906 |
Compact Cars | 6 | 13 | 0.016539440 |
Large Cars | 6 | 5 | 0.006361323 |
Midsize-Large Station Wagons | 6 | 0 | 0.000000000 |
Midsize Cars | 6 | 24 | 0.030534351 |
Midsize Station Wagons | 6 | 0 | 0.000000000 |
Minivan - 2WD | 6 | 25 | 0.031806616 |
Small Sport Utility Vehicle 2WD | 6 | 4 | 0.005089059 |
Small Sport Utility Vehicle 4WD | 6 | 3 | 0.003816794 |
Small Station Wagons | 6 | 0 | 0.000000000 |
Special Purpose Vehicle 2WD | 6 | 2 | 0.002544529 |
Special Purpose Vehicle 4WD | 6 | 4 | 0.005089059 |
Special Purpose Vehicles | 6 | 14 | 0.017811705 |
Sport Utility Vehicle - 2WD | 6 | 18 | 0.022900763 |
Sport Utility Vehicle - 4WD | 6 | 21 | 0.026717557 |
Standard Pickup Trucks 4WD | 6 | 9 | 0.011450382 |
Standard Sport Utility Vehicle 4WD | 6 | 1 | 0.001272265 |
Subcompact Cars | 6 | 0 | 0.000000000 |
Two Seaters | 6 | 0 | 0.000000000 |
## Probabilities
P_A = df_cyl$prob[which(df_cyl$cyl==4)]
P_B = df_class$prob[which(df_class$class=="Compact Cars")]
P_AXB = df$prob[which(df$cyl==4 & df$class=="Compact Cars")]
P_AB = P_AXB / P_B
P_BA = P_AXB / P_A
P_BA = P_AB * (P_B / P_A)
## conditional probability of a compact car having a 4-cylinder engine
message("Conditional Probability of a compact car having a 4-cylinder engine: ", round(P_AB, digits=3))
## conditional probability that a 4-cylinder engine vehicle is a compact car
message("Conditional Probability that a 4-cylinder engine vehicle is in a compact car (using Conditional Probability): ", round(P_BA, digits=3))
message("Conditional Probability that a 4-cylinder engine vehicle is in a compact car (using Bayes' Theorem): ", round(P_BA, digits=3))
Conditional Probability of a compact car having a 4-cylinder engine: 0.911
Conditional Probability that a 4-cylinder engine vehicle is in a compact car (using Conditional Probability): 0.205
Conditional Probability that a 4-cylinder engine vehicle is in a compact car (using Bayes' Theorem): 0.205
Data Sampling (8)
a. For the following randomly sampled data from the vehicles dataset, compute bias and standard error for the estimator on highway mileage. (5)
library(ggplot2)
P <- fueleconomy::vehicles$hwy
m <- 50
n <- 1000
# population parameter
z <- mean(P, na.rm=TRUE)
Z <- vector("numeric", m)
for (i in 1:m) {
set.seed(i)
I <- order(runif(length(P)))[1:n]
S <- P[I]
# sample parameter
Z[i] <- mean(S, na.rm=TRUE)
}
data.frame(parameter=z, estimator=Z, error=Z - z)
message("Bias: ", round(mean(Z - z), digits=3))
message("Standard Error: ", round(sd(Z), digits=3))
parameter | estimator | error |
---|---|---|
<dbl> | <dbl> | <dbl> |
23.55128 | 23.501 | -0.050282818 |
23.55128 | 23.922 | 0.370717182 |
23.55128 | 23.499 | -0.052282818 |
23.55128 | 23.135 | -0.416282818 |
23.55128 | 23.788 | 0.236717182 |
23.55128 | 23.710 | 0.158717182 |
23.55128 | 23.433 | -0.118282818 |
23.55128 | 23.595 | 0.043717182 |
23.55128 | 23.543 | -0.008282818 |
23.55128 | 23.405 | -0.146282818 |
23.55128 | 23.866 | 0.314717182 |
23.55128 | 23.286 | -0.265282818 |
23.55128 | 23.692 | 0.140717182 |
23.55128 | 23.350 | -0.201282818 |
23.55128 | 23.468 | -0.083282818 |
23.55128 | 23.330 | -0.221282818 |
23.55128 | 23.934 | 0.382717182 |
23.55128 | 23.328 | -0.223282818 |
23.55128 | 23.700 | 0.148717182 |
23.55128 | 23.803 | 0.251717182 |
23.55128 | 23.893 | 0.341717182 |
23.55128 | 23.938 | 0.386717182 |
23.55128 | 23.386 | -0.165282818 |
23.55128 | 23.554 | 0.002717182 |
23.55128 | 23.427 | -0.124282818 |
23.55128 | 23.471 | -0.080282818 |
23.55128 | 23.583 | 0.031717182 |
23.55128 | 23.383 | -0.168282818 |
23.55128 | 23.624 | 0.072717182 |
23.55128 | 23.684 | 0.132717182 |
23.55128 | 23.689 | 0.137717182 |
23.55128 | 23.922 | 0.370717182 |
23.55128 | 23.663 | 0.111717182 |
23.55128 | 23.661 | 0.109717182 |
23.55128 | 23.342 | -0.209282818 |
23.55128 | 23.665 | 0.113717182 |
23.55128 | 23.503 | -0.048282818 |
23.55128 | 23.453 | -0.098282818 |
23.55128 | 23.446 | -0.105282818 |
23.55128 | 23.460 | -0.091282818 |
23.55128 | 24.029 | 0.477717182 |
23.55128 | 23.485 | -0.066282818 |
23.55128 | 23.470 | -0.081282818 |
23.55128 | 23.505 | -0.046282818 |
23.55128 | 23.351 | -0.200282818 |
23.55128 | 23.483 | -0.068282818 |
23.55128 | 23.233 | -0.318282818 |
23.55128 | 23.553 | 0.001717182 |
23.55128 | 23.642 | 0.090717182 |
23.55128 | 23.206 | -0.345282818 |
Bias: 0.009
Standard Error: 0.209
b. Using the Archery analogy discussed in the class, draw a representative target board to comment upon the accuracy and precision of the estimator. (3)
The target board should represent high accuracy but low precision
Hypothesis Testing (12)
Test the following claims for Renault Vehicles
a. city mileage is greater than 23 mpl
b. highway mileage is greater than 29 mpl
c. highway mileage is not the same as the city mileage
Note, make appropriate assumptions, develop the null and alternate hypotheses, evaluate the test statistic, present the threshold value and consequently, make appropriate inferences.
# Load the dataset
data <- fueleconomy::vehicles %>% filter(make=="Renault")
# Test for city mileage being greater than 23 mpl (One Sample t-test)
message("Null Hypothesis: City mileage is less than or equal to 23 mpl")
message("Alternative Hypothesis: City mileage is greater than 23 mpl")
t = round((mean(data$cty) - 23) / (sd(data$cty) / sqrt(nrow(data))), digits=3)
v = qt(0.95, df=nrow(data)-1)
message("t-statistic: ", round(t, digits=3))
message("Critical value: ", round(v, digits=3))
message("Decision: ", ifelse(t > v, "Reject Null Hypothesis", "Do not reject Null Hypothesis"))
Null Hypothesis: City mileage is less than or equal to 23 mpl
Alternative Hypothesis: City mileage is greater than 23 mpl
t-statistic: 0.046
Critical value: 1.694
Decision: Do not reject Null Hypothesis
# Test for highway mileage being greater than 29 mpl (One Sample t-test)
message("Null Hypothesis: Highway mileage is less than or equal to 29 mpl")
message("Alternative Hypothesis: Highway mileage is greater than 29 mpl")
t = round((mean(data$hwy) - 29) / (sd(data$hwy) / sqrt(nrow(data))), digits=3)
v = qt(0.95, df=nrow(data)-1)
message("t-statistic: ", round(t, digits=3))
message("Critical value: ", round(v, digits=3))
message("Decision: ", ifelse(t > v, "Reject Null Hypothesis", "Do not reject Null Hypothesis"))
Null Hypothesis: Highway mileage is less than or equal to 29 mpl
Alternative Hypothesis: Highway mileage is greater than 29 mpl
t-statistic: 0.286
Critical value: 1.694
Decision: Do not reject Null Hypothesis
# Test for highway mileage not being same as the city mileage (Paired t-test)
message("Null Hypothesis: Highway mileage is equal to city mileage")
message("Alternative Hypothesis: Highway mileage is not equal to city mileage")
t = round((mean(data$hwy) - mean(data$cty)) / (sd(data$hwy - data$cty) / sqrt(nrow(data))), digits=3)
v = qt(0.975, df=nrow(data)-1)
message("t-statistic: ", round(t, digits=3))
message("Critical value: ", round(v, digits=3))
message("Decision: ", ifelse(abs(t) > v, "Reject Null Hypothesis", "Do not reject Null Hypothesis"))
Null Hypothesis: Highway mileage is equal to city mileage
Alternative Hypothesis: Highway mileage is not equal to city mileage
t-statistic: 19.841
Critical value: 2.037
Decision: Reject Null Hypothesis