Lecture 12: Assignment 1 Discussion

Lecture 12: Assignment 1 Discussion#


  1. Data Classification (5)

Consider the following R dataset detailing the attributes for different vehicles including vehicle make, model, year, class, transmission, drive type, number of engine cylinders, total engine displacement, fuel type, and mileage (highway and city). Classify each variable in the dataset as one of the following: Discrete Quantitative, Continuous Quantitative, Qualitative, and Categorical.

# Loading the packages
library(dplyr)
library(fueleconomy)
# Loading the dataset
data <- fueleconomy::vehicles
# Dataset Structure
str(data)
tibble [33,442 × 12] (S3: tbl_df/tbl/data.frame)
 $ id   : num [1:33442] 13309 13310 13311 14038 14039 ...
 $ make : chr [1:33442] "Acura" "Acura" "Acura" "Acura" ...
 $ model: chr [1:33442] "2.2CL/3.0CL" "2.2CL/3.0CL" "2.2CL/3.0CL" "2.3CL/3.0CL" ...
 $ year : num [1:33442] 1997 1997 1997 1998 1998 ...
 $ class: chr [1:33442] "Subcompact Cars" "Subcompact Cars" "Subcompact Cars" "Subcompact Cars" ...
 $ trans: chr [1:33442] "Automatic 4-spd" "Manual 5-spd" "Automatic 4-spd" "Automatic 4-spd" ...
 $ drive: chr [1:33442] "Front-Wheel Drive" "Front-Wheel Drive" "Front-Wheel Drive" "Front-Wheel Drive" ...
 $ cyl  : num [1:33442] 4 4 6 4 4 6 4 4 6 5 ...
 $ displ: num [1:33442] 2.2 2.2 3 2.3 2.3 3 2.3 2.3 3 2.5 ...
 $ fuel : chr [1:33442] "Regular" "Regular" "Regular" "Regular" ...
 $ hwy  : num [1:33442] 26 28 26 27 29 26 27 29 26 23 ...
 $ cty  : num [1:33442] 20 22 18 19 21 17 20 21 17 18 ...
  • make: categorical variable

  • model: categorical variable

  • year: discrete quantitative variable

  • class: categorical variable

  • trans: categorical variable

  • drive: categorical variable

  • cyl: discrete quantitative variable

  • displ: discrete quantitative variable

  • fuel: categorical variable

  • hwy: discrete quantitative variable

  • cty: discrete quantitative variable


  1. Data Summary (10)

a. Using the vehicles dataset filtered out for Renault vehicles, summarise measure of location (mean, median, mode), dispersion (range, inter-quartile range, standard deviation), and shape (skewness, kurtosis) for highway as well as city miles per galon. (8)

# Renault data
data <- fueleconomy::vehicles %>% filter(make=="Renault")
data
A tibble: 33 × 12
idmakemodelyearclasstransdrivecyldisplfuelhwycty
<dbl><chr><chr><dbl><chr><chr><chr><dbl><dbl><chr><dbl><dbl>
618Renault18i 4DR Wagon 1985Small Station WagonsAutomatic 3-spdFront-Wheel Drive42.2Regular2218
619Renault18i 4DR Wagon 1985Small Station WagonsManual 5-spd Front-Wheel Drive42.2Regular2820
2270Renault18i Sportwagon 1986Small Station WagonsAutomatic 3-spdFront-Wheel Drive42.2Regular2218
2271Renault18i Sportwagon 1986Small Station WagonsManual 5-spd Front-Wheel Drive42.2Regular2820
3301RenaultAlliance 1987Compact Cars Automatic 3-spdFront-Wheel Drive41.4Regular2823
3302RenaultAlliance 1987Compact Cars Manual 4-spd Front-Wheel Drive41.4Regular3729
3303RenaultAlliance 1987Compact Cars Manual 4-spd Front-Wheel Drive41.4Regular3428
3304RenaultAlliance 1987Compact Cars Manual 5-spd Front-Wheel Drive41.4Regular3628
3305RenaultAlliance 1987Compact Cars Automatic 3-spdFront-Wheel Drive41.7Regular2621
3306RenaultAlliance 1987Compact Cars Manual 5-spd Front-Wheel Drive41.7Regular3425
1907RenaultAlliance Convertible1986Subcompact Cars Automatic 3-spdFront-Wheel Drive41.7Regular2219
1908RenaultAlliance Convertible1986Subcompact Cars Manual 5-spd Front-Wheel Drive41.7Regular2923
3107RenaultAlliance Convertible1987Subcompact Cars Automatic 3-spdFront-Wheel Drive41.7Regular2621
3108RenaultAlliance Convertible1987Subcompact Cars Manual 5-spd Front-Wheel Drive41.7Regular3224
373RenaultAlliance/Encore 1985Compact Cars Automatic 3-spdFront-Wheel Drive41.4Regular2622
374RenaultAlliance/Encore 1985Compact Cars Manual 4-spd Front-Wheel Drive41.4Regular3629
375RenaultAlliance/Encore 1985Compact Cars Manual 4-spd Front-Wheel Drive41.4Regular3226
376RenaultAlliance/Encore 1985Compact Cars Manual 5-spd Front-Wheel Drive41.4Regular3527
377RenaultAlliance/Encore 1985Compact Cars Automatic 3-spdFront-Wheel Drive41.7Regular2419
378RenaultAlliance/Encore 1985Compact Cars Manual 5-spd Front-Wheel Drive41.7Regular3325
2050RenaultAlliance/Encore 1986Compact Cars Automatic 3-spdFront-Wheel Drive41.4Regular2723
2051RenaultAlliance/Encore 1986Compact Cars Manual 4-spd Front-Wheel Drive41.4Regular3730
2052RenaultAlliance/Encore 1986Compact Cars Manual 4-spd Front-Wheel Drive41.4Regular3527
2053RenaultAlliance/Encore 1986Compact Cars Manual 5-spd Front-Wheel Drive41.4Regular3528
2054RenaultAlliance/Encore 1986Compact Cars Automatic 3-spdFront-Wheel Drive41.7Regular2621
2055RenaultAlliance/Encore 1986Compact Cars Manual 5-spd Front-Wheel Drive41.7Regular3326
221RenaultFuego 1985Subcompact Cars Manual 5-spd Front-Wheel Drive41.6Premium2920
222RenaultFuego 1985Subcompact Cars Automatic 3-spdFront-Wheel Drive42.2Regular2218
223RenaultFuego 1985Subcompact Cars Manual 5-spd Front-Wheel Drive42.2Regular2820
1909RenaultFuego 1986Subcompact Cars Automatic 3-spdFront-Wheel Drive42.2Regular2218
1910RenaultFuego 1986Subcompact Cars Manual 5-spd Front-Wheel Drive42.2Regular2820
3307RenaultGTA 1987Compact Cars Manual 5-spd Front-Wheel Drive42.0Regular2723
3109RenaultGTA Convertible 1987Subcompact Cars Manual 5-spd Front-Wheel Drive42.0Regular2621
# Highway MpG
## creating a probability mass table
v <- sort(unique(data$hwy))
f <- numeric(length(v))
for (r in 1:nrow(data)) {
  z <- data$hwy[r]
  i <- which(v == z)
  f[i] <- f[i] + 1
}
df_hwy <- data.frame(x=v, f=f/sum(f))

# City MpG
## creating a probability mass values
v <- sort(unique(data$cty))
f <- numeric(length(v))
for (r in 1:nrow(data)) {
  z <- data$cty[r]
  i <- which(v == z)
  f[i] <- f[i] + 1
}
df_cty <- data.frame(x=v, f=f/sum(f))
  • Mean

# Highway MpG
df <- df_hwy
z  <- sum(df$f * df$x)
message("Mean Highway MpG: ", round(z, digits=3))

# City MpG
df <- df_cty
z  <- sum(df$f * df$x)
message("Mean City MpG: ", round(z, digits=3))
Mean Highway MpG: 29.242

Mean City MpG: 23.03
  • Median

# Highway MpG
df <- df_hwy
z  <- NA
F  <- cumsum(df$f) 
for (i in 2:nrow(df)) {
    if (F[i-1] < 0.5 & F[i] > 0.5) {
        z <- df$x[i]
        break
    } else if (F[i] == 0.5) {
        z <- (df$x[i] + df$x[i+1]) / 2
        break
    }
}
message("Median Highway MpG: ", round(z, digits=3))

# City MpG
df <- df_cty
z  <- NA
F  <- cumsum(df$f) 
for (i in 2:nrow(df)) {
    if (F[i-1] < 0.5 & F[i] > 0.5) {
        z <- df$x[i]
        break
    } else if (F[i] == 0.5) {
        z <- (df$x[i] + df$x[i+1]) / 2
        break
    }
}
message("Median City MpG: ", round(z, digits=3))
Median Highway MpG: 28

Median City MpG: 23
  • Mode

# Highway MpG
df <- df_hwy
z  <- df$x[which(df$f == max(df$f))]
message("Mode Highway MpG: ", z[1])

# City MpG
df <- df_cty
z  <- df$x[which(df$f == max(df$f))]
message("Mode City MpG: ", z[1])
Mode Highway MpG: 22

Mode City MpG: 20
  • Range

# Highway MpG
z = max(data$hwy) - min(data$hwy)
message("Range Highway MpG: ", z)

# City MpG
z = max(data$cty) - min(data$cty)
message("Range City MpG: ", z)
Range Highway MpG: 15

Range City MpG: 12
  • Inter-Quartile Range

# Highway MpG
df <- df_hwy
F  <- cumsum(df$f) 
## First Quartile
q1 <- NA
for (i in 2:nrow(df)) {
    if (F[i-1] < 0.25 & F[i] > 0.25) {
        q1 <- df$x[i]
        break
    } else if (F[i] == 0.25) {
        q1 <- (df$x[i] + df$x[i+1]) / 2
        break
    }
}
## Third Quartile
q3 <- NA
for (i in 2:nrow(df)) {
    if (F[i-1] < 0.75 & F[i] > 0.75) {
        q3 <- df$x[i]
        break
    } else if (F[i] == 0.75) {
        q3 <- (df$x[i] + df$x[i+1]) / 2
        break
    }
}
z = q3 - q1
message("IQR Highway MpG: ", z)

# City MpG
df <- df_cty
F  <- cumsum(df$f)
## First Quartile
q1 <- NA
for (i in 2:nrow(df)) {
    if (F[i-1] < 0.25 & F[i] > 0.25) {
        q1 <- df$x[i]
        break
    } else if (F[i] == 0.25) {
        q1 <- (df$x[i] + df$x[i+1]) / 2
        break
    }
}
## Third Quartile
q3 <- NA
for (i in 2:nrow(df)) {
    if (F[i-1] < 0.75 & F[i] > 0.75) {
        q3 <- df$x[i]
        break
    } else if (F[i] == 0.75) {
        q3 <- (df$x[i] + df$x[i+1]) / 2
        break
    }
}
z = q3 - q1
message("IQR City MpG: ", z)
IQR Highway MpG: 8

IQR City MpG: 6
  • Standard Deviation

# Highway MpG
df <- df_hwy
z  <- sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2)))
message("Standard Deviation Highway MpG: ", round(z, digits=3))
# City MpG
df <- df_cty
z  <- sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2)))
message("Standard Deviation City MpG: ", round(z, digits=3))
Standard Deviation Highway MpG: 4.799

Standard Deviation City MpG: 3.689
  • Skewness

# Highway MpG
df <- df_hwy
z  <- sum(df$f * ((df$x - sum(df$f * df$x))^3)) / (sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2))))^3
message("Skewness Highway MpG: ", round(z, digits=3))
# City MpG
df <- df_cty
z  <- sum(df$f * ((df$x - sum(df$f * df$x))^3)) / (sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2))))^3
message("Skewness City MpG: ", round(z, digits=3))
Skewness Highway MpG: 0.066

Skewness City MpG: 0.309
  • Kurtosis

# Highway MpG
df <- df_hwy
z  <- sum(df$f * ((df$x - sum(df$f * df$x))^4)) / (sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2))))^4
message("Kurtosis Highway MpG: ", round(z, digits=3))
# City MpG
df <- df_cty
z  <- sum(df$f * ((df$x - sum(df$f * df$x))^4)) / (sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2))))^4
message("Kurtosis City MpG: ", round(z, digits=3))
Kurtosis Highway MpG: 1.792

Kurtosis City MpG: 1.8

b. Based on these statistics, draw inferences for highway and city mileage (2)

  • City mileage is generally lower than highway mileage, as indicated by the measures of location (mean, median, mode).

  • Highway mileage shows greater variability compared to city mileage, based on the measures of dispersion (range, IQR, standard deviation).

  • City mileage distribution is more asymmetrically clustered around the mean (higher skewness and kurtosis) than highway mileage, as seen in the measures of shape.


  1. Probability Analysis (5)

a. Using the vehicles dataset filtered out for Honda vehicles, verify the axioms of probability for vehicle classes and engine cylinders. (1)

# Honda data
data <- fueleconomy::vehicles %>% filter(make=="Honda")
# Honda vehicle classes
df_class <- data.frame(class=names(table(data$class)), freq=as.numeric(table(data$class)), prob=as.numeric(prop.table(table(data$class))))
df       <- df_class
df
message("Axiom #1: ", all(df$prob >= 0))
message("Axiom #2: ", sum(df$prob) == 1)
message("Axiom #3: ", round((df$freq[1] + df$freq[2]) / sum(df$freq), digits=3) == round(df$prob[1] + df$prob[2], digits=3))
A data.frame: 18 × 3
classfreqprob
<chr><dbl><dbl>
Compact Cars 1420.180203046
Large Cars 150.019035533
Midsize-Large Station Wagons 30.003807107
Midsize Cars 620.078680203
Midsize Station Wagons 10.001269036
Minivan - 2WD 250.031725888
Small Sport Utility Vehicle 2WD 80.010152284
Small Sport Utility Vehicle 4WD 50.006345178
Small Station Wagons 730.092639594
Special Purpose Vehicle 2WD 30.003807107
Special Purpose Vehicle 4WD 70.008883249
Special Purpose Vehicles 180.022842640
Sport Utility Vehicle - 2WD 510.064720812
Sport Utility Vehicle - 4WD 630.079949239
Standard Pickup Trucks 4WD 90.011421320
Standard Sport Utility Vehicle 4WD 10.001269036
Subcompact Cars 2090.265228426
Two Seaters 930.118020305
Axiom #1: TRUE

Axiom #2: TRUE

Axiom #3: TRUE
# Honda engine cylinders
df_cyl   <- data.frame(cyl=names(table(data$cyl)), freq=as.numeric(table(data$cyl)), prob=as.numeric(prop.table(table(data$cyl))))
df       <- df_cyl
df
message("Axiom #1: ", all(df$prob >= 0))
message("Axiom #2: ", sum(df$prob) == 1)
message("Axiom #3: ", round((df$freq[1] + df$freq[2]) / sum(df$freq), digits=3) == round(df$prob[1] + df$prob[2], digits=3))
A data.frame: 3 × 3
cylfreqprob
<chr><dbl><dbl>
3 140.0178117
46290.8002545
61430.1819338
Axiom #1: TRUE

Axiom #2: TRUE

Axiom #3: TRUE

b. Using the vehicles dataset filtered out for Honda vehicles, employ conditional probability formula to evaluate the probability of a compact car having a 4-cylinder engine and consequently, employ the Bayes’ rule to evaluate the probability a 4-cylinder engine vehicle being a compact car. (4)

# Honda vehicle classes and engine cylinders
df <- as.data.frame(table(data$class, data$cyl))
names(df) <- c("class", "cyl", "freq")
df$prob <- prop.table(df$freq)
df
A data.frame: 54 × 4
classcylfreqprob
<fct><fct><int><dbl>
Compact Cars 3 00.000000000
Large Cars 3 00.000000000
Midsize-Large Station Wagons 3 00.000000000
Midsize Cars 3 00.000000000
Midsize Station Wagons 3 00.000000000
Minivan - 2WD 3 00.000000000
Small Sport Utility Vehicle 2WD 3 00.000000000
Small Sport Utility Vehicle 4WD 3 00.000000000
Small Station Wagons 3 00.000000000
Special Purpose Vehicle 2WD 3 00.000000000
Special Purpose Vehicle 4WD 3 00.000000000
Special Purpose Vehicles 3 00.000000000
Sport Utility Vehicle - 2WD 3 00.000000000
Sport Utility Vehicle - 4WD 3 00.000000000
Standard Pickup Trucks 4WD 3 00.000000000
Standard Sport Utility Vehicle 4WD3 00.000000000
Subcompact Cars 3 00.000000000
Two Seaters 3 140.017811705
Compact Cars 41290.164122137
Large Cars 4 100.012722646
Midsize-Large Station Wagons 4 30.003816794
Midsize Cars 4 380.048346056
Midsize Station Wagons 4 10.001272265
Minivan - 2WD 4 00.000000000
Small Sport Utility Vehicle 2WD 4 40.005089059
Small Sport Utility Vehicle 4WD 4 20.002544529
Small Station Wagons 4 710.090330789
Special Purpose Vehicle 2WD 4 10.001272265
Special Purpose Vehicle 4WD 4 30.003816794
Special Purpose Vehicles 4 40.005089059
Sport Utility Vehicle - 2WD 4 330.041984733
Sport Utility Vehicle - 4WD 4 420.053435115
Standard Pickup Trucks 4WD 4 00.000000000
Standard Sport Utility Vehicle 4WD4 00.000000000
Subcompact Cars 42090.265903308
Two Seaters 4 790.100508906
Compact Cars 6 130.016539440
Large Cars 6 50.006361323
Midsize-Large Station Wagons 6 00.000000000
Midsize Cars 6 240.030534351
Midsize Station Wagons 6 00.000000000
Minivan - 2WD 6 250.031806616
Small Sport Utility Vehicle 2WD 6 40.005089059
Small Sport Utility Vehicle 4WD 6 30.003816794
Small Station Wagons 6 00.000000000
Special Purpose Vehicle 2WD 6 20.002544529
Special Purpose Vehicle 4WD 6 40.005089059
Special Purpose Vehicles 6 140.017811705
Sport Utility Vehicle - 2WD 6 180.022900763
Sport Utility Vehicle - 4WD 6 210.026717557
Standard Pickup Trucks 4WD 6 90.011450382
Standard Sport Utility Vehicle 4WD6 10.001272265
Subcompact Cars 6 00.000000000
Two Seaters 6 00.000000000
## Probabilities
P_A   = df_cyl$prob[which(df_cyl$cyl==4)]
P_B   = df_class$prob[which(df_class$class=="Compact Cars")]
P_AXB = df$prob[which(df$cyl==4 & df$class=="Compact Cars")]
P_AB  = P_AXB / P_B
P_BA  = P_AXB / P_A
P_BA  = P_AB * (P_B / P_A)

## conditional probability of a compact car having a 4-cylinder engine
message("Conditional Probability of a compact car having a 4-cylinder engine: ", round(P_AB, digits=3))

## conditional probability that a 4-cylinder engine vehicle is a compact car
message("Conditional Probability that a 4-cylinder engine vehicle is in a compact car (using Conditional Probability): ", round(P_BA, digits=3))
message("Conditional Probability that a 4-cylinder engine vehicle is in a compact car (using Bayes' Theorem): ", round(P_BA, digits=3))
Conditional Probability of a compact car having a 4-cylinder engine: 0.911

Conditional Probability that a 4-cylinder engine vehicle is in a compact car (using Conditional Probability): 0.205

Conditional Probability that a 4-cylinder engine vehicle is in a compact car (using Bayes' Theorem): 0.205

  1. Data Sampling (8)

a. For the following randomly sampled data from the vehicles dataset, compute bias and standard error for the estimator on highway mileage. (5)

library(ggplot2)

P <- fueleconomy::vehicles$hwy
m <- 50
n <- 1000

# population parameter  
z <- mean(P, na.rm=TRUE)
  
Z <- vector("numeric", m)
for (i in 1:m) {
  set.seed(i)
  I <- order(runif(length(P)))[1:n]
  S <- P[I]
  # sample parameter
  Z[i] <- mean(S, na.rm=TRUE)
}

data.frame(parameter=z, estimator=Z, error=Z - z)
message("Bias: ", round(mean(Z - z), digits=3))
message("Standard Error: ", round(sd(Z), digits=3))
A data.frame: 50 × 3
parameterestimatorerror
<dbl><dbl><dbl>
23.5512823.501-0.050282818
23.5512823.922 0.370717182
23.5512823.499-0.052282818
23.5512823.135-0.416282818
23.5512823.788 0.236717182
23.5512823.710 0.158717182
23.5512823.433-0.118282818
23.5512823.595 0.043717182
23.5512823.543-0.008282818
23.5512823.405-0.146282818
23.5512823.866 0.314717182
23.5512823.286-0.265282818
23.5512823.692 0.140717182
23.5512823.350-0.201282818
23.5512823.468-0.083282818
23.5512823.330-0.221282818
23.5512823.934 0.382717182
23.5512823.328-0.223282818
23.5512823.700 0.148717182
23.5512823.803 0.251717182
23.5512823.893 0.341717182
23.5512823.938 0.386717182
23.5512823.386-0.165282818
23.5512823.554 0.002717182
23.5512823.427-0.124282818
23.5512823.471-0.080282818
23.5512823.583 0.031717182
23.5512823.383-0.168282818
23.5512823.624 0.072717182
23.5512823.684 0.132717182
23.5512823.689 0.137717182
23.5512823.922 0.370717182
23.5512823.663 0.111717182
23.5512823.661 0.109717182
23.5512823.342-0.209282818
23.5512823.665 0.113717182
23.5512823.503-0.048282818
23.5512823.453-0.098282818
23.5512823.446-0.105282818
23.5512823.460-0.091282818
23.5512824.029 0.477717182
23.5512823.485-0.066282818
23.5512823.470-0.081282818
23.5512823.505-0.046282818
23.5512823.351-0.200282818
23.5512823.483-0.068282818
23.5512823.233-0.318282818
23.5512823.553 0.001717182
23.5512823.642 0.090717182
23.5512823.206-0.345282818
Bias: 0.009

Standard Error: 0.209

b. Using the Archery analogy discussed in the class, draw a representative target board to comment upon the accuracy and precision of the estimator. (3)

The target board should represent high accuracy but low precision


  1. Hypothesis Testing (12)

Test the following claims for Renault Vehicles

a. city mileage is greater than 23 mpl

b. highway mileage is greater than 29 mpl

c. highway mileage is not the same as the city mileage

Note, make appropriate assumptions, develop the null and alternate hypotheses, evaluate the test statistic, present the threshold value and consequently, make appropriate inferences.

# Load the dataset
data <- fueleconomy::vehicles %>% filter(make=="Renault")
# Test for city mileage being greater than 23 mpl (One Sample t-test)
message("Null Hypothesis: City mileage is less than or equal to 23 mpl")
message("Alternative Hypothesis: City mileage is greater than 23 mpl")
t = round((mean(data$cty) - 23) / (sd(data$cty) / sqrt(nrow(data))), digits=3)
v = qt(0.95, df=nrow(data)-1)
message("t-statistic: ", round(t, digits=3))
message("Critical value: ", round(v, digits=3))
message("Decision: ", ifelse(t > v, "Reject Null Hypothesis", "Do not reject Null Hypothesis"))
Null Hypothesis: City mileage is less than or equal to 23 mpl

Alternative Hypothesis: City mileage is greater than 23 mpl

t-statistic: 0.046

Critical value: 1.694

Decision: Do not reject Null Hypothesis
# Test for highway mileage being greater than 29 mpl (One Sample t-test)
message("Null Hypothesis: Highway mileage is less than or equal to 29 mpl")
message("Alternative Hypothesis: Highway mileage is greater than 29 mpl")
t = round((mean(data$hwy) - 29) / (sd(data$hwy) / sqrt(nrow(data))), digits=3)
v = qt(0.95, df=nrow(data)-1)
message("t-statistic: ", round(t, digits=3))
message("Critical value: ", round(v, digits=3))
message("Decision: ", ifelse(t > v, "Reject Null Hypothesis", "Do not reject Null Hypothesis"))
Null Hypothesis: Highway mileage is less than or equal to 29 mpl

Alternative Hypothesis: Highway mileage is greater than 29 mpl

t-statistic: 0.286

Critical value: 1.694

Decision: Do not reject Null Hypothesis
# Test for highway mileage not being same as the city mileage (Paired t-test)
message("Null Hypothesis: Highway mileage is equal to city mileage")
message("Alternative Hypothesis: Highway mileage is not equal to city mileage")
t = round((mean(data$hwy) - mean(data$cty)) / (sd(data$hwy - data$cty) / sqrt(nrow(data))), digits=3)
v = qt(0.975, df=nrow(data)-1)
message("t-statistic: ", round(t, digits=3))
message("Critical value: ", round(v, digits=3))
message("Decision: ", ifelse(abs(t) > v, "Reject Null Hypothesis", "Do not reject Null Hypothesis"))
Null Hypothesis: Highway mileage is equal to city mileage

Alternative Hypothesis: Highway mileage is not equal to city mileage

t-statistic: 19.841

Critical value: 2.037

Decision: Reject Null Hypothesis