Lecture 12: Assignment 1 Discussion

Lecture 12: Assignment 1 Discussion#

Data Classification (5)

Consider the following R dataset detailing the attributes for different vehicles including vehicle make, model, year, class, transmission, drive type, number of engine cylinders, total engine displacement, fuel type, and mileage (highway and city). Classify each variable in the dataset as one of the following: Discrete Quantitative, Continuous Quantitative, Qualitative, and Categorical.

# Loading the packages
library(dplyr)
library(fueleconomy)

# Loading the dataset
data <- fueleconomy::vehicles

# Dataset Structure
str(data)

tibble [33,442 × 12] (S3: tbl_df/tbl/data.frame)
 $ id   : num [1:33442] 13309 13310 13311 14038 14039 ...
 $ make : chr [1:33442] "Acura" "Acura" "Acura" "Acura" ...
 $ model: chr [1:33442] "2.2CL/3.0CL" "2.2CL/3.0CL" "2.2CL/3.0CL" "2.3CL/3.0CL" ...
 $ year : num [1:33442] 1997 1997 1997 1998 1998 ...
 $ class: chr [1:33442] "Subcompact Cars" "Subcompact Cars" "Subcompact Cars" "Subcompact Cars" ...
 $ trans: chr [1:33442] "Automatic 4-spd" "Manual 5-spd" "Automatic 4-spd" "Automatic 4-spd" ...
 $ drive: chr [1:33442] "Front-Wheel Drive" "Front-Wheel Drive" "Front-Wheel Drive" "Front-Wheel Drive" ...
 $ cyl  : num [1:33442] 4 4 6 4 4 6 4 4 6 5 ...
 $ displ: num [1:33442] 2.2 2.2 3 2.3 2.3 3 2.3 2.3 3 2.5 ...
 $ fuel : chr [1:33442] "Regular" "Regular" "Regular" "Regular" ...
 $ hwy  : num [1:33442] 26 28 26 27 29 26 27 29 26 23 ...
 $ cty  : num [1:33442] 20 22 18 19 21 17 20 21 17 18 ...

make: categorical variable
model: categorical variable
year: discrete quantitative variable
class: categorical variable
trans: categorical variable
drive: categorical variable
cyl: discrete quantitative variable
displ: discrete quantitative variable
fuel: categorical variable
hwy: discrete quantitative variable
cty: discrete quantitative variable

Data Summary (10)

a. Using the vehicles dataset filtered out for Renault vehicles, summarise measure of location (mean, median, mode), dispersion (range, inter-quartile range, standard deviation), and shape (skewness, kurtosis) for highway as well as city miles per galon. (8)

# Renault data
data <- fueleconomy::vehicles %>% filter(make=="Renault")
data

A tibble: 33 × 12
id	make	model	year	class	trans	drive	cyl	displ	fuel	hwy	cty
<dbl>	<chr>	<chr>	<dbl>	<chr>	<chr>	<chr>	<dbl>	<dbl>	<chr>	<dbl>	<dbl>
618	Renault	18i 4DR Wagon	1985	Small Station Wagons	Automatic 3-spd	Front-Wheel Drive	4	2.2	Regular	22	18
619	Renault	18i 4DR Wagon	1985	Small Station Wagons	Manual 5-spd	Front-Wheel Drive	4	2.2	Regular	28	20
2270	Renault	18i Sportwagon	1986	Small Station Wagons	Automatic 3-spd	Front-Wheel Drive	4	2.2	Regular	22	18
2271	Renault	18i Sportwagon	1986	Small Station Wagons	Manual 5-spd	Front-Wheel Drive	4	2.2	Regular	28	20
3301	Renault	Alliance	1987	Compact Cars	Automatic 3-spd	Front-Wheel Drive	4	1.4	Regular	28	23
3302	Renault	Alliance	1987	Compact Cars	Manual 4-spd	Front-Wheel Drive	4	1.4	Regular	37	29
3303	Renault	Alliance	1987	Compact Cars	Manual 4-spd	Front-Wheel Drive	4	1.4	Regular	34	28
3304	Renault	Alliance	1987	Compact Cars	Manual 5-spd	Front-Wheel Drive	4	1.4	Regular	36	28
3305	Renault	Alliance	1987	Compact Cars	Automatic 3-spd	Front-Wheel Drive	4	1.7	Regular	26	21
3306	Renault	Alliance	1987	Compact Cars	Manual 5-spd	Front-Wheel Drive	4	1.7	Regular	34	25
1907	Renault	Alliance Convertible	1986	Subcompact Cars	Automatic 3-spd	Front-Wheel Drive	4	1.7	Regular	22	19
1908	Renault	Alliance Convertible	1986	Subcompact Cars	Manual 5-spd	Front-Wheel Drive	4	1.7	Regular	29	23
3107	Renault	Alliance Convertible	1987	Subcompact Cars	Automatic 3-spd	Front-Wheel Drive	4	1.7	Regular	26	21
3108	Renault	Alliance Convertible	1987	Subcompact Cars	Manual 5-spd	Front-Wheel Drive	4	1.7	Regular	32	24
373	Renault	Alliance/Encore	1985	Compact Cars	Automatic 3-spd	Front-Wheel Drive	4	1.4	Regular	26	22
374	Renault	Alliance/Encore	1985	Compact Cars	Manual 4-spd	Front-Wheel Drive	4	1.4	Regular	36	29
375	Renault	Alliance/Encore	1985	Compact Cars	Manual 4-spd	Front-Wheel Drive	4	1.4	Regular	32	26
376	Renault	Alliance/Encore	1985	Compact Cars	Manual 5-spd	Front-Wheel Drive	4	1.4	Regular	35	27
377	Renault	Alliance/Encore	1985	Compact Cars	Automatic 3-spd	Front-Wheel Drive	4	1.7	Regular	24	19
378	Renault	Alliance/Encore	1985	Compact Cars	Manual 5-spd	Front-Wheel Drive	4	1.7	Regular	33	25
2050	Renault	Alliance/Encore	1986	Compact Cars	Automatic 3-spd	Front-Wheel Drive	4	1.4	Regular	27	23
2051	Renault	Alliance/Encore	1986	Compact Cars	Manual 4-spd	Front-Wheel Drive	4	1.4	Regular	37	30
2052	Renault	Alliance/Encore	1986	Compact Cars	Manual 4-spd	Front-Wheel Drive	4	1.4	Regular	35	27
2053	Renault	Alliance/Encore	1986	Compact Cars	Manual 5-spd	Front-Wheel Drive	4	1.4	Regular	35	28
2054	Renault	Alliance/Encore	1986	Compact Cars	Automatic 3-spd	Front-Wheel Drive	4	1.7	Regular	26	21
2055	Renault	Alliance/Encore	1986	Compact Cars	Manual 5-spd	Front-Wheel Drive	4	1.7	Regular	33	26
221	Renault	Fuego	1985	Subcompact Cars	Manual 5-spd	Front-Wheel Drive	4	1.6	Premium	29	20
222	Renault	Fuego	1985	Subcompact Cars	Automatic 3-spd	Front-Wheel Drive	4	2.2	Regular	22	18
223	Renault	Fuego	1985	Subcompact Cars	Manual 5-spd	Front-Wheel Drive	4	2.2	Regular	28	20
1909	Renault	Fuego	1986	Subcompact Cars	Automatic 3-spd	Front-Wheel Drive	4	2.2	Regular	22	18
1910	Renault	Fuego	1986	Subcompact Cars	Manual 5-spd	Front-Wheel Drive	4	2.2	Regular	28	20
3307	Renault	GTA	1987	Compact Cars	Manual 5-spd	Front-Wheel Drive	4	2.0	Regular	27	23
3109	Renault	GTA Convertible	1987	Subcompact Cars	Manual 5-spd	Front-Wheel Drive	4	2.0	Regular	26	21

# Highway MpG
## creating a probability mass table
v <- sort(unique(data$hwy))
f <- numeric(length(v))
for (r in 1:nrow(data)) {
  z <- data$hwy[r]
  i <- which(v == z)
  f[i] <- f[i] + 1
}
df_hwy <- data.frame(x=v, f=f/sum(f))

# City MpG
## creating a probability mass values
v <- sort(unique(data$cty))
f <- numeric(length(v))
for (r in 1:nrow(data)) {
  z <- data$cty[r]
  i <- which(v == z)
  f[i] <- f[i] + 1
}
df_cty <- data.frame(x=v, f=f/sum(f))

Mean

# Highway MpG
df <- df_hwy
z  <- sum(df$f * df$x)
message("Mean Highway MpG: ", round(z, digits=3))

# City MpG
df <- df_cty
z  <- sum(df$f * df$x)
message("Mean City MpG: ", round(z, digits=3))

Mean Highway MpG: 29.242

Mean City MpG: 23.03

Median

# Highway MpG
df <- df_hwy
z  <- NA
F  <- cumsum(df$f) 
for (i in 2:nrow(df)) {
    if (F[i-1] < 0.5 & F[i] > 0.5) {
        z <- df$x[i]
        break
    } else if (F[i] == 0.5) {
        z <- (df$x[i] + df$x[i+1]) / 2
        break
    }
}
message("Median Highway MpG: ", round(z, digits=3))

# City MpG
df <- df_cty
z  <- NA
F  <- cumsum(df$f) 
for (i in 2:nrow(df)) {
    if (F[i-1] < 0.5 & F[i] > 0.5) {
        z <- df$x[i]
        break
    } else if (F[i] == 0.5) {
        z <- (df$x[i] + df$x[i+1]) / 2
        break
    }
}
message("Median City MpG: ", round(z, digits=3))

Median Highway MpG: 28

Median City MpG: 23

Mode

# Highway MpG
df <- df_hwy
z  <- df$x[which(df$f == max(df$f))]
message("Mode Highway MpG: ", z[1])

# City MpG
df <- df_cty
z  <- df$x[which(df$f == max(df$f))]
message("Mode City MpG: ", z[1])

Mode Highway MpG: 22

Mode City MpG: 20

Range

# Highway MpG
z = max(data$hwy) - min(data$hwy)
message("Range Highway MpG: ", z)

# City MpG
z = max(data$cty) - min(data$cty)
message("Range City MpG: ", z)

Range Highway MpG: 15

Range City MpG: 12

Inter-Quartile Range

# Highway MpG
df <- df_hwy
F  <- cumsum(df$f) 
## First Quartile
q1 <- NA
for (i in 2:nrow(df)) {
    if (F[i-1] < 0.25 & F[i] > 0.25) {
        q1 <- df$x[i]
        break
    } else if (F[i] == 0.25) {
        q1 <- (df$x[i] + df$x[i+1]) / 2
        break
    }
}
## Third Quartile
q3 <- NA
for (i in 2:nrow(df)) {
    if (F[i-1] < 0.75 & F[i] > 0.75) {
        q3 <- df$x[i]
        break
    } else if (F[i] == 0.75) {
        q3 <- (df$x[i] + df$x[i+1]) / 2
        break
    }
}
z = q3 - q1
message("IQR Highway MpG: ", z)

# City MpG
df <- df_cty
F  <- cumsum(df$f)
## First Quartile
q1 <- NA
for (i in 2:nrow(df)) {
    if (F[i-1] < 0.25 & F[i] > 0.25) {
        q1 <- df$x[i]
        break
    } else if (F[i] == 0.25) {
        q1 <- (df$x[i] + df$x[i+1]) / 2
        break
    }
}
## Third Quartile
q3 <- NA
for (i in 2:nrow(df)) {
    if (F[i-1] < 0.75 & F[i] > 0.75) {
        q3 <- df$x[i]
        break
    } else if (F[i] == 0.75) {
        q3 <- (df$x[i] + df$x[i+1]) / 2
        break
    }
}
z = q3 - q1
message("IQR City MpG: ", z)

IQR Highway MpG: 8

IQR City MpG: 6

Standard Deviation

# Highway MpG
df <- df_hwy
z  <- sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2)))
message("Standard Deviation Highway MpG: ", round(z, digits=3))
# City MpG
df <- df_cty
z  <- sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2)))
message("Standard Deviation City MpG: ", round(z, digits=3))

Standard Deviation Highway MpG: 4.799

Standard Deviation City MpG: 3.689

Skewness

# Highway MpG
df <- df_hwy
z  <- sum(df$f * ((df$x - sum(df$f * df$x))^3)) / (sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2))))^3
message("Skewness Highway MpG: ", round(z, digits=3))
# City MpG
df <- df_cty
z  <- sum(df$f * ((df$x - sum(df$f * df$x))^3)) / (sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2))))^3
message("Skewness City MpG: ", round(z, digits=3))

Skewness Highway MpG: 0.066

Skewness City MpG: 0.309

Kurtosis

# Highway MpG
df <- df_hwy
z  <- sum(df$f * ((df$x - sum(df$f * df$x))^4)) / (sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2))))^4
message("Kurtosis Highway MpG: ", round(z, digits=3))
# City MpG
df <- df_cty
z  <- sum(df$f * ((df$x - sum(df$f * df$x))^4)) / (sqrt(sum(df$f * ((df$x - sum(df$f * df$x))^2))))^4
message("Kurtosis City MpG: ", round(z, digits=3))

Kurtosis Highway MpG: 1.792

Kurtosis City MpG: 1.8

b. Based on these statistics, draw inferences for highway and city mileage (2)

City mileage is generally lower than highway mileage, as indicated by the measures of location (mean, median, mode).
Highway mileage shows greater variability compared to city mileage, based on the measures of dispersion (range, IQR, standard deviation).
City mileage distribution is more asymmetrically clustered around the mean (higher skewness and kurtosis) than highway mileage, as seen in the measures of shape.

Probability Analysis (5)

a. Using the vehicles dataset filtered out for Honda vehicles, verify the axioms of probability for vehicle classes and engine cylinders. (1)

# Honda data
data <- fueleconomy::vehicles %>% filter(make=="Honda")

# Honda vehicle classes
df_class <- data.frame(class=names(table(data$class)), freq=as.numeric(table(data$class)), prob=as.numeric(prop.table(table(data$class))))
df       <- df_class
df
message("Axiom #1: ", all(df$prob >= 0))
message("Axiom #2: ", sum(df$prob) == 1)
message("Axiom #3: ", round((df$freq[1] + df$freq[2]) / sum(df$freq), digits=3) == round(df$prob[1] + df$prob[2], digits=3))

A data.frame: 18 × 3
class	freq	prob
<chr>	<dbl>	<dbl>
Compact Cars	142	0.180203046
Large Cars	15	0.019035533
Midsize-Large Station Wagons	3	0.003807107
Midsize Cars	62	0.078680203
Midsize Station Wagons	1	0.001269036
Minivan - 2WD	25	0.031725888
Small Sport Utility Vehicle 2WD	8	0.010152284
Small Sport Utility Vehicle 4WD	5	0.006345178
Small Station Wagons	73	0.092639594
Special Purpose Vehicle 2WD	3	0.003807107
Special Purpose Vehicle 4WD	7	0.008883249
Special Purpose Vehicles	18	0.022842640
Sport Utility Vehicle - 2WD	51	0.064720812
Sport Utility Vehicle - 4WD	63	0.079949239
Standard Pickup Trucks 4WD	9	0.011421320
Standard Sport Utility Vehicle 4WD	1	0.001269036
Subcompact Cars	209	0.265228426
Two Seaters	93	0.118020305

Axiom #1: TRUE

Axiom #2: TRUE

Axiom #3: TRUE

# Honda engine cylinders
df_cyl   <- data.frame(cyl=names(table(data$cyl)), freq=as.numeric(table(data$cyl)), prob=as.numeric(prop.table(table(data$cyl))))
df       <- df_cyl
df
message("Axiom #1: ", all(df$prob >= 0))
message("Axiom #2: ", sum(df$prob) == 1)
message("Axiom #3: ", round((df$freq[1] + df$freq[2]) / sum(df$freq), digits=3) == round(df$prob[1] + df$prob[2], digits=3))

A data.frame: 3 × 3
cyl	freq	prob
<chr>	<dbl>	<dbl>
3	14	0.0178117
4	629	0.8002545
6	143	0.1819338

Axiom #1: TRUE

Axiom #2: TRUE

Axiom #3: TRUE

b. Using the vehicles dataset filtered out for Honda vehicles, employ conditional probability formula to evaluate the probability of a compact car having a 4-cylinder engine and consequently, employ the Bayes’ rule to evaluate the probability a 4-cylinder engine vehicle being a compact car. (4)

# Honda vehicle classes and engine cylinders
df <- as.data.frame(table(data$class, data$cyl))
names(df) <- c("class", "cyl", "freq")
df$prob <- prop.table(df$freq)
df

A data.frame: 54 × 4
class	cyl	freq	prob
<fct>	<fct>	<int>	<dbl>
Compact Cars	3	0	0.000000000
Large Cars	3	0	0.000000000
Midsize-Large Station Wagons	3	0	0.000000000
Midsize Cars	3	0	0.000000000
Midsize Station Wagons	3	0	0.000000000
Minivan - 2WD	3	0	0.000000000
Small Sport Utility Vehicle 2WD	3	0	0.000000000
Small Sport Utility Vehicle 4WD	3	0	0.000000000
Small Station Wagons	3	0	0.000000000
Special Purpose Vehicle 2WD	3	0	0.000000000
Special Purpose Vehicle 4WD	3	0	0.000000000
Special Purpose Vehicles	3	0	0.000000000
Sport Utility Vehicle - 2WD	3	0	0.000000000
Sport Utility Vehicle - 4WD	3	0	0.000000000
Standard Pickup Trucks 4WD	3	0	0.000000000
Standard Sport Utility Vehicle 4WD	3	0	0.000000000
Subcompact Cars	3	0	0.000000000
Two Seaters	3	14	0.017811705
Compact Cars	4	129	0.164122137
Large Cars	4	10	0.012722646
Midsize-Large Station Wagons	4	3	0.003816794
Midsize Cars	4	38	0.048346056
Midsize Station Wagons	4	1	0.001272265
Minivan - 2WD	4	0	0.000000000
Small Sport Utility Vehicle 2WD	4	4	0.005089059
Small Sport Utility Vehicle 4WD	4	2	0.002544529
Small Station Wagons	4	71	0.090330789
Special Purpose Vehicle 2WD	4	1	0.001272265
Special Purpose Vehicle 4WD	4	3	0.003816794
Special Purpose Vehicles	4	4	0.005089059
Sport Utility Vehicle - 2WD	4	33	0.041984733
Sport Utility Vehicle - 4WD	4	42	0.053435115
Standard Pickup Trucks 4WD	4	0	0.000000000
Standard Sport Utility Vehicle 4WD	4	0	0.000000000
Subcompact Cars	4	209	0.265903308
Two Seaters	4	79	0.100508906
Compact Cars	6	13	0.016539440
Large Cars	6	5	0.006361323
Midsize-Large Station Wagons	6	0	0.000000000
Midsize Cars	6	24	0.030534351
Midsize Station Wagons	6	0	0.000000000
Minivan - 2WD	6	25	0.031806616
Small Sport Utility Vehicle 2WD	6	4	0.005089059
Small Sport Utility Vehicle 4WD	6	3	0.003816794
Small Station Wagons	6	0	0.000000000
Special Purpose Vehicle 2WD	6	2	0.002544529
Special Purpose Vehicle 4WD	6	4	0.005089059
Special Purpose Vehicles	6	14	0.017811705
Sport Utility Vehicle - 2WD	6	18	0.022900763
Sport Utility Vehicle - 4WD	6	21	0.026717557
Standard Pickup Trucks 4WD	6	9	0.011450382
Standard Sport Utility Vehicle 4WD	6	1	0.001272265
Subcompact Cars	6	0	0.000000000
Two Seaters	6	0	0.000000000

## Probabilities
P_A   = df_cyl$prob[which(df_cyl$cyl==4)]
P_B   = df_class$prob[which(df_class$class=="Compact Cars")]
P_AXB = df$prob[which(df$cyl==4 & df$class=="Compact Cars")]
P_AB  = P_AXB / P_B
P_BA  = P_AXB / P_A
P_BA  = P_AB * (P_B / P_A)

## conditional probability of a compact car having a 4-cylinder engine
message("Conditional Probability of a compact car having a 4-cylinder engine: ", round(P_AB, digits=3))

## conditional probability that a 4-cylinder engine vehicle is a compact car
message("Conditional Probability that a 4-cylinder engine vehicle is in a compact car (using Conditional Probability): ", round(P_BA, digits=3))
message("Conditional Probability that a 4-cylinder engine vehicle is in a compact car (using Bayes' Theorem): ", round(P_BA, digits=3))

Conditional Probability of a compact car having a 4-cylinder engine: 0.911

Conditional Probability that a 4-cylinder engine vehicle is in a compact car (using Conditional Probability): 0.205

Conditional Probability that a 4-cylinder engine vehicle is in a compact car (using Bayes' Theorem): 0.205

Data Sampling (8)

a. For the following randomly sampled data from the vehicles dataset, compute bias and standard error for the estimator on highway mileage. (5)

library(ggplot2)

P <- fueleconomy::vehicles$hwy
m <- 50
n <- 1000

# population parameter  
z <- mean(P, na.rm=TRUE)
  
Z <- vector("numeric", m)
for (i in 1:m) {
  set.seed(i)
  I <- order(runif(length(P)))[1:n]
  S <- P[I]
  # sample parameter
  Z[i] <- mean(S, na.rm=TRUE)
}

data.frame(parameter=z, estimator=Z, error=Z - z)
message("Bias: ", round(mean(Z - z), digits=3))
message("Standard Error: ", round(sd(Z), digits=3))

A data.frame: 50 × 3
parameter	estimator	error
<dbl>	<dbl>	<dbl>
23.55128	23.501	-0.050282818
23.55128	23.922	0.370717182
23.55128	23.499	-0.052282818
23.55128	23.135	-0.416282818
23.55128	23.788	0.236717182
23.55128	23.710	0.158717182
23.55128	23.433	-0.118282818
23.55128	23.595	0.043717182
23.55128	23.543	-0.008282818
23.55128	23.405	-0.146282818
23.55128	23.866	0.314717182
23.55128	23.286	-0.265282818
23.55128	23.692	0.140717182
23.55128	23.350	-0.201282818
23.55128	23.468	-0.083282818
23.55128	23.330	-0.221282818
23.55128	23.934	0.382717182
23.55128	23.328	-0.223282818
23.55128	23.700	0.148717182
23.55128	23.803	0.251717182
23.55128	23.893	0.341717182
23.55128	23.938	0.386717182
23.55128	23.386	-0.165282818
23.55128	23.554	0.002717182
23.55128	23.427	-0.124282818
23.55128	23.471	-0.080282818
23.55128	23.583	0.031717182
23.55128	23.383	-0.168282818
23.55128	23.624	0.072717182
23.55128	23.684	0.132717182
23.55128	23.689	0.137717182
23.55128	23.922	0.370717182
23.55128	23.663	0.111717182
23.55128	23.661	0.109717182
23.55128	23.342	-0.209282818
23.55128	23.665	0.113717182
23.55128	23.503	-0.048282818
23.55128	23.453	-0.098282818
23.55128	23.446	-0.105282818
23.55128	23.460	-0.091282818
23.55128	24.029	0.477717182
23.55128	23.485	-0.066282818
23.55128	23.470	-0.081282818
23.55128	23.505	-0.046282818
23.55128	23.351	-0.200282818
23.55128	23.483	-0.068282818
23.55128	23.233	-0.318282818
23.55128	23.553	0.001717182
23.55128	23.642	0.090717182
23.55128	23.206	-0.345282818

Bias: 0.009

Standard Error: 0.209

b. Using the Archery analogy discussed in the class, draw a representative target board to comment upon the accuracy and precision of the estimator. (3)

The target board should represent high accuracy but low precision

Hypothesis Testing (12)

Test the following claims for Renault Vehicles

a. city mileage is greater than 23 mpl

b. highway mileage is greater than 29 mpl

c. highway mileage is not the same as the city mileage

Note, make appropriate assumptions, develop the null and alternate hypotheses, evaluate the test statistic, present the threshold value and consequently, make appropriate inferences.

# Load the dataset
data <- fueleconomy::vehicles %>% filter(make=="Renault")

# Test for city mileage being greater than 23 mpl (One Sample t-test)
message("Null Hypothesis: City mileage is less than or equal to 23 mpl")
message("Alternative Hypothesis: City mileage is greater than 23 mpl")
t = round((mean(data$cty) - 23) / (sd(data$cty) / sqrt(nrow(data))), digits=3)
v = qt(0.95, df=nrow(data)-1)
message("t-statistic: ", round(t, digits=3))
message("Critical value: ", round(v, digits=3))
message("Decision: ", ifelse(t > v, "Reject Null Hypothesis", "Do not reject Null Hypothesis"))

Null Hypothesis: City mileage is less than or equal to 23 mpl

Alternative Hypothesis: City mileage is greater than 23 mpl

t-statistic: 0.046

Critical value: 1.694

Decision: Do not reject Null Hypothesis

# Test for highway mileage being greater than 29 mpl (One Sample t-test)
message("Null Hypothesis: Highway mileage is less than or equal to 29 mpl")
message("Alternative Hypothesis: Highway mileage is greater than 29 mpl")
t = round((mean(data$hwy) - 29) / (sd(data$hwy) / sqrt(nrow(data))), digits=3)
v = qt(0.95, df=nrow(data)-1)
message("t-statistic: ", round(t, digits=3))
message("Critical value: ", round(v, digits=3))
message("Decision: ", ifelse(t > v, "Reject Null Hypothesis", "Do not reject Null Hypothesis"))

Null Hypothesis: Highway mileage is less than or equal to 29 mpl

Alternative Hypothesis: Highway mileage is greater than 29 mpl

t-statistic: 0.286

Critical value: 1.694

Decision: Do not reject Null Hypothesis

# Test for highway mileage not being same as the city mileage (Paired t-test)
message("Null Hypothesis: Highway mileage is equal to city mileage")
message("Alternative Hypothesis: Highway mileage is not equal to city mileage")
t = round((mean(data$hwy) - mean(data$cty)) / (sd(data$hwy - data$cty) / sqrt(nrow(data))), digits=3)
v = qt(0.975, df=nrow(data)-1)
message("t-statistic: ", round(t, digits=3))
message("Critical value: ", round(v, digits=3))
message("Decision: ", ifelse(abs(t) > v, "Reject Null Hypothesis", "Do not reject Null Hypothesis"))

Null Hypothesis: Highway mileage is equal to city mileage

Alternative Hypothesis: Highway mileage is not equal to city mileage

t-statistic: 19.841

Critical value: 2.037

Decision: Reject Null Hypothesis