1.研究者想了解二年級時學生自評師生衝突(CO_S_2)是否能預測三年級時的數學成績(math_3)。
用R讀取剛剛的CSV檔,並將此資料命名為 reg
reg <- read.csv("D:/104/ML_R/WOW_data.csv",header=TRUE,sep=",")
使用lm() 進行迴歸分析,並將結果存成M_reg
M_reg <-lm( math_3 ~ CO_S_2 ,data = reg)
summary(M_reg)
##
## Call:
## lm(formula = math_3 ~ CO_S_2, data = reg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.1894 -5.6611 0.4521 6.9238 21.0371
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 509.793 2.135 238.801 <2e-16 ***
## CO_S_2 -2.830 1.163 -2.433 0.0159 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.13 on 191 degrees of freedom
## Multiple R-squared: 0.03006, Adjusted R-squared: 0.02498
## F-statistic: 5.919 on 1 and 191 DF, p-value: 0.0159
載入套件進行標準化迴歸分析
library(lm.beta)
library(ggplot2)
lm.beta(M_reg)
##
## Call:
## lm(formula = math_3 ~ CO_S_2, data = reg)
##
## Standardized Coefficients::
## (Intercept) CO_S_2
## 0.0000000 -0.1733755
載入套件進行繪圖
plot(reg$CO_S_2,reg$math_3)
abline(lm(reg$math_3 ~ reg$CO_S_2))
ggplot(reg, aes(x = CO_S_2, y = math_3)) + geom_point(size=3) + stat_smooth(method="lm")
2.延續上面的分析,研究者想再將二年級時學生自評師生溫暖(WA_S_2)加到自變項中,形成多元迴歸分析。
一樣使用lm() 進行迴歸分析,並將結果存成Model_2
Model_2<-lm(math_3 ~ CO_S_2 + WA_S_2, data=reg)
summary(Model_2)
##
## Call:
## lm(formula = math_3 ~ CO_S_2 + WA_S_2, data = reg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.2074 -5.4557 0.6687 6.9193 22.2977
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 514.4646 3.7506 137.168 <2e-16 ***
## CO_S_2 -2.9986 1.1647 -2.575 0.0108 *
## WA_S_2 -1.2530 0.8285 -1.512 0.1321
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.1 on 190 degrees of freedom
## Multiple R-squared: 0.0416, Adjusted R-squared: 0.03151
## F-statistic: 4.123 on 2 and 190 DF, p-value: 0.01766
標準化迴歸分析
lm.beta(Model_2)
##
## Call:
## lm(formula = math_3 ~ CO_S_2 + WA_S_2, data = reg)
##
## Standardized Coefficients::
## (Intercept) CO_S_2 WA_S_2
## 0.0000000 -0.1836963 -0.1079128
由於自變數大於1個,需要進行共線性診斷(VIF<10即可)
library(car)
vif(Model_2)
## CO_S_2 WA_S_2
## 1.009231 1.009231
3.利用三個種族(ethnic)及三年級時的數學成績(math_3)進行虛擬變項分析
選取所需要的變項
dum<-reg[c(3,13)]
將種族換為數字
dum$ethnic_N<-as.numeric(dum$ethnic)
載入套件,利用mutate建立dummy code
library(dplyr)
dum<-mutate(dum, E1 =ifelse(ethnic_N == 1, "1", "0"))
dum<-mutate(dum, E2 =ifelse(ethnic_N == 2, "1", "0"))
dum<-mutate(dum, E3 =ifelse(ethnic_N == 3, "1", "0"))
利用contrast( )看目前的coding發現group目前的基準組為Human
contrasts(dum$ethnic)
## Orc Undead
## Human 0 0
## Orc 1 0
## Undead 0 1
事實上,對於「類別」變數來說,R會自動給予一組dummy coding,至於基準組則是依照字母先後順序去選擇。
在本例中,ethnic為類別變數,三個水準為”Human”、”Orc”、”Undead”,故預設基準組為” Human”
以Human為基準組
dum$ethnic<-as.factor(dum$ethnic)
dum_lm <- lm(math_3 ~ ethnic , data=dum)
summary(dum_lm)
##
## Call:
## lm(formula = math_3 ~ ethnic, data = dum)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.1905 -6.9062 0.8506 6.8095 21.0938
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 509.149 1.015 501.605 < 2e-16 ***
## ethnicOrc -9.959 1.779 -5.598 7.49e-08 ***
## ethnicUndead -6.243 1.559 -4.004 8.92e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.468 on 190 degrees of freedom
## Multiple R-squared: 0.1579, Adjusted R-squared: 0.1491
## F-statistic: 17.82 on 2 and 190 DF, p-value: 8.078e-08
更換基準組為Orc
dum$ethnic <- relevel(dum$ethnic,'Orc ')
contrasts(dum$ethnic)
## Human Undead
## Orc 0 0
## Human 1 0
## Undead 0 1
dum_lm_2 <- lm(math_3 ~ ethnic , data=dum)
summary(dum_lm_2)
##
## Call:
## lm(formula = math_3 ~ ethnic, data = dum)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.1905 -6.9062 0.8506 6.8095 21.0938
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 499.190 1.461 341.702 < 2e-16 ***
## ethnicHuman 9.959 1.779 5.598 7.49e-08 ***
## ethnicUndead 3.716 1.880 1.976 0.0496 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.468 on 190 degrees of freedom
## Multiple R-squared: 0.1579, Adjusted R-squared: 0.1491
## F-statistic: 17.82 on 2 and 190 DF, p-value: 8.078e-08
date: “2016年1月23日,第一版”
author: “邱浩恩”