Directions: Complete the following exercises using the code discussed during computer lab.
Save your work in an R script as well as a Word document containing the necessary output and comments.
Be sure to use notes in the script to justify any computations. If you have any questions, do not hesitate to ask.
现在提到了代写服务,肯定很多人都不会觉得陌生,就算是国内也是有着专业代写作业的服务行业的,能够为有需求的学生提供很多的帮助,不过其实代写机构在国外会更获得学生的支持,这是因为国外的学校对于平时的作业要求比较严格,为了获得更高的分数顺利毕业,不少留学生就会让代写机构帮忙完成作业,比较常见的作业代写类型,就是计算机专业了,因为对于留学生来说这个技术对于Machine Learning或者AI的代码编程要求更高,所以找代写机构完成作业会简单轻松很多,那么代写机构的水平,要怎么选择才会比较高?
1、代写机构正规专业
不论是在什么情况下,选择正规合法经营的机构肯定是首要的操作,这也是为了避免自己在找机构的时候,出现上当受骗的现象,造成自己的经济出现损失,带来的影响还是非常大的,所以需要注意很多细节才可以,所以在这样的情况下,代写机构的选择,也要选择在经营方面属于正规合法的类型,这样才可以保证服务进行的时候,不会出现各种问题,也可以减少损失的出现,而且正规合法也是代写机构的合格基础。
2、代写机构编程能力
作业的难度相信很多人都很熟悉,特别是对于AI深度学习或者是人工神经网络这种算法来说,因为要对SVM、Design Tree、线性回归以及编程有很高的要求,可以说作业的完成要求非常高,因此才会带动代写机构的发展,找专业的代写机构,一般都是会有专业的人员帮忙进行作业的完成,因为这类型的作业对专业要求比较高,因此代写机构也要具备专业能力才可以,否则很容易导致作业的完成出现问题,出现低分的评价。
3、代写机构收费情况
现在有非常多的留学生,都很在意作业的完成度,为了保证作业可以顺利的被完成,要进行的相关操作可是非常多的,代写机构也是因为如此才会延伸出来的,在现在发展也很迅速,现在选择代写机构的时候,一定要重视收费情况的合理性,因为代写作业还是比较费精力的,而且对于专业能力要求也高,所以价格方面一般会收取几千元至万元左右的价格,但是比较简单的也只需要几百元价格。
4、代写机构完成速度
大部分人都很在意代写机构的专业能力,也会很关心要具备什么能力,才可以展现出稳定的代写能力,其实专业的代写机构,对于作业完成度、作业完成时间、作业专业性等方面,都是要有一定的能力的,特别是在完成的时间上,一定要做到可以根据客户规定的时间内完成的操作,才可以作为合格专业的代写机构存在,大众在选择的时候,也可以重视完成时间这一点来。
现在找专业的CS代写机构帮忙完成作业的代写,完全不是奇怪的事情了,而且专业性越强的作业,需要代写机构帮忙的几率就会越高,代写就发展很好,需求量还是非常高的,这也可以很好的说明了,这个专业的难度以及专业性要求,才可以增加代写机构的存在。
1 Simple Linear Regression
- Load the data set pressure from the datasets package in R. Perform a Simple Linear Regression on the two variables.
- Provide the regression equation, coefficients table, and anova table. Summarize your findings. What is the relationship between the t statistic for temperature and the F statistic in the ANOVA table?
- Refer to the previous exercise. Check the assumptions on the regression model and report your results. Be sure to include the scatterplot with regression equation, normal QQ plot, and residual plot. Explain what you see.
- Refer to exercise 1. Experiment with different transformations of the data to improve the model. What is the best transformation?
2 Multiple Linear Regression
- Load the swiss data set from the ‘datasets’ package in R. Find the correlation matrix and print the pairwise scatterplots. What variables seem to be related?
- Run a Multiple Regression on Fertility using all of the other variables as predictors. Print the model and coefficients table. Explain the meaning of the significant coefficients.
- Check the assumptions using the diagnostic tests mentioned in this section. Discuss your findings.
- Run a stepwise selection method to reduce the dimension of the model using the backward direction. Print the new model and new coefficients table. Check the assumptions and discuss any changes.
- Use Mallow’s Cp to determine the best model. Does your choice match the model in the previous exercise?
3 Principal Component Analysis
- Load the longley data set from the R datasets package. This data set was used to predict a countries GNP based on several variables. Find the correlation matrix of the explanatory variables.
- Refer to the previous exercise. Perform a principal component analysis on the explanatory variables using the correlation matrix. Use a scree plot to determine the optimal number of components and report them. Try to explain the meaning behind each component.
- Refer to the previous exercise. What proportion of variation does each component explain? What is the total cumulative variance explained by the optimal number of components?
随时关注您喜欢的主题
Day 2 Lab Activities – Solutions解答
Simple Linear Regression
1. > pressure.lm <- lm(pressure ~ temperature, data = pressure)
> summary(pressure.lm)
Call:
lm(formula = pressure ~ temperature, data = pressure)
Residuals:
Min 1Q Median 3Q Max
-158.08 -117.06 -32.84 72.30 409.43
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -147.8989 66.5529 -2.222 0.040124 *
temperature 1.5124 0.3158 4.788 0.000171 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 150.8 on 17 degrees of freedom
Multiple R-squared: 0.5742, Adjusted R-squared: 0.5492
F-statistic: 22.93 on 1 and 17 DF, p-value: 0.000171
> anova(pressure.lm)
Analysis of Variance Table
Response: pressure
Df Sum Sq Mean Sq F value Pr(>F)
temperature 1 521530 521530 22.93 0.000171 ***
Residuals 17 386665 22745
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The temperature coefficient is positive so if there is a significant relationship between temperature and pressure, it is a direct relationship. Since the p-value is less than 0.05, temperature is indeed significant in the model. The relationship between the t statistic and the F statistic is t^2 = F.
2. Linearity: The scatterplot shows a clear violation of the linearity assumption. The data appears to be exponentially increasing. The standardized residual plot reinforces this observation.
Equal Variance: The lack of a linear relationship makes it difficult to determine the equality of variance in observations.
Normality: The Normal Quantile plot shows a lack of linearity at the tails of the data set. A Shapiro-Wilk test verifies that the residuals do not follow a normal distribution.
Shapiro-Wilk normality test
data: rstandard(pressure.lm)
W = 0.8832, **p-value = 0.02438**
3. Using a Box Cox transformation, the optimal transformation is either or
where λ = 0.01
Multiple Linear Regression
1.
Fertility Agriculture Examination Education Catholic Infant.Mortality
Fertility 1.000 0.353 -0.646 -0.664 0.464 0.417
Agriculture 0.353 1.000 -0.687 -0.640 0.401 -0.061
Examination -0.646 -0.687 1.000 0.698 -0.573 -0.114
Education -0.664 -0.640 0.698 1.000 -0.154 -0.099
Catholic 0.464 0.401 -0.573 -0.154 1.000 0.175
![]()Infant.Mortality 0.417 -0.061 -0.114 -0.099 0.175 1.000
*Related Variables:*
Fertility, Agriculture
Fertility, Examination
Fertility, Infant Mortality
Agriculture, Examination
Agriculture, Education
Examination, Education
2.
Call:
lm(formula = Fertility ~ Agriculture + Examination + Education +
Catholic + Infant.Mortality, data = swiss)
Residuals:
Min 1Q Median 3Q Max
-15.2743 -5.2617 0.5032 4.1198 15.3213
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 66.91518 10.70604 6.250 1.91e-07 ***
Agriculture -0.17211 0.07030 -2.448 0.01873 *
Examination -0.25801 0.25388 -1.016 0.31546
Education -0.87094 0.18303 -4.758 2.43e-05 ***
Catholic 0.10412 0.03526 2.953 0.00519 **
Infant.Mortality 1.07705 0.38172 2.822 0.00734 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.165 on 41 degrees of freedom
Multiple R-squared: 0.7067, Adjusted R-squared: 0.671
F-statistic: 19.76 on 5 and 41 DF, p-value: 5.594e-10
All of the predictors are significant except examination.
3. Residual Plot: There is a random pattern in the residual plot which causes no concern with the model fit.
Normal Q-Q Plot: The data follows the diagonal line quite nicely, indicating that the residuals probably satisfy the normality assumption.
Scale – Location: The data is randomly scattered which indicates that the homoscedasticity assumption is probably met.
4.
> swiss.step.b <- step(swiss.lm, direction = 'backward')
Start: AIC=190.69
Fertility ~ Agriculture + Examination + Education + Catholic +
Infant.Mortality
Df Sum of Sq RSS AIC
- Examination 1 53.03 2158.1 189.86
<none> 2105.0 190.69
- Agriculture 1 307.72 2412.8 195.10
- Infant.Mortality 1 408.75 2513.8 197.03
- Catholic 1 447.71 2552.8 197.75
- Education 1 1162.56 3267.6 209.36
Step: AIC=189.86
Fertility ~ Agriculture + Education + Catholic + Infant.Mortality
Df Sum of Sq RSS AIC
<none> 2158.1 189.86
- Agriculture 1 264.18 2422.2 193.29
- Infant.Mortality 1 409.81 2567.9 196.03
- Catholic 1 956.57 3114.6 205.10
- Education 1 2249.97 4408.0 221.43
Call:
lm(formula = Fertility ~ Agriculture + Education + Catholic +
Infant.Mortality, data = swiss)
Residuals:
Min 1Q Median 3Q Max
-14.6765 -6.0522 0.7514 3.1664 16.1422
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 62.10131 9.60489 6.466 8.49e-08 ***
Agriculture -0.15462 0.06819 -2.267 0.02857 *
Education -0.98026 0.14814 -6.617 5.14e-08 ***
Catholic 0.12467 0.02889 4.315 9.50e-05 ***
Infant.Mortality 1.07844 0.38187 2.824 0.00722 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.168 on 42 degrees of freedom
Multiple R-squared: 0.6993, Adjusted R-squared: 0.6707
F-statistic: 24.42 on 4 and 42 DF, p-value: 1.717e-10
The new model does not include the examination variable. Now all of the predictors are significant.
Residual Plot: There is a random pattern in the residual plot which causes no concern with the model fit.
Normal Q-Q Plot: The data no longer seems to follow a precise normal distribution. This assumption may now be violated.
Scale – Location: The data is randomly scattered which indicates that the homoscedasticity assumption is probably met.
5. The two models that best fit Mallow’s Cp are the model with all 5 variables or the model with the 4 variables Agriculture, Education, Catholic, and Infant.Mortality. We prefer a simpler model in statistics, so the best model choice is the model with four explanatory variables. This is the exact same model that backward selection had identified.
Principal Component Analysis
Unemployed Armed.Forces Population Year Employed
Unemployed 1.000 -0.177 0.687 0.668 0.502
Armed.Forces -0.177 1.000 0.364 0.417 0.457
Population 0.687 0.364 1.000 0.994 0.960
Year 0.668 0.417 0.994 1.000 0.971
Employed 0.502 0.457 0.960 0.971 1.000
![]()
2. Comp.1 Comp.2
Unemployed 0.3633 0.5988
Armed.Forces 0.2269 -0.7911
Population 0.5261 0.0435
Year 0.5291 -0.0024
Employed 0.5097 -0.1171
The first component is a standardized measure of GNP and the second component is difficult to interpret.
3.
Component 1: **71.23%** Variance explained
Component2: **23.67%** Variance explained
Cumulative Variance: **94.89%**
关于分析师
LE PHUONG
在此对LE PHUONG对本文所作的贡献表示诚挚感谢,她在山东大学完成了计算机科学与技术专业的硕士学位,专注数据分析、数据可视化、数据采集等。擅长Python、SQL、C/C++、HTML、CSS、VSCode、Linux、Jupyter Notebook。