R语言代做编程辅导和解答Day 2 Lab Activities – MAT 500:Linear Regression and PCA

Directions: Complete the following exercises using the code discussed during computer lab. 

Save your work in an R script as well as a Word document containing the necessary output and comments. 

由LE PHUONG撰写

Be sure to use notes in the script to justify any computations. If you have any questions, do not hesitate to ask.

×

现在提到了代写服务,肯定很多人都不会觉得陌生,就算是国内也是有着专业代写作业的服务行业的,能够为有需求的学生提供很多的帮助,不过其实代写机构在国外会更获得学生的支持,这是因为国外的学校对于平时的作业要求比较严格,为了获得更高的分数顺利毕业,不少留学生就会让代写机构帮忙完成作业,比较常见的作业代写类型,就是计算机专业了,因为对于留学生来说这个技术对于Machine Learning或者AI的代码编程要求更高,所以找代写机构完成作业会简单轻松很多,那么代写机构的水平,要怎么选择才会比较高?

1、代写机构正规专业

不论是在什么情况下,选择正规合法经营的机构肯定是首要的操作,这也是为了避免自己在找机构的时候,出现上当受骗的现象,造成自己的经济出现损失,带来的影响还是非常大的,所以需要注意很多细节才可以,所以在这样的情况下,代写机构的选择,也要选择在经营方面属于正规合法的类型,这样才可以保证服务进行的时候,不会出现各种问题,也可以减少损失的出现,而且正规合法也是代写机构的合格基础。

2、代写机构编程能力

作业的难度相信很多人都很熟悉,特别是对于AI深度学习或者是人工神经网络这种算法来说,因为要对SVM、Design Tree、线性回归以及编程有很高的要求,可以说作业的完成要求非常高,因此才会带动代写机构的发展,找专业的代写机构,一般都是会有专业的人员帮忙进行作业的完成,因为这类型的作业对专业要求比较高,因此代写机构也要具备专业能力才可以,否则很容易导致作业的完成出现问题,出现低分的评价。

3、代写机构收费情况

现在有非常多的留学生,都很在意作业的完成度,为了保证作业可以顺利的被完成,要进行的相关操作可是非常多的,代写机构也是因为如此才会延伸出来的,在现在发展也很迅速,现在选择代写机构的时候,一定要重视收费情况的合理性,因为代写作业还是比较费精力的,而且对于专业能力要求也高,所以价格方面一般会收取几千元至万元左右的价格,但是比较简单的也只需要几百元价格。

4、代写机构完成速度

大部分人都很在意代写机构的专业能力,也会很关心要具备什么能力,才可以展现出稳定的代写能力,其实专业的代写机构,对于作业完成度、作业完成时间、作业专业性等方面,都是要有一定的能力的,特别是在完成的时间上,一定要做到可以根据客户规定的时间内完成的操作,才可以作为合格专业的代写机构存在,大众在选择的时候,也可以重视完成时间这一点来。

现在找专业的CS代写机构帮忙完成作业的代写,完全不是奇怪的事情了,而且专业性越强的作业,需要代写机构帮忙的几率就会越高,代写就发展很好,需求量还是非常高的,这也可以很好的说明了,这个专业的难度以及专业性要求,才可以增加代写机构的存在。



1 Simple Linear Regression

  1. Load the data set pressure from the datasets package in R. Perform a Simple Linear Regression on the two variables. 

  1. Provide the regression equation, coefficients table, and anova table. Summarize your findings. What is the relationship between the t statistic for temperature and the F statistic in the ANOVA table?
    1. Refer to the previous exercise. Check the assumptions on the regression model and report your results. Be sure to include the scatterplot with regression equation, normal QQ plot, and residual plot. Explain what you see.
    1. Refer to exercise 1. Experiment with different transformations of the data to improve the model. What is the best transformation?

2 Multiple Linear Regression

  1. Load the swiss data set from the ‘datasets’ package in R. Find the correlation matrix and print the pairwise scatterplots. What variables seem to be related?
  2. Run a Multiple Regression on Fertility using all of the other variables as predictors. Print the model and coefficients table. Explain the meaning of the significant coefficients.

  1. Check the assumptions using the diagnostic tests mentioned in this section. Discuss your findings.
  2. Run a stepwise selection method to reduce the dimension of the model using the backward direction. Print the new model and new coefficients table. Check the assumptions and discuss any changes.
  3. Use Mallow’s Cp to determine the best model. Does your choice match the model in the previous exercise?

python岭回归、Lasso、随机森林、XGBoost、Keras神经网络、kmeans聚类链家租房数据地理可视化分析

阅读文章


3 Principal Component Analysis

  1. Load the longley data set from the R datasets package. This data set was used to predict a countries GNP based on several variables. Find the correlation matrix of the explanatory variables.
  2. Refer to the previous exercise. Perform a principal component analysis on the explanatory variables using the correlation matrix. Use a scree plot to determine the optimal number of components and report them. Try to explain the meaning behind each component.
  3. Refer to the previous exercise. What proportion of variation does each component explain? What is the total cumulative variance explained by the optimal number of components?


随时关注您喜欢的主题


Day 2 Lab Activities – Solutions解答

Simple Linear Regression

1.  > pressure.lm <- lm(pressure ~ temperature, data = pressure)


> summary(pressure.lm)   Call: lm(formula = pressure ~ temperature, data = pressure)   Residuals:     Min      1Q  Median      3Q     Max -158.08 -117.06  -32.84   72.30  409.43   Coefficients:              Estimate Std. Error t value Pr(>|t|)    (Intercept) -147.8989    66.5529  -2.222 0.040124 *  temperature    1.5124     0.3158   4.788 0.000171 *** --- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1   Residual standard error: 150.8 on 17 degrees of freedom Multiple R-squared:  0.5742,    Adjusted R-squared:  0.5492 F-statistic: 22.93 on 1 and 17 DF,  p-value: 0.000171   > anova(pressure.lm) Analysis of Variance Table   Response: pressure             Df Sum Sq Mean Sq F value   Pr(>F)    temperature  1 521530  521530   22.93 0.000171 *** Residuals   17 386665   22745                     --- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1  

The temperature coefficient is positive so if there is a significant relationship between temperature and pressure, it is a direct relationship.  Since the p-value is less than 0.05, temperature is indeed significant in the model.  The relationship between the t statistic and the F statistic is t^2 = F.

2.  Linearity: The scatterplot shows a clear violation of the linearity assumption.  The data appears to be exponentially increasing.  The standardized residual plot reinforces this observation.

        Equal Variance:   The lack of a linear relationship makes it difficult to determine the equality of variance in observations.

       Normality:   The Normal Quantile plot shows a lack of linearity at the tails of the data set.  A Shapiro-Wilk test verifies that the residuals do not follow a normal distribution.

image.png
image.png
   Shapiro-Wilk normality test

data:  rstandard(pressure.lm)

W = 0.8832, **p-value = 0.02438**
image.png

3.  Using a Box Cox transformation, the optimal transformation is either    or

          where λ = 0.01

Multiple Linear Regression

1.                  


                 Fertility Agriculture Examination Education Catholic Infant.Mortality

Fertility            1.000       0.353      -0.646    -0.664    0.464            0.417

Agriculture          0.353       1.000      -0.687    -0.640    0.401           -0.061

Examination         -0.646      -0.687       1.000     0.698   -0.573           -0.114

Education           -0.664      -0.640       0.698     1.000   -0.154           -0.099

Catholic             0.464       0.401      -0.573    -0.154    1.000            0.175

![]()Infant.Mortality     0.417      -0.061      -0.114    -0.099    0.175            1.000

 

*Related Variables:*

 

Fertility, Agriculture

Fertility, Examination

Fertility, Infant Mortality

Agriculture, Examination

Agriculture, Education

Examination, Education

image.png

2. 

Call:

lm(formula = Fertility ~ Agriculture + Examination + Education +

    Catholic + Infant.Mortality, data = swiss)

 

Residuals:

     Min       1Q   Median       3Q      Max

-15.2743  -5.2617   0.5032   4.1198  15.3213

 

Coefficients:

                 Estimate Std. Error t value Pr(>|t|)   

(Intercept)      66.91518   10.70604   6.250 1.91e-07 ***

Agriculture      -0.17211    0.07030  -2.448  0.01873 * 

Examination      -0.25801    0.25388  -1.016  0.31546   

Education        -0.87094    0.18303  -4.758 2.43e-05 ***

Catholic          0.10412    0.03526   2.953  0.00519 **

Infant.Mortality  1.07705    0.38172   2.822  0.00734 **

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 

Residual standard error: 7.165 on 41 degrees of freedom

Multiple R-squared:  0.7067,    Adjusted R-squared:  0.671

F-statistic: 19.76 on 5 and 41 DF,  p-value: 5.594e-10

 

All of the predictors are significant except examination.

3.  Residual Plot:   There is a random pattern in the residual plot which causes no concern with the model fit.

        Normal Q-Q Plot:   The data follows the diagonal line quite nicely, indicating that the residuals probably satisfy the normality assumption.

        Scale – Location:   The data is randomly scattered which indicates that the homoscedasticity assumption is probably met.

image.png

4. 

> swiss.step.b <- step(swiss.lm, direction = 'backward')

Start:  AIC=190.69

Fertility ~ Agriculture + Examination + Education + Catholic +

    Infant.Mortality

 

                   Df Sum of Sq    RSS    AIC

- Examination       1     53.03 2158.1 189.86

<none>                          2105.0 190.69

- Agriculture       1    307.72 2412.8 195.10

- Infant.Mortality  1    408.75 2513.8 197.03

- Catholic          1    447.71 2552.8 197.75

- Education         1   1162.56 3267.6 209.36

 

Step:  AIC=189.86

Fertility ~ Agriculture + Education + Catholic + Infant.Mortality

 

                   Df Sum of Sq    RSS    AIC

<none>                          2158.1 189.86

- Agriculture       1    264.18 2422.2 193.29

- Infant.Mortality  1    409.81 2567.9 196.03

- Catholic          1    956.57 3114.6 205.10

- Education         1   2249.97 4408.0 221.43

 

Call:

lm(formula = Fertility ~ Agriculture + Education + Catholic +

    Infant.Mortality, data = swiss)

 

Residuals:

     Min       1Q   Median       3Q      Max

-14.6765  -6.0522   0.7514   3.1664  16.1422

 

Coefficients:

                 Estimate Std. Error t value Pr(>|t|)   

(Intercept)      62.10131    9.60489   6.466 8.49e-08 ***

Agriculture      -0.15462    0.06819  -2.267  0.02857 * 

Education        -0.98026    0.14814  -6.617 5.14e-08 ***

Catholic          0.12467    0.02889   4.315 9.50e-05 ***

Infant.Mortality  1.07844    0.38187   2.824  0.00722 **

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 

Residual standard error: 7.168 on 42 degrees of freedom

Multiple R-squared:  0.6993,    Adjusted R-squared:  0.6707

F-statistic: 24.42 on 4 and 42 DF,  p-value: 1.717e-10

image.png

The new model does not include the examination variable.  Now all of the predictors are significant.

Residual Plot:   There is a random pattern in the residual plot which causes no concern with the model fit.

        Normal Q-Q Plot:   The data no longer seems to follow a precise normal distribution.  This assumption may now be violated.

        Scale – Location:   The data is randomly scattered which indicates that the homoscedasticity assumption is probably met.

5.  The two models that best fit Mallow’s Cp are the model with all 5 variables or the model with the 4 variables Agriculture, Education, Catholic, and Infant.Mortality.  We prefer a simpler model in statistics, so the best model choice is the model with four explanatory variables.  This is the exact same model that backward selection had identified.

image.png

Principal Component Analysis

          Unemployed Armed.Forces Population  Year Employed

Unemployed         1.000       -0.177      0.687 0.668    0.502

Armed.Forces      -0.177        1.000      0.364 0.417    0.457

Population           0.687        0.364      1.000 0.994    0.960

Year                       0.668        0.417      0.994 1.000    0.971

Employed            0.502        0.457      0.960 0.971    1.000

![]()

2.                      Comp.1  Comp.2

Unemployed       0.3633  0.5988

Armed.Forces       0.2269 -0.7911

Population     0.5261  0.0435

Year                0.5291 -0.0024

Employed            0.5097 -0.1171

The first component is a standardized measure of GNP and the second component is difficult to interpret.

image.png

3.    


Component 1: **71.23%** Variance explained

        Component2:  **23.67%** Variance explained

 

        Cumulative Variance:  **94.89%**

关于分析师

在此对LE PHUONG对本文所作的贡献表示诚挚感谢,她在山东大学完成了计算机科学与技术专业的硕士学位,专注数据分析、数据可视化、数据采集等。擅长Python、SQL、C/C++、HTML、CSS、VSCode、Linux、Jupyter Notebook。

 
QQ在线咨询
售前咨询热线
15121130882
售后咨询热线
0571-63341498

关注有关新文章的微信公众号


永远不要错过任何见解。当新文章发表时,我们会通过微信公众号向您推送。

技术干货

最新洞察

This will close in 0 seconds