In a simulation of 30 mobile computer networks, the average speed, pause time, and number of neighbor were measured. A “neighbor” is a computer within the transmission range of another. The data are presented in the following table. a. Fit the model with Neighbors as the dependent variable, and independent variables Speed, Pause, Speed,·Pause, Speed 2 , and Pause 2 . b. Construct a reduced model by dropping any variables whose P -values are large, and test the plausibility of the model with an F test. c. Plot the residuals versus the fitted values for the reduced model. Are there any indications that the model is inappropriate? If so, what are they? d. Someone suggests that a model containing Pause and Pause 2 as the only dependent variables is adequate. Do you agree? Why or why not? e. Using a best subsets software package, find the two models with the highest R 2 value for each model size from one to five variables. Compute C p and adjusted R 2 for each model. f. Which model is selected by minimum C p ? By adjusted R 2 ? Are they the same?

Question

Want to see more full solutions like this?

Answer 1

Textbook Question

Chapter 8, Problem 5SE

In a simulation of 30 mobile computer networks, the average speed, pause time, and number of neighbor were measured. A “neighbor” is a computer within the transmission range of another. The data are presented in the following table.

Chapter 8, Problem 5SE, In a simulation of 30 mobile computer networks, the average speed, pause time, and number of

a. Fit the model with Neighbors as the dependent variable, and independent variables Speed, Pause, Speed,·Pause, Speed², and Pause².
b. Construct a reduced model by dropping any variables whose P-values are large, and test the plausibility of the model with an F test.
c. Plot the residuals versus the fitted values for the reduced model. Are there any indications that the model is inappropriate? If so, what are they?
d. Someone suggests that a model containing Pause and Pause² as the only dependent variables is adequate. Do you agree? Why or why not?
e. Using a best subsets software package, find the two models with the highest R² value for each model size from one to five variables. Compute C_p and adjusted R² for each model.
f. Which model is selected by minimum C_p? By adjusted R²? Are they the same?

a.

Expert Solution

To determine

Construct a multiple linear regression model with neighbor as the dependent variable, speed, pause, speed×pause, speed2 and pause2 as the independent variables for the given data.

Answer to Problem 5SE

A multiple linear regression model for the given data is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22_.

Explanation of Solution

Calculation:

The data represents the values of the variables number of neighbors, average speed and pause time for a simulation of 30 mobile network computers.

Multiple linear regression model:

A multiple linear regression model is given as yi=β0+β1x1i+...+βkxki+εi where yi is the response variable, and x1i,x2i,...,xki are the k predictor variables. The quantities β0,β1,...,βk are the slopes corresponding to x1i,x2i,...,xki respectively.β^0 is the estimated intercept of the line, from the sample data.

Let x1,x2 be speed and pause. The response variable is y=neighbors.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 1

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22_.

b.

Expert Solution

To determine

Construct a reduced model by dropping the variables with large P- values.

Check whether the reduced model is plausible or not.

Answer to Problem 5SE

A multiple linear regression model for the given data is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22_.

Yes, there is enough evidence to conclude that the reduced model is plausible.

Explanation of Solution

Calculation:

From part (a), it can be seen that the ‘P’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

By observing the P- values of the MINITAB output, it is clear that the largest P-value is 0.390 corresponding to the predictor variable x1x2. Remaining all P- values are reasonable.

Now, the new regression has to be fitted after dropping the predictor variable x1x2.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 2

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22_.

The full model is,

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22

The reduced model is,

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22

The test hypotheses are given below:

Null hypothesis:

H0:β4=0

That is, the dropped predictor of the full model is not significant to predict y.

Alternative hypothesis:

H1:β4≠0

That is, the dropped predictor of the full model is significant to predict y.

Test statistic:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]

Where,

SSEFull represents the sum of squares due to error obtained from the full model.

SSEReduced represents the sum of squares due to error obtained from the reduced model.

n represents the total number of observations.

p represents the number of predictors on the full model.

k represents the number of predictors on the reduced model.

From the obtained MINITAB outputs, the value of error sum of squares for full model is SSEFull=2.2642 and the value of error sum of squares for the reduced model is SSEReduced=2.7307.

The total number of observations is n=30.

Number of predictors on the full model is p=5 and the number of predictors on the reduced model is k=4.

Degrees of freedom of F-statistic for reduced model:

In a reduced multiple linear regression analysis, the F-statistic is f=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)].

In the ratio, the numerator is obtained by dividing the quantity SSEReduced−SSEFull by its degrees of freedom, p−k. The denominator is obtained by dividing the error sum of squares of full model by the error degrees of freedom, n−(p+1).

Thus, the degrees of freedom for the F-statistic in a reduced multiple regression analysis are p−k and n−(p+1).

Hence, the numerator degrees of freedom is p−k=5−4=1 and the denominator degrees of freedom is n−(p+1)=30−6=24.

Test statistic under null hypothesis:

Under the null hypothesis, the test statistic is obtained as follows:

f=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]=(2.7307−2.6462)5−42.6462[30−(5+1)]=0.08450.11026=0.76638

Thus, the test statistic is F=0.76638_

Since, the level of significance is not specified. The prior level of significance α=0.05 can be used.

P-value:

Software procedure:

Choose Graph > Probability Distribution Plot choose View Probability > OK.
From Distribution, choose F, enter 1 in numerator df and 24 in denominator df.
Click the Shaded Area tab.
Choose X-Value and Right Tail for the region of the curve to shade.
Enter the X-value as 0.76638.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 3

From the output, the P- value is 0.39.

Thus, the P- value is 0.39.

Decision criteria based on P-value approach:

If P-value≤α, then reject the null hypothesis H0.

If P-value>α, then fail to reject the null hypothesis H0.

Conclusion:

The P-value is 0.39 and α value is 0.05.

Here, P-value is greater than the α value.

That is 0.39(=P)>0.05(=α).

By the rejection rule, fail to reject the null hypothesis.

Hence, there is sufficient evidence to conclude that the dropped predictor variable is not significant to predict the response variable y.

Thus, the reduced model is useful than the full model to predict the response variable y.

c.

Expert Solution

To determine

Plot the residuals versus fitted line plot for the reduced model.

Check whether the model is appropriate.

Answer to Problem 5SE

Residual plot:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 4

Yes, the model seems to be appropriate.

Explanation of Solution

Calculation:

Residual plot:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X1 and X2*X2.
In Graphs, Under Residuals for plots, select Regular.
Under Residual plots select box Residuals versus fits.
Click OK.

Conditions for the appropriateness of regression model using the residual plot:

The plot of the residuals vs. fitted values should fall roughly in a horizontal band contended and symmetric about x-axis. That is, the residuals of the data should not represent any bend.
The plot of residuals should not contain any outliers.
The residuals have to be scattered randomly around “0” with constant variability among for all the residuals. That is, the spread should be consistent.

Interpretation:

In residual plot there is high bend or pattern, which can violate the straight line condition and there is change in the spread of the residuals from one part to another part of the plot.

However, it is difficult to determine about the violation of the assumptions without the data.

Thus, the model seems to be appropriate.

d.

Expert Solution

To determine

Check whether the model with only two dependent variables x2,x22 is adequate.

Answer to Problem 5SE

No, the model with only two dependent variables x2,x22 is not adequate.

Explanation of Solution

Calculation:

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X2 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 5

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^=9.960−0.1325x2+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^=9.960−0.1325x2+0.001674x22_.

The full model is,

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22

The reduced model is,

y^=9.960−0.1325x2+0.001674x22_

The test hypotheses are given below:

Null hypothesis:

H0:β1=β3=β4

That is, the dropped predictors of the full model are not significant to predict y.

Alternative hypothesis:

H1: At least one β′s≠0

That is, at least one of the dropped predictors of the full model are significant to predict y.

Test statistic:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]

Where,

SSEFull represents the sum of squares due to error obtained from the full model.

SSEReduced represents the sum of squares due to error obtained from the reduced model.

n represents the total number of observations.

p represents the number of predictors on the full model.

k represents the number of predictors on the reduced model.

From the obtained MINITAB outputs, the value of error sum of squares for full model is SSEFull=2.2642 and the value of error sum of squares for the reduced model is SSEReduced=2.7307.

The total number of observations is n=30.

Number of predictors on the full model is p=5 and the number of predictors on the reduced model is k=2.

Degrees of freedom of F-statistic for reduced model:

In a reduced multiple linear regression analysis, the F-statistic is F=M(SSEReduced−SSEFull)MSSEFull.

In the ratio, the numerator is obtained by dividing the quantity SSEReduced−SSEFull by its degrees of freedom, p−k. The denominator is obtained by dividing the error sum of squares of full model by the error degrees of freedom, n−(p+1).

Thus, the degrees of freedom for the F-statistic in a reduced multiple regression analysis are p−k and n−(p+1).

Hence, the numerator degrees of freedom is p−k=5−2=3 and the denominator degrees of freedom is n−(p+1)=30−6=24.

Test statistic under null hypothesis:

Under the null hypothesis, the test statistic is obtained as follows:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]=(7.840−2.6462)5−22.6462[30−(5+1)]=1.731270.11026=15.702

Thus, the test statistic is F=15.702_.

Since, the level of significance is not specified. The prior level of significance α=0.05 can be used.

P-value:

Software procedure:

Choose Graph > Probability Distribution Plot choose View Probability > OK.
From Distribution, choose F, enter 3 in numerator df and 24 in denominator df.
Click the Shaded Area tab.
Choose X-Value and Right Tail for the region of the curve to shade.
Enter the X-value as 15.702.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 6

From the output, the P- value is 7.3×10−6.

Thus, the P- value is 7.3×10−6_.

Decision criteria based on P-value approach:

If P-value≤α, then reject the null hypothesis H0.

If P-value>α, then fail to reject the null hypothesis H0.

Conclusion:

The P-value is 7.3×10−6_ and α value is 0.05.

Here, P-value is less than the α value.

That is 7.3×10−6_(=P)<0.05(=α).

By the rejection rule, reject the null hypothesis.

Hence, there is sufficient evidence to conclude that at least one of the dropped predictors of the full model are significant to predict y.

Thus, the model with only two dependent variables x2,x22 is not adequate.

e.

Expert Solution

To determine

Find the two models with the highest R2 value.

Obtain the values of mallows Cp and adjusted R2 for each model.

Answer to Problem 5SE

The two models with the highest R2 are:

First model with X1,X2,X12,X22 predictors and the second model with X1,X2,X22 predictors.

The values of M Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

Explanation of Solution

Calculation:

Coefficient of multiple determination R2:

The coefficient of multiple determination, R2, is given by:

R2=1−SSESST, where SST and SSE are the total sum of squares and error sum of squares respectively.

The subset with larger R2 is considered to be best subset for prediction.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > Regression> Best subsets.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 7

For the one predictor case, the highest value of R2 is 61.5, corresponding to X2.

For the two predictor case, the highest value of R2 is 76.9, corresponding to X2,X22.

For the three predictor case, the highest value of R2 is 90.3, corresponding to X1,X2,X22.

For the four predictor case, the highest value of R2 is 92.0, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of R2 is 90.6.

The value of R2 is the highest for predictors X1,X2,X12,X22. However, the subset with highest value of R2 is considered to be best subset for prediction.

Thus, depending upon the factors affecting the analysis it would be most preferable to use the regression equation corresponding to the predictors X1,X2,X12,X22.

The second highest value of R2 is 90.6 for the five predictor case and there is not much difference in the value of R2 for the full model and the model with X1,X2,X22 predictors.

That is, 90.6 and 90.3 are not much distinct.

Therefore, the model with X1,X2,X22 predictors is the second best model.

Thus, the two best models are:

First model with X1,X2,X12,X22 predictors and the second model with X1,X2,X22 predictors.

From the accompanying MINITAB output, the values of Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

f.

Expert Solution

To determine

Select the variables for the model, using the Mallows’ Cp criterion and adjusted-R2 criterion.

Check whether both the models are same.

Answer to Problem 5SE

The variables for the model using the Mallows’ Cp criterion are X1,X3 and X4.

The variables for the model using the adjusted-R2 criterion is X1,X2,X3,X4.

Yes, both the models are same.

Explanation of Solution

Mallows’ Cp:

An important utility of the Mallows’ Cp criterion is to compare between regression equations of subsets having different sizes, all taken from the same all-subsets regression.

Mallows’ Cp criterion is given as:

Cp=SSEsubsetMSEall−(n−2p), where SSEsubset denotes the error sum of squares of the current model and MSEall denotes the error mean square for the set of all potential predictors, n is the sample size and p=k+1, with k being the number of predictors.

The predictor with the lowest value of Cp or the value of Cp closest to p is chosen to predict the response variable.

From part (e), the values of Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

For the one predictor case, the lowest value of Cp is 92.5, corresponding to X2.

For the two predictor case, the lowest value of Cp is 47.1, corresponding to X2,X22.

For the three predictor case, the lowest value of Cp is 7.9, corresponding to X1,X2,X22.

For the four predictor case, the lowest value of Cp is 4.8, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of Cp is 6.0.

The value of Cp is the lowest for predictors X1,X2,X12,X22. However, the subset with lowest value of Cp is considered to be best subset for prediction.

Thus, depending upon the factors affecting the analysis it would be most preferable to use the regression equation corresponding to the predictors X1,X2,X12,X22.

Hence, the variables for the model using the Mallows’ Cp criterion are X1,X2,X12,X22.

Adjusted R2 or Ra2:

An important utility of the adjusted coefficient of multiple determination or Ra2 is to find the best subset of the predictors, that can predict the response variable. The best subset may be a smaller subset of all the predictors and need not necessarily be a larger subset, as long as it predicts the response variable accurately. The subset with larger Ra2 is considered to be best subset for prediction.

The adjusted coefficient of multiple determination, Ra2, is given by:

Ra2=1−SSEn−(k+1)SSTn−1.

For the one predictor case, the highest value of Ra2 is 60.1, corresponding to X2.

For the two predictor case, the highest value of Ra2 is 75.2, corresponding to X2,X22.

For the three predictor case, the highest value of Ra2 is 89.2, corresponding to X1,X2,X22.

For the four predictor case, the highest value of Ra2 is 90.7, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of R2 is 90.6.

The value of adjusted R2 is the highest for predictors X1,X2,X12,X22. However, the subset with highest value of adjusted R2 is considered to be best subset for prediction.

Thus, provided other factors do not affect the analysis it could be most preferable to use the regression equation corresponding to the predictors, X1,X2,X12,X22.

Hence, the variables for the model using the adjusted-R2 criterion is X1,X2,X12,X22.

Both Mallows’ Cp and adjusted-R2 suggests that the best model contains the predictor variables X1,X2,X12,X22.

Want to see more full solutions like this?

Subscribe now to access step-by-step solutions to millions of textbook problems written by subject matter experts!

Students have asked these similar questions

The r code for side by side boxplot of vitamind v newage and vitamin d v country. Scatterplot code for relationship between vitamin d level and age.

Make a linear model for the following data (1,13) (5,20) (9,27) (13,34)

The entirety of the data set will be in the two pictures

Answer 2

Textbook Question

Chapter 8, Problem 5SE

In a simulation of 30 mobile computer networks, the average speed, pause time, and number of neighbor were measured. A “neighbor” is a computer within the transmission range of another. The data are presented in the following table.

Chapter 8, Problem 5SE, In a simulation of 30 mobile computer networks, the average speed, pause time, and number of

a. Fit the model with Neighbors as the dependent variable, and independent variables Speed, Pause, Speed,·Pause, Speed², and Pause².
b. Construct a reduced model by dropping any variables whose P-values are large, and test the plausibility of the model with an F test.
c. Plot the residuals versus the fitted values for the reduced model. Are there any indications that the model is inappropriate? If so, what are they?
d. Someone suggests that a model containing Pause and Pause² as the only dependent variables is adequate. Do you agree? Why or why not?
e. Using a best subsets software package, find the two models with the highest R² value for each model size from one to five variables. Compute C_p and adjusted R² for each model.
f. Which model is selected by minimum C_p? By adjusted R²? Are they the same?

a.

Expert Solution

To determine

Construct a multiple linear regression model with neighbor as the dependent variable, speed, pause, speed×pause, speed2 and pause2 as the independent variables for the given data.

Answer to Problem 5SE

A multiple linear regression model for the given data is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22_.

Explanation of Solution

Calculation:

The data represents the values of the variables number of neighbors, average speed and pause time for a simulation of 30 mobile network computers.

Multiple linear regression model:

A multiple linear regression model is given as yi=β0+β1x1i+...+βkxki+εi where yi is the response variable, and x1i,x2i,...,xki are the k predictor variables. The quantities β0,β1,...,βk are the slopes corresponding to x1i,x2i,...,xki respectively.β^0 is the estimated intercept of the line, from the sample data.

Let x1,x2 be speed and pause. The response variable is y=neighbors.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 1

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22_.

b.

Expert Solution

To determine

Construct a reduced model by dropping the variables with large P- values.

Check whether the reduced model is plausible or not.

Answer to Problem 5SE

A multiple linear regression model for the given data is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22_.

Yes, there is enough evidence to conclude that the reduced model is plausible.

Explanation of Solution

Calculation:

From part (a), it can be seen that the ‘P’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

By observing the P- values of the MINITAB output, it is clear that the largest P-value is 0.390 corresponding to the predictor variable x1x2. Remaining all P- values are reasonable.

Now, the new regression has to be fitted after dropping the predictor variable x1x2.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 2

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22_.

The full model is,

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22

The reduced model is,

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22

The test hypotheses are given below:

Null hypothesis:

H0:β4=0

That is, the dropped predictor of the full model is not significant to predict y.

Alternative hypothesis:

H1:β4≠0

That is, the dropped predictor of the full model is significant to predict y.

Test statistic:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]

Where,

SSEFull represents the sum of squares due to error obtained from the full model.

SSEReduced represents the sum of squares due to error obtained from the reduced model.

n represents the total number of observations.

p represents the number of predictors on the full model.

k represents the number of predictors on the reduced model.

From the obtained MINITAB outputs, the value of error sum of squares for full model is SSEFull=2.2642 and the value of error sum of squares for the reduced model is SSEReduced=2.7307.

The total number of observations is n=30.

Number of predictors on the full model is p=5 and the number of predictors on the reduced model is k=4.

Degrees of freedom of F-statistic for reduced model:

In a reduced multiple linear regression analysis, the F-statistic is f=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)].

In the ratio, the numerator is obtained by dividing the quantity SSEReduced−SSEFull by its degrees of freedom, p−k. The denominator is obtained by dividing the error sum of squares of full model by the error degrees of freedom, n−(p+1).

Thus, the degrees of freedom for the F-statistic in a reduced multiple regression analysis are p−k and n−(p+1).

Hence, the numerator degrees of freedom is p−k=5−4=1 and the denominator degrees of freedom is n−(p+1)=30−6=24.

Test statistic under null hypothesis:

Under the null hypothesis, the test statistic is obtained as follows:

f=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]=(2.7307−2.6462)5−42.6462[30−(5+1)]=0.08450.11026=0.76638

Thus, the test statistic is F=0.76638_

Since, the level of significance is not specified. The prior level of significance α=0.05 can be used.

P-value:

Software procedure:

Choose Graph > Probability Distribution Plot choose View Probability > OK.
From Distribution, choose F, enter 1 in numerator df and 24 in denominator df.
Click the Shaded Area tab.
Choose X-Value and Right Tail for the region of the curve to shade.
Enter the X-value as 0.76638.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 3

From the output, the P- value is 0.39.

Thus, the P- value is 0.39.

Decision criteria based on P-value approach:

If P-value≤α, then reject the null hypothesis H0.

If P-value>α, then fail to reject the null hypothesis H0.

Conclusion:

The P-value is 0.39 and α value is 0.05.

Here, P-value is greater than the α value.

That is 0.39(=P)>0.05(=α).

By the rejection rule, fail to reject the null hypothesis.

Hence, there is sufficient evidence to conclude that the dropped predictor variable is not significant to predict the response variable y.

Thus, the reduced model is useful than the full model to predict the response variable y.

c.

Expert Solution

To determine

Plot the residuals versus fitted line plot for the reduced model.

Check whether the model is appropriate.

Answer to Problem 5SE

Residual plot:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 4

Yes, the model seems to be appropriate.

Explanation of Solution

Calculation:

Residual plot:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X1 and X2*X2.
In Graphs, Under Residuals for plots, select Regular.
Under Residual plots select box Residuals versus fits.
Click OK.

Conditions for the appropriateness of regression model using the residual plot:

The plot of the residuals vs. fitted values should fall roughly in a horizontal band contended and symmetric about x-axis. That is, the residuals of the data should not represent any bend.
The plot of residuals should not contain any outliers.
The residuals have to be scattered randomly around “0” with constant variability among for all the residuals. That is, the spread should be consistent.

Interpretation:

In residual plot there is high bend or pattern, which can violate the straight line condition and there is change in the spread of the residuals from one part to another part of the plot.

However, it is difficult to determine about the violation of the assumptions without the data.

Thus, the model seems to be appropriate.

d.

Expert Solution

To determine

Check whether the model with only two dependent variables x2,x22 is adequate.

Answer to Problem 5SE

No, the model with only two dependent variables x2,x22 is not adequate.

Explanation of Solution

Calculation:

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X2 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 5

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^=9.960−0.1325x2+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^=9.960−0.1325x2+0.001674x22_.

The full model is,

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22

The reduced model is,

y^=9.960−0.1325x2+0.001674x22_

The test hypotheses are given below:

Null hypothesis:

H0:β1=β3=β4

That is, the dropped predictors of the full model are not significant to predict y.

Alternative hypothesis:

H1: At least one β′s≠0

That is, at least one of the dropped predictors of the full model are significant to predict y.

Test statistic:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]

Where,

SSEFull represents the sum of squares due to error obtained from the full model.

SSEReduced represents the sum of squares due to error obtained from the reduced model.

n represents the total number of observations.

p represents the number of predictors on the full model.

k represents the number of predictors on the reduced model.

From the obtained MINITAB outputs, the value of error sum of squares for full model is SSEFull=2.2642 and the value of error sum of squares for the reduced model is SSEReduced=2.7307.

The total number of observations is n=30.

Number of predictors on the full model is p=5 and the number of predictors on the reduced model is k=2.

Degrees of freedom of F-statistic for reduced model:

In a reduced multiple linear regression analysis, the F-statistic is F=M(SSEReduced−SSEFull)MSSEFull.

In the ratio, the numerator is obtained by dividing the quantity SSEReduced−SSEFull by its degrees of freedom, p−k. The denominator is obtained by dividing the error sum of squares of full model by the error degrees of freedom, n−(p+1).

Thus, the degrees of freedom for the F-statistic in a reduced multiple regression analysis are p−k and n−(p+1).

Hence, the numerator degrees of freedom is p−k=5−2=3 and the denominator degrees of freedom is n−(p+1)=30−6=24.

Test statistic under null hypothesis:

Under the null hypothesis, the test statistic is obtained as follows:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]=(7.840−2.6462)5−22.6462[30−(5+1)]=1.731270.11026=15.702

Thus, the test statistic is F=15.702_.

Since, the level of significance is not specified. The prior level of significance α=0.05 can be used.

P-value:

Software procedure:

Choose Graph > Probability Distribution Plot choose View Probability > OK.
From Distribution, choose F, enter 3 in numerator df and 24 in denominator df.
Click the Shaded Area tab.
Choose X-Value and Right Tail for the region of the curve to shade.
Enter the X-value as 15.702.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 6

From the output, the P- value is 7.3×10−6.

Thus, the P- value is 7.3×10−6_.

Decision criteria based on P-value approach:

If P-value≤α, then reject the null hypothesis H0.

If P-value>α, then fail to reject the null hypothesis H0.

Conclusion:

The P-value is 7.3×10−6_ and α value is 0.05.

Here, P-value is less than the α value.

That is 7.3×10−6_(=P)<0.05(=α).

By the rejection rule, reject the null hypothesis.

Hence, there is sufficient evidence to conclude that at least one of the dropped predictors of the full model are significant to predict y.

Thus, the model with only two dependent variables x2,x22 is not adequate.

e.

Expert Solution

To determine

Find the two models with the highest R2 value.

Obtain the values of mallows Cp and adjusted R2 for each model.

Answer to Problem 5SE

The two models with the highest R2 are:

First model with X1,X2,X12,X22 predictors and the second model with X1,X2,X22 predictors.

The values of M Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

Explanation of Solution

Calculation:

Coefficient of multiple determination R2:

The coefficient of multiple determination, R2, is given by:

R2=1−SSESST, where SST and SSE are the total sum of squares and error sum of squares respectively.

The subset with larger R2 is considered to be best subset for prediction.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > Regression> Best subsets.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 7

For the one predictor case, the highest value of R2 is 61.5, corresponding to X2.

For the two predictor case, the highest value of R2 is 76.9, corresponding to X2,X22.

For the three predictor case, the highest value of R2 is 90.3, corresponding to X1,X2,X22.

For the four predictor case, the highest value of R2 is 92.0, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of R2 is 90.6.

The value of R2 is the highest for predictors X1,X2,X12,X22. However, the subset with highest value of R2 is considered to be best subset for prediction.

Thus, depending upon the factors affecting the analysis it would be most preferable to use the regression equation corresponding to the predictors X1,X2,X12,X22.

The second highest value of R2 is 90.6 for the five predictor case and there is not much difference in the value of R2 for the full model and the model with X1,X2,X22 predictors.

That is, 90.6 and 90.3 are not much distinct.

Therefore, the model with X1,X2,X22 predictors is the second best model.

Thus, the two best models are:

First model with X1,X2,X12,X22 predictors and the second model with X1,X2,X22 predictors.

From the accompanying MINITAB output, the values of Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

f.

Expert Solution

To determine

Select the variables for the model, using the Mallows’ Cp criterion and adjusted-R2 criterion.

Check whether both the models are same.

Answer to Problem 5SE

The variables for the model using the Mallows’ Cp criterion are X1,X3 and X4.

The variables for the model using the adjusted-R2 criterion is X1,X2,X3,X4.

Yes, both the models are same.

Explanation of Solution

Mallows’ Cp:

An important utility of the Mallows’ Cp criterion is to compare between regression equations of subsets having different sizes, all taken from the same all-subsets regression.

Mallows’ Cp criterion is given as:

Cp=SSEsubsetMSEall−(n−2p), where SSEsubset denotes the error sum of squares of the current model and MSEall denotes the error mean square for the set of all potential predictors, n is the sample size and p=k+1, with k being the number of predictors.

The predictor with the lowest value of Cp or the value of Cp closest to p is chosen to predict the response variable.

From part (e), the values of Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

For the one predictor case, the lowest value of Cp is 92.5, corresponding to X2.

For the two predictor case, the lowest value of Cp is 47.1, corresponding to X2,X22.

For the three predictor case, the lowest value of Cp is 7.9, corresponding to X1,X2,X22.

For the four predictor case, the lowest value of Cp is 4.8, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of Cp is 6.0.

The value of Cp is the lowest for predictors X1,X2,X12,X22. However, the subset with lowest value of Cp is considered to be best subset for prediction.

Thus, depending upon the factors affecting the analysis it would be most preferable to use the regression equation corresponding to the predictors X1,X2,X12,X22.

Hence, the variables for the model using the Mallows’ Cp criterion are X1,X2,X12,X22.

Adjusted R2 or Ra2:

An important utility of the adjusted coefficient of multiple determination or Ra2 is to find the best subset of the predictors, that can predict the response variable. The best subset may be a smaller subset of all the predictors and need not necessarily be a larger subset, as long as it predicts the response variable accurately. The subset with larger Ra2 is considered to be best subset for prediction.

The adjusted coefficient of multiple determination, Ra2, is given by:

Ra2=1−SSEn−(k+1)SSTn−1.

For the one predictor case, the highest value of Ra2 is 60.1, corresponding to X2.

For the two predictor case, the highest value of Ra2 is 75.2, corresponding to X2,X22.

For the three predictor case, the highest value of Ra2 is 89.2, corresponding to X1,X2,X22.

For the four predictor case, the highest value of Ra2 is 90.7, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of R2 is 90.6.

The value of adjusted R2 is the highest for predictors X1,X2,X12,X22. However, the subset with highest value of adjusted R2 is considered to be best subset for prediction.

Thus, provided other factors do not affect the analysis it could be most preferable to use the regression equation corresponding to the predictors, X1,X2,X12,X22.

Hence, the variables for the model using the adjusted-R2 criterion is X1,X2,X12,X22.

Both Mallows’ Cp and adjusted-R2 suggests that the best model contains the predictor variables X1,X2,X12,X22.

Want to see more full solutions like this?

Subscribe now to access step-by-step solutions to millions of textbook problems written by subject matter experts!

Answer 3

Textbook Question

Chapter 8, Problem 5SE

In a simulation of 30 mobile computer networks, the average speed, pause time, and number of neighbor were measured. A “neighbor” is a computer within the transmission range of another. The data are presented in the following table.

Chapter 8, Problem 5SE, In a simulation of 30 mobile computer networks, the average speed, pause time, and number of

a. Fit the model with Neighbors as the dependent variable, and independent variables Speed, Pause, Speed,·Pause, Speed², and Pause².
b. Construct a reduced model by dropping any variables whose P-values are large, and test the plausibility of the model with an F test.
c. Plot the residuals versus the fitted values for the reduced model. Are there any indications that the model is inappropriate? If so, what are they?
d. Someone suggests that a model containing Pause and Pause² as the only dependent variables is adequate. Do you agree? Why or why not?
e. Using a best subsets software package, find the two models with the highest R² value for each model size from one to five variables. Compute C_p and adjusted R² for each model.
f. Which model is selected by minimum C_p? By adjusted R²? Are they the same?

a.

Expert Solution

To determine

Construct a multiple linear regression model with neighbor as the dependent variable, speed, pause, speed×pause, speed2 and pause2 as the independent variables for the given data.

Answer to Problem 5SE

A multiple linear regression model for the given data is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22_.

Explanation of Solution

Calculation:

The data represents the values of the variables number of neighbors, average speed and pause time for a simulation of 30 mobile network computers.

Multiple linear regression model:

A multiple linear regression model is given as yi=β0+β1x1i+...+βkxki+εi where yi is the response variable, and x1i,x2i,...,xki are the k predictor variables. The quantities β0,β1,...,βk are the slopes corresponding to x1i,x2i,...,xki respectively.β^0 is the estimated intercept of the line, from the sample data.

Let x1,x2 be speed and pause. The response variable is y=neighbors.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 1

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22_.

b.

Expert Solution

To determine

Construct a reduced model by dropping the variables with large P- values.

Check whether the reduced model is plausible or not.

Answer to Problem 5SE

A multiple linear regression model for the given data is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22_.

Yes, there is enough evidence to conclude that the reduced model is plausible.

Explanation of Solution

Calculation:

From part (a), it can be seen that the ‘P’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

By observing the P- values of the MINITAB output, it is clear that the largest P-value is 0.390 corresponding to the predictor variable x1x2. Remaining all P- values are reasonable.

Now, the new regression has to be fitted after dropping the predictor variable x1x2.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 2

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22_.

The full model is,

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22

The reduced model is,

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22

The test hypotheses are given below:

Null hypothesis:

H0:β4=0

That is, the dropped predictor of the full model is not significant to predict y.

Alternative hypothesis:

H1:β4≠0

That is, the dropped predictor of the full model is significant to predict y.

Test statistic:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]

Where,

SSEFull represents the sum of squares due to error obtained from the full model.

SSEReduced represents the sum of squares due to error obtained from the reduced model.

n represents the total number of observations.

p represents the number of predictors on the full model.

k represents the number of predictors on the reduced model.

From the obtained MINITAB outputs, the value of error sum of squares for full model is SSEFull=2.2642 and the value of error sum of squares for the reduced model is SSEReduced=2.7307.

The total number of observations is n=30.

Number of predictors on the full model is p=5 and the number of predictors on the reduced model is k=4.

Degrees of freedom of F-statistic for reduced model:

In a reduced multiple linear regression analysis, the F-statistic is f=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)].

In the ratio, the numerator is obtained by dividing the quantity SSEReduced−SSEFull by its degrees of freedom, p−k. The denominator is obtained by dividing the error sum of squares of full model by the error degrees of freedom, n−(p+1).

Thus, the degrees of freedom for the F-statistic in a reduced multiple regression analysis are p−k and n−(p+1).

Hence, the numerator degrees of freedom is p−k=5−4=1 and the denominator degrees of freedom is n−(p+1)=30−6=24.

Test statistic under null hypothesis:

Under the null hypothesis, the test statistic is obtained as follows:

f=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]=(2.7307−2.6462)5−42.6462[30−(5+1)]=0.08450.11026=0.76638

Thus, the test statistic is F=0.76638_

Since, the level of significance is not specified. The prior level of significance α=0.05 can be used.

P-value:

Software procedure:

Choose Graph > Probability Distribution Plot choose View Probability > OK.
From Distribution, choose F, enter 1 in numerator df and 24 in denominator df.
Click the Shaded Area tab.
Choose X-Value and Right Tail for the region of the curve to shade.
Enter the X-value as 0.76638.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 3

From the output, the P- value is 0.39.

Thus, the P- value is 0.39.

Decision criteria based on P-value approach:

If P-value≤α, then reject the null hypothesis H0.

If P-value>α, then fail to reject the null hypothesis H0.

Conclusion:

The P-value is 0.39 and α value is 0.05.

Here, P-value is greater than the α value.

That is 0.39(=P)>0.05(=α).

By the rejection rule, fail to reject the null hypothesis.

Hence, there is sufficient evidence to conclude that the dropped predictor variable is not significant to predict the response variable y.

Thus, the reduced model is useful than the full model to predict the response variable y.

c.

Expert Solution

To determine

Plot the residuals versus fitted line plot for the reduced model.

Check whether the model is appropriate.

Answer to Problem 5SE

Residual plot:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 4

Yes, the model seems to be appropriate.

Explanation of Solution

Calculation:

Residual plot:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X1 and X2*X2.
In Graphs, Under Residuals for plots, select Regular.
Under Residual plots select box Residuals versus fits.
Click OK.

Conditions for the appropriateness of regression model using the residual plot:

The plot of the residuals vs. fitted values should fall roughly in a horizontal band contended and symmetric about x-axis. That is, the residuals of the data should not represent any bend.
The plot of residuals should not contain any outliers.
The residuals have to be scattered randomly around “0” with constant variability among for all the residuals. That is, the spread should be consistent.

Interpretation:

In residual plot there is high bend or pattern, which can violate the straight line condition and there is change in the spread of the residuals from one part to another part of the plot.

However, it is difficult to determine about the violation of the assumptions without the data.

Thus, the model seems to be appropriate.

d.

Expert Solution

To determine

Check whether the model with only two dependent variables x2,x22 is adequate.

Answer to Problem 5SE

No, the model with only two dependent variables x2,x22 is not adequate.

Explanation of Solution

Calculation:

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X2 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 5

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^=9.960−0.1325x2+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^=9.960−0.1325x2+0.001674x22_.

The full model is,

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22

The reduced model is,

y^=9.960−0.1325x2+0.001674x22_

The test hypotheses are given below:

Null hypothesis:

H0:β1=β3=β4

That is, the dropped predictors of the full model are not significant to predict y.

Alternative hypothesis:

H1: At least one β′s≠0

That is, at least one of the dropped predictors of the full model are significant to predict y.

Test statistic:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]

Where,

SSEFull represents the sum of squares due to error obtained from the full model.

SSEReduced represents the sum of squares due to error obtained from the reduced model.

n represents the total number of observations.

p represents the number of predictors on the full model.

k represents the number of predictors on the reduced model.

From the obtained MINITAB outputs, the value of error sum of squares for full model is SSEFull=2.2642 and the value of error sum of squares for the reduced model is SSEReduced=2.7307.

The total number of observations is n=30.

Number of predictors on the full model is p=5 and the number of predictors on the reduced model is k=2.

Degrees of freedom of F-statistic for reduced model:

In a reduced multiple linear regression analysis, the F-statistic is F=M(SSEReduced−SSEFull)MSSEFull.

In the ratio, the numerator is obtained by dividing the quantity SSEReduced−SSEFull by its degrees of freedom, p−k. The denominator is obtained by dividing the error sum of squares of full model by the error degrees of freedom, n−(p+1).

Thus, the degrees of freedom for the F-statistic in a reduced multiple regression analysis are p−k and n−(p+1).

Hence, the numerator degrees of freedom is p−k=5−2=3 and the denominator degrees of freedom is n−(p+1)=30−6=24.

Test statistic under null hypothesis:

Under the null hypothesis, the test statistic is obtained as follows:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]=(7.840−2.6462)5−22.6462[30−(5+1)]=1.731270.11026=15.702

Thus, the test statistic is F=15.702_.

Since, the level of significance is not specified. The prior level of significance α=0.05 can be used.

P-value:

Software procedure:

Choose Graph > Probability Distribution Plot choose View Probability > OK.
From Distribution, choose F, enter 3 in numerator df and 24 in denominator df.
Click the Shaded Area tab.
Choose X-Value and Right Tail for the region of the curve to shade.
Enter the X-value as 15.702.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 6

From the output, the P- value is 7.3×10−6.

Thus, the P- value is 7.3×10−6_.

Decision criteria based on P-value approach:

If P-value≤α, then reject the null hypothesis H0.

If P-value>α, then fail to reject the null hypothesis H0.

Conclusion:

The P-value is 7.3×10−6_ and α value is 0.05.

Here, P-value is less than the α value.

That is 7.3×10−6_(=P)<0.05(=α).

By the rejection rule, reject the null hypothesis.

Hence, there is sufficient evidence to conclude that at least one of the dropped predictors of the full model are significant to predict y.

Thus, the model with only two dependent variables x2,x22 is not adequate.

e.

Expert Solution

To determine

Find the two models with the highest R2 value.

Obtain the values of mallows Cp and adjusted R2 for each model.

Answer to Problem 5SE

The two models with the highest R2 are:

First model with X1,X2,X12,X22 predictors and the second model with X1,X2,X22 predictors.

The values of M Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

Explanation of Solution

Calculation:

Coefficient of multiple determination R2:

The coefficient of multiple determination, R2, is given by:

R2=1−SSESST, where SST and SSE are the total sum of squares and error sum of squares respectively.

The subset with larger R2 is considered to be best subset for prediction.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > Regression> Best subsets.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 7

For the one predictor case, the highest value of R2 is 61.5, corresponding to X2.

For the two predictor case, the highest value of R2 is 76.9, corresponding to X2,X22.

For the three predictor case, the highest value of R2 is 90.3, corresponding to X1,X2,X22.

For the four predictor case, the highest value of R2 is 92.0, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of R2 is 90.6.

The value of R2 is the highest for predictors X1,X2,X12,X22. However, the subset with highest value of R2 is considered to be best subset for prediction.

Thus, depending upon the factors affecting the analysis it would be most preferable to use the regression equation corresponding to the predictors X1,X2,X12,X22.

The second highest value of R2 is 90.6 for the five predictor case and there is not much difference in the value of R2 for the full model and the model with X1,X2,X22 predictors.

That is, 90.6 and 90.3 are not much distinct.

Therefore, the model with X1,X2,X22 predictors is the second best model.

Thus, the two best models are:

First model with X1,X2,X12,X22 predictors and the second model with X1,X2,X22 predictors.

From the accompanying MINITAB output, the values of Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

f.

Expert Solution

To determine

Select the variables for the model, using the Mallows’ Cp criterion and adjusted-R2 criterion.

Check whether both the models are same.

Answer to Problem 5SE

The variables for the model using the Mallows’ Cp criterion are X1,X3 and X4.

The variables for the model using the adjusted-R2 criterion is X1,X2,X3,X4.

Yes, both the models are same.

Explanation of Solution

Mallows’ Cp:

An important utility of the Mallows’ Cp criterion is to compare between regression equations of subsets having different sizes, all taken from the same all-subsets regression.

Mallows’ Cp criterion is given as:

Cp=SSEsubsetMSEall−(n−2p), where SSEsubset denotes the error sum of squares of the current model and MSEall denotes the error mean square for the set of all potential predictors, n is the sample size and p=k+1, with k being the number of predictors.

The predictor with the lowest value of Cp or the value of Cp closest to p is chosen to predict the response variable.

From part (e), the values of Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

For the one predictor case, the lowest value of Cp is 92.5, corresponding to X2.

For the two predictor case, the lowest value of Cp is 47.1, corresponding to X2,X22.

For the three predictor case, the lowest value of Cp is 7.9, corresponding to X1,X2,X22.

For the four predictor case, the lowest value of Cp is 4.8, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of Cp is 6.0.

The value of Cp is the lowest for predictors X1,X2,X12,X22. However, the subset with lowest value of Cp is considered to be best subset for prediction.

Thus, depending upon the factors affecting the analysis it would be most preferable to use the regression equation corresponding to the predictors X1,X2,X12,X22.

Hence, the variables for the model using the Mallows’ Cp criterion are X1,X2,X12,X22.

Adjusted R2 or Ra2:

An important utility of the adjusted coefficient of multiple determination or Ra2 is to find the best subset of the predictors, that can predict the response variable. The best subset may be a smaller subset of all the predictors and need not necessarily be a larger subset, as long as it predicts the response variable accurately. The subset with larger Ra2 is considered to be best subset for prediction.

The adjusted coefficient of multiple determination, Ra2, is given by:

Ra2=1−SSEn−(k+1)SSTn−1.

For the one predictor case, the highest value of Ra2 is 60.1, corresponding to X2.

For the two predictor case, the highest value of Ra2 is 75.2, corresponding to X2,X22.

For the three predictor case, the highest value of Ra2 is 89.2, corresponding to X1,X2,X22.

For the four predictor case, the highest value of Ra2 is 90.7, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of R2 is 90.6.

The value of adjusted R2 is the highest for predictors X1,X2,X12,X22. However, the subset with highest value of adjusted R2 is considered to be best subset for prediction.

Thus, provided other factors do not affect the analysis it could be most preferable to use the regression equation corresponding to the predictors, X1,X2,X12,X22.

Hence, the variables for the model using the adjusted-R2 criterion is X1,X2,X12,X22.

Both Mallows’ Cp and adjusted-R2 suggests that the best model contains the predictor variables X1,X2,X12,X22.

Want to see more full solutions like this?

Subscribe now to access step-by-step solutions to millions of textbook problems written by subject matter experts!

Answer 4

Textbook Question

Chapter 8, Problem 5SE

In a simulation of 30 mobile computer networks, the average speed, pause time, and number of neighbor were measured. A “neighbor” is a computer within the transmission range of another. The data are presented in the following table.

Chapter 8, Problem 5SE, In a simulation of 30 mobile computer networks, the average speed, pause time, and number of

a. Fit the model with Neighbors as the dependent variable, and independent variables Speed, Pause, Speed,·Pause, Speed², and Pause².
b. Construct a reduced model by dropping any variables whose P-values are large, and test the plausibility of the model with an F test.
c. Plot the residuals versus the fitted values for the reduced model. Are there any indications that the model is inappropriate? If so, what are they?
d. Someone suggests that a model containing Pause and Pause² as the only dependent variables is adequate. Do you agree? Why or why not?
e. Using a best subsets software package, find the two models with the highest R² value for each model size from one to five variables. Compute C_p and adjusted R² for each model.
f. Which model is selected by minimum C_p? By adjusted R²? Are they the same?

a.

Expert Solution

To determine

Construct a multiple linear regression model with neighbor as the dependent variable, speed, pause, speed×pause, speed2 and pause2 as the independent variables for the given data.

Answer to Problem 5SE

A multiple linear regression model for the given data is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22_.

Explanation of Solution

Calculation:

The data represents the values of the variables number of neighbors, average speed and pause time for a simulation of 30 mobile network computers.

Multiple linear regression model:

A multiple linear regression model is given as yi=β0+β1x1i+...+βkxki+εi where yi is the response variable, and x1i,x2i,...,xki are the k predictor variables. The quantities β0,β1,...,βk are the slopes corresponding to x1i,x2i,...,xki respectively.β^0 is the estimated intercept of the line, from the sample data.

Let x1,x2 be speed and pause. The response variable is y=neighbors.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 1

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22_.

b.

Expert Solution

To determine

Construct a reduced model by dropping the variables with large P- values.

Check whether the reduced model is plausible or not.

Answer to Problem 5SE

A multiple linear regression model for the given data is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22_.

Yes, there is enough evidence to conclude that the reduced model is plausible.

Explanation of Solution

Calculation:

From part (a), it can be seen that the ‘P’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

By observing the P- values of the MINITAB output, it is clear that the largest P-value is 0.390 corresponding to the predictor variable x1x2. Remaining all P- values are reasonable.

Now, the new regression has to be fitted after dropping the predictor variable x1x2.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 2

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22_.

The full model is,

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22

The reduced model is,

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22

The test hypotheses are given below:

Null hypothesis:

H0:β4=0

That is, the dropped predictor of the full model is not significant to predict y.

Alternative hypothesis:

H1:β4≠0

That is, the dropped predictor of the full model is significant to predict y.

Test statistic:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]

Where,

SSEFull represents the sum of squares due to error obtained from the full model.

SSEReduced represents the sum of squares due to error obtained from the reduced model.

n represents the total number of observations.

p represents the number of predictors on the full model.

k represents the number of predictors on the reduced model.

From the obtained MINITAB outputs, the value of error sum of squares for full model is SSEFull=2.2642 and the value of error sum of squares for the reduced model is SSEReduced=2.7307.

The total number of observations is n=30.

Number of predictors on the full model is p=5 and the number of predictors on the reduced model is k=4.

Degrees of freedom of F-statistic for reduced model:

In a reduced multiple linear regression analysis, the F-statistic is f=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)].

In the ratio, the numerator is obtained by dividing the quantity SSEReduced−SSEFull by its degrees of freedom, p−k. The denominator is obtained by dividing the error sum of squares of full model by the error degrees of freedom, n−(p+1).

Thus, the degrees of freedom for the F-statistic in a reduced multiple regression analysis are p−k and n−(p+1).

Hence, the numerator degrees of freedom is p−k=5−4=1 and the denominator degrees of freedom is n−(p+1)=30−6=24.

Test statistic under null hypothesis:

Under the null hypothesis, the test statistic is obtained as follows:

f=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]=(2.7307−2.6462)5−42.6462[30−(5+1)]=0.08450.11026=0.76638

Thus, the test statistic is F=0.76638_

Since, the level of significance is not specified. The prior level of significance α=0.05 can be used.

P-value:

Software procedure:

Choose Graph > Probability Distribution Plot choose View Probability > OK.
From Distribution, choose F, enter 1 in numerator df and 24 in denominator df.
Click the Shaded Area tab.
Choose X-Value and Right Tail for the region of the curve to shade.
Enter the X-value as 0.76638.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 3

From the output, the P- value is 0.39.

Thus, the P- value is 0.39.

Decision criteria based on P-value approach:

If P-value≤α, then reject the null hypothesis H0.

If P-value>α, then fail to reject the null hypothesis H0.

Conclusion:

The P-value is 0.39 and α value is 0.05.

Here, P-value is greater than the α value.

That is 0.39(=P)>0.05(=α).

By the rejection rule, fail to reject the null hypothesis.

Hence, there is sufficient evidence to conclude that the dropped predictor variable is not significant to predict the response variable y.

Thus, the reduced model is useful than the full model to predict the response variable y.

c.

Expert Solution

To determine

Plot the residuals versus fitted line plot for the reduced model.

Check whether the model is appropriate.

Answer to Problem 5SE

Residual plot:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 4

Yes, the model seems to be appropriate.

Explanation of Solution

Calculation:

Residual plot:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X1 and X2*X2.
In Graphs, Under Residuals for plots, select Regular.
Under Residual plots select box Residuals versus fits.
Click OK.

Conditions for the appropriateness of regression model using the residual plot:

The plot of the residuals vs. fitted values should fall roughly in a horizontal band contended and symmetric about x-axis. That is, the residuals of the data should not represent any bend.
The plot of residuals should not contain any outliers.
The residuals have to be scattered randomly around “0” with constant variability among for all the residuals. That is, the spread should be consistent.

Interpretation:

In residual plot there is high bend or pattern, which can violate the straight line condition and there is change in the spread of the residuals from one part to another part of the plot.

However, it is difficult to determine about the violation of the assumptions without the data.

Thus, the model seems to be appropriate.

d.

Expert Solution

To determine

Check whether the model with only two dependent variables x2,x22 is adequate.

Answer to Problem 5SE

No, the model with only two dependent variables x2,x22 is not adequate.

Explanation of Solution

Calculation:

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X2 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 5

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^=9.960−0.1325x2+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^=9.960−0.1325x2+0.001674x22_.

The full model is,

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22

The reduced model is,

y^=9.960−0.1325x2+0.001674x22_

The test hypotheses are given below:

Null hypothesis:

H0:β1=β3=β4

That is, the dropped predictors of the full model are not significant to predict y.

Alternative hypothesis:

H1: At least one β′s≠0

That is, at least one of the dropped predictors of the full model are significant to predict y.

Test statistic:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]

Where,

SSEFull represents the sum of squares due to error obtained from the full model.

SSEReduced represents the sum of squares due to error obtained from the reduced model.

n represents the total number of observations.

p represents the number of predictors on the full model.

k represents the number of predictors on the reduced model.

From the obtained MINITAB outputs, the value of error sum of squares for full model is SSEFull=2.2642 and the value of error sum of squares for the reduced model is SSEReduced=2.7307.

The total number of observations is n=30.

Number of predictors on the full model is p=5 and the number of predictors on the reduced model is k=2.

Degrees of freedom of F-statistic for reduced model:

In a reduced multiple linear regression analysis, the F-statistic is F=M(SSEReduced−SSEFull)MSSEFull.

In the ratio, the numerator is obtained by dividing the quantity SSEReduced−SSEFull by its degrees of freedom, p−k. The denominator is obtained by dividing the error sum of squares of full model by the error degrees of freedom, n−(p+1).

Thus, the degrees of freedom for the F-statistic in a reduced multiple regression analysis are p−k and n−(p+1).

Hence, the numerator degrees of freedom is p−k=5−2=3 and the denominator degrees of freedom is n−(p+1)=30−6=24.

Test statistic under null hypothesis:

Under the null hypothesis, the test statistic is obtained as follows:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]=(7.840−2.6462)5−22.6462[30−(5+1)]=1.731270.11026=15.702

Thus, the test statistic is F=15.702_.

Since, the level of significance is not specified. The prior level of significance α=0.05 can be used.

P-value:

Software procedure:

Choose Graph > Probability Distribution Plot choose View Probability > OK.
From Distribution, choose F, enter 3 in numerator df and 24 in denominator df.
Click the Shaded Area tab.
Choose X-Value and Right Tail for the region of the curve to shade.
Enter the X-value as 15.702.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 6

From the output, the P- value is 7.3×10−6.

Thus, the P- value is 7.3×10−6_.

Decision criteria based on P-value approach:

If P-value≤α, then reject the null hypothesis H0.

If P-value>α, then fail to reject the null hypothesis H0.

Conclusion:

The P-value is 7.3×10−6_ and α value is 0.05.

Here, P-value is less than the α value.

That is 7.3×10−6_(=P)<0.05(=α).

By the rejection rule, reject the null hypothesis.

Hence, there is sufficient evidence to conclude that at least one of the dropped predictors of the full model are significant to predict y.

Thus, the model with only two dependent variables x2,x22 is not adequate.

e.

Expert Solution

To determine

Find the two models with the highest R2 value.

Obtain the values of mallows Cp and adjusted R2 for each model.

Answer to Problem 5SE

The two models with the highest R2 are:

First model with X1,X2,X12,X22 predictors and the second model with X1,X2,X22 predictors.

The values of M Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

Explanation of Solution

Calculation:

Coefficient of multiple determination R2:

The coefficient of multiple determination, R2, is given by:

R2=1−SSESST, where SST and SSE are the total sum of squares and error sum of squares respectively.

The subset with larger R2 is considered to be best subset for prediction.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > Regression> Best subsets.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 7

For the one predictor case, the highest value of R2 is 61.5, corresponding to X2.

For the two predictor case, the highest value of R2 is 76.9, corresponding to X2,X22.

For the three predictor case, the highest value of R2 is 90.3, corresponding to X1,X2,X22.

For the four predictor case, the highest value of R2 is 92.0, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of R2 is 90.6.

The value of R2 is the highest for predictors X1,X2,X12,X22. However, the subset with highest value of R2 is considered to be best subset for prediction.

Thus, depending upon the factors affecting the analysis it would be most preferable to use the regression equation corresponding to the predictors X1,X2,X12,X22.

The second highest value of R2 is 90.6 for the five predictor case and there is not much difference in the value of R2 for the full model and the model with X1,X2,X22 predictors.

That is, 90.6 and 90.3 are not much distinct.

Therefore, the model with X1,X2,X22 predictors is the second best model.

Thus, the two best models are:

First model with X1,X2,X12,X22 predictors and the second model with X1,X2,X22 predictors.

From the accompanying MINITAB output, the values of Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

f.

Expert Solution

To determine

Select the variables for the model, using the Mallows’ Cp criterion and adjusted-R2 criterion.

Check whether both the models are same.

Answer to Problem 5SE

The variables for the model using the Mallows’ Cp criterion are X1,X3 and X4.

The variables for the model using the adjusted-R2 criterion is X1,X2,X3,X4.

Yes, both the models are same.

Explanation of Solution

Mallows’ Cp:

An important utility of the Mallows’ Cp criterion is to compare between regression equations of subsets having different sizes, all taken from the same all-subsets regression.

Mallows’ Cp criterion is given as:

Cp=SSEsubsetMSEall−(n−2p), where SSEsubset denotes the error sum of squares of the current model and MSEall denotes the error mean square for the set of all potential predictors, n is the sample size and p=k+1, with k being the number of predictors.

The predictor with the lowest value of Cp or the value of Cp closest to p is chosen to predict the response variable.

From part (e), the values of Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

For the one predictor case, the lowest value of Cp is 92.5, corresponding to X2.

For the two predictor case, the lowest value of Cp is 47.1, corresponding to X2,X22.

For the three predictor case, the lowest value of Cp is 7.9, corresponding to X1,X2,X22.

For the four predictor case, the lowest value of Cp is 4.8, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of Cp is 6.0.

The value of Cp is the lowest for predictors X1,X2,X12,X22. However, the subset with lowest value of Cp is considered to be best subset for prediction.

Thus, depending upon the factors affecting the analysis it would be most preferable to use the regression equation corresponding to the predictors X1,X2,X12,X22.

Hence, the variables for the model using the Mallows’ Cp criterion are X1,X2,X12,X22.

Adjusted R2 or Ra2:

An important utility of the adjusted coefficient of multiple determination or Ra2 is to find the best subset of the predictors, that can predict the response variable. The best subset may be a smaller subset of all the predictors and need not necessarily be a larger subset, as long as it predicts the response variable accurately. The subset with larger Ra2 is considered to be best subset for prediction.

The adjusted coefficient of multiple determination, Ra2, is given by:

Ra2=1−SSEn−(k+1)SSTn−1.

For the one predictor case, the highest value of Ra2 is 60.1, corresponding to X2.

For the two predictor case, the highest value of Ra2 is 75.2, corresponding to X2,X22.

For the three predictor case, the highest value of Ra2 is 89.2, corresponding to X1,X2,X22.

For the four predictor case, the highest value of Ra2 is 90.7, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of R2 is 90.6.

The value of adjusted R2 is the highest for predictors X1,X2,X12,X22. However, the subset with highest value of adjusted R2 is considered to be best subset for prediction.

Thus, provided other factors do not affect the analysis it could be most preferable to use the regression equation corresponding to the predictors, X1,X2,X12,X22.

Hence, the variables for the model using the adjusted-R2 criterion is X1,X2,X12,X22.

Both Mallows’ Cp and adjusted-R2 suggests that the best model contains the predictor variables X1,X2,X12,X22.

Want to see more full solutions like this?

Subscribe now to access step-by-step solutions to millions of textbook problems written by subject matter experts!

Answer 5

Textbook Question

Answer 6

Textbook Question

Answer 7

a.

Expert Solution

To determine

Construct a multiple linear regression model with neighbor as the dependent variable, speed, pause, speed×pause, speed2 and pause2 as the independent variables for the given data.

Answer to Problem 5SE

A multiple linear regression model for the given data is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22_.

Explanation of Solution

Calculation:

The data represents the values of the variables number of neighbors, average speed and pause time for a simulation of 30 mobile network computers.

Multiple linear regression model:

A multiple linear regression model is given as yi=β0+β1x1i+...+βkxki+εi where yi is the response variable, and x1i,x2i,...,xki are the k predictor variables. The quantities β0,β1,...,βk are the slopes corresponding to x1i,x2i,...,xki respectively.β^0 is the estimated intercept of the line, from the sample data.

Let x1,x2 be speed and pause. The response variable is y=neighbors.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 1

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22_.

b.

Expert Solution

To determine

Construct a reduced model by dropping the variables with large P- values.

Check whether the reduced model is plausible or not.

Answer to Problem 5SE

A multiple linear regression model for the given data is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22_.

Yes, there is enough evidence to conclude that the reduced model is plausible.

Explanation of Solution

Calculation:

From part (a), it can be seen that the ‘P’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

By observing the P- values of the MINITAB output, it is clear that the largest P-value is 0.390 corresponding to the predictor variable x1x2. Remaining all P- values are reasonable.

Now, the new regression has to be fitted after dropping the predictor variable x1x2.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 2

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22_.

The full model is,

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22

The reduced model is,

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22

The test hypotheses are given below:

Null hypothesis:

H0:β4=0

That is, the dropped predictor of the full model is not significant to predict y.

Alternative hypothesis:

H1:β4≠0

That is, the dropped predictor of the full model is significant to predict y.

Test statistic:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]

Where,

SSEFull represents the sum of squares due to error obtained from the full model.

SSEReduced represents the sum of squares due to error obtained from the reduced model.

n represents the total number of observations.

p represents the number of predictors on the full model.

k represents the number of predictors on the reduced model.

From the obtained MINITAB outputs, the value of error sum of squares for full model is SSEFull=2.2642 and the value of error sum of squares for the reduced model is SSEReduced=2.7307.

The total number of observations is n=30.

Number of predictors on the full model is p=5 and the number of predictors on the reduced model is k=4.

Degrees of freedom of F-statistic for reduced model:

In a reduced multiple linear regression analysis, the F-statistic is f=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)].

In the ratio, the numerator is obtained by dividing the quantity SSEReduced−SSEFull by its degrees of freedom, p−k. The denominator is obtained by dividing the error sum of squares of full model by the error degrees of freedom, n−(p+1).

Thus, the degrees of freedom for the F-statistic in a reduced multiple regression analysis are p−k and n−(p+1).

Hence, the numerator degrees of freedom is p−k=5−4=1 and the denominator degrees of freedom is n−(p+1)=30−6=24.

Test statistic under null hypothesis:

Under the null hypothesis, the test statistic is obtained as follows:

f=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]=(2.7307−2.6462)5−42.6462[30−(5+1)]=0.08450.11026=0.76638

Thus, the test statistic is F=0.76638_

Since, the level of significance is not specified. The prior level of significance α=0.05 can be used.

P-value:

Software procedure:

Choose Graph > Probability Distribution Plot choose View Probability > OK.
From Distribution, choose F, enter 1 in numerator df and 24 in denominator df.
Click the Shaded Area tab.
Choose X-Value and Right Tail for the region of the curve to shade.
Enter the X-value as 0.76638.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 3

From the output, the P- value is 0.39.

Thus, the P- value is 0.39.

Decision criteria based on P-value approach:

If P-value≤α, then reject the null hypothesis H0.

If P-value>α, then fail to reject the null hypothesis H0.

Conclusion:

The P-value is 0.39 and α value is 0.05.

Here, P-value is greater than the α value.

That is 0.39(=P)>0.05(=α).

By the rejection rule, fail to reject the null hypothesis.

Hence, there is sufficient evidence to conclude that the dropped predictor variable is not significant to predict the response variable y.

Thus, the reduced model is useful than the full model to predict the response variable y.

c.

Expert Solution

To determine

Plot the residuals versus fitted line plot for the reduced model.

Check whether the model is appropriate.

Answer to Problem 5SE

Residual plot:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 4

Yes, the model seems to be appropriate.

Explanation of Solution

Calculation:

Residual plot:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X1 and X2*X2.
In Graphs, Under Residuals for plots, select Regular.
Under Residual plots select box Residuals versus fits.
Click OK.

Conditions for the appropriateness of regression model using the residual plot:

The plot of the residuals vs. fitted values should fall roughly in a horizontal band contended and symmetric about x-axis. That is, the residuals of the data should not represent any bend.
The plot of residuals should not contain any outliers.
The residuals have to be scattered randomly around “0” with constant variability among for all the residuals. That is, the spread should be consistent.

Interpretation:

In residual plot there is high bend or pattern, which can violate the straight line condition and there is change in the spread of the residuals from one part to another part of the plot.

However, it is difficult to determine about the violation of the assumptions without the data.

Thus, the model seems to be appropriate.

d.

Expert Solution

To determine

Check whether the model with only two dependent variables x2,x22 is adequate.

Answer to Problem 5SE

No, the model with only two dependent variables x2,x22 is not adequate.

Explanation of Solution

Calculation:

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X2 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 5

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^=9.960−0.1325x2+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^=9.960−0.1325x2+0.001674x22_.

The full model is,

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22

The reduced model is,

y^=9.960−0.1325x2+0.001674x22_

The test hypotheses are given below:

Null hypothesis:

H0:β1=β3=β4

That is, the dropped predictors of the full model are not significant to predict y.

Alternative hypothesis:

H1: At least one β′s≠0

That is, at least one of the dropped predictors of the full model are significant to predict y.

Test statistic:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]

Where,

SSEFull represents the sum of squares due to error obtained from the full model.

SSEReduced represents the sum of squares due to error obtained from the reduced model.

n represents the total number of observations.

p represents the number of predictors on the full model.

k represents the number of predictors on the reduced model.

From the obtained MINITAB outputs, the value of error sum of squares for full model is SSEFull=2.2642 and the value of error sum of squares for the reduced model is SSEReduced=2.7307.

The total number of observations is n=30.

Number of predictors on the full model is p=5 and the number of predictors on the reduced model is k=2.

Degrees of freedom of F-statistic for reduced model:

In a reduced multiple linear regression analysis, the F-statistic is F=M(SSEReduced−SSEFull)MSSEFull.

In the ratio, the numerator is obtained by dividing the quantity SSEReduced−SSEFull by its degrees of freedom, p−k. The denominator is obtained by dividing the error sum of squares of full model by the error degrees of freedom, n−(p+1).

Thus, the degrees of freedom for the F-statistic in a reduced multiple regression analysis are p−k and n−(p+1).

Hence, the numerator degrees of freedom is p−k=5−2=3 and the denominator degrees of freedom is n−(p+1)=30−6=24.

Test statistic under null hypothesis:

Under the null hypothesis, the test statistic is obtained as follows:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]=(7.840−2.6462)5−22.6462[30−(5+1)]=1.731270.11026=15.702

Thus, the test statistic is F=15.702_.

Since, the level of significance is not specified. The prior level of significance α=0.05 can be used.

P-value:

Software procedure:

Choose Graph > Probability Distribution Plot choose View Probability > OK.
From Distribution, choose F, enter 3 in numerator df and 24 in denominator df.
Click the Shaded Area tab.
Choose X-Value and Right Tail for the region of the curve to shade.
Enter the X-value as 15.702.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 6

From the output, the P- value is 7.3×10−6.

Thus, the P- value is 7.3×10−6_.

Decision criteria based on P-value approach:

If P-value≤α, then reject the null hypothesis H0.

If P-value>α, then fail to reject the null hypothesis H0.

Conclusion:

The P-value is 7.3×10−6_ and α value is 0.05.

Here, P-value is less than the α value.

That is 7.3×10−6_(=P)<0.05(=α).

By the rejection rule, reject the null hypothesis.

Hence, there is sufficient evidence to conclude that at least one of the dropped predictors of the full model are significant to predict y.

Thus, the model with only two dependent variables x2,x22 is not adequate.

e.

Expert Solution

To determine

Find the two models with the highest R2 value.

Obtain the values of mallows Cp and adjusted R2 for each model.

Answer to Problem 5SE

The two models with the highest R2 are:

First model with X1,X2,X12,X22 predictors and the second model with X1,X2,X22 predictors.

The values of M Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

Explanation of Solution

Calculation:

Coefficient of multiple determination R2:

The coefficient of multiple determination, R2, is given by:

R2=1−SSESST, where SST and SSE are the total sum of squares and error sum of squares respectively.

The subset with larger R2 is considered to be best subset for prediction.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > Regression> Best subsets.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 7

For the one predictor case, the highest value of R2 is 61.5, corresponding to X2.

For the two predictor case, the highest value of R2 is 76.9, corresponding to X2,X22.

For the three predictor case, the highest value of R2 is 90.3, corresponding to X1,X2,X22.

For the four predictor case, the highest value of R2 is 92.0, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of R2 is 90.6.

The value of R2 is the highest for predictors X1,X2,X12,X22. However, the subset with highest value of R2 is considered to be best subset for prediction.

Thus, depending upon the factors affecting the analysis it would be most preferable to use the regression equation corresponding to the predictors X1,X2,X12,X22.

The second highest value of R2 is 90.6 for the five predictor case and there is not much difference in the value of R2 for the full model and the model with X1,X2,X22 predictors.

That is, 90.6 and 90.3 are not much distinct.

Therefore, the model with X1,X2,X22 predictors is the second best model.

Thus, the two best models are:

First model with X1,X2,X12,X22 predictors and the second model with X1,X2,X22 predictors.

From the accompanying MINITAB output, the values of Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

f.

Expert Solution

To determine

Select the variables for the model, using the Mallows’ Cp criterion and adjusted-R2 criterion.

Check whether both the models are same.

Answer to Problem 5SE

The variables for the model using the Mallows’ Cp criterion are X1,X3 and X4.

The variables for the model using the adjusted-R2 criterion is X1,X2,X3,X4.

Yes, both the models are same.

Explanation of Solution

Mallows’ Cp:

An important utility of the Mallows’ Cp criterion is to compare between regression equations of subsets having different sizes, all taken from the same all-subsets regression.

Mallows’ Cp criterion is given as:

Cp=SSEsubsetMSEall−(n−2p), where SSEsubset denotes the error sum of squares of the current model and MSEall denotes the error mean square for the set of all potential predictors, n is the sample size and p=k+1, with k being the number of predictors.

The predictor with the lowest value of Cp or the value of Cp closest to p is chosen to predict the response variable.

From part (e), the values of Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

For the one predictor case, the lowest value of Cp is 92.5, corresponding to X2.

For the two predictor case, the lowest value of Cp is 47.1, corresponding to X2,X22.

For the three predictor case, the lowest value of Cp is 7.9, corresponding to X1,X2,X22.

For the four predictor case, the lowest value of Cp is 4.8, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of Cp is 6.0.

The value of Cp is the lowest for predictors X1,X2,X12,X22. However, the subset with lowest value of Cp is considered to be best subset for prediction.

Thus, depending upon the factors affecting the analysis it would be most preferable to use the regression equation corresponding to the predictors X1,X2,X12,X22.

Hence, the variables for the model using the Mallows’ Cp criterion are X1,X2,X12,X22.

Adjusted R2 or Ra2:

An important utility of the adjusted coefficient of multiple determination or Ra2 is to find the best subset of the predictors, that can predict the response variable. The best subset may be a smaller subset of all the predictors and need not necessarily be a larger subset, as long as it predicts the response variable accurately. The subset with larger Ra2 is considered to be best subset for prediction.

The adjusted coefficient of multiple determination, Ra2, is given by:

Ra2=1−SSEn−(k+1)SSTn−1.

For the one predictor case, the highest value of Ra2 is 60.1, corresponding to X2.

For the two predictor case, the highest value of Ra2 is 75.2, corresponding to X2,X22.

For the three predictor case, the highest value of Ra2 is 89.2, corresponding to X1,X2,X22.

For the four predictor case, the highest value of Ra2 is 90.7, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of R2 is 90.6.

The value of adjusted R2 is the highest for predictors X1,X2,X12,X22. However, the subset with highest value of adjusted R2 is considered to be best subset for prediction.

Thus, provided other factors do not affect the analysis it could be most preferable to use the regression equation corresponding to the predictors, X1,X2,X12,X22.

Hence, the variables for the model using the adjusted-R2 criterion is X1,X2,X12,X22.

Both Mallows’ Cp and adjusted-R2 suggests that the best model contains the predictor variables X1,X2,X12,X22.

Answer 8

a.

Expert Solution

To determine

Construct a multiple linear regression model with neighbor as the dependent variable, speed, pause, speed×pause, speed2 and pause2 as the independent variables for the given data.

Answer to Problem 5SE

A multiple linear regression model for the given data is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22_.

Explanation of Solution

Calculation:

The data represents the values of the variables number of neighbors, average speed and pause time for a simulation of 30 mobile network computers.

Multiple linear regression model:

A multiple linear regression model is given as yi=β0+β1x1i+...+βkxki+εi where yi is the response variable, and x1i,x2i,...,xki are the k predictor variables. The quantities β0,β1,...,βk are the slopes corresponding to x1i,x2i,...,xki respectively.β^0 is the estimated intercept of the line, from the sample data.

Let x1,x2 be speed and pause. The response variable is y=neighbors.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 1

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22_.

Answer 9

a.

Expert Solution

Answer 10

a.

Expert Solution

Answer 11

Expert Solution

Answer 12

To determine

Construct a multiple linear regression model with neighbor as the dependent variable, speed, pause, speed×pause, speed2 and pause2 as the independent variables for the given data.

Answer 13

Answer to Problem 5SE

A multiple linear regression model for the given data is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22_.

Answer 14

Explanation of Solution

Calculation:

The data represents the values of the variables number of neighbors, average speed and pause time for a simulation of 30 mobile network computers.

Multiple linear regression model:

A multiple linear regression model is given as yi=β0+β1x1i+...+βkxki+εi where yi is the response variable, and x1i,x2i,...,xki are the k predictor variables. The quantities β0,β1,...,βk are the slopes corresponding to x1i,x2i,...,xki respectively.β^0 is the estimated intercept of the line, from the sample data.

Let x1,x2 be speed and pause. The response variable is y=neighbors.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 1

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22_.

Answer 15

b.

Expert Solution

To determine

Construct a reduced model by dropping the variables with large P- values.

Check whether the reduced model is plausible or not.

Answer to Problem 5SE

A multiple linear regression model for the given data is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22_.

Yes, there is enough evidence to conclude that the reduced model is plausible.

Explanation of Solution

Calculation:

From part (a), it can be seen that the ‘P’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

By observing the P- values of the MINITAB output, it is clear that the largest P-value is 0.390 corresponding to the predictor variable x1x2. Remaining all P- values are reasonable.

Now, the new regression has to be fitted after dropping the predictor variable x1x2.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 2

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22_.

The full model is,

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22

The reduced model is,

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22

The test hypotheses are given below:

Null hypothesis:

H0:β4=0

That is, the dropped predictor of the full model is not significant to predict y.

Alternative hypothesis:

H1:β4≠0

That is, the dropped predictor of the full model is significant to predict y.

Test statistic:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]

Where,

SSEFull represents the sum of squares due to error obtained from the full model.

SSEReduced represents the sum of squares due to error obtained from the reduced model.

n represents the total number of observations.

p represents the number of predictors on the full model.

k represents the number of predictors on the reduced model.

From the obtained MINITAB outputs, the value of error sum of squares for full model is SSEFull=2.2642 and the value of error sum of squares for the reduced model is SSEReduced=2.7307.

The total number of observations is n=30.

Number of predictors on the full model is p=5 and the number of predictors on the reduced model is k=4.

Degrees of freedom of F-statistic for reduced model:

In a reduced multiple linear regression analysis, the F-statistic is f=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)].

In the ratio, the numerator is obtained by dividing the quantity SSEReduced−SSEFull by its degrees of freedom, p−k. The denominator is obtained by dividing the error sum of squares of full model by the error degrees of freedom, n−(p+1).

Thus, the degrees of freedom for the F-statistic in a reduced multiple regression analysis are p−k and n−(p+1).

Hence, the numerator degrees of freedom is p−k=5−4=1 and the denominator degrees of freedom is n−(p+1)=30−6=24.

Test statistic under null hypothesis:

Under the null hypothesis, the test statistic is obtained as follows:

f=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]=(2.7307−2.6462)5−42.6462[30−(5+1)]=0.08450.11026=0.76638

Thus, the test statistic is F=0.76638_

Since, the level of significance is not specified. The prior level of significance α=0.05 can be used.

P-value:

Software procedure:

Choose Graph > Probability Distribution Plot choose View Probability > OK.
From Distribution, choose F, enter 1 in numerator df and 24 in denominator df.
Click the Shaded Area tab.
Choose X-Value and Right Tail for the region of the curve to shade.
Enter the X-value as 0.76638.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 3

From the output, the P- value is 0.39.

Thus, the P- value is 0.39.

Decision criteria based on P-value approach:

If P-value≤α, then reject the null hypothesis H0.

If P-value>α, then fail to reject the null hypothesis H0.

Conclusion:

The P-value is 0.39 and α value is 0.05.

Here, P-value is greater than the α value.

That is 0.39(=P)>0.05(=α).

By the rejection rule, fail to reject the null hypothesis.

Hence, there is sufficient evidence to conclude that the dropped predictor variable is not significant to predict the response variable y.

Thus, the reduced model is useful than the full model to predict the response variable y.

Answer 16

b.

Expert Solution

Answer 17

b.

Expert Solution

Answer 18

Expert Solution

Answer 19

To determine

Construct a reduced model by dropping the variables with large P- values.

Check whether the reduced model is plausible or not.

Answer 20

Answer to Problem 5SE

A multiple linear regression model for the given data is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22_.

Yes, there is enough evidence to conclude that the reduced model is plausible.

Answer 21

Explanation of Solution

Calculation:

From part (a), it can be seen that the ‘P’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

By observing the P- values of the MINITAB output, it is clear that the largest P-value is 0.390 corresponding to the predictor variable x1x2. Remaining all P- values are reasonable.

Now, the new regression has to be fitted after dropping the predictor variable x1x2.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 2

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22_.

The full model is,

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22

The reduced model is,

y^=10.967−0.0799x1−0.1325x2+0.001110x12+0.001674x22

The test hypotheses are given below:

Null hypothesis:

H0:β4=0

That is, the dropped predictor of the full model is not significant to predict y.

Alternative hypothesis:

H1:β4≠0

That is, the dropped predictor of the full model is significant to predict y.

Test statistic:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]

Where,

SSEFull represents the sum of squares due to error obtained from the full model.

SSEReduced represents the sum of squares due to error obtained from the reduced model.

n represents the total number of observations.

p represents the number of predictors on the full model.

k represents the number of predictors on the reduced model.

From the obtained MINITAB outputs, the value of error sum of squares for full model is SSEFull=2.2642 and the value of error sum of squares for the reduced model is SSEReduced=2.7307.

The total number of observations is n=30.

Number of predictors on the full model is p=5 and the number of predictors on the reduced model is k=4.

Degrees of freedom of F-statistic for reduced model:

In a reduced multiple linear regression analysis, the F-statistic is f=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)].

In the ratio, the numerator is obtained by dividing the quantity SSEReduced−SSEFull by its degrees of freedom, p−k. The denominator is obtained by dividing the error sum of squares of full model by the error degrees of freedom, n−(p+1).

Thus, the degrees of freedom for the F-statistic in a reduced multiple regression analysis are p−k and n−(p+1).

Hence, the numerator degrees of freedom is p−k=5−4=1 and the denominator degrees of freedom is n−(p+1)=30−6=24.

Test statistic under null hypothesis:

Under the null hypothesis, the test statistic is obtained as follows:

f=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]=(2.7307−2.6462)5−42.6462[30−(5+1)]=0.08450.11026=0.76638

Thus, the test statistic is F=0.76638_

Since, the level of significance is not specified. The prior level of significance α=0.05 can be used.

P-value:

Software procedure:

Choose Graph > Probability Distribution Plot choose View Probability > OK.
From Distribution, choose F, enter 1 in numerator df and 24 in denominator df.
Click the Shaded Area tab.
Choose X-Value and Right Tail for the region of the curve to shade.
Enter the X-value as 0.76638.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 3

From the output, the P- value is 0.39.

Thus, the P- value is 0.39.

Decision criteria based on P-value approach:

If P-value≤α, then reject the null hypothesis H0.

If P-value>α, then fail to reject the null hypothesis H0.

Conclusion:

The P-value is 0.39 and α value is 0.05.

Here, P-value is greater than the α value.

That is 0.39(=P)>0.05(=α).

By the rejection rule, fail to reject the null hypothesis.

Hence, there is sufficient evidence to conclude that the dropped predictor variable is not significant to predict the response variable y.

Thus, the reduced model is useful than the full model to predict the response variable y.

Answer 22

c.

Expert Solution

To determine

Plot the residuals versus fitted line plot for the reduced model.

Check whether the model is appropriate.

Answer to Problem 5SE

Residual plot:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 4

Yes, the model seems to be appropriate.

Explanation of Solution

Calculation:

Residual plot:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X1 and X2*X2.
In Graphs, Under Residuals for plots, select Regular.
Under Residual plots select box Residuals versus fits.
Click OK.

Conditions for the appropriateness of regression model using the residual plot:

The plot of the residuals vs. fitted values should fall roughly in a horizontal band contended and symmetric about x-axis. That is, the residuals of the data should not represent any bend.
The plot of residuals should not contain any outliers.
The residuals have to be scattered randomly around “0” with constant variability among for all the residuals. That is, the spread should be consistent.

Interpretation:

In residual plot there is high bend or pattern, which can violate the straight line condition and there is change in the spread of the residuals from one part to another part of the plot.

However, it is difficult to determine about the violation of the assumptions without the data.

Thus, the model seems to be appropriate.

Answer 23

c.

Expert Solution

Answer 24

c.

Expert Solution

Answer 25

Expert Solution

Answer 26

To determine

Plot the residuals versus fitted line plot for the reduced model.

Check whether the model is appropriate.

Answer 27

Answer to Problem 5SE

Residual plot:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 4

Yes, the model seems to be appropriate.

Answer 28

Explanation of Solution

Calculation:

Residual plot:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X1 and X2*X2.
In Graphs, Under Residuals for plots, select Regular.
Under Residual plots select box Residuals versus fits.
Click OK.

Conditions for the appropriateness of regression model using the residual plot:

The plot of the residuals vs. fitted values should fall roughly in a horizontal band contended and symmetric about x-axis. That is, the residuals of the data should not represent any bend.
The plot of residuals should not contain any outliers.
The residuals have to be scattered randomly around “0” with constant variability among for all the residuals. That is, the spread should be consistent.

Interpretation:

In residual plot there is high bend or pattern, which can violate the straight line condition and there is change in the spread of the residuals from one part to another part of the plot.

However, it is difficult to determine about the violation of the assumptions without the data.

Thus, the model seems to be appropriate.

Answer 29

d.

Expert Solution

To determine

Check whether the model with only two dependent variables x2,x22 is adequate.

Answer to Problem 5SE

No, the model with only two dependent variables x2,x22 is not adequate.

Explanation of Solution

Calculation:

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X2 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 5

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^=9.960−0.1325x2+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^=9.960−0.1325x2+0.001674x22_.

The full model is,

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22

The reduced model is,

y^=9.960−0.1325x2+0.001674x22_

The test hypotheses are given below:

Null hypothesis:

H0:β1=β3=β4

That is, the dropped predictors of the full model are not significant to predict y.

Alternative hypothesis:

H1: At least one β′s≠0

That is, at least one of the dropped predictors of the full model are significant to predict y.

Test statistic:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]

Where,

SSEFull represents the sum of squares due to error obtained from the full model.

SSEReduced represents the sum of squares due to error obtained from the reduced model.

n represents the total number of observations.

p represents the number of predictors on the full model.

k represents the number of predictors on the reduced model.

From the obtained MINITAB outputs, the value of error sum of squares for full model is SSEFull=2.2642 and the value of error sum of squares for the reduced model is SSEReduced=2.7307.

The total number of observations is n=30.

Number of predictors on the full model is p=5 and the number of predictors on the reduced model is k=2.

Degrees of freedom of F-statistic for reduced model:

In a reduced multiple linear regression analysis, the F-statistic is F=M(SSEReduced−SSEFull)MSSEFull.

In the ratio, the numerator is obtained by dividing the quantity SSEReduced−SSEFull by its degrees of freedom, p−k. The denominator is obtained by dividing the error sum of squares of full model by the error degrees of freedom, n−(p+1).

Thus, the degrees of freedom for the F-statistic in a reduced multiple regression analysis are p−k and n−(p+1).

Hence, the numerator degrees of freedom is p−k=5−2=3 and the denominator degrees of freedom is n−(p+1)=30−6=24.

Test statistic under null hypothesis:

Under the null hypothesis, the test statistic is obtained as follows:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]=(7.840−2.6462)5−22.6462[30−(5+1)]=1.731270.11026=15.702

Thus, the test statistic is F=15.702_.

Since, the level of significance is not specified. The prior level of significance α=0.05 can be used.

P-value:

Software procedure:

Choose Graph > Probability Distribution Plot choose View Probability > OK.
From Distribution, choose F, enter 3 in numerator df and 24 in denominator df.
Click the Shaded Area tab.
Choose X-Value and Right Tail for the region of the curve to shade.
Enter the X-value as 15.702.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 6

From the output, the P- value is 7.3×10−6.

Thus, the P- value is 7.3×10−6_.

Decision criteria based on P-value approach:

If P-value≤α, then reject the null hypothesis H0.

If P-value>α, then fail to reject the null hypothesis H0.

Conclusion:

The P-value is 7.3×10−6_ and α value is 0.05.

Here, P-value is less than the α value.

That is 7.3×10−6_(=P)<0.05(=α).

By the rejection rule, reject the null hypothesis.

Hence, there is sufficient evidence to conclude that at least one of the dropped predictors of the full model are significant to predict y.

Thus, the model with only two dependent variables x2,x22 is not adequate.

Answer 30

d.

Expert Solution

Answer 31

d.

Expert Solution

Answer 32

Expert Solution

Answer 33

To determine

Check whether the model with only two dependent variables x2,x22 is adequate.

Answer 34

Answer to Problem 5SE

No, the model with only two dependent variables x2,x22 is not adequate.

Answer 35

Explanation of Solution

Calculation:

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > General Regression.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X2 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 5

The ‘Coefficient’ column of the regression analysis MINITAB output gives the slopes corresponding to the respective variables stored in the column ‘Term’.

A careful inspection of the output shows that the fitted model is:

y^=9.960−0.1325x2+0.001674x22.

Hence, the multiple linear regression model for the given data is:

y^=9.960−0.1325x2+0.001674x22_.

The full model is,

y^= 10.840−0.0739x1−0.1274x2+0.001110x12−0.000243x1x2+0.001674x22

The reduced model is,

y^=9.960−0.1325x2+0.001674x22_

The test hypotheses are given below:

Null hypothesis:

H0:β1=β3=β4

That is, the dropped predictors of the full model are not significant to predict y.

Alternative hypothesis:

H1: At least one β′s≠0

That is, at least one of the dropped predictors of the full model are significant to predict y.

Test statistic:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]

Where,

SSEFull represents the sum of squares due to error obtained from the full model.

SSEReduced represents the sum of squares due to error obtained from the reduced model.

n represents the total number of observations.

p represents the number of predictors on the full model.

k represents the number of predictors on the reduced model.

From the obtained MINITAB outputs, the value of error sum of squares for full model is SSEFull=2.2642 and the value of error sum of squares for the reduced model is SSEReduced=2.7307.

The total number of observations is n=30.

Number of predictors on the full model is p=5 and the number of predictors on the reduced model is k=2.

Degrees of freedom of F-statistic for reduced model:

In a reduced multiple linear regression analysis, the F-statistic is F=M(SSEReduced−SSEFull)MSSEFull.

In the ratio, the numerator is obtained by dividing the quantity SSEReduced−SSEFull by its degrees of freedom, p−k. The denominator is obtained by dividing the error sum of squares of full model by the error degrees of freedom, n−(p+1).

Thus, the degrees of freedom for the F-statistic in a reduced multiple regression analysis are p−k and n−(p+1).

Hence, the numerator degrees of freedom is p−k=5−2=3 and the denominator degrees of freedom is n−(p+1)=30−6=24.

Test statistic under null hypothesis:

Under the null hypothesis, the test statistic is obtained as follows:

F=(SSEReduced−SSEFull)p−kSSEFull[n−(p+1)]=(7.840−2.6462)5−22.6462[30−(5+1)]=1.731270.11026=15.702

Thus, the test statistic is F=15.702_.

Since, the level of significance is not specified. The prior level of significance α=0.05 can be used.

P-value:

Software procedure:

Choose Graph > Probability Distribution Plot choose View Probability > OK.
From Distribution, choose F, enter 3 in numerator df and 24 in denominator df.
Click the Shaded Area tab.
Choose X-Value and Right Tail for the region of the curve to shade.
Enter the X-value as 15.702.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 6

From the output, the P- value is 7.3×10−6.

Thus, the P- value is 7.3×10−6_.

Decision criteria based on P-value approach:

If P-value≤α, then reject the null hypothesis H0.

If P-value>α, then fail to reject the null hypothesis H0.

Conclusion:

The P-value is 7.3×10−6_ and α value is 0.05.

Here, P-value is less than the α value.

That is 7.3×10−6_(=P)<0.05(=α).

By the rejection rule, reject the null hypothesis.

Hence, there is sufficient evidence to conclude that at least one of the dropped predictors of the full model are significant to predict y.

Thus, the model with only two dependent variables x2,x22 is not adequate.

Answer 36

e.

Expert Solution

To determine

Find the two models with the highest R2 value.

Obtain the values of mallows Cp and adjusted R2 for each model.

Answer to Problem 5SE

The two models with the highest R2 are:

First model with X1,X2,X12,X22 predictors and the second model with X1,X2,X22 predictors.

The values of M Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

Explanation of Solution

Calculation:

Coefficient of multiple determination R2:

The coefficient of multiple determination, R2, is given by:

R2=1−SSESST, where SST and SSE are the total sum of squares and error sum of squares respectively.

The subset with larger R2 is considered to be best subset for prediction.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > Regression> Best subsets.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 7

For the one predictor case, the highest value of R2 is 61.5, corresponding to X2.

For the two predictor case, the highest value of R2 is 76.9, corresponding to X2,X22.

For the three predictor case, the highest value of R2 is 90.3, corresponding to X1,X2,X22.

For the four predictor case, the highest value of R2 is 92.0, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of R2 is 90.6.

The value of R2 is the highest for predictors X1,X2,X12,X22. However, the subset with highest value of R2 is considered to be best subset for prediction.

Thus, depending upon the factors affecting the analysis it would be most preferable to use the regression equation corresponding to the predictors X1,X2,X12,X22.

The second highest value of R2 is 90.6 for the five predictor case and there is not much difference in the value of R2 for the full model and the model with X1,X2,X22 predictors.

That is, 90.6 and 90.3 are not much distinct.

Therefore, the model with X1,X2,X22 predictors is the second best model.

Thus, the two best models are:

First model with X1,X2,X12,X22 predictors and the second model with X1,X2,X22 predictors.

From the accompanying MINITAB output, the values of Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

Answer 37

e.

Expert Solution

Answer 38

e.

Expert Solution

Answer 39

Expert Solution

Answer 40

To determine

Find the two models with the highest R2 value.

Obtain the values of mallows Cp and adjusted R2 for each model.

Answer 41

Answer to Problem 5SE

The two models with the highest R2 are:

First model with X1,X2,X12,X22 predictors and the second model with X1,X2,X22 predictors.

The values of M Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

Answer 42

Explanation of Solution

Calculation:

Coefficient of multiple determination R2:

The coefficient of multiple determination, R2, is given by:

R2=1−SSESST, where SST and SSE are the total sum of squares and error sum of squares respectively.

The subset with larger R2 is considered to be best subset for prediction.

Regression:

Software procedure:

Step by step procedure to obtain regression using MINITAB software is given as,

Choose Stat > Regression > Regression> Best subsets.
In Response, enter the numeric column containing the response data Y.
In Model, enter the numeric column containing the predictor variables X1, X2, X1*X2, X1*X1 and X2*X2.
Click OK.

Output obtained from MINITAB is given below:

Statistics for Engineers and Scientists, Chapter 8, Problem 5SE , additional homework tip 7

For the one predictor case, the highest value of R2 is 61.5, corresponding to X2.

For the two predictor case, the highest value of R2 is 76.9, corresponding to X2,X22.

For the three predictor case, the highest value of R2 is 90.3, corresponding to X1,X2,X22.

For the four predictor case, the highest value of R2 is 92.0, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of R2 is 90.6.

The value of R2 is the highest for predictors X1,X2,X12,X22. However, the subset with highest value of R2 is considered to be best subset for prediction.

Thus, depending upon the factors affecting the analysis it would be most preferable to use the regression equation corresponding to the predictors X1,X2,X12,X22.

The second highest value of R2 is 90.6 for the five predictor case and there is not much difference in the value of R2 for the full model and the model with X1,X2,X22 predictors.

That is, 90.6 and 90.3 are not much distinct.

Therefore, the model with X1,X2,X22 predictors is the second best model.

Thus, the two best models are:

First model with X1,X2,X12,X22 predictors and the second model with X1,X2,X22 predictors.

From the accompanying MINITAB output, the values of Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

Answer 43

f.

Expert Solution

To determine

Select the variables for the model, using the Mallows’ Cp criterion and adjusted-R2 criterion.

Check whether both the models are same.

Answer to Problem 5SE

The variables for the model using the Mallows’ Cp criterion are X1,X3 and X4.

The variables for the model using the adjusted-R2 criterion is X1,X2,X3,X4.

Yes, both the models are same.

Explanation of Solution

Mallows’ Cp:

An important utility of the Mallows’ Cp criterion is to compare between regression equations of subsets having different sizes, all taken from the same all-subsets regression.

Mallows’ Cp criterion is given as:

Cp=SSEsubsetMSEall−(n−2p), where SSEsubset denotes the error sum of squares of the current model and MSEall denotes the error mean square for the set of all potential predictors, n is the sample size and p=k+1, with k being the number of predictors.

The predictor with the lowest value of Cp or the value of Cp closest to p is chosen to predict the response variable.

From part (e), the values of Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

For the one predictor case, the lowest value of Cp is 92.5, corresponding to X2.

For the two predictor case, the lowest value of Cp is 47.1, corresponding to X2,X22.

For the three predictor case, the lowest value of Cp is 7.9, corresponding to X1,X2,X22.

For the four predictor case, the lowest value of Cp is 4.8, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of Cp is 6.0.

The value of Cp is the lowest for predictors X1,X2,X12,X22. However, the subset with lowest value of Cp is considered to be best subset for prediction.

Thus, depending upon the factors affecting the analysis it would be most preferable to use the regression equation corresponding to the predictors X1,X2,X12,X22.

Hence, the variables for the model using the Mallows’ Cp criterion are X1,X2,X12,X22.

Adjusted R2 or Ra2:

An important utility of the adjusted coefficient of multiple determination or Ra2 is to find the best subset of the predictors, that can predict the response variable. The best subset may be a smaller subset of all the predictors and need not necessarily be a larger subset, as long as it predicts the response variable accurately. The subset with larger Ra2 is considered to be best subset for prediction.

The adjusted coefficient of multiple determination, Ra2, is given by:

Ra2=1−SSEn−(k+1)SSTn−1.

For the one predictor case, the highest value of Ra2 is 60.1, corresponding to X2.

For the two predictor case, the highest value of Ra2 is 75.2, corresponding to X2,X22.

For the three predictor case, the highest value of Ra2 is 89.2, corresponding to X1,X2,X22.

For the four predictor case, the highest value of Ra2 is 90.7, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of R2 is 90.6.

The value of adjusted R2 is the highest for predictors X1,X2,X12,X22. However, the subset with highest value of adjusted R2 is considered to be best subset for prediction.

Thus, provided other factors do not affect the analysis it could be most preferable to use the regression equation corresponding to the predictors, X1,X2,X12,X22.

Hence, the variables for the model using the adjusted-R2 criterion is X1,X2,X12,X22.

Both Mallows’ Cp and adjusted-R2 suggests that the best model contains the predictor variables X1,X2,X12,X22.

Answer 44

f.

Expert Solution

Answer 45

f.

Expert Solution

Answer 46

Expert Solution

Answer 47

To determine

Select the variables for the model, using the Mallows’ Cp criterion and adjusted-R2 criterion.

Check whether both the models are same.

Answer 48

Answer to Problem 5SE

The variables for the model using the Mallows’ Cp criterion are X1,X3 and X4.

The variables for the model using the adjusted-R2 criterion is X1,X2,X3,X4.

Yes, both the models are same.

Answer 49

Explanation of Solution

Mallows’ Cp:

An important utility of the Mallows’ Cp criterion is to compare between regression equations of subsets having different sizes, all taken from the same all-subsets regression.

Mallows’ Cp criterion is given as:

Cp=SSEsubsetMSEall−(n−2p), where SSEsubset denotes the error sum of squares of the current model and MSEall denotes the error mean square for the set of all potential predictors, n is the sample size and p=k+1, with k being the number of predictors.

The predictor with the lowest value of Cp or the value of Cp closest to p is chosen to predict the response variable.

From part (e), the values of Mallows’ Cp and adjusted R2 for the various subsets are as follows:

Predictor variables	Mallows’ Cp	Adjusted R2
X2	92.5	60.1
X1X2	97	58.6
X2,X22	47.1	75.2
X1,X2	53.3	73
X1,X2,X22	7.9	89.2
X2,X1X2,X22	15.5	86.4
X1,X2,X12,X22	4.8	90.7
X1,X2,X1X2,X22	9.2	89
X1,X2,X1X2,X12,X22	6	90.6

For the one predictor case, the lowest value of Cp is 92.5, corresponding to X2.

For the two predictor case, the lowest value of Cp is 47.1, corresponding to X2,X22.

For the three predictor case, the lowest value of Cp is 7.9, corresponding to X1,X2,X22.

For the four predictor case, the lowest value of Cp is 4.8, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of Cp is 6.0.

The value of Cp is the lowest for predictors X1,X2,X12,X22. However, the subset with lowest value of Cp is considered to be best subset for prediction.

Thus, depending upon the factors affecting the analysis it would be most preferable to use the regression equation corresponding to the predictors X1,X2,X12,X22.

Hence, the variables for the model using the Mallows’ Cp criterion are X1,X2,X12,X22.

Adjusted R2 or Ra2:

An important utility of the adjusted coefficient of multiple determination or Ra2 is to find the best subset of the predictors, that can predict the response variable. The best subset may be a smaller subset of all the predictors and need not necessarily be a larger subset, as long as it predicts the response variable accurately. The subset with larger Ra2 is considered to be best subset for prediction.

The adjusted coefficient of multiple determination, Ra2, is given by:

Ra2=1−SSEn−(k+1)SSTn−1.

For the one predictor case, the highest value of Ra2 is 60.1, corresponding to X2.

For the two predictor case, the highest value of Ra2 is 75.2, corresponding to X2,X22.

For the three predictor case, the highest value of Ra2 is 89.2, corresponding to X1,X2,X22.

For the four predictor case, the highest value of Ra2 is 90.7, corresponding to X1,X2,X12,X22.

For the five predictor case, the value of R2 is 90.6.

The value of adjusted R2 is the highest for predictors X1,X2,X12,X22. However, the subset with highest value of adjusted R2 is considered to be best subset for prediction.

Thus, provided other factors do not affect the analysis it could be most preferable to use the regression equation corresponding to the predictors, X1,X2,X12,X22.