Module1Assignment - Copy
.docx
keyboard_arrow_up
School
Northeastern University *
*We aren’t endorsed by this school
Course
6015
Subject
Mathematics
Date
Apr 3, 2024
Type
docx
Pages
20
Uploaded by PresidentToadPerson1018 on coursehero.com
Module 1 Assignment
College of Professional Studies, Northeastern University
ALY6015, 21626
Harpreet Sharma
January 15
th
, 2023
1
Table of Contents
Introduction
................................................................................................................................
3
Analysis
......................................................................................................................................
3
Figure 1
...................................................................................................................................
3
Histogram of Sale Price
.........................................................................................................
3
Figure 2
...................................................................................................................................
4
Descriptive Statistics of Sale Price
.........................................................................................
4
Figure 3
...................................................................................................................................
4
Correlation Matrix Plot of Subset Data
.................................................................................
4
Figure 4
...................................................................................................................................
5
Scatterplot for the Variable with the Highest correlation
......................................................
5
Figure 5
...................................................................................................................................
6
Scatterplot for the Variable with the lowest correlation
.........................................................
6
Figure 6
...................................................................................................................................
7
Scatterplot for the variable with correlation closest to 0.5
....................................................
7
Figure 7
...................................................................................................................................
9
Diagnostic Plot
.......................................................................................................................
9
Figure 8
.................................................................................................................................
11
All Subset Regression Plot
....................................................................................................
11
Conclusion/Interpretations
.......................................................................................................
13
References
................................................................................................................................
14
Appendices
...............................................................................................................................
15
2
Introduction This report details an analysis focused on exploring and modeling the Ames housing dataset
comprising 2930 records and 82 variables, with the primary goal of predicting sale prices. The central question guiding the exploration is understanding the determinants of housing prices. To do this, a methodical strategy is used, which includes importing and preparing the dataset, performing thorough Exploratory Data Analysis (EDA), and applying predictive modeling tools. Analysis
Commencing with an exploration of the dataset,
an in-depth analysis was carried out using visualizations (see Appendix A) and descriptive statistics (see Appendix B). This process involved uncovering patterns, understanding distributions, and discerning correlations, particularly about the sale price. Subsequently, numeric variables were extracted from the original dataset (see Appendix C), and further preparation was undertaken by imputing missing values using the mean values of each respective variable (see Appendix D).
Figure 1
Histogram of Sale Price
3
Figure 1 shows a histogram illustrating the distribution of sale prices with a noticeable positive skew, suggesting that the data is skewed to the right. Figure 2
Descriptive Statistics of Sale Price
Figure 2 details descriptive statistics for the sale price of 2,930 properties revealing an
average sale price of $180,796.1, with a median of $160,000. The standard deviation of $79,886.69 indicates notable variability around the mean. Sale prices range from $12,789 to $755,000, reflecting diverse property values.
Correlation analysis was performed on the subset dataset to compute the correlation matrix, and a visual representation of the matrix was generated (see Appendix E).
4
Figure 3
Correlation Matrix Plot of Subset Data
The resulting correlation matrix plot in Figure 3, visually represents the strength and direction of linear relationships between the numeric variables. Darker colors indicate stronger correlations. The legend on the right side of the correlation matrix indicates the strength of correlations with 1 indicating a perfect positive correlation, -1 indicating a perfect negative correlation, and 0 indicating that there is no relationship between the different variables (
Correlation Analysis Different Types of Plots in R | R-Bloggers
, 2021).
Following that, scatterplots were generated for variables exhibiting the highest and lowest correlation with Sale Price, along with the variable demonstrating a correlation closest
to 0.5 (refer to Appendix F).
5
Figure 4
Scatterplot for the Variable with the Highest correlation The scatter plot for the variable with the highest correlation (above-ground living area) with Sale Price reveals a positive linear relationship. As the above-ground living area increases, there is a corresponding increase in sale price.
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help