Module1Assignment - Copy

.docx

School

Northeastern University *

*We aren’t endorsed by this school

Course

6015

Subject

Mathematics

Date

Apr 3, 2024

Type

docx

Pages

20

Uploaded by PresidentToadPerson1018 on coursehero.com

Module 1 Assignment College of Professional Studies, Northeastern University ALY6015, 21626 Harpreet Sharma January 15 th , 2023 1
Table of Contents Introduction ................................................................................................................................ 3 Analysis ...................................................................................................................................... 3 Figure 1 ................................................................................................................................... 3 Histogram of Sale Price ......................................................................................................... 3 Figure 2 ................................................................................................................................... 4 Descriptive Statistics of Sale Price ......................................................................................... 4 Figure 3 ................................................................................................................................... 4 Correlation Matrix Plot of Subset Data ................................................................................. 4 Figure 4 ................................................................................................................................... 5 Scatterplot for the Variable with the Highest correlation ...................................................... 5 Figure 5 ................................................................................................................................... 6 Scatterplot for the Variable with the lowest correlation ......................................................... 6 Figure 6 ................................................................................................................................... 7 Scatterplot for the variable with correlation closest to 0.5 .................................................... 7 Figure 7 ................................................................................................................................... 9 Diagnostic Plot ....................................................................................................................... 9 Figure 8 ................................................................................................................................. 11 All Subset Regression Plot .................................................................................................... 11 Conclusion/Interpretations ....................................................................................................... 13 References ................................................................................................................................ 14 Appendices ............................................................................................................................... 15 2
Introduction This report details an analysis focused on exploring and modeling the Ames housing dataset comprising 2930 records and 82 variables, with the primary goal of predicting sale prices. The central question guiding the exploration is understanding the determinants of housing prices. To do this, a methodical strategy is used, which includes importing and preparing the dataset, performing thorough Exploratory Data Analysis (EDA), and applying predictive modeling tools. Analysis Commencing with an exploration of the dataset, an in-depth analysis was carried out using visualizations (see Appendix A) and descriptive statistics (see Appendix B). This process involved uncovering patterns, understanding distributions, and discerning correlations, particularly about the sale price. Subsequently, numeric variables were extracted from the original dataset (see Appendix C), and further preparation was undertaken by imputing missing values using the mean values of each respective variable (see Appendix D). Figure 1 Histogram of Sale Price 3
Figure 1 shows a histogram illustrating the distribution of sale prices with a noticeable positive skew, suggesting that the data is skewed to the right. Figure 2 Descriptive Statistics of Sale Price Figure 2 details descriptive statistics for the sale price of 2,930 properties revealing an average sale price of $180,796.1, with a median of $160,000. The standard deviation of $79,886.69 indicates notable variability around the mean. Sale prices range from $12,789 to $755,000, reflecting diverse property values. Correlation analysis was performed on the subset dataset to compute the correlation matrix, and a visual representation of the matrix was generated (see Appendix E). 4
Figure 3 Correlation Matrix Plot of Subset Data The resulting correlation matrix plot in Figure 3, visually represents the strength and direction of linear relationships between the numeric variables. Darker colors indicate stronger correlations. The legend on the right side of the correlation matrix indicates the strength of correlations with 1 indicating a perfect positive correlation, -1 indicating a perfect negative correlation, and 0 indicating that there is no relationship between the different variables ( Correlation Analysis Different Types of Plots in R | R-Bloggers , 2021). Following that, scatterplots were generated for variables exhibiting the highest and lowest correlation with Sale Price, along with the variable demonstrating a correlation closest to 0.5 (refer to Appendix F). 5
Figure 4 Scatterplot for the Variable with the Highest correlation The scatter plot for the variable with the highest correlation (above-ground living area) with Sale Price reveals a positive linear relationship. As the above-ground living area increases, there is a corresponding increase in sale price. 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help