Skip to content

Gayatri-Rout/House-Price-Prediction-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

House-Price-Prediction-Data-Analysis

Introduction

This report presents an exploratory data analysis (EDA) of a dataset related to house prices. The dataset was obtained from Kaggle.

Data Overview

The dataset consists of 1460 records and 81 features. These features include both numerical and categorical variables, such as lot frontage, house size, neighborhood, and more. Before proceeding with the analysis, missing values were handled, outliers were identified, and feature engineering techniques were applied to preprocess the data.

Missing Values

  • The dataset contains missing values in several features, with varying percentages of missingness ranging from less than 1% to almost 100%. image

  • The graphs showed some relationship between the missing values and sale price. Thus all these features are important for prediction and can't be dropped.

  • Missing values were imputed or replaced using appropriate techniques, such as filling with median values for numerical features and replacing with a new label for categorical features.

Numerical Variables

Discrete Variables

  • 17 discrete variables were identified, including features like the overall quality of the house, the number of rooms, and the number of bathrooms.
  • The relationship between discrete variables and the sale price was visualized using bar plots. image image

Continuous Variables

  • 16 continuous variables were identified, such as lot area, living area, and garage area.
  • The distribution of each continuous variable was visualized using histograms, and it is clear that the data is skewed. image
    • Logarithmic transformation was applied to transform skewed data to approximate normality. image
    • Outliers in continuous variables were identified using box plots to understand their impact on the data distribution. image

Temporal Variables

  • Year-related variables, such as the year the house was built or remodeled, were analyzed to understand their relationship with the sale price over time. Untitled
  • The graph shows that the price of the house decreases with time.
  • Scatter plots were used to compare the difference between each year-related variable and the year the house was sold, revealing potential trends or patterns. image

Categorical Variables

  • The dataset contains numerous categorical variables representing features like zoning, street type, and building type.
  • The relationship between categorical features and the sale price was visualized using bar plots. image

Feature Engineering

  • Rare categories in categorical features were identified and grouped into a single category to simplify the analysis.
  • Categorical features were converted into numerical representations using mean encoding to prepare the data for modeling.

Feature Scaling

  • Finally, feature scaling was performed to scale numerical features to a common range using MinMaxScaler, ensuring that all features contribute equally to the analysis and modeling process.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published