Projects

House Prices

House Prices is another Kaggle project. The goal is to create a model to predict the prices of residential homes in Ames, Iowa. The main challenge in this project is how to handle a dataset with a significant amount of features.

Colab links

EDA Colab: Click Here
Model Testing Colab: Click Here

About the data:

For this project, I didn't do a training dataset split because Kaggle already provides us with a test dataset. However, I need to submit the predictions to get the performance of the model.

EDA:

Even though there are 79 variables, the dataset is still small enough to be explored. I explore the data based on data categories:

Numerical features
- Area features
- Non-area features
Categorical features
- Nominal features
- Ordinal features
Date features

What I learned through this exploration is:

There are features correlated, (ex: Ground living area is correlated to first floor area, second floor area, lot area).
An area of zero means the house doesn't have that feature. Example: pool area of zero means the house does not have pool
In the dataset we can find unbalance categorical columns, which we can safely discard
Some scatter plots may not tell us the big picture because there are so many variables that can affect the result

About the models

Linear models perform slightly better than nonlinear models.
After testing with many models and doing hyperparameter tuning, I found that Ridge was the best model for this task

Note: You can find a more detailed explanation inside the notebooks.

Linear Regression

Regularization

Dimensionality Reduction