top of page

Machine Learning Decisions

Regression

Used for predicting continuous numeric variable, like housing prices.

​

-Random Forest Regressor

​

- Linear Regression

Classification

Used for predicting a category that an instance belongs in.

​

-Naive Bayes

​

-Support Vector Machine

Screen Shot 2018-12-14 at 3.52.19 PM.png

Random Forest

Above is the Random Forest using Mean Squared Error as the criteria.

Below is the Random Forest using Mean Absolute Error as the criteria.

The feature importances are in the images as well, depicting what features hold the most weight in the model training.

Screen Shot 2018-12-14 at 3.55.01 PM.png

Linear Regression

Screen Shot 2018-12-14 at 4.01.22 PM.png

Support Vector Machine

Top Half of Image

Complement Naive Bayes

Bottom Half of Image

Screen Shot 2018-12-14 at 4.06.46 PM.png

Conclusion

The Random Forest Regressor was the best model for predicting housing prices. The two criteria (MSE and MAE), altered the feature importances but did not have a huge affect on the overall accuracy of the model. Linear Regression performed poorly most likely because the data is not linear.

​

Classification Required the mapping of housing priced to categories. However, once mapped, the categories were extremely unbalanced. This caused poor performance by both classification models.

 

This data set was best suited for regression.

bottom of page