Machine Learning Decisions
Regression
Used for predicting continuous numeric variable, like housing prices.
​
-Random Forest Regressor
​
- Linear Regression
Classification
Used for predicting a category that an instance belongs in.
​
-Naive Bayes
​
-Support Vector Machine
Random Forest
Above is the Random Forest using Mean Squared Error as the criteria.
Below is the Random Forest using Mean Absolute Error as the criteria.
The feature importances are in the images as well, depicting what features hold the most weight in the model training.
Linear Regression
Support Vector Machine
Top Half of Image
Complement Naive Bayes
Bottom Half of Image
Conclusion
The Random Forest Regressor was the best model for predicting housing prices. The two criteria (MSE and MAE), altered the feature importances but did not have a huge affect on the overall accuracy of the model. Linear Regression performed poorly most likely because the data is not linear.
​
Classification Required the mapping of housing priced to categories. However, once mapped, the categories were extremely unbalanced. This caused poor performance by both classification models.
This data set was best suited for regression.