Feature Engineering
with scikit-learn
The data that has been displayed in the project Airbnb in Barcelona also included pricing data. While the data mining, cleaning and feature extraction is already displayed in the method section of that project, the features haven't been used so far in predictive modeling.
Every price of every night in every Airbnb listing in Barcelona over a time span of a whole year is a perfect test ground for building a machine learning model and experiment with feature engineering techniques.
Explored Concepets
Custom transformers, pipelines, grid searches for paramenter tuning, cross-validation, learning curves for overfitting detection, error analysis, decision trees, random forests, support vector machines, linear regression, ridge regression, extra trees regression, gradient boosting, ensemble learning (bagging), feature selection with hashing trick and variance threshold, feature engineering and creation, chi-squared tests, multi level perceptron...
Result
Through feature engineering and intelligent model combination, a decrease of test error by 53% could be achieved. This was mostly possible due to addressing the target feature's outlier problem and concentrating on ensemble methods, while steadily increasing the model's bias and decreasing the its variance.