For every classifier, there are a lot of parameters to be optimized in Scikit-Learn to get an optimal accuracy for the classifier in the cross-validattion dataset. You could always try and fail and see which parameters work for best but it is time consuming(computation, storage and reporting) and cumbersome in general if you want to handle the parameter search by yourself. Consider these parameters an extra burden when you try to choose an optimal classifier out of multiple classifiers.
Scikit-learn provides a grid search with cross validation so that when you do the parameter search in GridSearchCV
, it automatically finds the best parameters tested on a cross-validation so you do not have to search the parameters in different folds.
One thing to note is that it may take a lot of time to do on a regular machine if you have a lot of parameters to try as the number of parameters will be a combinatorial of those parameters. So you may want to use a powerful and multicore machine to do the grid search in the parameter space.
%matplotlib inline
import matplotlib as mlp
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn
from sklearn import cross_validation
from sklearn import datasets
from sklearn import ensemble
from sklearn import grid_search
from sklearn import metrics
import time
plt.style.use('fivethirtyeight')
california_housing = datasets.california_housing.fetch_california_housing()
california_housing_data = california_housing['data']
california_housing_labels = california_housing['target']# 'target' variables
california_housing_feature_names = california_housing['feature_names']
X_train, X_test, y_train, y_test = cross_validation.train_test_split(california_housing_data,
california_housing_labels,
test_size=0.2,
random_state=0)
In order to create a grid search object, it is necessary to pass the estimator, parameters to optimize for that estimator, optionally you could pass n_job
in order to parallellize the parameter search. Grid search in parameter space is embarrasingly parallel by nature as different parameter combinations are independent from each other. You could also pass different scoring functions if you want to use another scoring function to optimize the best parameters for your classifier.
param_grid = {
'learning_rate': [0.1, 0.05, 0.01],
'max_depth': [4, 6],
'min_samples_leaf': [3, 9, 15],
'n_estimators': [1000, 2000, 3000],
}
est = ensemble.GradientBoostingRegressor()
start_time = time.time()
gs_cv = grid_search.GridSearchCV(est, param_grid, n_jobs=4).fit(X_train, y_train)
end_time = time.time()
print('It took {} seconds'.format(end_time - start_time))
# best hyperparameter setting
gs_cv.best_params_
It took 2920.44363379 seconds
{'learning_rate': 0.05, 'max_depth': 6, 'min_samples_leaf': 15, 'n_estimators': 2000}
If you want to get scores for each different parameter combination, grid_scores_
attribute of GridSearch object provides a nice way to provide all of the scores for each parameter combination.
gs_cv.grid_scores_
[mean: 0.83473, std: 0.00265, params: {'n_estimators': 1000, 'learning_rate': 0.1, 'max_depth': 4, 'min_samples_leaf': 3}, mean: 0.83579, std: 0.00259, params: {'n_estimators': 2000, 'learning_rate': 0.1, 'max_depth': 4, 'min_samples_leaf': 3}, mean: 0.83533, std: 0.00238, params: {'n_estimators': 3000, 'learning_rate': 0.1, 'max_depth': 4, 'min_samples_leaf': 3}, mean: 0.83513, std: 0.00268, params: {'n_estimators': 1000, 'learning_rate': 0.1, 'max_depth': 4, 'min_samples_leaf': 9}, mean: 0.83493, std: 0.00242, params: {'n_estimators': 2000, 'learning_rate': 0.1, 'max_depth': 4, 'min_samples_leaf': 9}, mean: 0.83368, std: 0.00249, params: {'n_estimators': 3000, 'learning_rate': 0.1, 'max_depth': 4, 'min_samples_leaf': 9}, mean: 0.83669, std: 0.00174, params: {'n_estimators': 1000, 'learning_rate': 0.1, 'max_depth': 4, 'min_samples_leaf': 15}, mean: 0.83715, std: 0.00267, params: {'n_estimators': 2000, 'learning_rate': 0.1, 'max_depth': 4, 'min_samples_leaf': 15}, mean: 0.83590, std: 0.00264, params: {'n_estimators': 3000, 'learning_rate': 0.1, 'max_depth': 4, 'min_samples_leaf': 15}, mean: 0.83683, std: 0.00123, params: {'n_estimators': 1000, 'learning_rate': 0.1, 'max_depth': 6, 'min_samples_leaf': 3}, mean: 0.83672, std: 0.00126, params: {'n_estimators': 2000, 'learning_rate': 0.1, 'max_depth': 6, 'min_samples_leaf': 3}, mean: 0.83642, std: 0.00138, params: {'n_estimators': 3000, 'learning_rate': 0.1, 'max_depth': 6, 'min_samples_leaf': 3}, mean: 0.83686, std: 0.00217, params: {'n_estimators': 1000, 'learning_rate': 0.1, 'max_depth': 6, 'min_samples_leaf': 9}, mean: 0.83616, std: 0.00187, params: {'n_estimators': 2000, 'learning_rate': 0.1, 'max_depth': 6, 'min_samples_leaf': 9}, mean: 0.83541, std: 0.00188, params: {'n_estimators': 3000, 'learning_rate': 0.1, 'max_depth': 6, 'min_samples_leaf': 9}, mean: 0.83755, std: 0.00155, params: {'n_estimators': 1000, 'learning_rate': 0.1, 'max_depth': 6, 'min_samples_leaf': 15}, mean: 0.83602, std: 0.00125, params: {'n_estimators': 2000, 'learning_rate': 0.1, 'max_depth': 6, 'min_samples_leaf': 15}, mean: 0.83504, std: 0.00146, params: {'n_estimators': 3000, 'learning_rate': 0.1, 'max_depth': 6, 'min_samples_leaf': 15}, mean: 0.83159, std: 0.00199, params: {'n_estimators': 1000, 'learning_rate': 0.05, 'max_depth': 4, 'min_samples_leaf': 3}, mean: 0.83444, std: 0.00241, params: {'n_estimators': 2000, 'learning_rate': 0.05, 'max_depth': 4, 'min_samples_leaf': 3}, mean: 0.83544, std: 0.00223, params: {'n_estimators': 3000, 'learning_rate': 0.05, 'max_depth': 4, 'min_samples_leaf': 3}, mean: 0.83204, std: 0.00182, params: {'n_estimators': 1000, 'learning_rate': 0.05, 'max_depth': 4, 'min_samples_leaf': 9}, mean: 0.83496, std: 0.00233, params: {'n_estimators': 2000, 'learning_rate': 0.05, 'max_depth': 4, 'min_samples_leaf': 9}, mean: 0.83548, std: 0.00208, params: {'n_estimators': 3000, 'learning_rate': 0.05, 'max_depth': 4, 'min_samples_leaf': 9}, mean: 0.83380, std: 0.00185, params: {'n_estimators': 1000, 'learning_rate': 0.05, 'max_depth': 4, 'min_samples_leaf': 15}, mean: 0.83745, std: 0.00270, params: {'n_estimators': 2000, 'learning_rate': 0.05, 'max_depth': 4, 'min_samples_leaf': 15}, mean: 0.83832, std: 0.00334, params: {'n_estimators': 3000, 'learning_rate': 0.05, 'max_depth': 4, 'min_samples_leaf': 15}, mean: 0.83698, std: 0.00283, params: {'n_estimators': 1000, 'learning_rate': 0.05, 'max_depth': 6, 'min_samples_leaf': 3}, mean: 0.83833, std: 0.00332, params: {'n_estimators': 2000, 'learning_rate': 0.05, 'max_depth': 6, 'min_samples_leaf': 3}, mean: 0.83799, std: 0.00338, params: {'n_estimators': 3000, 'learning_rate': 0.05, 'max_depth': 6, 'min_samples_leaf': 3}, mean: 0.83716, std: 0.00246, params: {'n_estimators': 1000, 'learning_rate': 0.05, 'max_depth': 6, 'min_samples_leaf': 9}, mean: 0.83759, std: 0.00199, params: {'n_estimators': 2000, 'learning_rate': 0.05, 'max_depth': 6, 'min_samples_leaf': 9}, mean: 0.83729, std: 0.00182, params: {'n_estimators': 3000, 'learning_rate': 0.05, 'max_depth': 6, 'min_samples_leaf': 9}, mean: 0.83903, std: 0.00159, params: {'n_estimators': 1000, 'learning_rate': 0.05, 'max_depth': 6, 'min_samples_leaf': 15}, mean: 0.83928, std: 0.00207, params: {'n_estimators': 2000, 'learning_rate': 0.05, 'max_depth': 6, 'min_samples_leaf': 15}, mean: 0.83862, std: 0.00207, params: {'n_estimators': 3000, 'learning_rate': 0.05, 'max_depth': 6, 'min_samples_leaf': 15}, mean: 0.80416, std: 0.00012, params: {'n_estimators': 1000, 'learning_rate': 0.01, 'max_depth': 4, 'min_samples_leaf': 3}, mean: 0.82043, std: 0.00121, params: {'n_estimators': 2000, 'learning_rate': 0.01, 'max_depth': 4, 'min_samples_leaf': 3}, mean: 0.82661, std: 0.00154, params: {'n_estimators': 3000, 'learning_rate': 0.01, 'max_depth': 4, 'min_samples_leaf': 3}, mean: 0.80566, std: 0.00118, params: {'n_estimators': 1000, 'learning_rate': 0.01, 'max_depth': 4, 'min_samples_leaf': 9}, mean: 0.82172, std: 0.00089, params: {'n_estimators': 2000, 'learning_rate': 0.01, 'max_depth': 4, 'min_samples_leaf': 9}, mean: 0.82785, std: 0.00182, params: {'n_estimators': 3000, 'learning_rate': 0.01, 'max_depth': 4, 'min_samples_leaf': 9}, mean: 0.80542, std: 0.00182, params: {'n_estimators': 1000, 'learning_rate': 0.01, 'max_depth': 4, 'min_samples_leaf': 15}, mean: 0.82148, std: 0.00052, params: {'n_estimators': 2000, 'learning_rate': 0.01, 'max_depth': 4, 'min_samples_leaf': 15}, mean: 0.82741, std: 0.00095, params: {'n_estimators': 3000, 'learning_rate': 0.01, 'max_depth': 4, 'min_samples_leaf': 15}, mean: 0.82472, std: 0.00189, params: {'n_estimators': 1000, 'learning_rate': 0.01, 'max_depth': 6, 'min_samples_leaf': 3}, mean: 0.83270, std: 0.00156, params: {'n_estimators': 2000, 'learning_rate': 0.01, 'max_depth': 6, 'min_samples_leaf': 3}, mean: 0.83506, std: 0.00197, params: {'n_estimators': 3000, 'learning_rate': 0.01, 'max_depth': 6, 'min_samples_leaf': 3}, mean: 0.82550, std: 0.00247, params: {'n_estimators': 1000, 'learning_rate': 0.01, 'max_depth': 6, 'min_samples_leaf': 9}, mean: 0.83370, std: 0.00171, params: {'n_estimators': 2000, 'learning_rate': 0.01, 'max_depth': 6, 'min_samples_leaf': 9}, mean: 0.83645, std: 0.00171, params: {'n_estimators': 3000, 'learning_rate': 0.01, 'max_depth': 6, 'min_samples_leaf': 9}, mean: 0.82603, std: 0.00174, params: {'n_estimators': 1000, 'learning_rate': 0.01, 'max_depth': 6, 'min_samples_leaf': 15}, mean: 0.83392, std: 0.00143, params: {'n_estimators': 2000, 'learning_rate': 0.01, 'max_depth': 6, 'min_samples_leaf': 15}, mean: 0.83648, std: 0.00186, params: {'n_estimators': 3000, 'learning_rate': 0.01, 'max_depth': 6, 'min_samples_leaf': 15}]
We could look at various metrics as well, passing scoring
optional parameter to GridSearchCV
.
mean_squared_error
score by providing scoring
parameter to GridSearchCV
.