xgboost的调优-Toy模板网

这篇具有很好参考价值的文章主要介绍了xgboost的调优。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

我想要给大家说明的是：不要幻想仅仅通过参数调优或者换一个稍微更好的模型使得最终结果有巨大的飞跃。要想最后的结果有巨大的提升，可以通过特征工程、模型集成来实现。文章来源地址https://www.toymoban.com/news/detail-432404.html

基本的

model2 = xgb.XGBRegressor(max_depth=6,learning_rate=0.05,n_estimators=100,min_child_weight=1,randam_state=42)

max_depth :每棵二叉树的最大深度,默认是6; 值越大,越容易过拟合,越小,容易欠拟合

learning_rate: 学习率

n_estumators: 基学习器个数

min_child_weight:默认值为1,。值越大，越容易欠拟合；值越小，越容易过拟合（值较大时，避免模型学习到局部的特殊样本）

gamma：系统默认为0,我们也常用0。在节点分裂时，只有分裂后损失函数的值下降了，才会分裂这个节点。gamma指定了节点分裂所需的最小损失函数下降值。这个参数的值越大，算法越保守。因为gamma值越大的时候，损失函数下降更多才可以分裂节点。所以树生成的时候更不容易分裂节点。范围: [0,∞]

subsample [default=1]:样本的采样率，如果设置成0.5，那么Xgboost会随机选择一般的样本作为训练集。

colsample_bytree [default=1]: 构造每棵树时，列采样率（一般是特征采样率）。

alpha [default=0, alias: reg_alpha]: L1正则化（与lasso回归中的正则化类似：传送门）这个主要是用在数据维度很高的情况下，可以提高运行速度。

调整max_depth 和 min_child_weight

使用网格搜索

GridSearchCV:

estimator: 分类器

param_grid: 参数值

cv: 交叉验证参数，默认None，使用三折交叉验证。指定fold数量，默认为3，也可以是yield训练/测试数据的生成器。

njobs: 线程为4

verbose:10输出进度

jupter 在n_jobs=4 不知为啥不输出进度设置成1 即可输出

from sklearn.model_selection import GridSearchCV
param_test1 = {
'max_depth':list(range(3,10,2)),
'min_child_weight':list(range(1,6,2))
}
gsearch1 = GridSearchCV(

estimator = XGBClassifier( learning_rate =0.1, n_estimators=20, max_depth=5,min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8,
objective= 'binary:logistic', nthread=4, scale_pos_weight=1, seed=27), 

param_grid = param_test1,

# scoring='roc_auc',

n_jobs=1,

verbose=10,

cv=5)

gsearch1.fit(train[predictors],train[target])

gsearch1.best_params_

输出为: {'max_depth': 3, 'min_child_weight': 5}

因此在 max_depth: 2,3,4. 及 min_child_weight:4,5,6 中搜索

进一步搜索

param_test2 = {
'max_depth':list(range(2,4,1)),
'min_child_weight':list(range(4,6,1))
}
gsearch2 = GridSearchCV(

estimator =xgb.XGBRegressor(max_depth=6,learning_rate=0.05,n_estimators=100,randam_state=42),
param_grid = param_test2,

n_jobs=1,
verbose=10,
cv=5)

gsearch2.fit(df6.drop(['label','cust_wid'], axis=1), df6['label'])

gsearch2.best_params_

搜索下来最优的依然是 {'max_depth': 3, 'min_child_weight': 5}

调整gamma

gamma 从 0 到0.5

param_test3 = {
'gamma':[i/10.0 for i in range(0,5)]
}
gsearch3 = GridSearchCV(

estimator =xgb.XGBRegressor(max_depth=3,learning_rate=0.05,min_child_weight=5,n_estimators=100,randam_state=42),
param_grid = param_test3,

n_jobs=1,
verbose=10,
cv=5)

gsearch3.fit(df6.drop(['label','cust_wid'], axis=1), df6['label'])

最优Gamma:0.4

调整subsample 和colsample_bytree

param_test4 = {
'subsample':[i/100.0 for i in range(75,100,5)],
'colsample_bytree':[i/100.0 for i in range(75,100,5)]
}
gsearch4 = GridSearchCV(

estimator =xgb.XGBRegressor(max_depth=3,learning_rate=0.05,min_child_weight=5,n_estimators=100,randam_state=42),
param_grid = param_test4,

n_jobs=1,
verbose=10,
cv=5)

gsearch4.fit(df6.drop(['label','cust_wid'], axis=1), df6['label'])

输出最优{'colsample_bytree': 0.85, 'subsample': 0.75}

调整正则化参数

param_test5 = {
'reg_alpha':[1e-5, 1e-2, 0.1, 1, 100]
}

gsearch5 = GridSearchCV(

estimator =xgb.XGBRegressor(max_depth=3,learning_rate=0.05,min_child_weight=5,n_estimators=100,randam_state=42,colsample_bytree=0.85, subsample=0.75),
param_grid = param_test5,

n_jobs=1,
verbose=10,
cv=5)

gsearch5.fit(df6.drop(['label','cust_wid'], axis=1), df6['label'])

输出{'reg_alpha': 100}

调整学习率

param_test6 = {
'learning_rate':[0.005,0.01, 0.05, 0.1,0.5,1]
}

gsearch6 = GridSearchCV(

estimator =xgb.XGBRegressor(max_depth=3,learning_rate=0.05,min_child_weight=5,n_estimators=100,randam_state=42,colsample_bytree=0.85, subsample=0.75),
param_grid = param_test6,

n_jobs=1,
verbose=10,
cv=5)

gsearch6.fit(df6.drop(['label','cust_wid'], axis=1), df6['label'])

学习率最优: {'learning_rate': 0.05}

得到最优的模型 -- 结果

最终结果提升了0.001 哈哈哈哈

名次从103到96上升了7名感觉还是这句话

我想要给大家说明的是：不要幻想仅仅通过参数调优或者换一个稍微更好的模型使得最终结果有巨大的飞跃。要想最后的结果有巨大的提升，可以通过特征工程、模型集成来实现。

pred_y=model2.predict(df7.drop(['label','cust_wid'], axis=1))
y_pred=pred_y.astype(int)
np.save("xgboost_best",y_pred)

到了这里，关于xgboost的调优的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！

xgboost的调优

基本的

调整max_depth 和 min_child_weight

调整gamma

调整subsample 和colsample_bytree

调整正则化参数

调整学习率

得到最优的模型 -- 结果

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

支付宝扫一扫领取红包，优惠每天领

二维码1

二维码2