笔者介绍
曾担任零售企业项目负责人,负责企业数据化转型,数据化管理;曾服务中国移动,负责客服部门产品推荐模型组组长;现于某金融投资公司大数据中心,负责风控数据建设,风控建模工作。在除工作外,也喜欢在DC、DF、天池、Kaggle参加一些比赛。机器学习方面,有一定经验,愿与各位分享我的所见所闻所想,与各位共同进步。
背景
本节是上一节的补充,长话短说正式开始。
工具:python
数据:sklearn boston数据集
回归对比
加载包
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston # 数据集
import warnings
warnings.filterwarnings('ignore')
获取数据
boston = load_boston()
data = boston['data'] # 自变量
tag = boston['target'] # 因变量
train = pd.DataFrame(data,columns = boston['feature_names'])
train.head() # 查看前5条
切分数据集
# 按照7/3分,切分训练集和测试集
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(train,tag,test_size=0.3)
Lasso回归
训练模型
from sklearn.linear_model import Lasso,LassoCV
lasso_model = Lasso(alpha=0)
lasso_model.fit(X_train,y_train)
lasso_model.intercept_
常数项
34.96418799738776
系数项
lasso_model.coef_
array([-1.19919178e-01, 5.72987550e-02, -5.72944321e-04, 3.10733953e+00,
-1.54141456e+01, 3.34103546e+00, 6.82197295e-03, -1.60447015e+00,
2.87849366e-01, -1.12510494e-02, -8.29993838e-01, 1.23529187e-02,
-5.54979548e-01])
R2得分
lasso_model.score(X_test,y_test)
0.7581409138375204
预测
predict_lasso = lasso_model.predict(X_test)
岭回归
训练模型
from sklearn.linear_model import Ridge,RidgeCV
Ridge_model = RidgeCV([0.1, 1.0, 10.0])
Ridge_model.fit(X_train,y_train)
常数项
Ridge_model.intercept_
36.10541844785573
系数项
Ridge_model.coef_
array([-1.20871429e-01, 5.71935816e-02, 4.56938008e-03, 3.23235488e+00,
-1.70457702e+01, 3.34022188e+00, 7.95820534e-03, -1.63737973e+00,
2.88983931e-01, -1.09687243e-02, -8.50817455e-01, 1.22566062e-02,
-5.51318511e-01])
R2得分
Ridge_model.score(X_test,y_test)
0.7578494354278809
预测
predict_Ridge = Ridge_model.predict(X_test)
线性回归
训练模型
from sklearn.linear_model import LinearRegression
reg_model = LinearRegression()
reg_model.fit(X_train,y_train)
常数项
reg_model.intercept_
37.31595301702461
系数项
reg_model.coef_
array([-1.21569065e-01, 5.71104425e-02, 1.05894959e-02, 3.24822330e+00,
-1.85287339e+01, 3.31408142e+00, 9.40444349e-03, -1.66164006e+00,
2.90832366e-01, -1.07740018e-02, -8.71234048e-01, 1.21634944e-02,
-5.49911287e-01])
R2得分
reg_model.score(X_test,y_test)
0.7575118129530991
预测
predict_reg = reg_model.predict(X_test)
从R2结果看,此次预测岭回归和Lasso回归的效果略好于线性回归,但是差距不明显。抽取20各点查看,拟合结果在真实值附近,三个模型的拟合结果差距不大。
线性回归结果:
Lasso回归结果:
岭回归结果:
閱讀更多 Ai機器學習 的文章