论文标题
使用机器学习来预测住房价格并分析芝加哥郊区的房地产市场
Predicting housing prices and analyzing real estate market in the Chicago suburbs using Machine Learning
论文作者
论文摘要
住房特性的定价取决于多种因素。但是,大流行市场在芝加哥郊区地区经历了波动,这些地区对房屋价格产生了很大的影响。在这项研究中,对Naperville/Bolingbrook房地产市场进行了分析,以通过机器学习模型根据这些住房属性来预测房地产价格,并评估此类模型在挥发性市场空间中的有效性。收集了一个房地产网站Redfin的数据,从2018年到2022年夏季的销售数据进行了研究。通过分析这些时间范围内的这些销售,我们还可以研究住房市场的状态并确定价格趋势。为了建模数据,使用的模型是线性回归,支持向量回归,决策树回归,随机森林回归和XGBoost回归。为了分析结果,对每个模型的MAE,RMSE和R平方值进行了比较。发现尽管大流行条件赞助了额外的波动性,但XGBoost模型在预测房价方面表现最好。建模后,使用沙普利值(SHAP)来评估构造模型中变量的权重。
The pricing of housing properties is determined by a variety of factors. However, post-pandemic markets have experienced volatility in the Chicago suburb area, which have affected house prices greatly. In this study, analysis was done on the Naperville/Bolingbrook real estate market to predict property prices based on these housing attributes through machine learning models, and to evaluate the effectiveness of such models in a volatile market space. Gathering data from Redfin, a real estate website, sales data from 2018 up until the summer season of 2022 were collected for research. By analyzing these sales in this range of time, we can also look at the state of the housing market and identify trends in price. For modeling the data, the models used were linear regression, support vector regression, decision tree regression, random forest regression, and XGBoost regression. To analyze results, comparison was made on the MAE, RMSE, and R-squared values for each model. It was found that the XGBoost model performs the best in predicting house prices despite the additional volatility sponsored by post-pandemic conditions. After modeling, Shapley Values (SHAP) were used to evaluate the weights of the variables in constructing models.