House price prediction with gradient boosted trees under different loss functions,Journal of Property Research

当前位置： X-MOL 学术 › Journal of Property Research › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

House price prediction with gradient boosted trees under different loss functions
Journal of Property Research Pub Date : 2022-05-24 , DOI: 10.1080/09599916.2022.2070525
Anders Hjort _{1,

2} , Johan Pensar ₁ , Ida Scheel ₁ , Dag Einar Sommervoll _{2,

3}

Affiliation

ABSTRACT

Many banks and credit institutions are required to assess the value of dwellings in their mortgage portfolio. This valuation often relies on an Automated Valuation Model (AVM). Moreover, these institutions often report the models accuracy by two numbers: The fraction of predictions within $\pm 20 %$ and $\pm 10 %$ range from the true values. Until recently, AVMs tended to be hedonic regression models, but lately machine learning approaches like random forest and gradient boosted trees have been increasingly applied. Both the traditional approaches and the machine learning approaches rely on minimising mean squared prediction error, and not the number of predictions in the $\pm 20 %$ and $\pm 10 %$ range. We investigate whether introducing a loss function closer to the AVMs actual loss measure improves performance in machine learning approaches, specifically for a gradient boosted tree approach. This loss function yields an improvement from $89.4 %$ to $90.0 %$ of predictions within $\pm 20 %$ of the true value on a data set of $N = 126 719$ transactions from the Norwegian housing market between 2013 and 2015, with the biggest improvements in performance coming from the lower price segments. We also find that a weighted average of models with different loss functions improves performance further, yielding $90.4 %$ of the observations within $\pm 20 %$ of the true value.

中文翻译：

不同损失函数下梯度提升树的房价预测

摘要

许多银行和信贷机构都需要评估其抵押贷款组合中的住宅价值。这种估值通常依赖于自动估值模型 (AVM)。此外，这些机构通常用两个数字报告模型的准确性： $\pm 20 %$ 和 $\pm 10 %$ 范围从真实值。直到最近，AVM 往往是特征回归模型，但最近机器学习方法，如随机森林和梯度提升树已经越来越多地应用。传统方法和机器学习方法都依赖于最小化均方预测误差，而不是预测的数量 $\pm 20 %$ 和 $\pm 10 %$ 范围。我们研究了引入更接近 AVM 实际损失度量的损失函数是否可以提高机器学习方法的性能，特别是对于梯度提升树方法。这个损失函数产生了从 $89.4 %$ 至 $90.0 %$ 内的预测 $\pm 20 %$ 数据集上的真值 $ñ = 126 719$ 2013 年至 2015 年间来自挪威房地产市场的交易，其中表现的最大改善来自较低的价格段。我们还发现，具有不同损失函数的模型的加权平均值进一步提高了性能，产生 $90.4 %$ 内的观察 $\pm 20 %$ 的真实价值。

更新日期：2022-05-24

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>