论文标题
使用混合随机自适应批次尺寸算法的离散选择模型的估计
Estimation of discrete choice models with hybrid stochastic adaptive batch size algorithms
论文作者
论文摘要
大数据的出现使离散选择社区中的新研究观点。尽管已经确定了在大量数据上估算机器学习模型的技术,但尚未完全探索这些技术,以估算基于随机实用程序框架的统计离散选择模型。在本文中,我们提供了在离散选择模型的背景下处理大型数据集的新方法。我们通过提出新的有效的随机优化算法并与现有方法一起对其进行广泛测试来实现这一目标。我们基于三个主要贡献开发这些算法:随机Hessian的使用,批处理大小的修改以及优化算法的更改,具体取决于批处理大小。在十个基准离散选择模型案例中进行了十五种优化算法的全面实验比较。结果表明,HAMABS算法是一种混合自适应批次大小随机方法,是优化基准的最佳性能算法。与实践中使用的现有算法相比,该算法在最大模型上的优化时间增加了23倍。离散选择模型估算软件中新算法的集成将大大减少模型估计所需的时间,从而使研究人员和从业人员能够探索选择模型规范的新方法。
The emergence of Big Data has enabled new research perspectives in the discrete choice community. While the techniques to estimate Machine Learning models on a massive amount of data are well established, these have not yet been fully explored for the estimation of statistical Discrete Choice Models based on the random utility framework. In this article, we provide new ways of dealing with large datasets in the context of Discrete Choice Models. We achieve this by proposing new efficient stochastic optimization algorithms and extensively testing them alongside existing approaches. We develop these algorithms based on three main contributions: the use of a stochastic Hessian, the modification of the batch size, and a change of optimization algorithm depending on the batch size. A comprehensive experimental comparison of fifteen optimization algorithms is conducted across ten benchmark Discrete Choice Model cases. The results indicate that the HAMABS algorithm, a hybrid adaptive batch size stochastic method, is the best performing algorithm across the optimization benchmarks. This algorithm speeds up the optimization time by a factor of 23 on the largest model compared to existing algorithms used in practice. The integration of the new algorithms in Discrete Choice Models estimation software will significantly reduce the time required for model estimation and therefore enable researchers and practitioners to explore new approaches for the specification of choice models.