论文标题
顺序数据的分布回归
Distribution Regression for Sequential Data
论文作者
论文摘要
分发回归是指监督学习问题,其中标签仅用于投入组而不是单个输入。在本文中,我们为分布回归开发了一个严格的数学框架,其中输入是复杂的数据流。利用预期签名的属性和最新的签名内核技巧,用于从随机分析中进行顺序数据,我们介绍了两种新的学习技术,一种基于特征的基于特征,另一个基于内核。每个数据流的数量和单个流的维度都适合不同的数据制度。我们提供了有关两种方法普遍性的理论结果,并凭经验证明了它们的鲁棒性,可对不规则采样多元时间表进行稳健性,从而在热力学,数学金融和农业科学中实现合成和现实世界实例的最新性能。
Distribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In this paper, we develop a rigorous mathematical framework for distribution regression where inputs are complex data streams. Leveraging properties of the expected signature and a recent signature kernel trick for sequential data from stochastic analysis, we introduce two new learning techniques, one feature-based and the other kernel-based. Each is suited to a different data regime in terms of the number of data streams and the dimensionality of the individual streams. We provide theoretical results on the universality of both approaches and demonstrate empirically their robustness to irregularly sampled multivariate time-series, achieving state-of-the-art performance on both synthetic and real-world examples from thermodynamics, mathematical finance and agricultural science.