论文标题
将物理知识纳入行星空间物理的机器学习
Incorporating Physical Knowledge into Machine Learning for Planetary Space Physics
论文作者
论文摘要
行星和太空物理任务的数据收集量的最新提高允许应用新颖的数据科学技术。例如,Cassini任务从2004年到2017年收集了超过600 GB的科学数据。这代表了土星系统的数据激增。机器学习可以帮助科学家在此更大范围内处理数据。与机器学习的许多应用不同,行星空间物理应用中的主要用途是推断系统本身的行为。这引起了三个问题:首先,机器学习模型的性能,其次是对回答科学问题的可解释应用的需求,第三,航天器数据的特征如何改变这些应用程序。与这些关注点相比,黑匣子或不可解释的机器学习方法的使用倾向于评估绩效,或者不忽略基本的物理过程,或者不太频繁地为其提供误导性的解释。我们采取了以前的努力,应用了土星磁层中基于半监督物理学的血浆不稳定性分类。然后,我们使用以前的工作与其他机器学习分类器相比,具有不同的数据尺寸访问和物理信息访问。我们表明,结合这些轨道航天器数据特征的知识可以提高机器学习方法的性能和解释性,这对于得出科学意义至关重要。在这些发现的基础上,我们提出了一个框架,该框架将物理知识纳入针对行星环境中空间物理数据的半监督分类的机器学习问题。这些发现提出了将物理知识纳入太空物理学和行星任务数据分析以进行科学发现的前进道路。
Recent improvements in data collection volume from planetary and space physics missions have allowed the application of novel data science techniques. The Cassini mission for example collected over 600 gigabytes of scientific data from 2004 to 2017. This represents a surge of data on the Saturn system. Machine learning can help scientists work with data on this larger scale. Unlike many applications of machine learning, a primary use in planetary space physics applications is to infer behavior about the system itself. This raises three concerns: first, the performance of the machine learning model, second, the need for interpretable applications to answer scientific questions, and third, how characteristics of spacecraft data change these applications. In comparison to these concerns, uses of black box or un-interpretable machine learning methods tend toward evaluations of performance only either ignoring the underlying physical process or, less often, providing misleading explanations for it. We build off a previous effort applying a semi-supervised physics-based classification of plasma instabilities in Saturn's magnetosphere. We then use this previous effort in comparison to other machine learning classifiers with varying data size access, and physical information access. We show that incorporating knowledge of these orbiting spacecraft data characteristics improves the performance and interpretability of machine learning methods, which is essential for deriving scientific meaning. Building on these findings, we present a framework on incorporating physics knowledge into machine learning problems targeting semi-supervised classification for space physics data in planetary environments. These findings present a path forward for incorporating physical knowledge into space physics and planetary mission data analyses for scientific discovery.