论文标题
合格:修剪神经网络中的缺失值
PROMISSING: Pruning Missing Values in Neural Networks
论文作者
论文摘要
尽管数据是机器学习模型的主要燃料,但它们通常会遭受缺失值的困扰,尤其是在现实世界中收集的情况下。但是,许多现成的机器学习模型,包括人工神经网络模型,无法直接处理这些缺失值。因此,在学习和预测过程之前,额外的数据预处理和策展步骤(例如数据插补)是不可避免的。在这项研究中,我们提出了一种简单,直观但有效的方法,用于在神经网络中的学习和推理步骤中修剪缺失值(合格)。在这种方法中,无需删除或算作缺失的值;取而代之的是,丢失的值被视为新的信息来源(表示我们不知道的内容)。我们在模拟数据,几个分类和回归基准以及多模式临床数据集的实验表明,与各种插补技术相比,张纸会导致相似的预测性能。此外,我们的实验表明,在面对许多未知数的不完整样本时,使用张接技术训练的模型在预测中变得越来越降低。希望这一发现可以将机器学习模型从纯粹的预测机器到更现实的思想家,这些思想家在面对不完整的信息来源时也可以说“我不知道”。
While data are the primary fuel for machine learning models, they often suffer from missing values, especially when collected in real-world scenarios. However, many off-the-shelf machine learning models, including artificial neural network models, are unable to handle these missing values directly. Therefore, extra data preprocessing and curation steps, such as data imputation, are inevitable before learning and prediction processes. In this study, we propose a simple and intuitive yet effective method for pruning missing values (PROMISSING) during learning and inference steps in neural networks. In this method, there is no need to remove or impute the missing values; instead, the missing values are treated as a new source of information (representing what we do not know). Our experiments on simulated data, several classification and regression benchmarks, and a multi-modal clinical dataset show that PROMISSING results in similar prediction performance compared to various imputation techniques. In addition, our experiments show models trained using PROMISSING techniques are becoming less decisive in their predictions when facing incomplete samples with many unknowns. This finding hopefully advances machine learning models from being pure predicting machines to more realistic thinkers that can also say "I do not know" when facing incomplete sources of information.