增强幼稚贝叶斯的判别力量

论文标题

增强幼稚贝叶斯的判别力量

Boosting the Discriminant Power of Naive Bayes

论文作者

Wang, Shihe, Ren, Jianfeng, Lian, Xiaoyu, Bai, Ruibin, Jiang, Xudong

论文摘要

幼稚的贝叶斯在许多应用中广泛使用，因为它具有简单性和处理数值数据和分类数据的能力。但是，缺乏特征之间相关性的建模会限制其性能。此外，现实世界数据集中的噪声和离群值也大大降低了分类性能。在本文中，我们提出了一种功能增强方法，该方法采用堆栈自动编码器来减少数据中的噪声并提高幼稚贝叶斯的判别能力。提出的堆栈自动编码器由两个用于不同目的的自动编码器组成。第一个编码器缩小了初始特征，以得出紧凑的特征表示形式，以消除噪声和冗余信息。第二个编码器通过将功能扩展到更高维度的空间中来增强特征的判别能力，从而使不同类别的样本可以在较高维度的空间中更好地分离。通过将提出的特征增强方法与正规化的幼稚贝叶斯集成，该模型的歧视能力得到了极大的增强。在一组机器学习基准数据集上评估了所提出的方法。实验结果表明，所提出的方法显着且一致地优于最先进的天真贝叶斯分类器。

Naive Bayes has been widely used in many applications because of its simplicity and ability in handling both numerical data and categorical data. However, lack of modeling of correlations between features limits its performance. In addition, noise and outliers in the real-world dataset also greatly degrade the classification performance. In this paper, we propose a feature augmentation method employing a stack auto-encoder to reduce the noise in the data and boost the discriminant power of naive Bayes. The proposed stack auto-encoder consists of two auto-encoders for different purposes. The first encoder shrinks the initial features to derive a compact feature representation in order to remove the noise and redundant information. The second encoder boosts the discriminant power of the features by expanding them into a higher-dimensional space so that different classes of samples could be better separated in the higher-dimensional space. By integrating the proposed feature augmentation method with the regularized naive Bayes, the discrimination power of the model is greatly enhanced. The proposed method is evaluated on a set of machine-learning benchmark datasets. The experimental results show that the proposed method significantly and consistently outperforms the state-of-the-art naive Bayes classifiers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题