最大切割决策树：提高决策树的准确性和运行时间

论文标题

最大切割决策树：提高决策树的准确性和运行时间

The Max-Cut Decision Tree: Improving on the Accuracy and Running Time of Decision Trees

论文作者

Bodine, Jonathan, Hochbaum, Dorit S.

论文摘要

决策树本身和作为多种不同集合学习方法的构件是一种广泛使用的分类方法。最大切割的决策树涉及对标准的，分类决策树构造的基线模型的新修改，正是Cart Gini。一种修改涉及一个替代分配度量，最大切割，基于最大化属于单独类别的所有观测值和阈值分离的侧面之间的距离。另一个修改是从每个节点本地使用主组件分析（PCA）构建的输入特征的线性组合选择决策功能。我们的实验表明，这种基于节点的局部PCA具有新型的分裂修饰可以大大改善分类，同时与基线决策树相比，计算时间也大大减少了。此外，在对具有更高维度或更多类别的数据集进行评估时，我们的结果最为重要。对于数据集CIFAR-100，这使得准确性提高了49％，同时将CPU时间降低了94％。这些引入的修改极大地推动了决策树的能力，以实现困难的分类任务。

Decision trees are a widely used method for classification, both by themselves and as the building blocks of multiple different ensemble learning methods. The Max-Cut decision tree involves novel modifications to a standard, baseline model of classification decision tree construction, precisely CART Gini. One modification involves an alternative splitting metric, maximum cut, based on maximizing the distance between all pairs of observations belonging to separate classes and separate sides of the threshold value. The other modification is to select the decision feature from a linear combination of the input features constructed using Principal Component Analysis (PCA) locally at each node. Our experiments show that this node-based localized PCA with the novel splitting modification can dramatically improve classification, while also significantly decreasing computational time compared to the baseline decision tree. Moreover, our results are most significant when evaluated on data sets with higher dimensions, or more classes; which, for the example data set CIFAR-100, enable a 49% improvement in accuracy while reducing CPU time by 94%. These introduced modifications dramatically advance the capabilities of decision trees for difficult classification tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题