A Hybrid Tsallis-Polarization Impurity Measure for Decision Trees: Theoretical Foundations and Empirical Evaluation

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

The Integrated Tsallis Combination (ITC) is a novel hybrid impurity measure for decision trees, combining normalized Tsallis entropy with an exponential polarization component to balance theoretical soundness and computational efficiency. While ITC offers rigorous mathematical properties, including proven concavity under specific parameter conditions (β ≤ 2), proper boundary conditions, and O(K) computational complexity, extensive empirical evaluation across seven benchmark datasets showed that simpler parametric measures like Tsallis with α=0.5 achieved the highest average accuracy (91.17%). ITC variants yielded competitive, though not superior, results (88.38–89.16%), and a Friedman test revealed no statistically significant global differences among the 23 impurity measures evaluated. The study concludes that ITC's primary value lies in its strong theoretical grounding and flexible parameterization, making it suitable for applications prioritizing interpretability and reliability, such as in regulated industries, despite simpler hybrids like Shannon–Polarization sometimes outperforming it empirically. An open-source implementation is provided to foster further research and reproducibility.

Key takeaway

A new hybrid impurity measure, Integrated Tsallis Combination (ITC), for decision trees combines normalized Tsallis entropy with an exponential polarization component, offering strong theoretical guarantees including concavity and $O(K)$ computational efficiency. While simple parametric measures (Tsallis $\alpha=0.5$) achieved the highest average accuracy (91.17%) in a 23-measure benchmark, ITC variants yielded competitive 88.38–89.16% accuracy with no statistically significant global difference among top performers. ITC's proven mathematical properties make it ideal for applications prioritizing interpretability and reliability, such as regulated industries, where theoretical soundness is paramount over marginal empirical gains.

Topics

Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.