MOELIGA: a multi-objective evolutionary approach for feature selection with local improvement
Summary
MOELIGA is a novel multi-objective genetic algorithm designed for feature selection in machine learning. It incorporates an evolutionary local improvement strategy, where subordinate populations refine feature subsets, and employs a crowding-based fitness sharing mechanism for diversity. The algorithm optimizes three objectives: Unweighted Average Recall (UAR) for classification performance, a sigmoid-transformed cardinality ratio for feature subset size, and a geometry-based metric for classifier independence. Experimental results across 14 diverse datasets, including synthetic, real-world, handwritten digit, and cancer-type classification tasks, demonstrate MOELIGA's ability to identify smaller feature subsets while maintaining superior or comparable classification performance against 11 state-of-the-art methods. This approach effectively balances accuracy and dimensionality in complex, high-dimensional scenarios.
Key takeaway
For research scientists developing machine learning models in high-dimensional domains, MOELIGA offers a robust feature selection approach. You should consider integrating multi-objective evolutionary algorithms with local improvement strategies to simultaneously optimize classification performance, model compactness, and classifier independence, potentially yielding more generalizable and interpretable models with fewer features.
Key insights
MOELIGA refines feature selection using multi-objective genetic algorithms with local improvement and diverse fitness sharing.
Principles
- Multi-objective optimization balances competing goals.
- Local search refines globally identified solutions.
- Diversity mechanisms prevent premature convergence.
Method
MOELIGA uses a multi-objective genetic algorithm with subordinate populations for local improvement, optimizing Unweighted Average Recall, sigmoid-transformed feature count, and a geometric classifier independence metric, while employing crowding-based fitness sharing.
In practice
- Use Unweighted Average Recall for imbalanced datasets.
- Apply sigmoid transformation to emphasize smaller feature subsets.
- Employ nearest neighbor distances for classifier-independent feature evaluation.
Topics
- Feature Selection
- Multi-objective Optimization
- Genetic Algorithms
- Evolutionary Local Improvement
- Classifier Independence
Code references
Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.NE updates on arXiv.org.