摘要

Large-scale feature selection problems usually face two challenges: 1) Real labels are insufficient for guiding the algorithm to select features, and 2) a large-scale search space encumbers the search for a satisfactory high-quality solution. To this end, in this paper, a novel self-supervised data-driven particle swarm optimization algorithm is proposed for large-scale feature selection, including three contributions. First, a novel algorithmic framework named self-supervised data-driven feature selection is proposed, which can perform the feature selection without real labels. Second, a discrete region encoding-based search strategy is proposed, which helps the algorithm to find better solutions in a large-scale search space. Third, based on the above framework and method, a self-supervised data-driven particle swarm optimization algorithm is proposed to solve the large-scale feature selection problem. Experimental results on datasets with large-scale features show that the proposed algorithm performs comparably to the mainstream supervised al-gorithms and has higher feature selection efficiency than state-of-the-art unsupervised algorithms. ? 2023, Editorial Department of CAAI Transactions on Intelligent Systems.

全文