摘要

In this paper we propose a novel approach of feature ranking for feature selection. This method is particularly useful for applications handling high dimensional datasets such as machine learning, pattern recognition and signal processing. This process is also applicable to small and medium sized datasets to identify significant features or attributes for a particular domain using the information contained in the dataset alone and hence the method preserves the meaning of the existing features. With the help of the proposed method, redundant attributes can be removed efficiently without sacrificing the classification performance. In this approach, after eliminating the outlier data elements from the dataset, features are ranked to identify the predominant features of the dataset. The discernibility matrix in RST is used as a tool to discover the data dependencies existing between various features and features are ranked based on these data dependencies. A method using Centre of Gravity (CoG) line is suggested to determine this discrimination frequency within a reduced computational effort. To evaluate the performance of the algorithm, we applied the proposed algorithm on a test dataset consisting of 3000 offline handwritten samples of 10 Tamil characters. The outcome of the experiment shows that the new method is efficient and effective for dimensionality reduction.

全文