In classification problems, supervised learning algorithms, are used to predict the class (or output variable) of an instance by observing its features’ (or input variables) values. Supervised learning algorithms train a prediction model over a dataset, in which different feature and class values of some past observations are provided, by understanding the relationship between the features and classes. Hence, the prediction model can be used to classify a new instance based on its features.
The classification performance of the learning algorithm depends on its ability to detect the relationship between input and output variables accurately. However, the presence of features that are irrelevant to the class, or the redundancy within the features may have a negative impact on the classification performance of the learning algorithm (Kohavi and John, 1997). Yu and Liu (2004) classify the features based on their relevance with respect to the output as strongly relevant, weakly relevant, and irrelevant. A feature is strongly relevant to class if its existence affects classification performance independently of the other features used, weakly relevant if it affects the classification performance depending on the other features used and irrelevant if the feature does not affect the classification performance at all. They argue that the optimal subset of features in terms of classification performance includes all strongly relevant, and weakly relevant and non-redundant features. Selecting a subset that comprises of strongly relevant, and weakly relevant and non-redundant features to be used in the prediction model of the learning algorithm (or classifier), instead of using them all, is called as feature selection problem.
Feature selection aims to improve the classification performance by eliminating irrelevant and redundant features. The decrease in the number of features to be used in the prediction model is also useful in terms of reducing storage requirements, improving the time efficiency, and simplifying the prediction model itself (Guyon, 2003). Therefore, feature selection methods are used in many areas, such as handwritten digit recognition, facial recognition, medical diagnosis, gene marker recognition etc.
Even though, reduction in the number of input variables seems to be a natural outcome of the feature selection procedure that aims at maximizing the classification performance, it is possible to consider minimizing the cardinality of subset as another objective. That is, one may be willing to reduce the number of variables beyond the number of variables in the subset that gives the best classification performance to enjoy the benefits of reducing cardinality. In that case, the problem is converted into a multi-objective problem. Depending on the scope of the problem other objectives can also be defined. For example, in a medical diagnosis application, minimizing the screening costs of medical tests that will give feature values or minimizing the health related risks involved in those tests for the patient could be set as objectives.
The algorithms developed for solving feature selection problem can be investigated in two dimensions. Firstly, since it is not straightforward to measure the impact of using a feature on classification performance, different strategies have been developed for subset selection; which are filter and wrapper approaches (Kohavi and John, 1997). Secondly, since the number of possible subsets grows exponentially with the number of available features, the feature selection problem is combinatorial in nature. Therefore, many optimization techniques are used to solve the feature selection problem, such as sequential backward selection, branch and bound, best- first search, and genetic algorithms (Kohavi and John, 1997).
In this study, we have implemented several variations of a preference-based evolutionary algorithm, iTDEA-fs, on the feature selection problem. As the results revealed special characteristics of the problem, we developed a new preference based evolutionary algorithm, iWREA-fs, that is compatible with those characteristics.
In addition to the traditional objectives defined for the feature selection problem in the literature, we set generic objectives that can be useful within different contexts of the problem. We defined the problem with representative objectives; however, in the presence of more objectives the algorithms can be used for more than four objectives.
The feature selection is used for many applications of classification problems. The DM of the problem can be different agencies or customers depending on the scope of the application area. For example, in health care, association of medical doctors, governmental agencies or patients can be the DM of the problem whose concerns are selecting a set of tests that provides accurate diagnosis while being cost-efficient and/or while minimizing health related risks involved in the tests. It may also be possible to select several meaningful subsets and then involve the patient in the final decision of which subset to use.
The results show that the interactions with the DM provide a higher convergence speed while finding solutions appealing to the DM in iWREA-fs. We believe employing the DM preferences is beneficial for solving feature selection problem in terms of bringing both flexibility on implementing the algorithm without dataset-specific parameters and ability that the final best solution for the DM is known at the end of algorithm. To the best of our knowledge, this is the first study that uses a preference-based approach and considers additional objectives together with the traditional ones for the feature selection problem.
Being a parameter-free and computationally efficient algorithm, iWREA-fs can be tested for other combinatorial problems as a preference-based evolutionary algorithm as a future work. We also intend to compare the performance of the algorithms with commonly used multi-objective evolutionary algorithms in feature selection problem.
References
Kohavi, R., & John, G. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1), 273-324.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157-1182.
Yu, L., & Liu, H., (2004). Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5,1205-1224.