br Research objectives br In this work
3. Research objectives
In this PSB 1115 work, we develop an efficient hybrid intelligent classification model for breast cancer diagnosis. There have two main advantages of our proposed model: the first is our study take fully account of the input feature dimensional, and design an effective feature selection method to select the optimum feature subset; the second is that this intelligent classification model can achieve the maximum classification accuracy of the breast cancer, and at the same time obtain the minimum misclassification cost. The main objectives of our proposed classification model are as follows:
1. Investigate the performance of IGSAGAW for feature selection. For this research objective, we compared our proposed method with GAW, and furthermore, in order to strengthen the significance of feature selection, we also carry out the comparative experiments with all features before applying feature selection approaches.
2. Examine the performance of CSSVM method. For this research objective, we carry out the comparative experiments with the same feature selection approaches, based on BP, 3-NN and CSSVM classifiers. And evaluate the performances of ACC, AMC, G-mean and running time.
4. Backgrounds and preliminaries
This section presents some preliminaries of our proposed method.
4.1. Information gain method
In this paper, we introduce IG directed SAGAW method for feature selection. To the best of our knowledge, the value of IG of each cases can represent its relevance to the category (Lai, Yeh, & Chang, 2016; Martín-Valdivia, Díaz-Galiano, Montejo-Raez, & Uren˜a-Lo´pez, 2008; Yang, Liu, Zhu, Liu, & Zhang, 2012), that is, a higher IG value means that the attribute contribute more information. For the classification system, we assume that the target dataset has N = 1,2,…,n instances with k classes. Let P(Ci, N) represents the proportion of Ci to N, where Ci represent the set of instances that belong to the ith class. The entropy of the dataset can be calculated
by: If a case γ has C = c 1, c2, …ck distinct category and letting Ni N =ci, then the entropy of the dataset from category γ is given
Finally, the value of IG of category γ can be derived by:
GA is a well-known global search method, which has received much attention for feature selection researchers (Dong, Li, Ding, & Sun, 2018; Ghosh, Parui, & Majumder, 2015; Hsu, 2004; Jadhav et al., 2018). GA can produce promising solutions for feature selection over a high-dimension space due to its robustness to the underlying search space size and multivariate distributions. To the best of our knowledge, the basic process of GA algorithm is as follows (Dong et al., 2018): (1) initialization. Random generate N individuals as the initial population, and encode the individuals; (2) individual evaluation. Calculate the fitness of each individual according to the evaluation criteria; (3) population evolution. Employ the selection operation, the crossover operation and the mutation operation to produce the next generation; (4) termination test. To judge if the maximum fitness of the individual is the optimal solution, if “yes”, then terminate the calculation, otherwise return to (2).
SA is a heuristic global optimization method, which introduce Metropolis acceptance criteria to judge whether to accept a new solution or not (Javidrad et al., 2018; Liang, Suganthan, Chan, & Huang, 2006). The basic idea of SA is to start from an initial solution, and then integrate with the Metropolis Monte Carlo procedure. The first iterative process of SA is generate new solution, then judging whether it meet with Metropolis criterion, if “yes” then accept it, otherwise abandon it. The acceptance probability P of a candidate solution xi+ 1 from the current solution xi is stated as:
where f is the objective function, f = f ( x perature is sequentially lowered to reduce