Instance Selection for Imbalanced Data

Sarah

Vluymans

Uit evenwicht: omgaan met ongebalanceerde informatie

Wanneer we op consultatie gaan bij de huisarts, gaan we ervan uit dat deze ons een correcte diagnose zal voorschotelen. Hij of zij zal zich daarbij baseren op vorige patiënten die gelijkaardige symptomen vertoonden. Indien we echter getroffen zijn door een zeldzame ziekte, waarvan slechts een klein aantal positieve gevallen bekend zijn ten opzichte van een groot aantal negatieve, kan de correcte detectie ervan erg moeilijk zijn. Recent onderzoek toont aan dat het bedachtzaam reduceren van de beschikbare data de herkenning van zulke zeldzame fenomenen sterk kan bevorderen. Betere conclusies door minder informatie te gebruiken? Dit klinkt misschien tegenstrijdig, maar het nut ervan wordt bewezen in verschillende domeinen.

Classificatie
Net als de dokter de juiste diagnose voor de patiënt probeert te bepalen, zo ook bestaan er computerprogramma’s die specifiek ontwikkeld zijn om een fenomeen toe te wijzen aan een klasse. Dit proces wordt classificatie genoemd. Het programma zal dit doen op basis van beschikbare data, waarvan de klasse reeds met zekerheid gekend is. Zo kan van een ziek persoon beslist worden of hij al dan niet aan een bepaalde ziekte leidt door zijn symptomen te vergelijken met zowel eerdere gevallen van deze ziekte als gezonde personen.

Een belangrijk doel van deze programma’s is uiteraard om in zoveel mogelijk gevallen de correcte klasse te kiezen. Zowel de kracht van het programma als de beschikbare data waarop het zich baseert zijn hierbij van essentieel belang.

Uit balans
Een specifiek probleem dat in deze toepassingen kan optreden is dat de beschikbare data niet gebalanceerd is. Dit betekent dat de gegevens ongelijk verdeeld zijn over de verschillende klassen. In bovenstaand voorbeeld weerspiegelt zich dit in een relatief groot aantal gezonde mensen ten opzichte van een kleine proportie zieken. Bij een zeldzame ziekte kan men bijvoorbeeld beschikken over de gegevens van 1000 patiënten, waarvan er slechts 50 de ziekte vertonen ten opzichte van 950 gezonde patiënten.

Deze ongelijke verdeling kan de classificatieprocedure van een computerprogramma sterk hinderen. Het programma tracht immers een goed beeld te vormen van de klassen en op basis daarvan nieuwe individuen te classificeren. Wanneer het relatief weinig informatie heeft voor een bepaalde klasse, zal dit proces niet optimaal verlopen. Experimenteel onderzoek toont aan dat verschillende vooraanstaande programma’s inderdaad ondermaats presteren wanneer ze geconfronteerd worden met ongebalanceerde data. In het bijzonder worden nieuwe elementen al te makkelijk toegewezen aan de meerderheidsklasse. Dit zou dus leiden tot een, mogelijks onaanvaardbaar, groot aantal zieke patiënten die verkeerd als gezond worden beschouwd.

Reductie
Recent onderzoek toont aan dat het gericht reduceren van de gegevens de classificatie sterk kan bevorderen. Het brengt een betere detectie van positieve gevallen met zich mee, zonder daarbij in te boeten aan een goede detectie van negatieve gevallen. Bij het reduceren worden er elementen uit de gegevensbank verwijderd. Van de 1000 beschikbare patiënten kan er bijvoorbeeld beslist worden slechts 400 te gebruiken.

De reductie wordt uitgevoerd door een onafhankelijk programma, dat op zich geen rekening houdt met de classificatieprocedure. Het doel is om uit de beschikbare gegevens een representatieve groep te bepalen, die op haar beurt kan leiden tot een makkelijkere en meer correcte modellering van de klassen. In deze procedure kunnen zowel positieve als negatieve gevallen verwijderd worden, maar inherent zal de reductie bij de meerderheidsklasse echter sterker zijn. Dit heeft als onmiddellijk resultaat dat er een beter evenwicht in de beschikbare informatie wordt bereikt.

Bij het verwijderen van gegevens kan er rekening gehouden worden met hoe representatief ze zijn. Zo is het bijvoorbeeld minder zinvol om gegevens van atypische patiënten, ziek of niet ziek, te behouden, omdat die makkelijk kunnen leiden tot een verkeerd beeld van de ziekte en bijgevolg het stellen van de correcte diagnose vermoeilijken. Naast het verbeteren van de classificatieproces is het ook interessant om op te merken dat door het reduceren van de data tevens de vereiste opslagruimte afneemt, wat een extra voordeel is langs de informatica kant.

Toepassing
Naast het bovenstaande medische voorbeeld doet ongebalanceerdheid in data zich nog in verschillende andere domeinen voor. Zo treden er bijvoorbeeld bij fraudedetectie in verhouding veel minder positieve dan negatieve gevallen op. Ook in deze situatie zal de classificatieprocedure hiervan hinder ondervinden, wat verholpen kan worden door de beschikbare data eerst te reduceren.

Doordat de reductie gebeurt door een alleenstaand programma, onafhankelijk van de gebruikte classificatie, kan het met verschillende van deze procedures gecombineerd worden. Dit biedt een comfortabele vrijheid aan de gebruiker, zodat een gepaste combinatie voor elke toepassing kan geselecteerd worden. Een ongunstige balans zal een goede classificatie dus niet langer in de weg staan.

Bibliografie

[1] D. Aha, D. Kibler & M. Albert (1991). Instance-based learning algorithms. MachineLearning, 6(1):37–66.[2] R. Akbani, S. Kwek & N. Japkowicz (2004). Applying support vector machines toimbalanced datasets. In Machine Learning: ECML 2004, pp. 39–50. Springer.[3] F. Angiulli (2007). Fast nearest neighbor condensation for large data sets classification.IEEE Transactions on Knowledge and Data Engineering, 19(11):1450–1464.[4] D. Bamber (1975). The area above the ordinal dominance graph and the area below thereceiver operating characteristic graph. Journal of mathematical psychology, 12(4):387–415.[5] R. Barandela, F. Ferri & J. S´anchez (2005). Decision boundary preserving prototype selectionfor nearest neighbor classification. International Journal of Pattern Recognitionand Artificial Intelligence, 19(6):787–806.[6] S. Barua, M. Islam, X. Yao, K. Murase et al. (2014). MWMOTE–majority weightedminority oversampling technique for imbalanced data set learning. IEEE Transactionson Knowledge and Data Engineering, 26(2):405–425.[7] G. Batista, R. Prati & M. Monard (2004). A study of the behavior of several methodsfor balancing machine learning training data. SIGKDD Explorations, 6(1):20–29.[8] K. Boyd, K. H. Eng & C. D. Page (2013). Area under the precision-recall curve: Pointestimates and confidence intervals. In Machine Learning and Knowledge Discovery inDatabases, pp. 451–466. Springer.[9] L. Breiman (1996). Bagging predictors. Machine learning, 24(2):123–140.[10] L. A. Breslow & D. W. Aha (1997). Simplifying decision trees: A survey. The KnowledgeEngineering Review, 12(01):1–40.[11] H. Brighton & C. Mellish (2002). Advances in instance selection for instance-based.Data mining and Knowledge Discovery, 6(2):153–172.[12] C. Brodley (1993). Adressing the selective superiority problem: Automatic algorithm /model class selection. In 10th International Machine Learning Conference(ICML’93),pp. 17–24.[13] C. Bunkhumpornpat, K. Sinapiromsaran & C. Lursinsap (2009). Safe-level-SMOTE:Safe-level-synthetic minority over-sampling technique for handling the class imbalancedproblem. In Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD09), volume 5476 of Lecture Notes on Computer Science, pp. 475–482.Springer-Verlag.[14] C. J. Burges (1998). A tutorial on support vector machines for pattern recognition.Data mining and knowledge discovery, 2(2):121–167.[15] R. Cameron-Jones (1995). Instance selection by encoding length heuristic withrandom mutation hill climbing. In 8th Australian Joint Conference on ArtificialIntelligence(AJCAI-95), pp. 99–106.[16] J. Cano, F. Herrera & M. Lozano (2003). Using evolutionary algorithms as instanceselection for data reduction in KDD: An experimental study. IEEE Transactions onEvolutionary Computation, 7(6):561–575.[17] P. K. Chan & S. J. Stolfo (1998). Toward scalable learning with non-uniform class andcost distributions: A case study in credit card fraud detection. In KDD, volume 1998,pp. 164–168.[18] F. Chang, C. Lin & C.-J. Lu (2006). Adaptive prototype learning algorithms: Theoreticaland experimental studies. Journal of Machine Learning Research, 7:2125–2148.[19] N. Chawla, K. Bowyer, L. Hall & W. Kegelmeyer (2002). SMOTE: Synthetic minorityover-sampling technique. Journal of Artificial Intelligence Research, 16:321–357.[20] N. V. Chawla, A. Lazarevic, L. O. Hall & K. W. Bowyer (2003). SMOTEBoost: Improvingprediction of the minority class in boosting. In Knowledge Discovery in Databases:PKDD 2003, pp. 107–119. Springer.[21] C.-H. Chou, B.-H. Kuo & F. Chang (2006). The generalized condensed nearest neighborrule as a data reduction method. In 18th International Conference on PatternRecognition, 2006. ICPR 2006., volume 2, pp. 556–559. IEEE.[22] J. Cohen (1960). A coefficient of agreement for nominal scales. Educational and psychologicalmeasurement, 20(1):37–46.[23] C. Cortes & V. Vapnik (1995). Support-vector networks. Machine learning, 20(3):273–297.[24] T. Cover (1968). Estimation by the nearest neighbor rule. IEEE Transactions onInformation Theory, 14(1):50–55.[25] T. Cover & P. Hart (1967). Nearest neighbor pattern classification. IEEE Transactionson Information Theory, 13(1):21–27.[26] J. Davis & M. Goadrich (2006). The relationship between precision-recall and ROCcurves. In Proceedings of the 23rd international conference on Machine learning, pp.233–240. ACM.[27] M. Denil & T. Trappenberg (2010). Overlap versus imbalance. In Advances in ArtificialIntelligence, pp. 220–231. Springer.[28] J. Derrac, S. Garc´ıa, D. Molina & F. Herrera (2011). A practical tutorial on the use ofnonparametric statistical tests as a methodology for comparing evolutionary and swarmintelligence algorithms. Swarm and Evolutionary Computation, 1(1):3–18.[29] V. Devi & M. Murty (2002). An incremental prototype set building technique. PatternRecognition, 35:505–513.[30] D. Dubois & H. Prade (1990). Rough fuzzy sets and fuzzy rough sets. InternationalJournal of General System, 17(2-3):191–209.[31] S. Dumais, J. Platt, D. Heckerman & M. Sahami (1998). Inductive learning algorithmsand representations for text categorization. In Proceedings of the seventh internationalconference on Information and knowledge management, pp. 148–155. ACM.[32] E. Duman, Y. Ekinci & A. Tanrıverdi (2012). Comparing alternative classifiers fordatabase marketing: The case of imbalanced datasets. Expert Systems with Applications,39(1):48–53.[33] T. Fawcett (2006). An introduction to ROC analysis. Pattern recognition letters,27(8):861–874.[34] A. Fern´andez, V. L´opez, M. Galar, M. J. Del Jesus & F. Herrera (2013). Analysingthe classification of imbalanced data-sets with multiple classes: Binarization techniquesand ad-hoc approaches. Knowledge-Based Systems, 42:97–110.[35] C. Ferri, J. Hern´andez-Orallo & M. A. Salido (2003). Volume under the ROC surfacefor multi-class problems. In Machine Learning: ECML 2003, pp. 108–120. Springer.[36] P. Flach (2012). Machine learning: the art and science of algorithms that make senseof data. Cambridge University Press.[37] Y. Freund (1990). Boosting a weak learning algorithm by majority. In COLT, volume 90,pp. 202–216.[38] Y. Freund & R. E. Schapire (1995). A desicion-theoretic generalization of on-linelearning and an application to boosting. In Computational learning theory, pp. 23–37. Springer.[39] M. Friedman (1937). The use of ranks to avoid the assumption of normality implicit inthe analysis of variance. Journal of the American Statistical Association, 32(200):675–701.[40] J. F¨urnkranz (1997). Pruning algorithms for rule learning. Machine Learning,27(2):139–172.[41] M. Galar, A. Fern´andez, E. Barrenechea & F. Herrera (2013). EUSBoost: Enhancingensembles for highly imbalanced data-sets by evolutionary undersampling. PatternRecognition.[42] S. Garc´ıa, J. Cano & F. Herrera (2008). A memetic algorithm for evolutionary prototypeselection: A scaling up approach. Pattern Recognition, 41(8):2693–2709.[43] S. Garcıa, J. Derrac, J. R. Cano & F. Herrera (2010). Prototype selection for nearestneighbor classification: Survey of methods. Technical report.[44] S. Garc´ıa, J. Derrac, J. R. Cano & F. Herrera (2012). Prototype selection for nearestneighbor classification: Taxonomy and empirical study. IEEE Transactions on PatternAnalysis and Machine Intelligence, 34(3):417–435.[45] S. Garc´ıa & F. Herrera (2009). Evolutionary undersampling for classification withimbalanced datasets: Proposals and taxonomy. Evolutionary Computation, 17(3):275–306.[46] V. Garc´ıa, R. A. Mollineda & J. S. S´anchez (2008). On the k-NN performance in achallenging scenario of imbalance and overlapping. Pattern Analysis and Applications,11(3-4):269–280.[47] N. Garc´ıa-Pedrajas, J. R. del Castillo & D. Ortiz-Boyer (2010). A cooperative coevolutionaryalgorithm for instance selection for instance-based learning. Machine Learning,78(3):381–420.[48] G. Gates (1972). The reduced nearest neighbour rule. IEEE Transactions on InformationTheory, 18(3):431–433.[49] D. J. Goodenough, K. Rossmann & L. B. Lusted (1974). Radiographic applications ofreceiver operating characteristic (ROC) curves. Radiology, 110(1):89–95.[50] M. Grochowski & N. Jankowski (2004). Comparison of instance selection algorithmsi. algorithms survey. In VII International Conference on Artificial Intelligence andSoft Computing(ICAISC’04), volume 3070 of Lecture Notes on Computer Science, pp.598–603. Springer-Verlag.[51] J. W. Grzymala-Busse, L. K. Goodwin, W. J. Grzymala-Busse & X. Zheng (2004). Anapproach to imbalanced data sets based on changing rule strength. In Rough-NeuralComputing, pp. 543–553. Springer.[52] H. Han, W.Wang & B. Mao (2005). Borderline-SMOTE: a new over-sampling method inimbalanced data sets learning. In 2005 International Conference on Intelligent Computing(ICIC05), volume 3644 of Lecture Notes on Computer Science, pp. 878–887. Springer-Verlag.[53] D. J. Hand & R. J. Till (2001). A simple generalisation of the area under the ROCcurve for multiple class classification problems. Machine Learning, 45(2):171–186.[54] D. J. Hand & V. Vinciotti (2003). Choosing k for two-class nearest neighbour classifierswith unbalanced classes. Pattern Recognition Letters, 24(9):1555–1562.[55] J. A. Hanley & B. J. McNeil (1982). The meaning and the use of the area under areceiver operating characteristic (ROC) curve. Radiology, 143:29–36.[56] P. Hart (1968). The condensed nearest neighbour rule. IEEE Transactions on InformationTheory, 14(5):515–516.[57] K. Hattori & M. Takahashi (2000). A new edited k-nearest neighbor rule in the patternclassification problem. Pattern Recognition, 33:521–528.[58] H. He & E. A. Garcia (2009). Learning from imbalanced data. IEEE Transactions onKnowledge and Data Engineering, 21(9):1263–1284.[59] A. S. Hedayat, N. J. A. Sloane & J. Stufken (1999). Orthogonal arrays: theory andapplications. Springer.[60] S. Ho, C. Liu & S. Liu (2002). Design of an optimal nearest neighbor classifier usingan intelligent genetic algorithm. Pattern Recognition Letters, 23:1495–1503.[61] C. A. Hoare (1962). Quicksort. The Computer Journal, 5(1):10–16.[62] S. Holm (1979). A simple sequentially rejective multiple test procedure. Scandinavianjournal of statistics, pp. 65–70.[63] C.-W. Hsu & C.-J. Lin (2002). A comparison of methods for multiclass support vectormachines. IEEE Transactions on Neural Networks, 13(2):415–425.[64] N. Japkowicz (2000). The class imbalance problem: Significance and strategies. In Proceedingsof the 2000 International Conference on Artificial Intelligence (ICAIâAZ2000),volume 1, pp. 111–117. Citeseer.[65] N. Japkowicz & S. Stephen (2002). The class imbalance problem: A systematic study.Intelligent data analysis, 6(5):429–449.[66] T. Jo & N. Japkowicz (2004). Class imbalances versus small disjuncts. ACM SIGKDDExplorations Newsletter, 6(1):40–49.[67] N. Kerdprasop & K. Kerdprasop (2012). On the generation of accurate predictive modelfrom highly imbalanced data with heuristics and replication techniques. Internationaljournal of bio-science and bio-technology, 4(1):49–64.[68] F. Kharbat, L. Bull & M. Odeh (2007). Mining breast cancer data with XCS. InProceedings of the 9th annual conference on Genetic and evolutionary computation, pp.2066–2073. ACM.[69] W. Khreich, E. Granger, A. Miri & R. Sabourin (2010). Iterative boolean combinationof classifiers in the ROC space: An application to anomaly detection with HMMs.Pattern Recognition, 43(8):2732–2752.[70] D. L. Kreher & D. R. Stinson (1998). Combinatorial algorithms: generation, enumeration,and search, volume 7. CRC press.[71] M. Kubat & S. Matwin (1997). Addressing the curse of imbalanced training sets: onesidedselection. In 14th International Conference on Machine Learning(ICML97), pp.179–186.[72] J. Laurikkala (2001). Improving identification of difficult small classes by balancingclass distribution. In 8th Conference on AI in Medicine in Europe(AIME01), volume2001 of Lecture Notes on Computer Science, pp. 63–66. Springer Berlin / Heidelberg.[73] Y.-H. Lee, P. J.-H. Hu, T.-H. Cheng, T.-C. Huang & W.-Y. Chuang (2013). Apreclustering-based ensemble learning technique for acute appendicitis diagnoses. Artificialintelligence in medicine.[74] Y. Li, Z. Hu, Y. Cai & W. Zhang (2005). Support vector vased prototype selectionmethod for nearest neighbor rules. In I International conference on advances in naturalcomputation(ICNC05), volume 3610 of Lecture Notes on Computer Science, pp. 528–535. Springer.[75] V. L´opez, A. Fern´andez, S. Garc´ıa, V. Palade & F. Herrera (2013). An insight intoclassification with imbalanced data: Empirical results and current trends on using dataintrinsic characteristics. Information Sciences, 250:113–141.[76] V. L´opez, A. Fern´andez, J. G. Moreno-Torres & F. Herrera (2012). Analysis of preprocessingvs. cost-sensitive learning for imbalanced classification. open problems onintrinsic data characteristics. Expert Systems with Applications, 39(7):6585–6608.[77] M. Lozano, J. S´anchez & F. Pla (2003). Using the geometrical distribution of prototypesfor training set condesing. In 10th Conference of the Spanish Association for ArtificialIntelligence(CAEPIA03), volume 3040 of Lecture Notes on Computer Science, pp. 618–627. Springer.[78] J. MacQueen (1967). Some methods for classification and analysis of multivariate observations.In 5th Berkeley Symposium on Mathematical Statistics and Probability, pp.281–297.[79] H. B. Mann & D. R. Whitney (1947). On a test of whether one of two random variables isstochastically larger than the other. The annals of mathematical statistics, 18(1):50–60.[80] E. Marchiori (2008). Hit miss networks with applications to instance selection. Journalof Machine Learning Research, 9:997–1017.[81] E. Marchiori (2010). Class conditional nearest neighbor for large margin instance selection.IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(2):364–370.[82] D. D. Margineantu (2002). Class probability estimation and cost-sensitive classificationdecisions. In Machine Learning: ECML 2002, pp. 270–281. Springer.[83] S. Mason & N. Graham (2002). Areas beneath the relative operating characteristics(ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation.Quarterly Journal of the Royal Meteorological Society, 128(584):2145–2166.[84] L. Mena & J. A. Gonzalez (2006). Machine learning for imbalanced datasets: Applicationin medical diagnostic. In FLAIRS Conference, pp. 574–579.[85] K. Napierala, J. Stefanowski & S. Wilk (2010). Learning from imbalanced data inpresence of noisy and borderline examples. In 7th International Conference on RoughSets and Current Trends in Computing(RSCTC2010), pp. 158–167.[86] J. Olvera-L´opez, J. Carrasco-Ochoa & J. Mart´ınez-Trinidad (2010). A new fast prototypeselection method based on clustering. Pattern Analysis and Applications, 13:131–141.[87] J. Olvera-L´opez, J. Mart´ınez-Trinidad & J. Carrasco-Ochoa (2005). Edition schemesbased on bse. In 10th Iberoamerican Congress on Pattern Recognition(CIARP2004),volume 3773 of Lecture Notes on Computer Science, pp. 360–367. Springer.[88] Z. Pawlak (1982). Rough sets. International Journal of Computer & InformationSciences, 11(5):341–356.[89] R. Pearson, G. Goney & J. Shwaber (2003). Imbalanced clustering for microarraytime-series. In Proceedings of the ICML, volume 3.[90] J. Platt et al. (1998). Sequential minimal optimization: A fast algorithm for trainingsupport vector machines.[91] R. C. Prati, G. E. Batista & M. C. Monard (2004). Class imbalances versus classoverlapping: an analysis of a learning system behavior. In MICAI 2004: Advances inArtificial Intelligence, pp. 312–321. Springer.[92] F. Provost & P. Domingos (2000). Well-trained PETs: Improving probability estimationtrees.[93] J. R. Quinlan (1993). C4.5: programs for machine learning, volume 1. Morgan kaufmann.[94] P. Radivojac, N. V. Chawla, A. K. Dunker & Z. Obradovic (2004). Classification andknowledge discovery in protein databases. Journal of Biomedical Informatics, 37(4):224–239.[95] E. Ramentol, Y. Caballero, R. Bello & F. Herrera (2012). SMOTE-RSB: a hybridpreprocessing approach based on oversampling and undersampling for high imbalanceddata-sets using SMOTE and rough sets theory. Knowledge and Information Systems,33(2):245–265.[96] J. Riquelme, J. Aguilar-Ruiz & M. Toro (2003). Finding representative patterns withordered projections. Pattern Recognition, 36:1009–1018.[97] L. Rokach (2007). Data mining with decision trees: theory and applications, volume 69.World Scientific.[98] L. Rokach (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33(1-2):1–39.[99] M. Sahare & H. Gupta (2012). A review of multi-class classification for imbalanceddata. International Journal, 2.[100] J. S´anchez, R. Barandela, A. M´arques, R. Alejo & J. Badenas (2003). Analysis of newtechniques to obtain quality training sets. Pattern Recognition Letters, 24:1015–1022.[101] J. S´anchez, F. Pla & F. Ferri (1997). Prototype selection for the nearest neighbor rulethrough proximity graphs. Pattern Recognition Letters, 18:507–513.[102] R. E. Schapire (1990). The strength of weak learnability. Machine learning, 5(2):197–227.[103] M. Sebban & R. Nock (2000). Instance pruning as an information preserving problem.In ICML, pp. 855–862.[104] C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse & A. Napolitano (2010). RUSBoost: Ahybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Manand Cybernetics, Part A: Systems and Humans, 40(1):185–197.[105] D. Skalak (1994). Prototype and feature selection by sampling and random mutation hillclimbing algorithms. In 11th International Conference on Machine Learning (ML’94),pp. 293–301.[106] J. Stefanowski & S. Wilk (2008). Selective pre-processing of imbalanced data for improvingclassification performance. In 10th International Conference in Data Warehousingand Knowledge Discovery(DaWaK2008), volume 5182 of Lecture Notes on ComputerScience, pp. 283–292. Springer.[107] M. Stone (1974). Cross-validatory choice and assessment of statistical predictions. Journalof the Royal Statistical Society. Series B (Methodological), pp. 111–147.[108] Y. Sun, A. K.Wong & M. S. Kamel (2009). Classification of imbalanced data: A review.International Journal of Pattern Recognition and Artificial Intelligence, 23(04):687–719.[109] K. M. Ting (2002). An instance-weighting method to induce cost-sensitive trees. IEEETransactions on Knowledge and Data Engineering, 14(3):659–665.[110] I. Tomek (1976). Two modifications of CNN. IEEE Transactions on Systems and Manand Cybernetics, 6:769–772.[111] V. N. Vapnik & A. J. Chervonenkis (1974). Theory of pattern recognition. Nauka.[112] F. Vazquez, J. S´anchez & F. Pla (2005). A stochastic approach to Wilson’s editingalgorithm. In 2nd Iberian Conference on Pattern Recognition and Image Analysis(IbPRIA05), volume 3523 of Lecture Notes on Computer Science, pp. 35–42. Springer.[113] N. Verbiest, C. Cornelis & F. Herrera (2013). FRPS: A fuzzy rough prototype selectionmethod. Pattern Recognition.[114] N. Verbiest, C. Cornelis & F. Herrera (2013). OWA-FRPS: A prototype selectionmethod based on ordered weighted average fuzzy rough set theory. In Rough Sets,Fuzzy Sets, Data Mining, and Granular Computing, pp. 180–190. Springer.[115] N. Verbiest, E. Ramentol, C. Cornelis & F. Herrera (2012). Improving SMOTE withfuzzy rough prototype selection to detect noise in imbalanced classification data. InAdvances in Artificial Intelligence–IBERAMIA 2012, pp. 169–178. Springer.[116] N. Verbiest, E. Ramentol, C. Cornelis & F. Herrera (2014). Preprocessing noisy imbalanceddatasets using SMOTE enhanced with fuzzy rough prototype selection. Submitted.[117] K. Veropoulos, C. Campbell & N. Cristianini (1999). Controlling the sensitivity of supportvector machines. In Proceedings of the international joint conference on artificialintelligence, volume 1999, pp. 55–60. Citeseer.[118] S. Wang & X. Yao (2009). Diversity analysis on imbalanced data sets by using ensemblemodels. In IEEE Symposium on Computational Intelligence and Data Mining, 2009.CIDM’09., pp. 324–331. IEEE.[119] X.-Z. Wang, B. Wu, Y.-L. He & X.-H. Pei (2008). NRMCS: Noise removing basedon the MCS. In 7th International Conference on Machine Learning and Cybernetics(ICMLA08), pp. 89–93.[120] G. M. Weiss (2005). Mining with rare cases. In Data Mining and Knowledge DiscoveryHandbook, pp. 765–776. Springer.[121] G. M. Weiss (2010). The impact of small disjuncts on classifier learning. In DataMining, pp. 193–226. Springer.[122] F. Wilcoxon (1945). Individual comparisons by ranking methods. Biometrics bulletin,1(6):80–83.[123] D. Wilson (1972). Asymptotic properties of nearest neighbor rules using edited data.IEEE Transactions on Systems and Man and Cybernetics, 2(3):408–421.[124] D. Wilson & T. Martinez (2000). Reduction techniques for instance-based learningalgorithms. Machine Learning, 38(3):257–286.[125] E. B. Wilson (1927). Probable inference, the law of succession, and statistical inference.Journal of the American Statistical Association, 22(158):209–212.[126] G. Wu & E. Y. Chang (2003). Class-boundary alignment for imbalanced dataset learning.In ICML 2003 workshop on learning from imbalanced data sets II, Washington,DC, pp. 49–56.[127] R. R. Yager (1988). On ordered weighted averaging aggregation operators in multicriteriadecisionmaking. IEEE Transactions on Systems, Man and Cybernetics, 18(1):183–190.[128] P. Yang, L. Xu, B. Zhou, Z. Zhang & A. Zomaya (2009). A particle swarm based hybridsystem for imbalanced medical data sampling. BMC genomics, 10(Suppl 3):S34.[129] S. Yen & Y. Lee (2006). Under-sampling approaches for improving prediction of theminority class in an imbalanced dataset. In International Conference on IntelligentComputing(ICIC06), pp. 731–740.[130] S.-J. Yen & Y.-S. Lee (2009). Cluster-based under-sampling approaches for imbalanceddata distributions. Expert Systems with Applications, 36(3):5718–5727.[131] K. Yoon & S. Kwek (2005). An unsupervised learning approach to resolving the dataimbalanced issue in supervised learning problems in functional genomics. In 5th InternationalConference on Hybrid Intelligent Systems(HIS05), pp. 303–308.[132] H. Yu, J. Ni & J. Zhao (2012). ACOSampling: An ant colony optimization-based undersamplingmethod for classifying imbalanced DNA microarray data. Neurocomputing.[133] G. U. Yule (1900). On the association of attributes in statistics: with illustrations fromthe material of the childhood society, &c. Philosophical Transactions of the Royal Societyof London. Series A, Containing Papers of a Mathematical or Physical Character,194(252-261):257–319.[134] L. Zadeh (1965). Fuzzy sets. Information and control, 8(3):338–353.[135] K. Zhao, S. Zhou, J. Guan & A. Zhou (2003). C-Pruner: An improved instance prunningalgorithm. In Second International Conference on Machine Learning and Cybernetics(ICMLC’03), pp. 94–99.