Using Transfer Learning Effectively: A Characterization of Negative Transfer in Data and Ways to Avoid it

Mikael Engels

Transfer Learning binnen Machine Learning; zelflerende algorithmes efficient toepassen en gebruiken

Computersystemen leren op een andere manier dan mensen. Een mens leert door eerdere ervaringen en kan deze ervaringen combineren om nieuwe taken uit te voeren. Bijvoorbeeld als je leert blokfluitspelen is het makkelijker om te leren klarinet spelen als dat je direct zou moeten leren klarinet te moeten spelen. Dit werkt niet zo voor een computersysteem.


Algoritmes in een computersysteem leren vanaf de start, bijvoorbeeld het herkennen honden op plaatjes, is kennis die door het computersysteem alleen voor dit doel kan worden gebruikt. Transfer Learning is binnen Machine Learning het vakgebied die dit probeert op te lossen. Echter is dit in de praktijk veel lastiger dan men vaak denkt wat kan leiden tot inefficiëntie, een 'negative transfer'. Wanneer echter Transfer Learning efficiënt werkt kan dit leiden tot een hele grote intelligentie boost voor computeralgoritmes en worden ingezet op heel veel digitale vakgebieden. Bijvoorbeeld door hersentumoren in een vroeg stadium te herkennen of zelfrijdende auto's veiliger te maken.

Deze thesis gaat over het optimaliseren van Transfer Learning en hoe men voorkomt dat de Negative Transfer ontstaat. Het bevat een literatuuronderzoek en een aantal experimenten, binnen text mining, een vorm waarin door mensen geschreven tekst door het computersysteem geanalyseerd wordt. De conclusies van het onderzoek zijn dat Transfer Learning efficiënt is wanneer de datasets met gelijke woordenset een grotere kans op positive transfers hebben, daarnaast blijkt de vorm van transferen relevant en is het te gebruiken algoritme van invloed op de effectiviteit.



Anderson, E. W. (1998). Customer satisfaction and word of mouth. Journal of service research, 1(1), 517.  Asghar, N. (2016). Yelp Dataset Challenge: Review Rating Prediction. arXiv preprint arXiv:1605.05362.  Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.  Boisot, M., & Canals, A. (2004). Data, information and knowledge: have we got it right? Journal of Evolutionary Economics, 14(1), 43-67.  Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., & Wirth, R. (2000). CRISP-DM 1.0 Step-by-step data mining guide.  Chatterjee, P. (2001). Online reviews: do consumers use them?  Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority oversampling technique. Journal of artificial intelligence research, 16, 321-357.  Deacon, H. J., & Deacon, J. (1999). Human beginnings in South Africa: uncovering the secrets of the Stone Age: Rowman Altamira. Do, C. B., & Ng, A. Y. (2006). Transfer learning for text classification. Paper presented at the Advances in Neural Information Processing Systems. Egan, T. M., Yang, B., & Bartlett, K. R. (2004). The effects of organizational learning culture and job satisfaction on motivation to transfer learning and turnover intention. Human resource development quarterly, 15(3), 279-301.  Ehteshami Bejnordi, B., Veta, M., Johannes van Diest, P., & et al. (2017). Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA, 318(22), 2199-2210. doi:10.1001/jama.2017.14585 Engels, M. (2017). Data Analytics Task 2 Amazon & Yelp.  Fernández, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. Journal of artificial intelligence research, 61, 863-905.  Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137-144.  Ge, L., Gao, J., Ngo, H., Li, K., & Zhang, A. (2014). On handling negative transfer and imbalanced distributions in multiple source transfer learning. Statistical Analysis and Data Mining: The ASA Data Science Journal, 7(4), 254-271.  Griffiths, O., Johnson, A. M., & Mitchell, C. J. (2011). Negative transfer in human associative learning. Psychological science, 22(9), 1198-1204.  Gui, L., Xu, R., Lu, Q., Du, J., & Zhou, Y. (2017). Negative transfer detection in transductive transfer learning. International Journal of Machine Learning and Cybernetics, 1-13.  Haskell, R. E. (2000). Transfer of learning: Cognition and instruction: Academic Press. He, R., & McAuley, J. (2016). Ups and downs: Modeling the visual evolution of fashion trends with oneclass collaborative filtering. Paper presented at the Proceedings of the 25th International Conference on World Wide Web. Helms, R. W. (2015). Datasafari – exploreren om te innoveren. In: Open University Press. Hofmann, T. (2017). Probabilistic latent semantic indexing. Paper presented at the ACM SIGIR Forum. Hu, N., Pavlou, P., & Zhang, J. (2006, 2006). Can online reviews reveal a product's true quality?: empirical findings and analytical modeling of Online word-of-mouth communication. Hu, X., Pan, J., Li, P., Li, H., He, W., & Zhang, Y. (2016). Multi-bridge transfer learning. Knowledge-Based Systems, 97, 60-74.  Huang, P., Wang, G., & Qin, S. (2012). Boosting for transfer learning from multiple data sources. Pattern Recognition Letters, 33(5), 568-579.  Jurafsky, D. (2000). Speech & language processing: Pearson Education India. Kocaguneli, E., Menzies, T., & Mendes, E. (2015). Transfer learning in effort estimation. Empirical Software Engineering, 20(3), 813-843.   
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Paper presented at the Advances in neural information processing systems. Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The annals of mathematical statistics, 22(1), 79-86.  Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse processes, 25(2-3), 259-284.  Leberman, S., McDonald, L., & Doyle, S. (2006). The transfer of learning: Participants' perspectives of adult education and training: Gower Publishing, Ltd. Lin, Y.-S., Jiang, J.-Y., & Lee, S.-J. (2014). A Similarity Measure for Text Classification and Clustering. IEEE Transactions on knowledge and data engineering, 26(7), 1575-1590. doi:10.1109/TKDE.2013.19 Lu, J., Behbood, V., Hao, P., Zuo, H., Xue, S., & Zhang, G. (2015). Transfer learning using computational intelligence: a survey. Knowledge-Based Systems, 80, 14-23.  Makūnas, D. (2018). Retrieved from Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 3941.  Moreno, P. J., Ho, P. P., & Vasconcelos, N. (2004). A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. Paper presented at the Advances in neural information processing systems. Muller, A. (2017). word_cloud Github. Retrieved from Pan, J., Hu, X., Li, P., Li, H., He, W., Zhang, Y., & Lin, Y. (2016). Domain adaptation via Multi-Layer Transfer Learning. Neurocomputing, 190, 10-24.  Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10), 1345-1359.  Perkins, D. N., & Salomon, G. (1992). Transfer of learning. International encyclopedia of education, 2, 6452-6457.  Perkins, D. N., & Salomon, G. (2012). Knowledge to go: A motivational and dispositional view of transfer. Educational Psychologist, 47(3), 248-258.  Provost, F., & Fawcett, T. (2013). Data Science for Business: What you need to know about data mining and data-analytic thinking: " O'Reilly Media, Inc.". Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. Y. (2007). Self-taught learning: transfer learning from unlabeled data. Paper presented at the Proceedings of the 24th international conference on Machine learning. Rosenstein, M. T., Marx, Z., Kaelbling, L. P., & Dietterich, T. G. (2005). To transfer or not to transfer. Paper presented at the NIPS 2005 Workshop on Transfer Learning. Saunders, M., Lewis, P., & Thornhill, A. (2016). Research methods for business students. Harlow; Munich [u.a.]: Pearson. Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61, 85-117.  Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1-47. doi:10.1145/505282.505283 Torrey, L., & Shavlik, J. (2009). Transfer learning. Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, 1, 242.  Weiss, K., Khoshgoftaar, T. M., & Wang, D. (2016). A survey of transfer learning. Journal of Big Data - Journal Article, 3(1), 1-40. doi:10.1186/s40537-016-0043-6 Woodworth, R. S., & Thorndike, E. (1901). The influence of improvement in one mental function upon the efficiency of other functions.(I). Psychological review, 8(3), 247.  Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? Paper presented at the Advances in neural information processing systems. Yu, D., & Deng, L. (2014). Automatic speech recognition: A deep learning approach: Springer. Zhu, X., & Wu, X. (2004). Class noise vs. attribute noise: A quantitative study. Artificial Intelligence Review, 22(3), 177-210.

Universiteit of Hogeschool
Business Process Management & IT
Open Universiteit
Dr. Stefano Bromuri
Share this on: