CatBoost and Logistic Regression as Machine Learning Approaches in Matchmaking and Perceived Availability
DOI:
https://doi.org/10.56124/encriptar.v7i14.009Keywords:
Matchmaking, ensemble, speed datingAbstract
This paper aims to redesign the analysis of the “Speed Dating” dataset, which was part of the research titled “Gender Differences in Mate Selection: Evidence from a Speed Dating Experiment,” presented by Raymond Fisman, Sheena Iyengar, Emir Kamenica, and Itamar Simonson in The Quarterly Journal of Economics, the oldest professional journal of economics in the English language, in 2006. Based on the theory of "perceived availability," which suggests that people are more likely to find those who seem more attainable or interested in them to be attractive, logistic regression and the CatBoost ensemble method were employed to uncover patterns that appear influential in the decisions of individuals of the opposite sex regarding the potential for a future relationship from a four-minute speed dating social experiment. The findings indicate that, in general, individuals prioritize the following in their potential partners, from most to least important: attractiveness, perceived compatibility, shared interests, sense of humor, ambition, satisfaction with acquaintances (indicative of sociability), TV interests, sincerity, and partner's age. These results report an accuracy of over 80% with Logistic Regression and 88% with the CatBoost ensemble method. The tool used in model development was Orange Data Mining 3.37.
Downloads
References
Association for the Advancement of Artificial Intelligence (Ed.). (2018). Proceedings of the Twelfth International AAAI Conference on Web and Social Media: ICWSM: 25-28 June 2018, Stanford, California, USA. International AAAI Conference on Web and Social Media, Palo Alto, California. AAAI Press.
Brannan, D., & Mohr, C. D. (2018). Love, friendship, and social support. Noba textbook series: Psychology. Champaign, IL: DEF publishers.
Buss, D. M., & Schmitt, D. P. (1993). Sexual Strategies Theory: An evolutionary perspective on human mating. Psychological Review, 100(2), 204-232. https://doi.org/10.1037/0033-295X.100.2.204
Fernández, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research, 61, 863-905.
Fisman, R., Iyengar, S. S., Kamenica, E., & Simonson, I. (2006). Gender Differences in Mate Selection: Evidence From a Speed Dating Experiment. The Quarterly Journal of Economics, 121(2), 673-697. https://doi.org/10.1162/qjec.2006.121.2.673
Hayashi, T., Mawalim, C. O., Ishii, R., Morikawa, A., Fukayama, A., Nakamura, T., & Okada, S. (2023). A Ranking Model for Evaluation of Conversation Partners Based on Rapport Levels. IEEE Access, 11, 73024-73035. https://doi.org/10.1109/ACCESS.2023.3287984
Joshi, A., Saggar, P., Jain, R., Sharma, M., Gupta, D., & Khanna, A. (2021). CatBoost—An Ensemble Machine Learning Model for Prediction and Classification of Student Academic Performance. Advances in Data Science and Adaptive Analysis, 13(03n04), Article 03n04. https://doi.org/10.1142/S2424922X21410023
Kleinerman, A., Rosenfeld, A., Ricci, F., & Kraus, S. (2018). Optimally balancing receiver and recommended users’ importance in reciprocal recommender systems. Proceedings of the 12th ACM Conference on Recommender Systems, 131-139. https://doi.org/10.1145/3240323.3240349
Lundberg, S. (2018). SHAP. API Reference. https://tinyurl.com/yhcdt2w8
Lundberg, S., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. https://doi.org/10.48550/ARXIV.1705.07874
McFarland, D. A., Broska, D., Prabhakaran, V., & Jurafsky, D. (2024). Coming into relations: How communication reveals and persuades relational decisions. Social Networks, 79, 57-75. https://doi.org/10.1016/j.socnet.2024.05.003
Mukhopadhyay, S. (2018). Advanced Data Analytics Using Python. Apress. https://doi.org/10.1007/978-1-4842-3450-1
Pincay Ponce, J. I. (2023). Análisis de datos educativos aplicado en el estudio de la incidencia de factores socioeconómicos en el rendimiento escolar [Doctor en Ciencias Informáticas, Universidad Nacional de La Plata]. https://doi.org/10.35537/10915/156471
Pincay Ponce, J. I., De Giusti, A. E., Sánchez Andrade, D. A., & Figueroa Suárez, J. A. (2024). CatBoost: Aprendizaje automático de conjunto para la analítica de los factores socioeconómicos que inciden en el rendimiento escolar. Revista Iberoamericana de Tecnología en Educación y Educación en Tecnología, 38, e3. https://doi.org/10.24215/18509959.38.e3
Pincay-Ponce, J., Sánchez-Andrade, D., Caicedo-Ávila, I., & Macías-Valencia, D. (2020, noviembre 27). Clasificación de pacientes según su posibilidad de adquirir Diabetes Mellitus empleando algoritmos de Machine Learning. IV Congreso Internacional Tecnologías de la Información y Computación (CITIC 2020), Calceta, Ecuador. https://tinyurl.com/yve333v7
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2019). CatBoost: Unbiased boosting with categorical features (arXiv:1706.09516; Número arXiv:1706.09516). arXiv. http://arxiv.org/abs/1706.09516
Regan, P. C. (1998). Minimum Mate Selection Standards as a Function of Perceived Mate Value, Relationship Context, and Gender. Journal of Psychology & Human Sexuality, 10(1), 53-73. https://doi.org/10.1300/J056v10n01_04
Sharabi, L. L., & Dorrance-Hall, E. (2024). The online dating effect: Where a couple meets predicts the quality of their marriage. Computers in Human Behavior, 150, 107973. https://doi.org/10.1016/j.chb.2023.107973
Van den Broeck, G., Lykov, A., Schleich, M., & Suciu, D. (2022). On the Tractability of SHAP Explanations. Journal of Artificial Intelligence Research, 74, 851-886. https://doi.org/10.1613/jair.1.13283
Weigard, A., & Spencer, R. J. (2023). Benefits and challenges of using logistic regression to assess neuropsychological performance validity: Evidence from a simulation study. The Clinical Neuropsychologist, 37(1), 34-59. https://doi.org/10.1080/13854046.2021.2023650
Ye, Y., Ni, K., Jing, F., Zhou, Y., Tang, W., & Zhang, Q. (2024). Model-Informed Targeted Network Interventions on Social Networks Among Men Who Have Sex With Men in Zhuhai, China. IEEE Transactions on Computational Social Systems, 11(1), 238-246. https://doi.org/10.1109/TCSS.2022.3216756
Zheng, X., Zhao, G., Zhu, L., Zhu, J., & Qian, X. (2022). What You Like, What I Am: Online Dating Recommendation via Matching Individual Preferences with Features. IEEE Transactions on Knowledge and Data Engineering, 1-1. https://doi.org/10.1109/TKDE.2022.3148485
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Scientific Journal of Informatics ENCRYPT - ISSN: 2737-6389.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.