CatBoost and Logistic Regression as Machine Learning Approaches in Matchmaking and Perceived Availability

Authors

  • Jorge Iván Pincay-Ponce Universidad Laica Eloy Alfaro de Manabí ULEAM
  • María Roxana Martínez Universidad Abierta Interamericana UAI
  • Wilian Richart Delgado-Muentes Universidad Laica Eloy Alfaro de Manabí ULEAM
  • Juan Alberto Figueroa-Suárez Universidad Laica Eloy Alfaro de Manabí ULEAM

DOI:

https://doi.org/10.56124/encriptar.v7i14.009

Keywords:

Matchmaking, ensemble, speed dating

Abstract

This paper aims to redesign the analysis of the “Speed Dating” dataset, which was part of the research titled “Gender Differences in Mate Selection: Evidence from a Speed Dating Experiment,” presented by Raymond Fisman, Sheena Iyengar, Emir Kamenica, and Itamar Simonson in The Quarterly Journal of Economics, the oldest professional journal of economics in the English language, in 2006. Based on the theory of "perceived availability," which suggests that people are more likely to find those who seem more attainable or interested in them to be attractive, logistic regression and the CatBoost ensemble method were employed to uncover patterns that appear influential in the decisions of individuals of the opposite sex regarding the potential for a future relationship from a four-minute speed dating social experiment. The findings indicate that, in general, individuals prioritize the following in their potential partners, from most to least important: attractiveness, perceived compatibility, shared interests, sense of humor, ambition, satisfaction with acquaintances (indicative of sociability), TV interests, sincerity, and partner's age. These results report an accuracy of over 80% with Logistic Regression and 88% with the CatBoost ensemble method. The tool used in model development was Orange Data Mining 3.37.

Downloads

Download data is not yet available.

References

Association for the Advancement of Artificial Intelligence (Ed.). (2018). Proceedings of the Twelfth International AAAI Conference on Web and Social Media: ICWSM: 25-28 June 2018, Stanford, California, USA. International AAAI Conference on Web and Social Media, Palo Alto, California. AAAI Press.

Brannan, D., & Mohr, C. D. (2018). Love, friendship, and social support. Noba textbook series: Psychology. Champaign, IL: DEF publishers.

Buss, D. M., & Schmitt, D. P. (1993). Sexual Strategies Theory: An evolutionary perspective on human mating. Psychological Review, 100(2), 204-232. https://doi.org/10.1037/0033-295X.100.2.204

Fernández, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research, 61, 863-905.

Fisman, R., Iyengar, S. S., Kamenica, E., & Simonson, I. (2006). Gender Differences in Mate Selection: Evidence From a Speed Dating Experiment. The Quarterly Journal of Economics, 121(2), 673-697. https://doi.org/10.1162/qjec.2006.121.2.673

Hayashi, T., Mawalim, C. O., Ishii, R., Morikawa, A., Fukayama, A., Nakamura, T., & Okada, S. (2023). A Ranking Model for Evaluation of Conversation Partners Based on Rapport Levels. IEEE Access, 11, 73024-73035. https://doi.org/10.1109/ACCESS.2023.3287984

Joshi, A., Saggar, P., Jain, R., Sharma, M., Gupta, D., & Khanna, A. (2021). CatBoost—An Ensemble Machine Learning Model for Prediction and Classification of Student Academic Performance. Advances in Data Science and Adaptive Analysis, 13(03n04), Article 03n04. https://doi.org/10.1142/S2424922X21410023

Kleinerman, A., Rosenfeld, A., Ricci, F., & Kraus, S. (2018). Optimally balancing receiver and recommended users’ importance in reciprocal recommender systems. Proceedings of the 12th ACM Conference on Recommender Systems, 131-139. https://doi.org/10.1145/3240323.3240349

Lundberg, S. (2018). SHAP. API Reference. https://tinyurl.com/yhcdt2w8

Lundberg, S., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. https://doi.org/10.48550/ARXIV.1705.07874

McFarland, D. A., Broska, D., Prabhakaran, V., & Jurafsky, D. (2024). Coming into relations: How communication reveals and persuades relational decisions. Social Networks, 79, 57-75. https://doi.org/10.1016/j.socnet.2024.05.003

Mukhopadhyay, S. (2018). Advanced Data Analytics Using Python. Apress. https://doi.org/10.1007/978-1-4842-3450-1

Pincay Ponce, J. I. (2023). Análisis de datos educativos aplicado en el estudio de la incidencia de factores socioeconómicos en el rendimiento escolar [Doctor en Ciencias Informáticas, Universidad Nacional de La Plata]. https://doi.org/10.35537/10915/156471

Pincay Ponce, J. I., De Giusti, A. E., Sánchez Andrade, D. A., & Figueroa Suárez, J. A. (2024). CatBoost: Aprendizaje automático de conjunto para la analítica de los factores socioeconómicos que inciden en el rendimiento escolar. Revista Iberoamericana de Tecnología en Educación y Educación en Tecnología, 38, e3. https://doi.org/10.24215/18509959.38.e3

Pincay-Ponce, J., Sánchez-Andrade, D., Caicedo-Ávila, I., & Macías-Valencia, D. (2020, noviembre 27). Clasificación de pacientes según su posibilidad de adquirir Diabetes Mellitus empleando algoritmos de Machine Learning. IV Congreso Internacional Tecnologías de la Información y Computación (CITIC 2020), Calceta, Ecuador. https://tinyurl.com/yve333v7

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2019). CatBoost: Unbiased boosting with categorical features (arXiv:1706.09516; Número arXiv:1706.09516). arXiv. http://arxiv.org/abs/1706.09516

Regan, P. C. (1998). Minimum Mate Selection Standards as a Function of Perceived Mate Value, Relationship Context, and Gender. Journal of Psychology & Human Sexuality, 10(1), 53-73. https://doi.org/10.1300/J056v10n01_04

Sharabi, L. L., & Dorrance-Hall, E. (2024). The online dating effect: Where a couple meets predicts the quality of their marriage. Computers in Human Behavior, 150, 107973. https://doi.org/10.1016/j.chb.2023.107973

Van den Broeck, G., Lykov, A., Schleich, M., & Suciu, D. (2022). On the Tractability of SHAP Explanations. Journal of Artificial Intelligence Research, 74, 851-886. https://doi.org/10.1613/jair.1.13283

Weigard, A., & Spencer, R. J. (2023). Benefits and challenges of using logistic regression to assess neuropsychological performance validity: Evidence from a simulation study. The Clinical Neuropsychologist, 37(1), 34-59. https://doi.org/10.1080/13854046.2021.2023650

Ye, Y., Ni, K., Jing, F., Zhou, Y., Tang, W., & Zhang, Q. (2024). Model-Informed Targeted Network Interventions on Social Networks Among Men Who Have Sex With Men in Zhuhai, China. IEEE Transactions on Computational Social Systems, 11(1), 238-246. https://doi.org/10.1109/TCSS.2022.3216756

Zheng, X., Zhao, G., Zhu, L., Zhu, J., & Qian, X. (2022). What You Like, What I Am: Online Dating Recommendation via Matching Individual Preferences with Features. IEEE Transactions on Knowledge and Data Engineering, 1-1. https://doi.org/10.1109/TKDE.2022.3148485

Downloads

Published

2024-11-20

How to Cite

Pincay-Ponce, J. I., Martínez, M. R., Delgado-Muentes, W. R. ., & Figueroa-Suárez, J. A. . (2024). CatBoost and Logistic Regression as Machine Learning Approaches in Matchmaking and Perceived Availability. Scientific Journal of Informatics ENCRYPT - ISSN: 2737-6389., 7(14), 169–186. https://doi.org/10.56124/encriptar.v7i14.009