Semi-automatic annotation of emergency events reported on X based on machine learning and evolutionary computing

Authors

  • Jesús Zambrano-Zambrano Universidad Técnica de Manabí UTM
  • Joel Garcia-Arteaga Universidad Técnica de Manabí UTM
  • Jorge Parraga-Alava Universidad Técnica de Manabí UTM

DOI:

https://doi.org/10.56124/encriptar.v6i11.0001

Keywords:

hyper-parameter, genetic algorithm, machine learning, X, corpus

Abstract

Text corpora related to citizen incidents are scarce and the annotation process, to determine whether they are emergencies, is usually done manually by human annotators. This annotation method yields acceptable results, but it is slower, more expensive, and only feasible for data sets that are small in volume or do not require real-time processing. Annotating emergency corpora with human annotators, which require immediate actions by their nature, does not seem to be ideal, especially when they are reported on social networks such as X. This article proposes an approach for the semi-automatic annotation of emergency events reported on X, based on machine learning and evolutionary computation. The CRISP-DM methodology was applied with six stages: problem understanding, data collection and understanding, modeling, validation and deployment. The computational experiments show a good performance when the linear support vector classifier (LSVC) is used as a model, whose hyper-parameters are optimized with an evolutionary computation algorithm. Average values of 0.976 and 0.963 for the F1-Score, as well as 0.96 and 0.97 for the Matthews correlation coefficient, were obtained to assign the emergency tag to +170 thousand tweets for the binary and multiclass classification, respectively. It is concluded that it is possible to have an efficient approach to assign the emergency label to the texts published on the social network X, thanks to the optimization of the values of the parameters of the machine learning classification models.

Downloads

Download data is not yet available.

References

Boughorbel, S., Jarray, F., & El-Anbari, M. (2017). Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLoS ONE, 12(6). doi: 10.1371/JOURNAL.PONE.0177678

Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1). doi: 10.1186/S12864-019-6413-7

Ding, J., Data, X. L.-2018 I. I. C. on B., & 2018, undefined. (n.d.). An approach for validating quality of datasets for machine learning. Ieeexplore.Ieee.Org. Retrieved from

García-Arteaga, J., Zambrano-Zambrano, J., Parraga-Alava, J., An effective approach for identifying keywords as high-quality filters to get emergency-implicated X Spanish data [Manuscript submitted for publication]. Computer Speech & Language.

Han, J. H., Choi, D. J., Park, S. U., & Hong, S. K. (2020). Hyperparameter Optimization Using a Genetic Algorithm Considering Verification Time in a Convolutional Neural Network. Journal of Electrical Engineering and Technology, 15(2), 721–726. doi: 10.1007/S42835-020-00343-7

Luque, A., Maniglio, F., Casado, F., & García-Guerrero, J. (n.d.). Transmedia Context and X As Conditioning the Ecuadorian Government’s Action. The Case of the “Guayaquil Emergency” During the COVID-19 Pandemic. Raco.Cat, 2, 47–68.

Martínez-Rojas, M., … M. del C. P.-F.-I. J. of, & 2018, undefined. (n.d.). X as a tool for the management and analysis of emergency situations: A systematic literature review. Elsevier. Retrieved from

Ranjit, M., Ganapathy, G., … K. S.-2019 I. 12th, & 2019, undefined. (n.d.). Efficient deep learning hyperparameter tuning using cloud infrastructure: Intelligent distributed hyperparameter tuning with bayesian optimization in the cloud. Ieeexplore.Ieee.Org. Retrieved from

Wood, J., Griffis, T., Meteorology, J. B.-A. and F., & 2015, undefined. (n.d.). Detecting drift bias and exposure errors in solar and photosynthetically active radiation data. Elsevier. Retrieved from

Rojo, V., Pollo-Cattaneo, M. F., & Britos, P. (n.d.). Slanglex-ar: aplicación de un léxico de lenguaje informal de Argentina para el análisis de sentimientos en español en X. Aplicación de Tecnologías de la Información y Comunicaciones Avanzadas y Accesibilidad, 205.

Cànaves Alberti, M. (2020). Datación automática de poemas mediante técnicas de aprendizaje automático.

Mares Giner, J. M. (2017). Visualización y seguimiento de acontecimientos en X (Doctoral dissertation, Universitat Politècnica de València).

Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415, 295-316.

Spasic, I., & Nenadic, G. (2020). Clinical text data in machine learning: systematic review. JMIR medical informatics, 8(3), e17984.

Wirth, R., & Hipp, J. (2000, April). CRISP-DM: Towards a standard process model for data mining. In Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining (Vol. 1, pp. 29-39).

Published

2023-12-31

How to Cite

Zambrano-Zambrano, J. ., Garcia-Arteaga, J. ., & Parraga-Alava, J. . (2023). Semi-automatic annotation of emergency events reported on X based on machine learning and evolutionary computing. Scientific Journal of Informatics ENCRYPT - ISSN: 2737-6389., 6(11), 1–18. https://doi.org/10.56124/encriptar.v6i11.0001