Data Mining applied on Web Robots Detection: A Systematic Mapping

Título: Data Mining applied on Web Robots Detection: A Systematic Mapping

Autores: Ramon Abilio, Cristiano Garcia and Victor Fernandes.

Resumo:
Browsing on Internet is part of the world population’s daily routine. The number of web pages is increasing and so is the amount of published content (news, tutorials, images, videos) provided by them. Search engines use web robots to index web contents and to offer better results to their users. However, web robots have also been used for exploiting vulnerabilities in web pages. Thus, monitoring and detecting web robots’ accesses is important in order to keep the web server as safe as possible. Data Mining methods have been applied to web server logs (used as data source) in order to detect web robots. Then, the main objective of this work was to observe evidences of definition or use of web robots detection by analyzing web server-side logs using Data Mining methods. Thus, we conducted a systematic Literature mapping, analyzing papers published between 2013 and 2020. In the systematic mapping, we analyzed 34 studies and they allowed us to better understand the area of web robots detection, mapping what is being done, the data used to perform web robots detection, the tools, and algorithms used in the Literature. From those studies, we extracted 33 machine learning algorithms, 64 features, and 13 tools. This study is helpful for researchers to find machine learning algorithms, features, and tools to detect web robots by analyzing web server logs.

Palavras-chave:
Web Usage Mining, Web Server Logs, Machine Learning Algorithms, Feature Extraction, Feature Selection.

Páginas: 8

Código DOI: 10.21528/CBIC2021-60

Artigo em pdf: CBIC_2021_paper_60.pdf

Arquivo BibTeX: CBIC_2021_60.bib