Machine Learning Based Sampling of X-Ray Images for a Computer-Aided Detection of Tuberculosis

Fernando Ferreira orcid, Philipp Gaspar orcid, Rodrigo Torres orcid, Carlos Eduardo Covas orcid, Lukas Müller de Oliveira orcid, Micael Veríssimo de Araújo orcid, José Manoel de Seixas orcid, Mayara Bastos orcid& Anete Trajman orcid

Abstract: Computer-Aided Detection software relies on annotated data set of X-rays to be developed. The annotation task is time-consuming and requires extensive know-how. This work presents a sampling method to select the most relevant images, which will be annotated for the development of a tuberculosis (TB) screening platform based on machine learning algorithms. The sampling task optimizes the annotation process by reducing the number of images to be analyzed without compromising the diversity and the significance power of the images in the dataset. We developed an algorithm to select images in a dataset to be annotated, based on similarity and dissimilarity measurements of images. Public TB image dataset was utilized to conduct this research. The experiment consisted of a deep learning feature engineering step, followed by topological analysis based on Self-Organizing Map and K-Means. The effectiveness of the process is evaluated at each of its stages: Classification, clustering and the final sampling algorithm which is based on similarity and dissimilarity features.

Keywords: Deep Learning, CNN, SOM, Clustering, CAD.

DOI code: 10.21528/lnlm-vol20-no2-art7

PDF file: vol20-no2-art7.pdf

BibTex file: vol20-no2-art7.bib