Fernando Ferreira , Philipp Gaspar , Rodrigo Torres , Carlos Eduardo Covas , Lukas Müller de Oliveira , Micael Veríssimo de Araújo , José Manoel de Seixas , Mayara Bastos & Anete Trajman
Abstract: Computer-Aided Detection software relies on annotated data set of X-rays to be developed. The annotation task is time-consuming and requires extensive know-how. This work presents a sampling method to select the most relevant images, which will be annotated for the development of a tuberculosis (TB) screening platform based on machine learning algorithms. The sampling task optimizes the annotation process by reducing the number of images to be analyzed without compromising the diversity and the significance power of the images in the dataset. We developed an algorithm to select images in a dataset to be annotated, based on similarity and dissimilarity measurements of images. Public TB image dataset was utilized to conduct this research. The experiment consisted of a deep learning feature engineering step, followed by topological analysis based on Self-Organizing Map and K-Means. The effectiveness of the process is evaluated at each of its stages: Classification, clustering and the final sampling algorithm which is based on similarity and dissimilarity features.
Keywords: Deep Learning, CNN, SOM, Clustering, CAD.
DOI code: 10.21528/lnlm-vol20-no2-art7
PDF file: vol20-no2-art7.pdf
BibTex file: vol20-no2-art7.bib