Small Image Training Sets: Exploring the Limits of Conventional and CNN-based Methods

Kovalev, V.

Please use this identifier to cite or link to this item: https://libeldoc.bsuir.by/handle/123456789/45826

Title:	Small Image Training Sets: Exploring the Limits of Conventional and CNN-based Methods
Authors:	Kovalev, V.
Keywords:	материалы конференций;conference proceedings;Image Classification;Benchmarking;Convolutional Neural Networks;Histology images
Issue Date:	2021
Publisher:	UIIP NASB
Citation:	Kovalev, V. Small Image Training Sets: Exploring the Limits of Conventional and CNN-based Methods / Kovalev V. // Pattern Recognition and Information Processing (PRIP'2021) = Распознавание образов и обработка информации (2021) : Proceedings of the 15th International Conference, 21–24 Sept. 2021, Minsk, Belarus / United Institute of Informatics Problems of the National Academy of Sciences of Belarus. – Minsk, 2021. – P. 178–182.
Abstract:	This work is dedicated to the problem of image classification under the condition of small image datasets. Both traditional and CNN-based methods are examined and compared based on a benchmark image dataset. The dataset consisted of 12000 routine hematoxylin-eosin stained histological images. They represent the biopsy samples of normal tissue and the malignant tumors caused by breast cancer. The commonly-known image analysis methods which make use of color co-occurrence matrices of images converted to an adaptive 32-color space and the limited number of their principal components (PCA) were used as image features. The features were inputted to SVM and Random Forests classifiers. The original image training set was gradually reduced from 8400 to 840 images with the step of 10%. In addition, the very-small sub-samples of 5% (420), 2.5% (210), 1.25% (105), and 1% (84) of original image dataset were also examined. In its turn, the classical CNN was employed that consisted of only 3 convolutional + MaxPooling layers with 16, 32, and 64 filters respectively. This is because the small image training sets were specifically targeted in this particular study. The convolutional part was followed by a fully connected neural network with 512 intermediate nodes. As a result, it was found that traditional methods outperform the CNN-based image classification technique on the training sets comprised of less than 840 images.
URI:	https://libeldoc.bsuir.by/handle/123456789/45826
Appears in Collections:	Pattern Recognition and Information Processing (PRIP'2021) = Распознавание образов и обработка информации (2021)

Files in This Item:

File	Description	Size	Format
Kovalev_Small.pdf		1.26 MB	Adobe PDF	View/Open

Show full item record Google Scholar