In this study, deep learning models showed a very high level of accuracy in discriminating six histopathological classes of colon lesions, without region-based annotation. Images based on non-WSI cameras achieved sufficient classification performance exceeding 95% for accuracy. Although the histological region is a complex mixture of epithelial injury, surrounding normal epithelial component, and additional mesenchymal component, Grad-CAM focused on the appropriate key areas of epithelial injury, not other surrounding areas not relevant.
Recently, several studies have predicted the histological type of lesions on colonoscopy images using AI17.18. Nevertheless, fewer studies have investigated the pathological diagnosis of histological slide images of colon polyps compared to endoscopic imaging studies. Most AI studies of colon pathology have aimed to distinguish adenocarcinoma from non-adenocarcinoma10,19,20,21. Iizuka et al. classified images of colon pathology using AI, including normal, adenoma (probably mostly TA), and adenocarcinoma22. When graded using a recurrent neural network, the AUC of TA and adenocarcinoma were 0.964 and 0.975, respectively, slightly lower than our study results. Compared to the research of Iizuka et al., our research has the following differences: they extracted the lesion image using random sampling, but in our study the pathology image has taken by a pathologist who selected a representative lesion. Their study used Inception-v3 as an AI model, while ours used DenseNet-161 and EfficientNet-B7. The number of images used to train the model also differed, but the extent to which this affects model accuracy is unknown.
According to Jones et al., when the AI was trained using 50 and 30 unambiguous benign and malignant images of the colon, the accuracy was 92.3% and 82.5%, respectively.ten. In the case of a malignant tumor (especially adenocarcinoma) of the colon, it is relatively easy to distinguish it from other lesions by pathological signs due to the characteristics of cytological atypia, disruptive and invasive growth of surrounding tissues and desmoplastic stroma. Therefore, even with only 30-50 images, we believe that adenocarcinoma can be easily learned using AI with 90% accuracy. In our study, although the dataset was reduced to 1/4, adenocarcinoma was 100% predictable. SSA, TA and TSA, which should be morphologically easy to recognize, showed over 90% accuracy even with the use of a 1/4 dataset. In our study, the 1/4 dataset used an average of 69 images for each disease group, and the accuracy was almost 90% after the data augmentation process. For lesions with subtle pathological changes (e.g., collagenous colitis or amyloid colitis), we cannot predict the amount of training data needed to ensure sufficient accuracy for diagnosis.23. However, we believe that if researchers could collect typical cases and use them for training, they would be able to build an AI model with sufficient accuracy using fewer cases.
Grad-CAM is a method that allows researchers to easily identify the domains to which weights are assigned in the classification using a CNN. Therefore, if Grad-CAM assigns weights to the appropriate parts, it will be a useful model. Otherwise, a global reassessment is necessary throughout the model training process. In our pilot study, we were able to confirm by Grad-CAM that the part that contributed significantly to the determination of SSA was the empty space in the submucosal area of the endoscopic submucosal dissection specimen. We added an image processing step so that areas without cells in certain regions are omitted from the learning process, allowing the AI algorithm to focus more on the epithelial layer of the colon. We evaluated our model using Grad-CAM images for all datasets using the trained model. In most cases, our model weights the epithelial components, not the stroma. This indicates that while most colon polyps are epithelial lesions, the widely used CNN model performs well even when applied to the pathological classification of colon polyps. The classes of the CNN image classification model are independent of each other. However, pathological lesions may only be present in a small portion of the entire image. In this case, an error occurs when classifying the image as NC instead of as a lesion. To solve this problem, a weight correction according to the class can be easily applied. The appropriate correction value requires further study. HP and SSA have common microscopic findings showing serration protruding from the crypt lumen. However, unlike HP, SSA has base crypt dilation. If muscle mucosa is still visible, it may affect the process of CNN pattern formation. To effectively learn the SSA in the CNN model, a new concept that takes into account the image direction in the model training process is needed. In the case of adenocarcinoma, the solid growth (undifferentiated carcinoma-like area) and signet ring cell components had lighter weights. These tumors are rare, but they have a worse prognosis; therefore, we believe it is important to reflect in the diagnostic process. We also believe it would be better to add additional cases and assign weights as separate classes rather than as adenocarcinoma classes. In this study, we used a method in which the CNN model focused more on the epithelium by removing the large submucosa area and lamina void space by zero-filling. However, to learn epithelial lesions more efficiently, a technique that removes smaller void space and stroma is needed.
Among the datasets used in the AI algorithm training process, multiple images were used from multiple polyps taken from a patient. In general, if the data set is not large enough, using multiple data from one source can skew the learning results. However, it is common to use multiple images (WSI segmentation) from one source (one patient) in pathology image research. Indeed, even a single lesion is morphologically or genetically heterogeneous. Malignant tumors are defined as tumors originating from one cell and clonal expansion in which enough genetic mutations are accumulated to allow uncontrolled cell division. However, adjacent tumor cells are not completely identical because they exhibit intra-tumor genetic heterogeneity through clonal evolution, in which several genetic abnormalities occur independently during the proliferation process.24. Colon adenomatous polyp also showed intra-tumor heterogeneity examined by single-cell sequencing technique25.26. These studies have shown that even nonproliferative and healthy colonic epithelia are not identical at the cellular level.27.28. Therefore, it is reasonable to assume that all images are independent if they do not overlap.
In this study, typical images determined by a pathologist were used as the data set. For the pathological picture to be “typical”, the lesion must be large enough to allow diagnosis. However, adenocarcinoma should be diagnosed even if it is small, with only a few cells or glands. Technically, the CNN-based AI algorithm can detect tiny adenocarcinomas with sufficient training data. It is unclear how much data is needed to detect single scattered adenocarcinoma cells using the AI algorithm. As the aim of our study was not to create an AI model capable of judging very small lesions, there is the limitation that we could not detect subtle changes in this study.
One of the main limitations of the current study is the use of individual tiles instead of WSI images. The lack of a slide scanner system at our institution was the reason, and working on individual tiles instead of WSI would reduce the utility value of automated deep learning models. In the future, research using WSI images obtained by an automated scanner would be much more useful. Another limitation of this study is the lack of technical novelty for the CNN structures: we adopted ready-to-use CNN structures open to the public. Nevertheless, this study could be considered as an early study that showed that deep learning has good performance even on the pathology of colonoscopy specimens. Additionally, to more clearly validate the reliability of the model’s performance, additional validation on an external dataset would be required in the future. Moreover, to show the effectiveness or the necessity of deep learning models on this task, a comparative study with a radiomics-based method having a much lighter model would also be needed in future research. The lack of comparison with traditional machine learning methods was another limitation of the present study. Indeed, we could not find an established technique that could be easily implemented for 3-channel images. Comparative research would show the difference in performance between different methods in the future.
In conclusion, this study showed that deep learning models classify histopathology images with high accuracy into six diagnostic types commonly encountered in colonoscopy specimens using relatively less data. Pathologists could also be aided by deep learning models to diagnose colonoscopy-related specimens. Future research would be needed to build a more comprehensive deep learning model diagnosing more different types of colorectal lesions.