This article was originally published here
Methods of cellular representation. 2021 Nov 8;1(7):100107. doi: 10.1016/j.crmeth.2021.100107. eCollection 2021 November 22.
The generalizability of deep learning (DL) model performance is not well understood and uses anecdotal assumptions to augment training data to improve medical image segmentation. We report statistical methods for the visual interpretation of DL models trained using ImageNet initialization with the natural world (J II) and supervised learning with medical images (L MID) for binary segmentation of skin cancer, prostate and kidney tumors. An algorithm for calculating Dice scores from the union and intersections of individual output masks has been developed for synergistic segmentation by J II and L MID models. Stress testing with non-Gaussian distributions of labels and infrequent clinical images has shown that scarcity of medical images of the natural world and domain can counterintuitively reduce Type I and Type II errors of DL models. A toolbox of 30 J II and L MID models, code and visual outputs of 59,967 images are shared to identify target and non-target medical image pixels and clinical labels to explain the performance of DL models.