Classification of Breast Cancer Histology Images Using MSMV-PFENet


Experimental setup

We trained and tested the MSMV-PFENet on a platform equipped with an NVIDIA Tesla V100 GPU (32G memory) and a 24-core Intel Xeon Platinum 8168 processor (33M cache, 2.70 GHz). We trained the network with a batch size of 64 to 27,000 iterations, using stochastic gradient descent (SGD) in which we set momentum to 0.9 and weight decay to 1e-4. To speed up training, we set the initial learning rate to 0.01 for the first 30 batches, then divided the rate by 10 every 10 batches, until the rate equaled 1e-4. The BACH dataset33 used in our experiment contained 400 training images and 100 test images (Fig. 6, Table 2). The dimensions of the images are 2048 (time ) 1536 pixels, and the pixel size is (0.42,upmu mathrm{m} times 0.42,upmu mathrm{m}). We split the training images into two sets with an equal ratio of 4:1 for training and validation.

Table 2 BACH dataset information33.

To avoid overfitting due to small dataset size, we used transfer learning35 and increase in data36 techniques. PFENet is derived from ImageNet’s pre-trained ResNet50. PFENet lacked the average pooling layers and fully connected layers that were essential for ResNet50 and were replaced by an adaptive average pooling layer. We randomly initialized the parameters in FDNet and trained FDNet with PFENet together. PFENet encoded the image features into a vector, and FDNet took the vector for classification. We refined the entire model based on classification feedback so that FDNet could correctly discriminate between different features and images. The specific details of augmenting the datasets are presented in the Methods section. We have also standardized the color of the images to avoid inconsistencies caused by different staining protocols39.

To demonstrate the classification capabilities of MSMV-PFENet, we used the same image and patch levels as in30.31 to evaluate the network from several aspects. We have defined the precision (Acc), precision (Pre), recall (Rec), and F1 score (F1) like

$$Acc=frac{TP+TN}{TP+TN+FP+FN}, $$

(1)

$$Pre=frac{TP}{TP+FP}, $$

(2)

$$Rec=frac{TP}{TP+FN}, $$

(3)

$$F1=frac{2 times Pre times Rec}{Pre+Rec}, $$

(4)

Here, TP, NT, PFand FN are respectively true positives, true negatives, false positives and false negatives. These criteria had also been adopted elsewhere40.41.

Comparison with other methods

To show the classification capability of MSMV-PFENet, we compared the accuracy given by our network with that of other30,39,42,43,44. Our MSMV-PFENet could achieve an accuracy of 93.0(%) at patch level (Table 3); the value was higher than those reported elsewhere. The comparison showed that a well-trained MSMV-PFENet could efficiently extract the most important features from the original images at local and global levels. Additionally, BiLSTM and our network’s majority voting mechanism have further improved accuracy through full consideration of local and global characteristics. As a result, MSMV-PFENet achieved an accuracy of 94.8(%) at the image level.

Table 3 Reported classification accuracy at the patch level and at the image level using various networks.

Results on the extraction of key regions

To verify the influence of various methods for extracting image patches from the original images, we compared our cell nucleus density-guided method with other random patch selection methods. When we chose random selection, training MSMV-PFENet was difficult because not all lesions in the original image were sent to the classifier for analysis, resulting in relatively low accuracy (85.4(%)). In comparison, our method made the training process much easier than the random selection method and benefited from the classification power of MSMV-PFENet, based on the fact that cancer cells were often associated with increased nuclear size, irregular nuclear outlines and a disturbed chromatin distribution. At the patch level, the accuracy could reach 93.0(%).

Results on different network combinations

To find an optimal design for coding and feature discrimination in MSMV-PFENet, we conducted an extensive study comparing different architectures (Table 4). For feature encoding, we chose fully connected layer as feature discriminator and tested FENet, PFENet, VGG1626GoogLeNet45and ResNet-10138 as a feature encoder (Table 4, lines 1 to 6). PFENet in this test provided the best patch and frame level accuracy among the five architectures because the PFENet containing 3 CNNs in parallel could simultaneously encode local and global level features. The global features represented at low resolution also reduced the computational cost and made the classifier more efficient than analyzing a high resolution image as a whole.

Table 4 Performance comparison of various combinations of network architectures.

After fixing the problem with feature encoding, we tested several candidates, including FC, Support Vector Machine (SVM), and BiLSTM, for the classifier (Table 4, lines 6–10). BiLSTM outperformed the other two candidates because it could effectively combine local and global information. Unfortunately, the BiLSTM training was difficult. The deeper the BiLSTM network was, the more difficult it was to converge on a qualified discriminator. Balancing the trade-off between accuracy and training feasibility, we ultimately chose a network design with a dual-layer BiLSTM for the following demonstrations.

Results of the whole network

Figure 7a shows the confusion matrix calculated in our test on the BACH test dataset. The results confirmed that our MSMV-PFENet could accurately associate histological images with their clinical findings, including normal, benign, in situ, and invasive carcinomas. Precision, recall, and F1 score as additional benchmarks were also calculated, and these scores were always above 92.8(%) (Fig. 7b). We noticed that the misclassification rate was higher for normal tissue, which was possibly mediated by the subtle inconsistency in tissue morphology among populations with different genotypes.

Picture 7

Results of the entire network on the BACH test dataset. (a) Confusion matrix illustrating the classification performed by MSMV-PFENet. (b) Histogram of precision, recall and F1 score for MSMV-PFENet evaluation.

Moreover, we also checked our method on Yan’s dataset30. This dataset contains 3771 diverse high-resolution pathology images. Figure 8a shows the confusion matrix computed on the test dataset. Precision, recall and F1 score were still maintained at over 92(%) (Fig. 8b), and the accuracy reaches 95.6(%). The results demonstrated that MVMS-PFENet can also achieve good performance on a larger data set.

Figure 8
figure 8

Network-wide results on Yan’s dataset. (a) Confusion matrix illustrating the classification performed by MSMV-PFENet. (b) Histogram of precision, recall and F1 score for MSMV-PFENet evaluation.

Previous Getty Images and GLAAD Challenge Global Creatives
Next Here are the very first images of a black lynx