br Moreover in some cases
Moreover, in some cases, distant dependencies play a more im-portant role in making the final classification decision than close de-pendencies. As shown in Fig. 5d, patches 6, 8 and 9 all have key pa-thologic features of ductal carcinoma in situ, such as the tumor POM1 in the duct arranged in solid nests with nuclear hyper-chromatism, het-erogeneous pathologic condition in which malignant epithelial cells are confined within the ducts of the breast without evidence of invasion, etc. Joint considering patch 6 with the spatially remote patches 8 or 9 can make the classification decision as ductal carcinoma in situ more accurately than joint considering patch 6 with the spatially close pat-ches 7 or 10.
Thus, by jointly considering the short-term and the long-term spatial correlations between patches, our proposed method not only deep mines the pathologic features of breast cancer but also simulates a real-world scenario in which a pathologist analyzes the pathological images.
5.2. Accuracy comparison with previous methods
The performance of our proposed method on patch-wise and image-wise accuracy is shown in Table 3. We compared the average classifi-cation accuracy with most of the advanced methods. Because some of
Fig. 5. Visualization of two pathological images using t-SNE. Fig. 5a and b show the two-dimensional (2D) representation of 12 feature vectors extracted from 12 patches from a breast cancer pathological image using t-SNE. Each data point in Fig. 5a and b represent the feature vector extracted from the corresponding patch in Fig. 5c and 5d.
Comparation of accuracy with previous methods.
Method Patch-wise accuracy (%) Image-wise accuracy (%)
the previous work used the Bioimaging2015 dataset with 249 training images and others used the ICIAR2018 dataset with 400 training images, and since 249 and 400 images are not considerably different, for convenient and concise comparison, we only compared the accuracy of the method in the case of the 400 training images.
For the 4-class pathological image classification, our method achieved 82.1% average accuracy in patch-wise and 91.3% average accuracy in image-wise. There are mainly two reasons for the good performance on the patch-wise. We use a pretraining model that allows for better generalization on a smaller number of pathological image datasets. Additionally, in contrast to previous work that used only the original CNN architecture, we use the more advanced Google's Inception-V3, which ensures the model's better learning ability. In ad-dition, we analyzed the reasons for achieving image-wise good perfor-mance. Because we use richer multilevel feature representation to re-present patches, image-wise information fusion can be more complete. At the same time, we use an integration method of a deep neural net-work to preserve the short-term and long-term spatial correlations be-tween patches.
5.3. Accuracy comparison with different combinations of methods
We compared the average classification accuracy of different com-binations of patch-wise and image-wise methods. It should be noted that to be comparable to most previous work, the dataset we use is the same size as the dataset (ICIAR2018) that has been used by most cur-rent work. Therefore, rather than experimenting on our complete da-taset, we randomly select 400 images for the training set and test them on another 100 images.
To select the CNN model suitable for pathological images, we fixed the image-wise phase with the method of majority voting and then used different patch-wise CNN models for training. We tried some of the most mainstream approaches. The three most representative are listed in Table 4. The results showed that the model proposed by Oxford University's Visual Geometry Group (VGG) had general performance, which may be related to the VGG model being proposed very early, and the follow-up work made many improvements on its foundation.
Comparison of accuracies with different combinations of methods.
Patch-wise method Image-wise Accuracy (%)
VGG16 Majority voting 79.2
ResNet-50 Majority voting 81.6
Inception-V3 Majority voting 82.2
Inception-V3 + Fine-tuning Majority voting 86.1
Inception-V3 + Fine-tuning SVM 86.8
Inception-V3 + Fine-tuning Bidirectional 90.5
Inception-V3 + Fine-tuning + Richer LSTM 91.3
multilevel features LSTM
Methods xxx (xxxx) xxx–xxx
Moreover, the complexity of the models proposed later is relatively high, which has a great promoting effect on the learning ability of the algorithm. From the experimental results, Google's Inception-V3 and ResNet  achieved almost the same results, but considering the computational efficiency and low parameter count advantages of Google's Inception, it may be suitable for high-resolution pathological image classification tasks. Therefore, we finalized the use of Google's Inception as our method for the model of image feature extraction.