Deep Learning-based Differentiation of Idiopathic Granulomatous Mastitis from Malignant Non-mass Enhancement Using Breast Magnetic Resonance Imaging

Filiz Tasci; Esat Kaba; Mahmut Nedim Ekersular; Ahmet Alkan; Huseyin Er; Nur Hursoy

doi:10.4274/haseki.galenos.2026.58561

Abstract

Aim

Idiopathic granulomatous mastitis (IGM) is a benign, chronic inflammatory disease of the breast, and its imaging findings may overlap with those of malignant non-mass enhancement (NME). This study aimed to investigate the performance of deep learning and machine learning models in differentiating IGM from malignant NME based on dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI).

Methods

In this retrospective study conducted between January 2019 and March 2023, DCE-MRI findings of 30 patients with histopathologically confirmed IGM and of 33 patients with breast cancer presenting as NME were analyzed. The second dynamic phase of DCE-MRI (Dataset 1, 475 images) and the corresponding subtracted images (Dataset 2, 402 images) were used in this study. Datasets were sequentially split into 80% training and 20% testing sets to ensure a patient-level split. Image features were extracted using SqueezeNet and classified with a narrow neural network.

Results

The mean age was significantly lower in the IGM group than in the NME group (41.3±11.3 vs. 52.2±11.4 years, p<0.001). For Dataset 1, the area under the curve was 0.997 in training and 0.870 in testing; for Dataset 2, the area under the curve was 0.998 in training and 0.807 in testing. Training accuracy was 0.984 (Dataset 1) and 0.978 (Dataset 2), whereas test accuracy was 0.811 (Dataset 1) and 0.704 (Dataset 2).

Conclusion

The findings of this study suggest that deep learning shows significant promise for non-invasive differentiation of IGM from malignant NME on DCE-MRI, particularly in cases that are clinically indistinguishable.

Keywords:

Granulomatous mastitis, breast neoplasms, magnetic resonance imaging, deep learning

Introduction

Idiopathic granulomatous mastitis (IGM), also known as granulomatous lobular mastitis, is a rare, benign, recurrent, and chronic inflammatory disease of the breast. It predominantly affects premenopausal women with a history of pregnancy and lactation (1, 2). The disease is characterized by the formation of non-necrotizing granulomas involving the breast lobules, often accompanied by microabscesses, without evidence of microbial infection. Despite being a benign condition, IGM represents a significant diagnostic challenge because it can clinically and radiologically mimic infectious mastitis, inflammatory breast carcinoma, and other granulomatous diseases (3, 4). Therefore, accurate diagnosis often requires a multidisciplinary approach involving radiologists, pathologists, and clinicians to avoid diagnostic delays and unnecessary interventions. A comprehensive evaluation typically includes imaging, clinical correlation, and histopathological examination, the latter remaining the gold standard for confirming IGM (5, 6).

Magnetic resonance imaging (MRI), particularly dynamic contrast-enhanced MRI (DCE-MRI), plays a pivotal role in determining the extent of inflammation, detecting associated complications such as abscesses or fistulas, and differentiating IGM from other breast pathologies by providing detailed soft-tissue characterization and vascular enhancement patterns (7). Dynamic contrast-enhanced MRI enables evaluation of lesion distribution, enhancement characteristics, and kinetic curves, which are critical for breast lesion assessment. However, DCE-MRI findings in IGM may overlap with those of invasive malignancies and present as non-mass enhancement (NME), closely mimicking malignant NME patterns. This overlap can significantly complicate differential diagnosis, leading to misinterpretations and unnecessary biopsies (7, 8).

We hypothesized that applying deep learning and machine learning models to breast MRI could effectively differentiate IGM from NME, thereby improving diagnostic accuracy in challenging cases. Accordingly, this study aimed to evaluate the diagnostic performance of these models in differentiating IGM from malignant NME. Thus, this approach may reduce diagnostic uncertainty, support clinical decision-making, and potentially prevent unnecessary invasive procedures.

Materials and Methods

Compliance with Ethical Standards

The study was conducted in accordance with the ethical principles outlined in the Declaration of Helsinki, and approval was obtained from the Non-Interventional Clinical Research Ethics Committee of Recep Tayyip Erdogan University Faculty of Medicine (approval no: 2023/107, date: 30.03.2023). Written informed consent was obtained from all patients before undergoing breast DCE-MRI.

Study Design and Datasets

This retrospective study was conducted at a tertiary radiology clinic between January 2019 and March 2023 and included 30 female patients (mean age: 41.30 years, range 25 to 69 years) with ultrasonography and DCE-MRI findings suggestive of IGM, which were pathologically confirmed as IGM, and 33 female patients (mean age: 52.21 years, range 34 to 82 years) with DCE-MRI findings suggestive of malignant NME, pathologically confirmed as breast cancer. The exclusion criteria were poor image quality, suboptimal contrast timing, and unconfirmed pathological diagnoses. After applying these criteria, a total of 63 patients were included in the study. Two datasets were designed using MRI images from these patients: Dataset 1 comprised T1-weighted (T1W) contrast-enhanced images from the second phase of the patients’ dynamic breast MRI, classified into two categories. This dataset included 30 patients in the IGM class and 33 in the NME class, for a total of 475 images (IGM=237, NME=238). Dataset 2 was created from subtracted images obtained from the same dynamic series of patients as in dataset 1, maintaining the same classifications, and included a total of 402 images (IGM=194, NME=208). The datasets utilized in this study exhibited a relatively equitable distribution between the IGM and NME classes, mitigating the potential influence of class imbalance on model training and assessment. The diagnoses of all the patients included in the study were pathologically confirmed.

Magnetic Resonance Imaging Parameters

Magnetic resonance imaging was performed using a 3.0-T magnetic resonance device (GE Healthcare Discovery MR750, Waukesha, WI, USA) with a 16-channel dedicated breast coil. The patients were positioned prone, with the breasts placed within the breast coil. A survey sequence was followed, for both breasts, by an axial T1W sequence and a fat-saturated T2-weighted (T2W) fast spin-echo sequence before contrast administration to avoid signal alteration due to the injected gadolinium. For DCE-MRI, the contrast agent gadobutrol (Gadovist, Bayer Schering Pharma, Berlin, Germany) was injected as a 0.1 mmol/kg bolus at a flow rate of 2 mL/s. After the injection, six phases of volume imaging for breast evaluation (VIBRANT-Flex) were employed, with approximately 60-s intervals between each phase and a total scanning of 410 s (repetition time, 3.9 ms; shortest echo time; flip angle, 12; field of view, 360-360 mm; matrix, 320-320; and layer thickness, 1.4 mm).

Classification

Datasets 1 and 2 were each divided into two groups, with the first 80% of the images in each dataset reserved for classifier training and validation and the remaining 20% allocated to classifier testing. Rather than randomly selecting images for an 80-20% split, images were selected from the beginning, allowing the classifier to evaluate patient slices it had not seen before during the testing phase. The datasets were sequentially split into 80% training and 20% testing sets to ensure a patient-level split. To prevent possible data leakage from multiple slices from the same patient, the dataset was divided at the patient level instead of the image level. All of a patient’s images were put into either the training set or the test set so that no images from the same patient were in both sets.

Within the training set, five-fold cross-validation was performed to optimize model performance and reduce the risk of overfitting. The dataset was partitioned into five mutually exclusive subsets. The model was trained on four subsets and validated on the remaining subset; this process was repeated five times so that each subset served once as the validation set (9). The independent test set was not involved in the cross-validation process.

A hybrid framework combining deep learning–based feature extraction and machine learning–based classification was adopted. For each image, 1,000 deep features were extracted using the pre-trained convolutional neural network SqueezeNet, which was originally trained on over one million images from the ImageNet dataset and consists of 68 layers. The network accepts input images of 227×227 resolution and extracts from the pool10 layer, yielding a 1×1×1000 feature vector (10).

In the classification phase of the feature vectors extracted by the deep convolutional neural network (SqueezeNet), a structured narrow neural network architecture was preferred. Architecturally, this network is a feedforward multilayer perceptron with a single hidden layer. The architectural features and parameters of the network were configured as follows:

Input layer: It is designed to directly accept the 1*1000-dimensional feature vector extracted from the pool10 layer of SqueezeNet, which represents a patient’s image.

Hidden layer: To keep the computational complexity of the model low and to prevent overfitting of high-level features extracted from the already-deep network (SqueezeNet), only 10 neurons were used in the hidden layer. The Rectified Linear Unit [ReLU, f(x)=max(0, x)], which prevents the vanishing gradient problem, was preferred as the activation function for the neurons in this layer.

Output layer: Since a binary classification problem (IGM and NME) was addressed, the output layer consists of 2 neurons. In this layer, the Softmax activation function was used to calculate the probability distribution of the features belonging to each class. During model training, the cross-entropy loss function was minimized to reduce error.

This narrow and shallow network structure offers superior performance in modeling non-linear relationships in high-dimensional data obtained from deep networks compared with traditional classifiers such as support vector machines and k-nearest neighbors, while eliminating the high hardware and time costs required by multi-layer deep networks (11). The overall workflow of the study is summarized in the flowchart presented in Figure 1. In addition, Figure 2 illustrates representative DCE-MRI findings from patients in the dataset.

Statistical Analysis

Statistical analyses were performed using the Statistical Package for the Social Sciences (SPSS Inc., Chicago, IL) for Windows, version 23.0, and Python libraries, including NumPy, Pandas, Scikit-learn, and Matplotlib. The difference in age between the IGM and NME groups was evaluated using the Mann-Whitney U test. Age differences among NME pathological subtypes were analyzed using the Kruskal-Wallis test. Descriptive statistics were reported as mean ± standard deviation or median (minimum-maximum), as appropriate. Statistical significance was defined as a two-tailed p-value <0.05. The model’s performance metrics were calculated separately for the validation and test datasets. Sensitivity, specificity, accuracy, precision, and F1-score were calculated from the components of the confusion matrix. Ninety-five percent confidence intervals (95%) for performance metrics were calculated using the Wilson score method. Receiver operating characteristic (ROC) curves and area under the curve (AUC) values were generated to assess discriminative performance.

Results

The mean age of the 30 patients with IGM was 41.30±11.31 (age range: 25-69) years, and that of the 33 patients with NME was 52.21±11.43 (age range: 34-82) years (p<0.001) (Table 1). The distribution of the pathology diagnoses of the patients with breast cancer in the NME class was as follows: invasive lobular carcinoma (n=12, mean age: 56.42±12.56 years), invasive ductal carcinoma (n=9, mean age: 52.78±13.48 years), invasive micropapillary carcinoma (n=7, mean age: 50.14±5.55 years), and ductal carcinoma in situ (n=5, mean age: 44.0±7.07 years) (p=0.198) (Table 2).

Confusion matrices and ROC graphs of the narrow neural network classifier were obtained to evaluate the classification performance on Datasets 1 and 2. These evaluations were also undertaken separately for the validation and test images. The confusion matrices for the images in Dataset 1 are shown in Figure 3. The confusion matrix illustrating the classification performance on the validation data (Figure 3a) indicates that the narrow neural network classifier correctly classified 186 of the 190 IGM images; four were misclassified as NME. The same matrix shows that the classifier correctly classified 188 of the 190 NME images as NME, with the remaining two misclassified as IGM. Examination of the confusion matrix for the test set (Figure 3b) of the same dataset shows that 38 of the 47 IGM images were correctly classified as IGM, and nine of the 48 NME images were incorrectly classified as IGM.

Figure 4 presents the classification values for the images included in Dataset 2. The confusion matrix for the validation images (Figure 4a) shows that the classifier correctly identified 152 of 155 IGM images (three misclassified as NME) and 162 of 166 NME images (four misclassified as IGM). The confusion matrix for the test set of Dataset 2 (Figure 4b) shows that 34 of the 42 NME images were correctly classified as NME, and eight were misclassified as IGM.

Figure 5 shows the ROC curves and the AUC values for the validation phase of Datasets 1 and 2, where the positive class is defined as IGM. On the ROC curve, the true positive rate is plotted on the y-axis, and the false positive rate is plotted on the x-axis to illustrate classifier performance. The AUC indicates the classifier’s ability to distinguish between classes. An AUC value approaching 1 indicates that the classifier is highly successful in differentiating between classes. For Datasets 1 and 2, the validation phase yielded AUC values of 0.997 and 0.998, respectively. These high AUC values demonstrate that the narrow neural network classifier successfully predicted class membership.

Figure 6 presents the ROC plots and AUC values obtained from the test images. For the classification of the images included in Datasets 1 and 2, which were allocated for testing, the AUC values were 0.870 and 0.807, respectively.

Other measurement values that show the classification performance of the narrow neural network classifier during the validation phase for both datasets are given in Table 3. For this phase, the accuracy values of Datasets 1 and 2 were 0.984 and 0.978, respectively.

Table 4 shows the classification performance metrics for the two datasets derived from images allocated for testing. The accuracy values achieved during the test phase were 0.811 for Dataset 1 and 0.704 for Dataset 2.

Discussion

This study explored the role of a hybrid model combining deep learning and machine learning as a supportive non-invasive decision-support tool that may assist radiologists in the differential diagnosis of IGM and malignant NME on breast MRI. Using DCE-MRI contrast-enhanced and subtraction images, a hybrid deep learning-machine learning model achieved an AUC of 0.870 and an accuracy of 0.811 on the contrast-enhanced images, with lower performance observed on subtraction images (AUC 0.807; accuracy 0.704). Overall, these findings support the potential of DCE-MRI-based hybrid deep-learning and machine-learning approaches for non-invasive differentiation between IGM and malignant NME. The model performed exceptionally well during training, but there was a drop in performance when it was tested on an independent dataset. This difference could be due to overfitting in the model, which is a common problem when using small datasets to train deep learning models.

Breast DCE-MRI plays a central role in detecting lesions presenting as NME, where enhancement morphology and distribution patterns are critical for evaluation (12, 13). However, differentiating IGM from malignancy remains challenging, particularly when lesions present as NME, due to considerable imaging overlap (14-16). Soylu Boy (17) reported that although certain MRI features may help in differentiation, the specificity of NME remains limited, and histopathological confirmation is often required.

This diagnostic uncertainty has led to interest in developing artificial intelligence-based classification approaches that aim to improve diagnostic accuracy and reduce dependency on invasive procedures. For example, Zhou et al. (18) demonstrated that a deep learning model achieved diagnostic performance comparable to that of radiologists in differentiating inflammatory breast conditions from malignancy on ultrasound imaging, highlighting the potential of AI as a supportive decision-making tool in complex breast imaging scenarios.

Only a limited number of studies have directly addressed the classification of IGM and malignant NMEs using artificial intelligence applied to DCE-MRI. In one of these studies, Kayadibi et al. (19) investigated the differentiation of IGM and malignant NME using machine learning-based approaches. In this two-center study of 178 patients with NME on breast MRI (69 IGM and 109 breast cancer cases), the authors evaluated clinical models, radiomics models, and combined clinical-radiomics models. Compared with radiologists’ interpretation (AUC in training, 0.740; in testing, 0.737), the combined clinical-radiomics model achieved the highest diagnostic performance (AUC 0.979 in training and 0.942 in testing). The study demonstrated that integrating radiomics features with clinical parameters significantly improves the discrimination between IGM and malignant NME, highlighting the complementary role of machine learning in radiological assessment.

Unlike our study, Kayadibi et al. (19) incorporated multiple MRI sequences, including T2W imaging, apparent diffusion coefficient maps, and DCE-MRI. Although we focused on DCE-MRI sequences, to enhance variability and improve generalizability, we also included subtracted images, which are commonly used in routine clinical practice. Another key distinction of our study is the use of deep learning-based image analysis rather than radiomics-based feature extraction, which offers a more practical and automated framework for addressing the same clinical question.

Study Limitations

Our study has some limitations. First, this was a retrospective, single-center study with a limited number of patients; multiple slices were obtained from each patient to expand the dataset. The potential risk of data leakage and overfitting arising from multiple samples from a single patient was mitigated by employing a sequential dataset-splitting strategy. Second, the NME group was heterogeneous with respect to pathological subtypes, and future studies could investigate each subtype in greater detail. Third, only the second dynamic phase and its subtracted images were analyzed, as these are commonly used in routine practice; studies incorporating a full MRI protocol (e.g., T2W imaging, diffusion-weighted imaging, or the full dynamic series) may provide more comprehensive insights. Fourth, this study did not include an external test set. Fifth, although the numbers of images in the IGM and NME groups were similar, they were not equal due to stringent exclusion criteria. No direct comparison with experienced radiologists was made. Another significant limitation of this study is the lack of an external validation cohort. To validate the generalizability and robustness of the proposed model, future multicenter studies utilizing independent external datasets are necessary.

Despite these limitations, this study has several strengths, including the use of pathologically confirmed cases and the incorporation of both subtracted and non-subtracted DCE-MRI images for model development. In addition, a practical and reproducible deep learning approach was used for image feature extraction. Furthermore, the use of cross-validation and independent internal testing strengthens the methodological framework and supports the reliability of the reported performance metrics.

Conclusion

In the present study, differentiation between IGM and malignant NME on breast MRI was achieved using a hybrid deep learning-based framework. The findings indicate that deep learning–assisted analysis of DCE-MRI may provide additional support for clinical decision-making in the differential diagnosis of IGM and malignant NME. Such approaches have the potential to reduce diagnostic uncertainty and may contribute to avoiding unnecessary invasive procedures in clinically equivocal cases. Nevertheless, these models should be regarded as complementary decision-support tools rather than standalone diagnostic systems, and further validation through larger multicenter prospective studies is required before routine clinical implementation.

Ethics

Ethics Committee Approval: The approval was obtained from the Non-Interventional Clinical Research Ethics Committee of Recep Tayyip Erdogan University Faculty of Medicine (approval no: 2023/107, date: 30.03.2023).

Informed Consent: Written informed consent was obtained from all patients before undergoing breast DCE-MRI.

Authorship Contributions

Concept: F.T., E.K., Design: F.T., E.K., Data Collection or Processing: F.T., E.K., M.N.E., A.A., H.E., N.H., Analysis or Interpretation: F.T., E.K., M.N.E., A.A., H.E., N.H., Literature Search: E.K., Writing: F.T., E.K., M.N.E., A.A., H.E., N.H.

Conflict of Interest: No conflicts of interest were declared by the authors.

Financial Disclosure: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Use of AI for Writing Assistance: During the revision of this work, the author(s) utilized ChatGPT 5.2 (OpenAI) solely for language editing and grammar correction. All outputs were carefully reviewed and revised by the authors to ensure accuracy and consistency. The authors take full responsibility for the content of the publication.

References

Han Y, Shi L, Zhang Y. Efficacy and tolerability of methotrexate for idiopathic granulomatous mastitis: a systematic review and meta-analysis. Breast J. 2026;2026:6710172.

CrossRef

Vercoe J, Sedaghat N, Brennan ME. Intralesional steroid injections for management of granulomatous mastitis: a systematic review of treatment protocols and clinical outcomes. Breast J. 2025;2025:2592366.

CrossRef PubMed Google Scholar

Dilaveri C, Degnim A, Lee C, DeSimone D, Moldoveanu D, Ghosh K. Idiopathic granulomatous mastitis. Breast J. 2024;2024:6693720.

CrossRef PubMed Google Scholar

Yazdipour N, Motamedfar A, Gharibvand MM, Fazelinezhad Z, Farhadi E. Diagnostic accuracy of breast ultrasonography parameters in idiopathic granulomatous mastitis patients: a cross-sectional study. Maedica (Bucur). 2025;20:721-8.

CrossRef PubMed Google Scholar

Shanbhag NM, Ameri MA, Shanbhag SN, Anandan N, Balaraj K, Bin Sumaida A. Diagnostic challenges and insights into granulomatous mastitis: a systematic review. Cureus. 2024;16:e75733.

CrossRef PubMed Google Scholar

Karami MY, Amestejani M, Zangouri V, et al. The effectiveness of local steroid injection for the treatment of breast-limited idiopathic granulomatous mastitis: a randomized controlled clinical trial study. Pol Przegl Chir. 2025;97:35-43.

Lyu S, Wang B, Xie T, et al. Multiparametric MRI for differentiating idiopathic granulomatous mastitis from invasive breast cancer: improving radiologists’ diagnostic accuracy. Eur J Radiol. 2025;184:111958.

Yin L, Wei X, Zhang Q, et al. Multimodal ultrasound assessment of mass and non-mass enhancements by MRI: diagnostic accuracy in idiopathic granulomatous mastitis and breast cancer. Breast. 2024;78:103797.

CrossRef

Stone M. Cross-validation: a review. Series Statistics. 1978;9:127-39.

Sunnetci, K. M., Ulukaya, S. Alkan, A. Periodontal bone loss detection based on hybrid deep learning and machine learning models with a user-friendly application. Biomedical Signal Processing and Control. 2022;77:103844.

MathWorks. Choose classifier options – neural network classifiers. MATLAB Documentation. Natick (MA): The MathWorks Inc; 2025. Available from: https://www.mathworks.com/help/stats/choose-classifier-options.html. Accessed: 2025 May 10.

Newell D, Nie K, Chen JH, et al. Selection of diagnostic features on breast MRI to differentiate between malignant and benign lesions using computer-aided diagnosis: differences in lesions presenting as mass and non-mass-like enhancement. Eur Radiol. 2010;20:771-81.

Zhao Q, Xie T, Fu C, et al. Differentiation between idiopathic granulomatous mastitis and invasive breast carcinoma, both presenting with non-mass enhancement without rim-enhanced masses: the value of whole-lesion histogram and texture analysis using apparent diffusion coefficient. Eur J Radiol. 2020;123:108782.

Qu N, Luo Y, Yu T. Differentiation between clinically noninflammatory granulomatous lobular mastitis and noncalcified ductal carcinoma in situ using dynamic contrast-enhanced magnetic resonance imaging. Breast Care. 2020;15:619-27.

CrossRef PubMed Google Scholar

Wang L, Wang D, Fei X, et al. A rim-enhanced mass with central cystic changes on MR imaging: how to distinguish breast cancer from inflammatory breast diseases? PLoS One. 2014;9:e90355.

CrossRef

Azzam MI, Alnaimat F, Al-Nazer MW, et al. Idiopathic granulomatous mastitis: clinical, histopathological, and radiological characteristics and management approaches. Rheumatol Int. 2023;43:1859-69.

Soylu Boy FN. MR imaging evaluation of the volume changes and the signs of deformation in the breasts with granulomatous mastitis. Bosphorus Med J. 2022;9:127-131.

Zhou Y, Feng BJ, Yue WW, et al. Differentiating non-lactating mastitis and malignant breast tumors by deep-learning based AI automatic classification system: a preliminary study. Front Oncol. 2022;12:997306.

Kayadibi Y, Saracoglu MS, Kurt SA, et al. Differentiation of malignancy and idiopathic granulomatous mastitis presenting as non-mass lesions on MRI: radiological, clinical, radiomics, and clinical-radiomics models. Acad Radiol. 2024;31:3511-23.