MMD-Net: a weakly supervised solution for quantification of nonalcoholic fatty liver biopsies

Ming Yu; Tao Jiang; Hongsheng Zhou; Dean Ta

doi:10.21037/qims-2025-981

Original Article

MMD-Net: a weakly supervised solution for quantification of nonalcoholic fatty liver biopsies

Ming Yu¹ , Tao Jiang¹ , Hongsheng Zhou², Dean Ta¹

¹Department of Biomedical Engineering, School of Information Science and Technology, Fudan University, Shanghai, China; ²Institute of Advanced Ultrasonic Technology of National Innovation Center Par Excellence, Shanghai, China

Contributions: (I) Conception and design: M Yu, T Jiang; (II) Administrative support: H Zhou, D Ta; (III) Provision of study materials or patients: M Yu; (IV) Collection and assembly of data: M Yu; (V) Data analysis and interpretation: M Yu, T Jiang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Dean Ta, PhD. Department of Biomedical Engineering, School of Information Science and Technology, Fudan University, No. 220 Handan Rd., Shanghai 200438, China. Email: tda@fudan.edu.cn.

Background: Nonalcoholic fatty liver disease (NAFLD) affects 25% of the global population and is a leading cause of cirrhosis. Although liver biopsy remains the diagnostic gold standard, its clinical utility is limited by the labor-intensive Kleiner scoring system. Existing deep learning (DL) solutions face two critical barriers: pathologist-dependent annotations requiring equivalent time to manual assessment, and prohibitive costs for large-scale labeled datasets. The primary objective of this research was to establish a weakly supervised framework using multi-instance learning (MIL) for NAFLD assessment, aiming to reduce the annotation workload for pathologists while developing a clinically applicable diagnostic approach.

Methods: This study utilized a hepatocellular pathology dataset published by Heinemann et al. in July 2023 on the Open Science Framework (OSF) platform (accessible at osf.io/8e7hd). We established MMD-Net, a weakly-supervised framework integrating MIL with multi-task learning (MTL) to concurrently evaluate steatosis, inflammation, and ballooning. To quantitatively evaluate the effectiveness of our model performance, 5 common metrics were employed: accuracy, precision, recall, F1 score, and Cohen’s κ.

Results: The system achieved exceptional agreement with ground truth, demonstrating quadratic weighted Cohen’s κ coefficients of 0.932±0.004 (ballooning), 0.836±0.016 (inflammation), and 0.766±0.029 (steatosis), with mean κ=0.845±0.014.

Conclusions: This approach establishes a new paradigm for standardized NAFLD histopathological assessment while eliminating the need for pixel-level annotations. By doing so, it charts a promising path for artificial intelligence (AI)-powered histopathological analysis to standardized NAFLD assessment in clinical practice.

Keywords: Nonalcoholic fatty liver disease (NAFLD); NAFLD Activity Score (NAS); weakly-supervised; multi-instance learning (MIL); multi-task learning (MTL)

Submitted Apr 26, 2025. Accepted for publication Nov 13, 2025. Published online Dec 09, 2025.

doi: 10.21037/qims-2025-981

Introduction

Nonalcoholic fatty liver disease (NAFLD), the most prevalent chronic liver disorder worldwide with a global prevalence of 25%, has emerged as a leading etiological factor for cirrhosis and hepatocellular carcinoma (HCC). Substantial epidemiological evidence links the rising NAFLD incidence to modifiable lifestyle factors including physical inactivity, excessive caloric intake, and imbalanced dietary patterns (1). The disease pathogenesis originates from hepatic steatosis caused by triglyceride accumulation within hepatocytes. Clinically significant progression to inflammatory liver injury, termed nonalcoholic steatohepatitis (NASH), was observed in 20% of NAFLD patients in 2015. With aging as an independent risk factor, this proportion is projected to rise to 27% by 2030 due to population aging (2). As the principal driver of hepatic fibrosis (3,4) and HCC (5), NASH constitutes a major cause of liver-related morbidity and mortality.

Accurate histological differentiation of ballooning degeneration, steatosis, and inflammation is clinically pivotal in NAFLD diagnosis, directly impacting disease staging, prognosis, and therapeutic decisions. The NAFLD spectrum spans benign nonalcoholic fatty liver (NAFL) to high-risk NASH, with distinct prognoses: isolated steatosis (NAFL) carries a ≤2% 10-year cirrhosis risk (cardiovascular mortality predominates) (6), whereas NASH (steatosis + inflammation + ballooning) shows 10–15% cirrhosis progression and elevated liver-related mortality (SMR 4.1) (7). Crucially, this triad—particularly ballooning—distinguishes indolent NAFL from progressive NASH, guiding biopsy/management choices.

Steatosis (>5% hepatocyte triglyceride accumulation) signifies metabolic dysregulation and requires lifestyle intervention despite slow progression. Inflammation (lobular/portal lymphocytic/neutrophilic infiltration) indicates disease activity, driving progression to NASH, hepatocyte injury, and fibrogenesis. Ballooning degeneration (hepatocyte swelling, cytoplasmic rarefaction, Mallory-Denk bodies) (8) directly marks hepatocyte injury/death and is the strongest independent fibrosis predictor; combined with inflammation, it initiates fibrosis—the paramount prognostic indicator (9).

Quantification via the NAFLD Activity Score (NAS; steatosis 0–3, inflammation 0–3, ballooning 0–2) enables risk stratification: NAS ≥5 suggests NASH requiring intervention; ballooning ≥1 independently predicts fibrosis progression; grade 2–3 inflammation quadruples annual fibrosis progression. Thus, ballooning acts as a hepatocyte injury/fibrosis alarm, inflammation accelerates cirrhosis, and steatosis reflects metabolic burden. Precise quantification of these features refines patient stratification, preventing low-risk overtreatment while ensuring timely high-risk intervention to improve outcomes.

Early detection and timely intervention are crucial for reversing disease progression. Thus, the accurate assessment of hepatic steatosis severity and implementation of early interventions to block pathological progression hold significant clinical importance. Liver biopsy with histopathological evaluation remains the gold standard for NAFLD/NASH diagnosis and staging. Among various scoring systems, the methodology developed by Kleiner, Brunt, and the NASH Clinical Research Network (CRN) Pathology Committee stands as the most validated framework (9), which requires separate semi-quantitative assessments of steatosis (0–3), lobular inflammation (0–3), hepatocellular ballooning (0–2), and fibrosis (0–4) to calculate the NAS (9). NAS is calculated as the sum of the steatosis score, lobular inflammation score, and ballooning score, thus ranging from 0 to 8 (see Table 1). Traditional manual histopathological scoring systems exhibit two critical limitations. First, the specialized training required for hepato-pathological assessment has created a global shortage of qualified hepato-pathologists, constituting a profound inadequacy to meet escalating clinical demands (10). Second, substantial inter-observer and intra-observer variability undermines the reproducibility of histological evaluations, particularly in evaluating critical histological features such as ballooning degeneration and inflammatory activity. As demonstrated in the seminal 2005 study by Kleiner et al., inter-observer agreement for ballooning degeneration and inflammation yielded kappa values of 0.56 and 0.45, respectively. Even intra-observer consistency for ballooning assessment exhibited substantial temporal variation (κ=0.68), indicating subjective interpretation due to ambiguous morphological criteria (9). Current scoring protocols require pathologists to assign discrete ordinal classifications based on morphological characteristics, spatial distribution patterns, and zonal localization of pathological features. However, this visual interpretation-based methodology inherently introduces diagnostic discrepancies due to subjective assessment biases. Artificial intelligence (AI)-powered quantitative image analysis holds promise for overcoming these limitations through standardized, data-driven classification of disease severity. These constraints collectively highlight the urgent need for developing automated systems capable of delivering standardized, precise, and efficient evaluation of NAFLD/NASH histopathology. Our study specifically focused on deep learning (DL)-based automated quantification of these critical histopathological features.

Table 1

Nonalcoholic fatty liver disease activity score (9)

Histological component	Score 0	Score 1	Score 2	Score 3
Steatosis (hepatocyte fat accumulation)	<5%	5–33%	>33–66%	>66%
Lobular inflammation (inflammatory foci per 200× field)	No foci	<2 foci	2–4 foci	>4 foci
Ballooning (hepatocyte ballooning degeneration)	None	Few balloon cells	Many cells/prominent ballooning	–

DL-based approaches have recently achieved important breakthroughs in biopsy image analysis. Heinemann et al.’s [2022] InceptionV3-based stepwise Kleiner scoring system requiring separate sub-models for ballooning, inflammation, fibrosis, and steatosis feature extraction followed by artificial neural network (ANN) regression integration for assessing the progression of NAFLD/NASH. The system quantifies histopathological features such as ballooning, inflammation, steatosis, and fibrosis by analyzing microscope images of liver biopsy samples. The features are aggregated into continuous scores using ANNs, offering finer granularity compared to discrete pathologist scores. Validation on a dataset of 467 samples demonstrates that the automated system achieves high consistency with pathologist scores (11).

Multi-instance learning (MIL) has emerged as an effective framework for processing weakly-labeled whole slide images (WSIs) with slide-level annotations. To address multiclass classification challenges in histopathological analysis, recent methodological advancements include ReMix (Yang et al., 2022) (12), a stochastic augmentation methodology that enhances sample variability through instance level feature mixing. Concurrently, Lin et al. [2023] introduced interventional bag MIL (IBMIL) (13), an attention-based framework that improves discriminative feature learning via confounder-aware instance selection and cross-bag feature interaction.

Hashimoto et al. [2020] achieved breakthrough performance in lymphoma cancer subtype classification by integrating MIL for bag-level prediction, domain adversarial (DA) training with gradient reversal layers to mitigate staining variations, and multi-scale learning to capture tumor heterogeneity across spatial scales. Key innovations include: (I) a two-stage training strategy where single-scale DA-MIL networks [using pretrained convolutional neural networks (CNNs) such as VGG16] are first trained individually, followed by multi-scale feature fusion; (II) attention-based MIL to focus on diagnostically critical image patches. Validated on a 196-case malignant lymphoma dataset, the method outperforms conventional CNNs (achieving pathologist-level accuracy) and highlights tumor regions consistent with clinical observations. Its clinical relevance lies in mimicking pathologists’ multi-scale diagnostic workflows while enabling scalable analysis of unannotated whole-slide images (14).

Yan et al. [2022] proposed a Swin Transformer-based deep self-supervised framework integrated with residual modules to enhance model performance. To the best of our knowledge, this work pioneers the application of Swin Transformer in hepatic histopathological analysis. WSIs were processed through dual-scale patch cropping with systematic quantification of four diagnostic features (15).

Yin et al. [2024] developed a primal-dual graph architecture to explicitly model spatial interactions between vascular systems and fibrotic matrices in hepatic fibrosis histopathology. By constructing a vascular network-derived primal graph and a fibrosis region-induced dual graph, this framework systematically extracts topological features of fibrosis-related structures in WSIs. The specifically designed primal-dual graph convolutional module enables independent characterization of vascular morphological patterns and fibrotic distribution features, while establishing their pathological correlation model (16).

Junaid et al. [2025] proposed a two-stage DL framework for molecular subtype classification (basal-like vs. classical) of pancreatic ductal adenocarcinoma (PDAC) using routine hematoxylin and eosin (H&E)-stained histopathology slides. It first employs a CNN to localize tumor regions in WSIs, then evaluates four MIL architectures to integrate local morphological features (e.g., glandular patterns) for subtype prediction. Validated on both The Cancer Genome Atlas-pancreatic adenocarcinoma (TCGA-PAAD) data (97 slides) and an external biopsy cohort (44 patients, 110 slides), the approach demonstrates generalizability and provides interpretable decision patterns through Grad-CAM visualization, bridging histopathological features with molecular subtypes for clinical application (17).

Lu et al. [2024] developed CONtrastive learning from Captions for Histopathology (CONCH), a vision-language foundation model specifically developed for histopathological analysis. Trained on over 1.17 million image-text pairs aggregated from multiple sources of histopathology images and biomedical captions through task-agnostic pretraining, CONCH demonstrates exceptional multimodal capabilities across diverse benchmarks. The model achieves state-of-the-art performance in image classification, segmentation, caption generation, as well as cross-modal retrieval tasks including text-to-image and image-to-text search (18).

Shabanian et al. [2025] investigated the feasibility of leveraging clustering-constrained attention multiple instance learning (CLAM) (19), a weakly supervised learning method, for staging liver fibrosis on trichrome-stained WSIs in children and young adults. Through a retrospective analysis, 217 trichrome-stained WSIs from pediatric liver biopsies were collected and independently scored by two pediatric pathologists using both METAVIR and Ishak fibrosis staging systems. The cases were stratified into low- and high-fibrosis stages, and a binary classification model was subsequently developed using the CLAM pipeline to distinguish between these stages (20).

Ahmadvand et al. [2024] developed a two-stage DL strategy to classify molecular subtypes of PDAC using routine H&E-stained histopathology slides. Firstly, a CNN was trained to automatically localize tumor regions in WSI, effectively excluding interference from normal tissues. Subsequently, the researchers evaluated four DL architectures (Vanilla, IDaRS, DeepMIL, and VarMIL) on the identified tumor regions, integrating local image features through an MIL strategy to construct a classification model distinguishing between basal-like and classical subtypes. Trained on the TCGA-PAAD dataset (97 slides), the model demonstrated robust generalization capability when validated on an external cohort of 44 patients (110 biopsy slides) (21).

Addressing the limitations of existing methodologies—exemplified by Heinemann et al.’s [2022] InceptionV3-based stepwise Kleiner scoring system requiring separate sub-models for ballooning, inflammation, fibrosis, and steatosis feature extraction followed by ANN regression integration (11)—our approach resolves two critical operational constraints:

Redundant tile-level feature prediction and aggregation procedures on WSIs, typically 40,000×40,000 pixels), which incur significant time overhead;
Prohibitive annotation costs associated with tile-level labeling of gigapixel WSIs.

Building upon existing research, we were inspired to integrate multiple advanced technologies to address the labor-intensive and time-consuming challenges associated with manual interpretation and annotation in the quantitative assessment of hepatic histopathological images for NAFLD patients. Multi-task learning (MTL) enables joint feature learning through shared representations (22), and demonstrates proven efficiency gains in computer vision (23), natural language processing (NLP) (24), and speech recognition (25) through multi-objective optimization.

MIL, as a weakly-supervised paradigm, circumvents instance-level labeling requirements by operating on diagnostically-labeled bags of image tiles (26,27), and handles weakly-labeled WSI data through bag-level supervision. DA modules mitigate domain shifts caused by staining variations. Experimental validation confirms the framework’s superior efficacy.

This methodological advancement provides an efficient and reliable solution for automated NAS quantification, showing substantial potential to accelerate histopathology-based NAFLD/NASH research and clinical translation. The major contributions of this work include:

Development of a novel end-to-end network architecture dedicated to NAS quantification, which streamlines computational workflows while achieving competitive performance metrics;
Construction of an innovative instance bag dynamic sampling mechanism specifically designed to address class-imbalanced data distribution;
The weakly supervised nature of the proposed MMD-Net model streamlines the annotation workflow in dataset construction, enabling researchers to selectively label a subset from hundreds to thousands of image patches generated from a single WSI.

We present this article in accordance with the TRIPOD+AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-981/rc).

Methods

We present MMD-Net, a weakly supervised framework integrating MIL with MTL for the concurrent assessment of three key histopathological features in NAFLD: ballooning degeneration, inflammation, and steatosis. The method requires only bag-level annotations for WSIs, eliminating the need for pixel- or instance-level labeling and significantly reducing annotation costs. Figure 1 presents the dataset preparation process and an overview of the workflow in this study.

Figure 1 Overview of dataset preparation and the overall research workflow. (A) Dataset preparation, addressing class imbalance through data augmentation and balanced sampling. (B) Training phase of the MMD-Net network. (C) Preparation of slide-level test datasets to validate the model’s potential for clinical application. (D) Prediction on both validation and test sets using the pre-trained model. (E) Evaluation of experimental results. WSI, whole slide image.

Figure 2 illustrates the histopathological data preprocessing workflow.

Figure 2 Schematically illustrates the histopathological data preprocessing: (A) tiles extracted from the WSI with gigapixel resolution, (B) the tiles are categorized into positive and negative groups based on pathologists’ annotations, with the negative group including those tiles labeled as “Ignore”, and (C) given the substantially larger size of the negative group compared to the positive group, a two-phase bag construction process is implemented to ensure balanced distribution of positive and negative samples. In the primary phase, tissue sections are selected from the positive group complemented with randomized sampling from the negative group to assemble positive sample bags. The secondary phase then performs proportional matching sampling for negative sample bags based on the quantity of positive bags generated in the initial phase. WSI, whole-slide image.

Problem formulation

We constructed a multi-task classification model on a liver histopathology dataset comprising N patients, with three core subtasks: ballooning grading (3-class), lobular inflammation assessment (4-class), and steatosis quantification (4-class). We employed an MIL framework to process high-resolution WSIs, where each WSI is partitioned into multiple tiles and organized into bags, using only bag-level weak labels for training. Detailed mathematical notation, dataset representation, and formal definitions of the MIL framework are provided in Appendix 1.

Class imbalance handling

The experiment exhibits severe class imbalance. Taking ballooning degeneration as an example (similar distributions observed in inflammation and steatosis), there are only 796 (or 1.5%) positive patches versus 29,913 (or 55.6%) negative samples, with 23,094 (or 42.9%) “ignore”-labeled patches excluded due to quality issues. To address sampling bias, we implemented:

Generative data augmentation: apply Stable Diffusion v2.1 for semantics-preserving enhancement of positive patches (Figure 3);
Dynamic priority sampling: positive-first selection during bag construction.

Figure 3 The original patches and the augmented patches by Stable Diffusion v2.1. Cells were stained with two variants of Masson’s stain (Masson Trichrome and Masson Goldner). Scale bar =50 µm.

The detailed methodology is provided in Appendix 2.

Through data augmentation and class-balancing strategies under computational constraints, a refined dataset of 3,957 samples was established. As shown in Figure 4, the dataset demonstrated balanced inter-class distribution across three pathological features.

Figure 4 The balanced inter-class distribution across three pathological features.

Network

MMD-Net comprises three core components (Figure 5):

Feature extractor: a CNN that maps 299×299 pixel images to feature vectors;
Label predictor: an attention-based neural network with three independent parameter sets for different pathological features;
Domain predictor: a fully-connected network that maps features to domain label probabilities.

Figure 5 The topological structure of the innovative network architecture: (I) multi-instance bags generate deep features hi through the feature extractor G_f, which are then fed into the category discriminator G_y and domain discriminator G_d via a parallel dual-channel architecture; (II) a dynamically weighted loss function system is constructed, where L_total serves as the primary optimization objective for class labels, and L_d acts as the domain invariance constraint; (III) a GRL is introduced to achieve adversarial feature alignment. GRL, gradient reversal layer.

The network employs an adversarial training strategy, achieving domain-invariant feature learning through a gradient reversal layer. Detailed network structure, parameter settings, and attention mechanism formulas are provided in Appendix 3.

Experimental setup

Dataset

This study utilized a publicly available hepatocellular pathology dataset originally published by Heinemann et al. (11) on the Open Science Framework (OSF) platform (accessible at osf.io/8e7hd). The dataset comprises a retrospective collection of 467 clinical liver biopsy specimens sourced from three independent institutions, incorporating diversity in digital slide scanners and staining protocols (H&E and Sirius red). The data distribution across the contributing centers is as follows:

Duke University Medical Center (Durham, NC, USA): 338 whole-slide digital pathology images;
Institute of Pathology, Hannover Medical School (Germany): 72 specimens;
Boehringer Ingelheim Biorepository (Biberach, Germany/Ridgefield, CT, USA): 57 specimens provided by Discovery Life Sciences (Huntsville, AL, USA).

The dataset includes two primary biopsy types:

Wedge biopsies: characterized by substantial tissue volume (typical dimension: 0.8 cm edge length) and containing multiple intact portal tracts;
Needle biopsies: typically 1–2 cm in length, presenting 6–10 representative portal triads.

All slides were stained using standardized protocols for either Masson-Goldner trichrome or Masson trichrome and were digitized using either a Leica Aperio AT2 whole-slide scanner (Leica Biosystems, Wetzlar, Germany) or a Carl Zeiss Axioscan Z1 digital pathology system (Carl Zeiss, Jena, Germany). Scanning parameters were standardized to bright-field imaging mode with a 20× objective resolution (0.5 µm/pixel).

Expert pathologists manually annotated four key histological features: steatosis, ballooning, inflammation, and fibrosis. To enhance scoring consistency, steatosis grades were further calibrated using a U-Net-based automated segmentation system to mitigate potential systematic bias.

From the initial 467 specimens, a subset of 282 cases with complete histopathological annotations was selected as the final dataset for analysis. The remaining 185 cases were excluded due to incomplete labels or image quality concerns. The curated dataset maintains cross-center staining protocol consistency, structural integrity of portal tracts, and full traceability of histopathological grading labels. Despite the multicenter design enhancing scanner and stain diversity, the dataset presents certain constraints, including quality heterogeneity (with staining artifacts proactively excluded), label incompleteness (specifically, missing inflammation/ballooning scores in some subsets), and a relative scarcity of high-grade cases. Furthermore, fibrosis-annotated slides were excluded from MIL bag construction due to incompatible magnification scales with the specific requirements for NAS scoring.

Evaluation metrics

To quantitatively evaluate the effectiveness of our model performance, 5 common metrics were employed: the accuracy, precision, recall, F1 score, and Cohen’s κ. The formulas corresponding to each metric are as follows:

$Accuracy = (TN + TP) / (TN + TP + FN + FP)$ [1]

$Precision = TP / (TP + FN)$ [2]

$Recall = TN / (FP + TN)$ [3]

$F 1 score = (2 \times precision \times recall) / (precision + recall)$ [4]

Where TP, FP, TN, and FN indicate true positive, false positive, true negative, and false negative, respectively.

$Cohen's κ = \frac{P_{o} - P_{e}}{1 - P_{e}}$ [5]

Where $P_{o} = observed agreement$ , $P_{e} = expected agreement by chance$ .

Training strategy

We employ a multi-task loss combining weighted cross-entropy terms, with the optimization objective including bag-level classification loss and attention-weighted domain adaptation regularization. The training process utilizes dynamic domain regularization parameters and an early stopping strategy. The complete training algorithm pseudocode is provided in Appendix 4, and hyperparameter settings are detailed in Appendix 5.

All experiments in this study were conducted on a workstation equipped with an NVIDIA RTX 5080 GPU (16 GB VRAM; NVIDIA, Santa Clara, CA, USA) and an Intel^® Core™ Ultra9 285K (3.7 GHz; Intel, Santa Clara, CA, USA) CPU. The operating system was Windows 11 Home Chinese Edition (Microsoft, Redmond, WA, USA), with PyTorch 2.7.1 (Meta, Menlo Park, CA, USA) as the DL framework and Python 3.11.13 (Python Software Foundation, Wilmington, DE, USA) as the programming language. GPU acceleration was enabled via CUDA 12.8 (NVIDIA).

In the ablation study protocol, the dataset was stratified into training, validation, and test subsets through randomized partitioning (6:2:2 ratio, n=3,957 samples). For comparative analysis of backbone network efficacy, we implemented 5-fold cross-validation with randomized stratified sampling, ensuring each fold maintained equivalent class distribution (792±1 samples per fold).

The model was trained for 100 epochs with early stopping patience set to 10. Dropout with a rate of 0.5 was applied during class prediction. The dynamic domain regularization parameter was calculated as:

$λ = \frac{2}{1 + \exp (- 10 r)} - 1$ [6]

Where $λ = \frac{current epoch m}{total epochs M} \times α$ . The hyperparameter α was optimized through grid search on the validation set.

The datasets analyzed during the study are available in the OSF repository, https://osf.io/8e7hd/.

Results

Table 2 compares the experimental results of DA, MIL, and their multi-task enhanced variant (MTL-DA-MIL). The baseline DA-MIL employs a single-feature training protocol for its classification head (iteratively training individual feature channels) while maintaining identical hyperparameter configurations to MTL-DA-MIL. It should be noted that due to the MTL approach employed in the proposed model, the three features (ballooning degeneration, inflammation, and steatosis) failed to converge optimally concurrently during training. A common observation was that ballooning degeneration tended to overfit while inflammation and steatosis remained underfit. Given that the primary objective of this work was the calculation of the NAS score, the experimental focus was primarily directed towards optimizing the average performance across these three features. The comparative results indicate that the multi-task cooperative training approach leads to quantifiable enhancements across all assessed metrics. Specifically, as shown in the table, integrating MTL yielded performance gains in all five metrics across three experimental runs compared to DA-MIL. Furthermore, the comparison between MTL-MIL and DA-MIL underscores that MTL contributes a more significant performance improvement than does DA. Notably, although results for MTL-DA-MIL (3 tasks), comprising three primary tasks (ballooning, inflammation, and steatosis prediction), show marginal improvements in four metrics relative to MTL-MIL, the recall rate decreased from 0.8107 to 0.8072. However, the performance of MTL-DA-MIL (4 tasks), which introduces an auxiliary task to determine the presence of ignored patches within a MIL bag alongside these primary tasks, relative to MTL-DA-MIL (3 tasks) was suboptimal, exhibiting declines in three metrics. We attribute this to the limited relevance of the fourth supplementary task (determining whether bags contain patches annotated as ‘ignored’) to the primary objectives. This observation suggests that task selection within MTL should prioritize those demonstrating strong correlation with the core task. The confusion matrices in Figure 6 demonstrate the classification performance of MMD-Net (MTL-DA-MIL, 4 tasks, ConvNeXt backbone) on three critical histopathological features (ballooning, inflammation, and steatosis) within the test set.

Table 2

Comparison of the performance differences among DA-MIL, MTL-MIL, and MTL-DA-MIL architectures under a unified network framework (ConvNeXt backbone) across five critical metrics: classification accuracy, precision, recall, F1 score, and Cohen’s κ coefficient. Experimental results demonstrate that across three trials, the model integrating MTL exhibits measurable improvements over the baseline DA-MIL model in all five evaluation metrics

Method	Feature	Accuracy	Precision	Recall	F1 score	Cohen’s kappa
DA-MIL	Ballooning	0.9295	0.9308	0.9092	0.9272	0.9549
	Inflammation	0.7976	0.7964	0.7312	0.7918	0.7185
	Steatosis	0.7421	0.7625	0.6782	0.7406	0.6376
	Mean	0.8231	0.8299	0.7729	0.8199	0.7703
MTL-MIL	Ballooning	0.946	0.9404	0.9376	0.9456	0.9617
	Inflammation	0.8426	0.8316	0.7885	0.8383	0.7709
	Steatosis	0.7361	0.7226	0.706	0.7343	0.6196
	mean	0.8416	0.8315	0.8107	0.8394	0.7841
MTL-DA-MIL (3 tasks)	Ballooning	0.9414	0.9418	0.9265	0.9404	0.9593
	Inflammation	0.8123	0.7792	0.7651	0.8097	0.7479
	Steatosis	0.7718	0.7577	0.7301	0.7706	0.6964
	mean	0.8418	0.8262	0.8072	0.8402	0.8012
MTL-DA-MIL (4 tasks)	Ballooning	0.94	0.9396	0.94	0.9394	0.9579
	Inflammation	0.8201	0.8217	0.8201	0.8198	0.7805
	Steatosis	0.7631	0.7629	0.7631	0.7608	0.6316
	Mean	0.8411	0.8414	0.8411	0.84	0.79

DA, domain adversarial; MIL, multi-instance learning; MTL, multi-task learning.

Figure 6 The confusion matrices demonstrate the classification performance of MMD-Net on three critical histopathological features (ballooning, inflammation, and steatosis) within the test set. The class labels in the figure represent the scoring grades for ballooning (0–2), inflammation (0–3), and steatosis (0–3).

To identify the optimal neural architecture for NAFLD NAS scoring, this study implemented a 5-fold cross-validation approach to systematically compare multiple classical feature extraction networks, including VGG16 (28), ResNet50 (29), MobileNet V3 (30), ViT (31), ConvNeXt (32), and Swin Transformer (33). Notably, ResNet50 and MobileNet V3 were excluded from the final comparative analysis due to their suboptimal performance metrics (demonstrating 35–39% lower classification accuracy and 0.44–0.53 reduction in F1 scores compared to top-performing architectures). As shown in the systematic comparison in Table 3, the selection of backbone architectures does impact network performance. The experimental results demonstrate that Swin Transformer achieves state-of-the-art accuracy, whereas ConvNeXt, ViT, and VGG16 exhibit moderately competitive performance. This performance disparity can be primarily attributed to two factors:

Long-range dependency requirements: the MIL framework contains diverse image tiles that necessitate effective long-range dependency modeling—a capability inherently strengthened by the self-attention mechanisms in Transformer-based architectures such as Swin Transformer.
Pathological texture characteristics: histopathological images exhibit densely packed textural patterns, making them particularly amenable to VGG’s architectural strength in capturing fine-grained local features through its stacked 3×3 convolutional operations.

Table 3

A systematic comparison of backbone architectures (Backbones) revealing their distinct impacts on network performance with 5-fold cross-validation. The experimental results demonstrate that Swin Transformer achieves state-of-the-art performance, while ConvNeXt and VGG networks demonstrate competitive accuracy

Backbone	Feature	Accuracy	Precision	Recall	F1 score
VGG16	Ballooning	0.913±0.009	0.918±0.008	0.919±0.01	0.913±0.009
	Inflammation	0.799±0.022	0.797±0.024	0.79±0.025	0.802±0.022
	Steatosis	0.721±0.017	0.723±0.014	0.721±0.018	0.721±0.017
	Mean	0.811±0.009	0.813±0.01	0.81±0.011	0.812±0.009
ViT_b_32	Ballooning	0.78±0.066	0.791±0.069	0.767±0.084	0.781±0.066
	Inflammation	0.684±0.072	0.687±0.068	0.673±0.069	0.683±0.073
	Steatosis	0.709±0.046	0.715±0.05	0.702±0.042	0.709±0.045
	Mean	0.757±0.009	0.762±0.007	0.754±0.01	0.757±0.01
ConvNeXt_tiny	Ballooning	0.924±0.015	0.929±0.011	0.93±0.016	0.924±0.014
	Inflammation	0.824±0.024	0.821±0.021	0.816±0.021	0.825±0.024
	Steatosis	0.754±0.035	0.755±0.031	0.755±0.036	0.754±0.034
	Mean	0.834±0.021	0.835±0.018	0.834±0.02	0.834±0.02
Swin transformer (TINY)	Ballooning	0.932±0.005	0.935±0.004	0.937±0.007	0.932±0.004
	Inflammation	0.835±0.018	0.829±0.019	0.831±0.017	0.836±0.016
	Steatosis	0.766±0.03	0.772±0.023	0.767±0.038	0.766±0.029
	Mean	0.844±0.015	0.845±0.012	0.845±0.018	0.845±0.014

Data are presented as mean ± standard deviation.

To delineate region-specific attention patterns during model prediction, the class activation mapping (CAM) technique was implemented in this study. Figure 7 employs the Grad-CAM++ (34) visualization method to generate corresponding CAM images for each task. The lesion areas displayed in Figure 7A were annotated by a senior pathologist (with over 15 years of experience) from a tier-3 hospital. Blue arrows indicate ballooning degeneration, red arrows denote inflammation, and green arrows represent steatosis. Analysis of these images reveals that although the original dataset contained insufficient patch-level annotations (e.g., patches labeled as ‘inflammation’ lacked annotations for ‘ballooning’ or ’steatosis’), the model successfully identified suspected lesion areas within the patches. For instance, in Figure 7B, ballooning degeneration was detected in patches 2, 4, 6, and 14, whereas patch 16 exhibited signs of inflammation. This demonstrates a key advantage of the intelligent algorithm over human pathologists: immunity to oversight and fatigue. Furthermore, Figure 7C indicates that the model directs significantly more attention towards inflammatory regions. However, the detection performance for steatosis, as shown in Figure 7D, was suboptimal. Notably, confusion with ballooning degeneration regions can be observed, including instances where macro-vesicular steatosis is misclassified as ballooning degeneration. This finding aligns with the experimental results showing relatively lower accuracy in steatosis detection, indicating potential for further model refinement.

Figure 7 CAM showing relevant regions for the decision of a CNN for a certain class. (A) The original input MIL bag annotated by a pathologist, where blue arrows denote ballooning, red arrows indicate lobular inflammation, and green arrows mark steatosis. (B-D) The CAM images for the sub-tasks of ballooning, inflammation, and steatosis respectively, visualizing model attention for each pathological feature calculated with Grad-CAM++. CAM, class activation maps; CNN, convolutional neural network; MIL, multi-instance learning. Cells were stained with two variants of Masson’s stain (Masson Trichrome and Masson Goldner). Scale bar =50 µm.

To clinically validate the performance of MMD-Net (MTL-DA-MIL, 4 tasks, Swin backbone), we conducted a slide-level evaluation following the procedure illustrated in Figure 8. After excluding patches annotated as “ignore”, all patches from the same case were grouped into multiple instance bags. These bags were then processed by the model, and the prediction results for each bag were recorded. The highest score for each target feature among all bags from the same slide was aggregated to determine the final slide-level score.

Figure 8 Illustrates the workflow for WSI-level evaluation. WSI, whole slide image.

The test set was strictly curated to include only WSIs containing patches exhibiting all three key histological features: ballooning, inflammation, and steatosis. This selection criterion resulted in a final test cohort of 190 WSIs. The quantitative results of this slide-level evaluation are presented in Table 4.

Table 4

The quantitative results of the slide-level evaluation

Feature	Accuracy	Precision	Recall	F1 score
Ballooning	0.7326±0.0172	0.7633±0.0519	0.593±0.0178	0.6882±0.0192
Inflammation	0.7074±0.0858	0.7157±0.088	0.6787±0.0889	0.6982±0.0902
Steatosis	0.6874±0.0485	0.5923±0.0359	0.712±0.0393	0.7202±0.043

Data are presented as mean ± standard deviation.

Upon transitioning the model evaluation from the bag level to the comprehensive WSI level, we observed a moderate performance degradation. This decline may be attributed to the variability in sampling ratios across different slides. For instance, although some slides yielded over 2,400 patches, others contributed fewer than 30, creating a domain-level imbalance that likely undermined the effectiveness of the DA strategy employed in this study. Furthermore, this imbalance increases the difficulty for the model to accurately “focus” on a few critical pathological regions, such as scattered ballooning cells. Although the attention mechanism is theoretically capable of such localization, in practice, attention weights can be “diluted” by large volumes of normal tissue, resulting in reduced sensitivity to subtle lesions.

Additionally, the MIL aggregation strategy adopted in this work is based on the “max-pooling” assumption, where the label of a bag is determined by its most abnormal instance. At the WSI level, this assumption may be an oversimplification. For example, a slide may contain multiple isolated, low-grade inflammatory foci that do not collectively constitute high-grade “diffuse” inflammation. The model, however, might overestimate the severity due to these focal lesions. This method of label aggregation inherently differs from the complex integrative process used by pathologists during global assessment, thereby introducing inevitable evaluation uncertainty.

It should be noted that due to computational resource constraints, all networks implemented in this study were deployed in lightweight configurations. This technical limitation might constrain the full expression of architectural potentials in sophisticated models such as Swin Transformers, and ConvNeXt.

Discussion

NAFLD, as the most prevalent chronic liver disease globally, affects approximately 25% of the world’s population and has emerged as a primary cause of cirrhosis and HCC. However, despite the crucial importance of early diagnosis and intervention in reversing disease progression, current diagnostic approaches, such as liver biopsy and the Kleiner scoring system (9), are hindered by limitations, including inefficiency and labor-intensiveness. In particular, the interpretation of traditional pathology slides is a time-consuming process, often requiring pathologists to spend 5–10 minutes meticulously analyzing each slide, which substantially increases their diagnostic workload.

In the quest for more efficient diagnostic pathways, the application of supervised DL algorithms, although showing some potential, faces dual challenges of cumbersome annotation processes (35) and high costs associated with obtaining high-quality labeled samples (36). To overcome this dilemma, our study designed MMD-Net, a weakly supervised scoring framework that integrates MIL and MTL strategies, enabling efficient assessment of three key histopathological features: steatosis, inflammation, and ballooning. Through a weakly supervised learning mechanism, MMD-Net may reduce the dependence on large-scale labeled data, thereby effectively controlling implementation costs.

In performance evaluations, MMD-Net achieved promising results. Its secondary weighted Cohen’s κ coefficients for the three key features appeared favorable compared to Heinemann et al.’s method (11), suggesting potential effectiveness in pathological image quantification. The framework may offer advantages not only in addressing annotation efficiency challenges but also in balancing computational efficiency with accuracy, potentially serving as a supplementary tool for pathological assessment. MMD-Net’s MTL performance also indicates possible utility for evaluating complex histopathological features.

A slight performance degradation was observed when evaluation shifted from the patch level to the WSI level, which was considered due to the substantial data complexity inherent in WSIs. This includes extensive non-diagnostic background regions and the increased difficulty for attention mechanisms to localize critical lesions within an ultra-large instance pool. Nevertheless, the model retained satisfactory discriminative ability at the WSI level, underscoring MMD-Net’s potential for real-world clinical application. Future work will focus on refining preprocessing strategies and attention mechanisms to improve robustness in complex whole-slide settings.

This study has several limitations. First, the dataset was limited to specific histopathological features, which may affect the model’s generalizability to all NAFLD/NASH manifestations. Second, although MMD-Net showed overall competence, its accuracy for certain features (e.g., steatosis) could be further optimized. Future work will expand datasets to include more diverse histopathological features and patient demographics, while exploring clinical applications such as real-time assessment and remote diagnosis. Model architecture refinements will also be pursued to enhance efficiency without compromising performance.

Conclusions

We present a DL system MMD-Net for automated NAS quantification in partially annotated NAFLD histopathology images. The primary objective of this research was to establish a weakly supervised framework using MIL for NAFLD assessment, aiming to reduce the annotation workload for pathologists while developing a clinically applicable diagnostic approach. Although class imbalance in the original dataset necessitated the implementation of distribution-balancing strategies that might theoretically introduce confounding factors, this study systematically demonstrates that the synergistic integration of MIL with DA training and MTL can substantially enhance model performance. MMD-Net provides end-to-end concurrent assessment of three diagnostic hallmarks (ballooning, inflammation, steatosis) under weak supervision, enabling simultaneous prediction of Kleiner scores for three diagnostic features with promising performance (average Cohen’s κ: 0.845±0.014). This advancement suggests promising potential to contribute to standardized NAFLD assessment using clinically applied AI-powered histopathology analysis.

Acknowledgments

The authors gratefully acknowledge the hepatocellular pathology dataset published by Heinemann et al. (July 2023). This publicly available dataset was provided by Duke University Medical Center (Durham, NC, USA), the Institute of Pathology at Hannover Medical School (Germany), and the Boehringer Ingelheim Biorepository (Biberach, Germany/Ridgefield, CT, USA), and was critical for this research.

Footnote

Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-981/rc

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-981/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Eslam M, Sanyal AJ, George JInternational Consensus Panel. MAFLD: A Consensus-Driven Proposed Nomenclature for Metabolic Associated Fatty Liver Disease. Gastroenterology 2020;158:1999-2014.e1. [Crossref] [PubMed]
Estes C, Razavi H, Loomba R, Younossi Z, Sanyal AJ. Modeling the epidemic of nonalcoholic fatty liver disease demonstrates an exponential increase in burden of disease. Hepatology 2018;67:123-33. [Crossref] [PubMed]
Schuppan D, Afdhal NH. Liver cirrhosis. Lancet 2008;371:838-51. [Crossref] [PubMed]
Tsochatzis EA, Bosch J, Burroughs AK. Liver cirrhosis. Lancet 2014;383:1749-61. [Crossref] [PubMed]
Cholankeril G, Wong RJ, Hu M, Perumpail RB, Yoo ER, Puri P, Younossi ZM, Harrison SA, Ahmed A. Liver Transplantation for Nonalcoholic Steatohepatitis in the US: Temporal Trends and Outcomes. Dig Dis Sci 2017;62:2915-22. [Crossref] [PubMed]
Singh S, Allen AM, Wang Z, Prokop LJ, Murad MH, Loomba R. Fibrosis progression in nonalcoholic fatty liver vs nonalcoholic steatohepatitis: a systematic review and meta-analysis of paired-biopsy studies. Clin Gastroenterol Hepatol 2015;13:643-54.e1-9; quiz e39-40.
Angulo P, Kleiner DE, Dam-Larsen S, Adams LA, Bjornsson ES, Charatcharoenwitthaya P, Mills PR, Keach JC, Lafferty HD, Stahler A, Haflidadottir S, Bendtsen F. Liver Fibrosis, but No Other Histologic Features, Is Associated With Long-term Outcomes of Patients With Nonalcoholic Fatty Liver Disease. Gastroenterology 2015;149:389-97.e10. [Crossref] [PubMed]
Brunt EM, Kleiner DE, Wilson LA, Belt P, Neuschwander-Tetri BANASH Clinical Research Network (CRN). Nonalcoholic fatty liver disease (NAFLD) activity score and the histopathologic diagnosis in NAFLD: distinct clinicopathologic meanings. Hepatology 2011;53:810-20. [Crossref] [PubMed]
Kleiner DE, Brunt EM, Van Natta M, Behling C, Contos MJ, Cummings OW, Ferrell LD, Liu YC, Torbenson MS, Unalp-Arida A, Yeh M, McCullough AJ, Sanyal AJNonalcoholic Steatohepatitis Clinical Research Network. Design and validation of a histological scoring system for nonalcoholic fatty liver disease. Hepatology 2005;41:1313-21. [Crossref] [PubMed]
Metter DM, Colgan TJ, Leung ST, Timmons CF, Park JY. Trends in the US and Canadian Pathologist Workforces From 2007 to 2017. JAMA Netw Open 2019;2:e194337. [Crossref] [PubMed]
Heinemann F, Gross P, Zeveleva S, Qian HS, Hill J, Höfer A, Jonigk D, Diehl AM, Abdelmalek M, Lenter MC, Pullen SS, Guarnieri P, Stierstorfer B. Deep learning-based quantification of NAFLD/NASH progression in human liver biopsies. Sci Rep 2022;12:19236. [Crossref] [PubMed]
Yang J, Chen H, Zhao Y, Yang F, Zhang Y, He L, Yao J. ReMix: A General and Efficient Framework for Multiple Instance Learning Based Whole Slide Image Classification. In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022. Proceedings, Part II. Springer-Verlag, Berlin, Heidelberg; 2022:35-45.
Lin T, Yu Z, Hu H, Xu Y, Chen CW. Interventional bag multi-instance learning on whole-slide pathological images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023:19830-9.
Hashimoto N, Fukushima D, Koga R, Takagi Y, Ko K, Kohno K, Nakaguro M, Nakamura S, Hontani H, Takeuchi I. Multi-scale Domain-adversarial Multiple-instance CNN for Cancer Subtype Classification with Unannotated Histopathological Images. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA; 2020:3851-60.
Yan R, He Q, Liu Y, Gou J, Sun Q, Zhou G, He Y, Tian G. DEST: Deep Enhanced Swin Transformer Toward Better Scoring for NAFLD. In: Pattern Recognition and Computer Vision: 5th Chinese Conference, PRCV 2022, Shenzhen, China, November 4–7, 2022; Proceedings, Part II. Springer-Verlag, Berlin, Heidelberg; 2022:204-14.
Yin C, Liu S, Lyu F, Lu J, Darkner S, Wong VWS, Yuen PC. Xfibrosis: Explicit vessel-fiber modeling for fibrosis staging from liver pathology images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024:11282-91.
Junaid HHS, Daneshfar F, Mohammad MA. Automatic colorectal cancer detection using machine learning and deep learning based on feature selection in histopathological images. Biomedical Signal Processing and Control 2025;107:107866.
Lu MY, Chen B, Williamson DFK, Chen RJ, Liang I, Ding T, Jaume G, Odintsov I, Le LP, Gerber G, Parwani AV, Zhang A, Mahmood F. A visual-language foundation model for computational pathology. Nat Med 2024;30:863-74. [Crossref] [PubMed]
Lu MY, Williamson DFK, Chen TY, Chen RJ, Barbieri M, Mahmood F. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng 2021;5:555-70. [Crossref] [PubMed]
Shabanian M, Taylor Z, Woods C, Bernieh A, Dillman J, He L, Ranganathan S, Picarsic J, Somasundaram E. Liver fibrosis classification on trichrome histology slides using weakly supervised learning in children and young adults. J Pathol Inform 2025;16:100416. [Crossref] [PubMed]
Ahmadvand P, Farahani H, Farnell D, Darbandsari A, Topham J, Karasinska J, Nelson J, Naso J, Jones SJM, Renouf D, Schaeffer DF, Bashashati A. A Deep Learning Approach for the Identification of the Molecular Subtypes of Pancreatic Ductal Adenocarcinoma Based on Whole Slide Pathology Images. Am J Pathol 2024;194:2302-12. [Crossref] [PubMed]
Caruana R. Multitask learning. Machine learning 1997;28:41-75.
Kokkinos I. Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017:6129-38.
Collobert R, Weston J. A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning; 2008:160-7.
Huang JT, Li J, Yu D, Deng L, Gong Y. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE; 2013:7304-8.
Gao Z, Mao A, Wu K, Li Y, Zhao L, Zhang X, Wu J, Yu L, Xing C, Gong T, Zheng Y, Meng D, Zhou M, Li C. Childhood Leukemia Classification via Information Bottleneck Enhanced Hierarchical Multi-Instance Learning. IEEE Trans Med Imaging 2023;42:2348-59. [Crossref] [PubMed]
Kamoona AM, Gostar AK, Bab-Hadiashar A, Hoseinnezhad R. Multiple instance-based video anomaly detection using deep temporal encoding–decoding. Expert Syst Appl 2023;214:119079.
SimonyanKZissermanA. Very deep convolutional networks for large-scale image recognition. Available online:
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016:770-8.
Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V. Searching for MobileNetV3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019:1314-24.
DosovitskiyABeyerLKolesnikovAWeissenbornDZhaiXUnterthinerTDehghaniMMindererMHeigoldGGellyS. An image is worth 16x16 words: Transformers for image recognition at scale. Available online:
Liu Z, Mao H, Wu CY, Feichtenhofer C, Darrell T, Xie S. 2022. A ConvNet for the 2020s. Available online: 10.48550/arXiv.2201.03545
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B. Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2021:10012-22.
Chattopadhay A, Sarkar A, Howlader P, Balasubramanian VN. Grad-Cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). (IEEE, 2018); 2018:839-47.
Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, van der Laak JAWM, van Ginneken B, Sánchez CI. A survey on deep learning in medical image analysis. Med Image Anal 2017;42:60-88. [Crossref] [PubMed]
Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G, Thrun S, Dean J. A guide to deep learning in healthcare. Nat Med 2019;25:24-9. [Crossref] [PubMed]

Cite this article as: Yu M, Jiang T, Zhou H, Ta D. MMD-Net: a weakly supervised solution for quantification of nonalcoholic fatty liver biopsies. Quant Imaging Med Surg 2026;16(1):38. doi: 10.21037/qims-2025-981

MMD-Net: a weakly supervised solution for quantification of nonalcoholic fatty liver biopsies

Introduction

Table 1

Methods

Problem formulation

Class imbalance handling

Network

Experimental setup

Dataset

Evaluation metrics

Training strategy

Results

Table 2

Table 3

Table 4

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share