Evaluation of ultrasound-based predictive models for disease relapse in rheumatoid arthritis patients in clinical remission
Introduction
Rheumatoid arthritis (RA) is an autoimmune disease characterized by persistent synovial inflammation, manifesting as joint swelling, pain, and stiffness, representing one of the leading causes of disability (1). In numerous studies, disease activity relapse has been observed in 40–75% of RA patients who have discontinued treatment after achieving sustained remission (2). Disease relapse exacerbates joint damage, leads to chronic pain and functional impairment, reduces quality of life, and increases risks of cardiovascular diseases, osteoporosis, and infections while increasing treatment challenges and economic burdens. Therefore, the early and accurate identification of patients at high risk of relapse during clinical remission represents a critical prognostic challenge in the effective management of RA.
Currently, although several clinical indicator-based prediction models for RA relapse risk exist, their predictive accuracy remains limited and they generally do not incorporate sensitive imaging information, making precise risk stratification challenging in clinical practice (3,4). Musculoskeletal ultrasound (MSUS), as a sensitive imaging tool, holds potential to address this limitation. Grayscale (GS) ultrasound can detect synovial effusion and hypertrophy, while power Doppler (PD) objectively reflects active synovial inflammation (5). Studies have confirmed that ultrasound assessment can provide effective predictive information for relapse risk across different stages of RA (6). The European Alliance of Associations for Rheumatology (EULAR) also recommends the use of joint ultrasound for disease activity assessment and outcome prediction (7,8).
However, a key challenge in this field is the substantial heterogeneity in ultrasound scanning protocols regarding joint sites and numbers across existing studies (9,10), which hinders the comparability and integration of results and obstructs the establishment of standardized assessment pathways. Although comprehensive ultrasound evaluation protocols (e.g., examining 36 joints) demonstrate high predictive value, their operational complexity and time-consuming nature limit widespread clinical application. A key unresolved question is whether a simplified and standardized ultrasound protocol can maintain predictive accuracy while improving clinical feasibility.
Based on this background, we systematically analyze the clinical and ultrasonographic characteristics of RA patients in clinical remission using a standardized ultrasound assessment protocol. By constructing and comparing comprehensive versus simplified prediction models, we systematically evaluate the predictive performance of simplified protocols and identify the optimal joint combination, with the aim of providing direct evidence for establishing a clinically practical and standardized ultrasound-based assessment scheme for RA relapse risk. We present this article in accordance with the TRIPOD reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1518/rc).
Methods
Study design and participants
This prospective cohort study utilized data from the clinical follow-up database of the Rheumatology and Immunology Department at The Second Hospital of Lanzhou University. The study was conducted at a single center, with a recruitment period from August 1, 2022 to August 31, 2024, and follow-up concluded on May 31, 2025.
Consecutive patients with RA in clinical remission were enrolled. All patients met the 1987 American College of Rheumatology (ACR) or 2010 ACR/EULAR diagnostic criteria and satisfied at least one of the following remission criteria: Disease Activity Score in 28 Joints (DAS28) <2.6, simplified disease activity index (SDAI) ≤3.3, clinical disease activity index (CDAI) ≤2.8, ACR/EULAR Boolean remission criteria, absence of swelling or tenderness in 28 joints, or clinical assessment of remission by rheumatology specialists (11). All enrolled patients received standard antirheumatic therapy during remission.
The primary outcome was disease relapse, defined as an increase in DAS28 >0.6 from the lowest recorded value or DAS28 >2.6 (12). Outcome assessment was performed blindly by research assistants unaware of ultrasound results.
The final analysis included 332 patients from an initial pool of 402 eligible participants. Exclusions were due to loss to follow up (n=45), poor-quality ultrasound images (n=15), and incomplete data (n=10). Patients were randomly allocated to development (n=222) and validation (n=110) sets, with 76 relapse cases and 256 sustained remission cases identified during follow-up.
Sample size was estimated using the R package pmsampsize, with parameters set at C-statistic =0.85, eight predictors, shrinkage factor =0.9, and event rate =39%, indicating a minimum requirement of 197 participants. The study’s sample of 332 patients met this requirement.
The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of The Second Hospital of Lanzhou University (No. 2020A-326) and informed consent was obtained from all individual participants.
Data processing and predictor definitions
Predictor definitions and measurement: candidate predictors collected in this study included: basic demographics (age, sex), disease characteristics (disease duration, clinical remission duration), clinical assessments (28-joint tenderness/swollen counts, radiographic bone erosion), disease activity scores (DAS28-CRP, DAS28-ESR), laboratory parameters (C-reactive protein, erythrocyte sedimentation rate), and comorbidities. All predictors were measured at enrollment, synchronized with ultrasound examinations.
Missing data handling: this study employed complete-case analysis. During the screening phase, cases with missing key variables (10 cases) were excluded, ensuring complete data for the final 332 analyzed cases.
Ultrasound examination
A baseline multi-joint and tendon ultrasound examination was conducted on patients using a GE LOGIQ E20 ultrasound system with an L3-12-D 12 MHz linear transducer by two experienced sonographers. The PD signal of the synovium was assessed by selecting the region of interest, calibrated to the lowest pulse repetition frequency (1.0 kHz) and the lowest wall filter setting (72 Hz) to achieve maximum sensitivity. Color gain was set below the level at which noise artifacts appeared, allowing the sonographer to adjust machine settings for optimal image quality.
Prior to the examination, the sonographers practiced standardized static image interpretation through an electronic learning platform and remained blinded to the patients’ clinical and laboratory data. The examination included 36 joints and 36 tendons; the presence of synovitis in any unilateral joint was recorded as positive, and the maximum score of both sides was documented. The specific examination sites included:
- Joints: metacarpophalangeal (MCP) joints 1–5, proximal interphalangeal (PIP) joints 1–5, metatarsophalangeal (MTP) joints 1–5, wrist, knee, and ankle joints.
- Tendons: extensor compartments of the wrist (I–VI): abductor pollicis longus/extensor pollicis brevis (APL/EPB) tendons; extensor carpi radialis longus/extensor carpi radialis brevis (ECRL/ECRB) tendons; extensor pollicis longus (EPL) tendon; extensor digitorum communis/extensor indicis proprius (EDC/EIP) tendons; extensor digiti minimi (EDM) tendon; extensor carpi ulnaris (ECU) tendon, digit flexors (DF) tendons of the 2–5 fingers, tibialis anterior (TA) tendon, extensor hallucis longus (EHL) tendon, extensor digitorum longus (EDL) tendon, tibialis posterior (TP) tendon, peroneus longus (PL) tendon, peroneus brevis (PB) tendon, flexor digitorum longus (FDL) tendon, flexor hallucis longus (FHL) tendon, flexor digitorum brevis (FDB) tendon, and flexor carpi radialis (FCR) tendon. In this study, a standardized method was employed to systematically assess synovitis (13-15) and tenosynovitis (16). Synovial hypertrophy (SH) and PD signals were evaluated using two respective systems: a binary classification system (present ≥1, absent =0) for SH and a semi-quantitative scoring system (grades 0–3) for PD. The specific scoring criteria referred to the EULAR-Outcome Measures in Rheumatology (OMERACT) integrated scoring system (17), with the detailed grading criteria provided in Table 1. The assessment protocol defined both synovitis and tenosynovitis as conditions simultaneously satisfying an SH score ≥1 and a PD score ≥1 (Table 1).
Table 1
| Synovitis/tenosynovitis | Hand, wrist and ankle joints | Knee | Wrist and ankle tendons | |||||
|---|---|---|---|---|---|---|---|---|
| SH (GS) | PD | SH (GS) | PD | SH (GS) | PD | |||
| Grade 0 (normal) | No SH | No signal | Thickness <2 mm | No signal | No SH | No signal | ||
| Grade 1 (minimal) | Minimal SH up to the imaginary horizontal line connecting the 2 joints edges | Up to three single signals or one confluent and two single or two confluent | Thickness: 2–5 mm | Up to three single signals or one confluent and two single or two confluent | Mild | Peritendinous focal signal | ||
| Grade 2 (moderate) | Moderate SH protruding over the joint line along with concave surface | Larger than grade 1, but <50% of SH area covered by signals | Thickness: 6–8 mm | Larger than grade 1, but <50% of SH area covered by signals | Moderate | Peritendinous multifocal signal | ||
| Grade 3 (severe) | Severe SH protruding beyond the joint line with convex surface | More than 50% of SH area covered by signals | The thickness >8 mm | More than 50% of SH area covered by signals | Severe | Peritendinous diffuse signal | ||
GS, grayscale; SH, synovial hypertrophy; PD, power Doppler.
Model development and validation
Data preprocessing and balancing
Continuous variables were standardized and categorical variables were one-hot encoded. To address class imbalance, four strategies were compared: no sampling, random under-sampling, Synthetic Minority Over-sampling Technique (SMOTE), and Adaptive Synthetic Sampling (ADASYN). Based on cross-validation performance, SMOTE was selected for model training (Table S1).
Feature selection
A multi-stage feature selection strategy was employed:
- Preliminary screening: univariate analysis (P<0.05) combined with multivariate logistic regression was used to identify predictors independently associated with relapse.
- Feature optimization: zero-variance variables were removed, and variance inflation factor (VIF; VIF >5) was calculated to eliminate multicollinearity.
- Final determination: the selected predictors were incorporated into machine learning model training.
Model development and internal validation
Three-tier prediction models were constructed:
- Model I (clinical benchmark): clinical indicators only.
- Model II (simplified ultrasound): clinical indicators + 4 key joint ultrasound markers (wrist, MCP2, knee, TP tendon).
- Model III (comprehensive ultrasound): clinical indicators + all 36 joint ultrasound markers.
Four algorithms [logistic regression, random forest, Light Gradient Boosting Machine (LightGBM), and Extreme Gradient Boosting (XGBoost)] were trained in parallel using repeated 5-fold cross-validation (10 repetitions) for internal validation, with area under the curve (AUC) as the primary metric. LightGBM demonstrated optimal performance and was selected for final model construction after Bayesian optimization hyperparameter tuning (Figure S1). Based on validation set performance, we focused on evaluating Model II’s non-inferiority to Model III and its incremental value over Model I.
Model validation and performance evaluation
- Validation set prediction: the optimal LightGBM model was used to directly calculate relapse probability for each patient.
- Performance evaluation: comprehensive assessment of discrimination (AUC, sensitivity, specificity, precision, recall, F1-score), calibration (calibration curve, Brier score), and clinical utility (decision curve analysis). Visualization through receiver operating characteristic (ROC) curves, calibration curves, decision curves, and precision-recall curves.
Development and validation set comparison
The development and validation sets maintained consistency in study center, inclusion criteria, outcome definitions, and predictor measurement methods. Both groups showed similar distributions in demographic characteristics, disease activity, and laboratory indicators, ensuring validation rationality.
Feasibility assessment
Scanning times for the complete protocol (36 joints/tendons) and simplified protocol (4 joints/tendons) were recorded in 10 consecutive RA patients in remission. A paired design was employed, with scanning times described using median (interquartile range) and compared using Wilcoxon signed-rank test.
Statistical analysis methods
Continuous variables were described as mean ± standard deviation for normal distributions and median (interquartile range) for non-normal distributions. Categorical variables were expressed as frequency (percentage). Group comparisons used independent samples t-test for normally distributed continuous variables, Mann-Whitney U test for non-normal continuous variables, and Chi-squared or Fisher’s exact test for categorical variables. All tests were two-sided with P<0.05 considered statistically significant. Two blinded sonographers independently evaluated images from 20 RA patients. Inter-observer consistency was analyzed using κ statistics, with κ >0.8 indicating excellent agreement. All statistical analyses were performed using R software (v3.3.2) and Free Statistics software.
Results
Study cohort, data partitioning and baseline characteristics
A total of 402 patients met the inclusion criteria, with 332 patients ultimately included in the study and randomly allocated to the development set (n=222) and validation set (n=110). The screening flowchart is shown in Figure 1.
Comparison of key predictors and outcome distributions between the development and validation sets showed that except for slightly higher disease duration in the validation set (P=0.017), there were no significant differences in other variables or outcome incidence rates, indicating balanced dataset partitioning and suitability of the validation set for unbiased model evaluation (Table S2).
The baseline characteristics of the entire study cohort are presented in Table 2. Based on follow-up outcomes, patients were categorized into the relapse group (n=76) and sustained remission group (n=256). Intergroup comparisons revealed significant differences between the relapse and remission groups in terms of disease duration, duration of clinical remission, rates of high anti-CCP positivity, and positive hand/foot X-ray findings (erosions and/or joint space narrowing) (Table S3).
Table 2
| Variables | Remission group (n=256) | Relapse group (n=76) | P |
|---|---|---|---|
| Age (years) | 54.0±11.4 | 54.2±11.2 | 0.857 |
| Female | 108 (42.2) | 28 (36.8) | 0.405 |
| Disease duration (years) | 2.0 [0.7, 4.0] | 2.4 [0.9, 6.0] | 0.007 |
| Duration of clinical remission (months) | 8.7±4.6 | 6.4±4.9 | <0.001 |
| Anti-CCP | 0.020 | ||
| Negative | 119 (46.5) | 22 (28.9) | |
| Low positive | 52 (20.3) | 18 (23.7) | |
| High positive | 85 (33.2) | 36 (47.4) | |
| RF | 0.397 | ||
| Negative | 64 (25.0) | 14 (18.4) | |
| Low positive | 61 (23.8) | 17 (22.4) | |
| High positive | 131 (51.2) | 45 (59.2) | |
| ESR (mm/h) | 9.0 [4.0, 12.7] | 12.0 [8.9, 13.2] | <0.001 |
| CRP (mg/L) | 4.0 [2.0, 6.3] | 4.5 [3.0, 6.0] | 0.080 |
| TJC (0–28) | 0 [0, 0] | 0 [0, 0] | 0.693 |
| SJC (0–28) | 0 [0, 0] | 0 [0, 0] | 0.720 |
| Hand/foot X-ray+ | 74 (28.9) | 33 (43.4) | 0.017 |
| PhGA | 2.0 [1.0, 3.0] | 2.0 [1.0, 2.0] | 0.094 |
| PGA | 1.0 [0.0, 2.0] | 1.0 [0.8, 2.0] | 0.563 |
| CDAI | 3.0 [1.0, 5.0] | 3.0 [2.0, 4.0] | 0.884 |
| SDAI | 7.8 [5.3, 11.4] | 8.8 [6.1, 11.4] | 0.184 |
Data are presented as mean standard deviation, n (%) or median [interquartile range]. Anti-CCP, anti-cyclic citrullinated peptide antibody; CDAI, clinical disease activity index; CRP, C-reactive protein; DAS, disease activity score; ESR, erythrocyte sedimentation rate; PGA, patient global assessment; PhGA, physician global assessment; RF, rheumatoid factor; SDAI, simplified disease activity index; SJC, swollen joint count; TJC, tender joint count.
Ultrasound distribution characteristics of synovitis in joints and tendons
This study analyzed the baseline ultrasound examination results of 332 RA patients to assess the GS and PD scores across various joints and the incidence and distribution of synovitis (Table S4). The findings revealed significant differences in the distribution of synovitis among different joints during the remission phase of RA.
Synovitis
The wrist joint showed the highest incidence of synovitis (39.8%), followed by MCP2 (25.0%) and MCP3 (23.2%) joints.
The knee and ankle joints had relatively low incidences of synovitis at 13.6% and 11.4% respectively.
Tenosynovitis
The highest incidence of tenosynovitis was observed in the ECU tendon sheath (12.7%), followed by the EDC/EIP (12%).
The DF 2 and TP tendon sheaths showed incidences of 8.1% and 9.6% respectively.
The incidence of synovitis was higher in the wrist and finger joints, while deep joints and tendon sheaths exhibited relatively lower inflammatory rates.
Ultrasound characteristics of joints and tendons in relapse and remission groups
This study evaluated the ultrasound synovitis and tenosynovitis of hands and wrists in RA patients in both remission and relapse groups (Table 3, Figure 2).
Table 3
| Joints and tendons | US synovitis and tenosynovitis | ||
|---|---|---|---|
| Remission group (n=256), n (%) | Relapse group (n=76), n (%) | P | |
| Wrist | 92 (35.9) | 40 (52.6) | 0.009 |
| MCP1 | 25 (9.8) | 13 (17.1) | 0.078 |
| MCP2 | 53 (20.7) | 30 (39.5) | <0.001 |
| MCP3 | 52 (20.3) | 25 (32.9) | 0.022 |
| MCP4 | 27 (10.5) | 13 (17.1) | 0.123 |
| MCP5 | 24 (9.4) | 15 (19.7) | 0.014 |
| PIP1 | 20 (7.8) | 9 (11.8) | 0.275 |
| PIP2 | 32 (12.5) | 15 (19.7) | 0.112 |
| PIP3 | 29 (11.3) | 17 (22.4) | 0.014 |
| PIP4 | 26 (10.2) | 5 (6.6) | 0.347 |
| PIP5 | 16 (6.2) | 8 (10.5) | 0.206 |
| APL/EPB | 12 (4.7) | 7 (9.2) | 0.159 |
| ECRL/ECRB | 8 (3.1) | 6 (7.9) | 0.098 |
| EPL | 8 (3.1) | 5 (6.6) | 0.184 |
| EDC/EIP | 28 (10.9) | 12 (15.8) | 0.254 |
| EDM | 6 (2.3) | 4 (5.3) | 0.245 |
| ECU | 23 (9.0) | 19 (25.0) | <0.001 |
| DF 2 | 17 (6.6) | 10 (13.2) | 0.068 |
| DF 3 | 11 (4.3) | 3 (3.9) | 0.098 |
| DF 4 | 8 (3.1) | 6 (7.9) | 0.098 |
| DF 5 | 14 (5.5) | 4 (5.3) | 0.098 |
| Knee | 29 (11.3) | 16 (21.1) | 0.030 |
| Ankle | 19 (7.4) | 19 (25.0) | <0.001 |
| MTP1 | 16 (6.2) | 8 (10.5) | 0.369 |
| MTP2 | 16 (6.2) | 7 (9.2) | 0.372 |
| MTP3 | 9 (3.5) | 5 (6.6) | 0.325 |
| MTP4 | 3 (1.2) | 2 (2.6) | 0.323 |
| MTP5 | 16 (6.2) | 11 (14.5) | 0.021 |
| TP | 15 (5.9) | 17 (22.4) | <0.001 |
| FDL | 5 (2.0) | 5 (6.6) | 0.053 |
| FHL | 4 (1.6) | 3 (3.9) | 0.199 |
| TA | 4 (1.6) | 4 (5.3) | 0.084 |
| EHL | 5 (2.0) | 4 (5.3) | 0.219 |
| EDL | 5 (2.0) | 4 (5.3) | 0.219 |
| PL | 7 (2.7) | 5 (6.6) | 0.155 |
| PB | 6 (2.3) | 5 (6.6) | 0.135 |
APL, abductor pollicis longus; DF, digit flexors; ECRB, extensor carpi radialis brevis; ECRL, extensor carpi radialis longus; EDC, extensor digitorum communis; EDM, extensor digiti minimi; EDL, extensor digitorum longus; EHL, extensor hallucis longus; EIP, extensor indicis proprius; EPL, extensor pollicis longus; ECU, extensor carpi ulnaris; EPB, extensor pollicis brevis; FDL, flexor digitorum longus; FHL, flexor hallucis longus; GS, grayscale; MCP, metacarpophalangeal; MTP, metatarsophalangeal; PB, peroneus brevis; PD, power Doppler; PIP, proximal interphalangeal; PL, peroneus longus; SH, synovial hypertrophy; TA, tibialis anterior; TP, tibialis posterior; US, ultrasound.
Synovitis
The incidence of wrist synovitis was significantly higher in the relapse group than in the remission group (64.5% vs. 35.9%, P<0.001). For MCP joints, the relapse group showed significantly higher incidences of synovitis at MCP2 (39.5% vs. 20.7%, P<0.001), MCP3 (32.9% vs. 20.3%, P=0.022), and MCP5 (19.7% vs. 9.4%, P=0.014). The incidence of synovitis at PIP3 was also significantly higher in the relapse group (22.4% vs. 11.3%, P=0.014). Additionally, the relapse group had significantly higher incidences of synovitis in the knee (21.1% vs. 11.3%, P=0.030), ankle (25.0% vs. 7.4%, P<0.001), and MTP5 joints (14.5% vs. 6.2%, P=0.021).
Tenosynovitis
The incidences of tenosynovitis at ECU (25% vs. 9%, P<0.001) and TP (22.4% vs. 5.9%, P<0.001) were significantly higher in the relapse group. No significant differences were observed in tenosynovitis at FDL, FHL, TA, EHL, EDL, PL, or PB between the two groups.
Analysis of independent predictors for RA relapse
Univariate logistic regression analysis revealed that disease duration, duration of clinical remission, high anti-CCP antibody positivity, and positive hand/foot X-ray findings were significantly associated with relapse. Among ultrasound parameters, synovitis in the wrist, MCP2, MCP3, MCP5, PIP3, knee, ankle, and MTP5 joints, as well as tenosynovitis in the ECU, TP, and FDL tendon sheaths, were significantly associated with increased relapse risk (Table S5).
Multivariate analysis further identified the following indicators as independent predictors of RA relapse: longer disease duration, shorter duration of clinical remission, high anti-CCP positivity, positive hand/foot X-ray findings, as well as ultrasound-detected synovitis in the wrist, MCP2, and knee, and tenosynovitis of the TP (Table 4).
Table 4
| Variables | Univariate | Multivariate analysis | |||||
|---|---|---|---|---|---|---|---|
| P | OR score | 95% CI | P | OR score | 95% CI | ||
| Disease duration | <0.001 | 1.17 | 1.04–1.31 | 0.020 | 1.18 | 1.02–1.37 | |
| Duration of clinical remission | <0.001 | 0.91 | 0.86–0.96 | <0.001 | 0.89 | 0.83–0.95 | |
| Anti-CCP | – | – | – | – | – | – | |
| Negative | Reference | – | – | – | |||
| Low positive | 0.08 | 1.87 | 0.93–3.78 | – | – | – | |
| High positive | 0.007 | 2.29 | 1.26–4.17 | <0.001 | 2.77 | 1.31–5.87 | |
| Hand/foot X-ray+ | 0.018 | 1.89 | 1.11–3.2 | 0.005 | 2.62 | 1.32–5.22 | |
| Wrist | <0.001 | 3.24 | 1.9–5.52 | 0.022 | 3.55 | 1.82–6.88 | |
| MCP2 | 0.001 | 2.5 | 1.44–4.33 | 0.009 | 3.04 | 1.31–7.02 | |
| MCP3 | 0.024 | 1.92 | 1.09–3.39 | – | – | – | |
| MCP5 | 0.016 | 2.38 | 1.18–4.81 | – | – | – | |
| PIP3 | 0.016 | 2.26 | 1.16–4.38 | – | – | – | |
| ECU | <0.001 | 3.38 | 1.72–6.62 | – | – | – | |
| Knee | 0.032 | 2.09 | 1.06–4.09 | 0.003 | 3.60 | 1.52–8.49 | |
| Ankle | <0.001 | 4.16 | 2.07–8.36 | – | – | – | |
| MTP5 | 0.025 | 2.54 | 1.12–5.74 | – | – | – | |
| TP | <0.001 | 4.63 | 2.19–9.8 | 0.025 | 3.30 | 1.16–9.40 | |
Anti-CCP, anti-cyclic citrullinated peptide antibody; CI, confidence interval; ECU, extensor carpi ulnaris; MCP, metacarpophalangeal; MTP, metatarsophalangeal; OR, odds ratio; PIP, proximal interphalangeal; RA, rheumatoid arthritis; TP, tibialis posterior.
Model performance, presentation and validation
Model development and performance
The performance of the three prediction models on the validation set is shown in Table 5. Compared to Model I containing only clinical indicators (AUC =0.755), both Model II incorporating 4 key joint ultrasound indicators (AUC =0.865) and Model III incorporating all joint ultrasound indicators (AUC =0.903) demonstrated superior performance. The simplified ultrasound model (Model II) achieved a comparable AUC (0.865) and similar accuracy (0.827) to the comprehensive ultrasound model (Model III) (AUC =0.903, accuracy =0.845) (Figure 3).
Table 5
| Model | Accuracy | AUC | Recall | Precision | F1-score | Kappa | MCC | Log loss | Brier score |
|---|---|---|---|---|---|---|---|---|---|
| Model I | 0.791 | 0.755 | 0.52 | 0.542 | 0.531 | 0.396 | 0.396 | 7.536 | 0.209 |
| Model II | 0.827 | 0.865 | 0.56 | 0.714 | 0.622 | 0.527 | 0.531 | 5.570 | 0.173 |
| Model III | 0.845 | 0.903 | 0.72 | 0.643 | 0.679 | 0.578 | 0.579 | 5.570 | 0.154 |
Model I: clinical benchmark model. Model II: simplified model combining clinical indicators with 4 key joint ultrasound indicators. Model III: comprehensive model combining clinical indicators with all joint ultrasound indicators. AUC, area under the curve; MCC, Matthews correlation coefficient.
Incremental predictive value and clinical utility
The net reclassification index (NRI) for Model II compared to Model I was 0.36 (95% confidence interval (CI): 0.16–0.55; P<0.001), and the integrated discrimination improvement (IDI) was 0.17 (95% CI: 0.12–0.23; P<0.001), indicating that the new model, on average, increased prediction probabilities for relapsed patients while decreasing them in non-relapsed patients, with an overall discrimination improvement of 17% (Table S6, Figure S2). The Brier scores of Model II and Model III were significantly lower than that of Model I (0.2091), suggesting more accurate prediction probabilities. Decision curve analysis showed that within the 10–70% decision threshold range, the net benefit of models containing ultrasound indicators was significantly higher than the clinical benchmark model and the “treat all/treat none” strategies (Figure S3).
Although Model III had the best predictive accuracy (AUC =0.90), Model II maintained excellent performance (AUC =0.86) while reducing scanning time from 29.0 to 10.5 min (P<0.001), and achieved higher precision (0.71 vs. 0.64). The simplified model (Model II) achieved a favorable balance, with only a minimal loss in accuracy (a 4.4% reduction in AUC) but a substantial 64% improvement in scanning efficiency, underscoring its high clinical utility.
Visualization, development and evaluation of the optimal model
Analysis of the SHapley Additive exPlanations (SHAP) summary plot revealed that among the eight predictors, the duration of clinical remission was the most important protective factor (longer remission periods associated with lower relapse risk). MCP2 synovitis, disease duration, and hand/foot radiographic bone erosion were the most significant risk factors, while high anti-CCP positivity, wrist joint synovitis, knee joint synovitis, and TP inflammation also served as positive predictors (Figure 4).
Comprehensive performance evaluation demonstrated that the simplified ultrasound model (Model II) exhibits excellent discriminative ability (AUC =0.86), good calibration, and significant clinical net benefit, making it a promising predictive tool for clinical application (Figure 5).
Consistency testing
The ultrasound examination demonstrated good reliability in assessing joints and tendon sheaths. For the MCP, PIP, wrists, and knees, the grayscale and Doppler scores, as well as the detection of synovitis and tenosynovitis, had Kappa values all above 0.7, indicating high consistency. Although the scoring consistency for the ankles and some wrist areas was slightly lower, it remained acceptable. The ultrasound proved to be highly reliable in detecting synovitis and tenosynovitis (Table S7).
Discussion
This study prospectively analyzed clinical and ultrasound monitoring of joints and tendon sheaths in patients with RA during clinical remission to explore risk factors for RA relapse and to investigate the predictive value of ultrasound models in forecasting disease relapse in patients with RA in remission. The main findings are as follows: (I) among clinical factors, longer disease duration, shorter clinical remission duration, high anti-CCP antibody positivity, and positive hand/foot X-ray findings were independent risk factors for RA relapse; (II) among ultrasound indicators, synovitis in the wrist, MCP2, and knee, and tenosynovitis of the TP were independent imaging predictors of relapse; (III) the simplified prediction model (Model II) combining key clinical and ultrasound indicators demonstrated excellent discriminative ability (AUC =0.865) in the independent validation set, significantly outperforming the clinical-only model and approaching the performance of the comprehensive ultrasound reference model, showing strong potential for clinical translation.
This study confirms that patients with longer disease duration, shorter clinical remission periods, high anti-CCP positivity, and existing radiographic bone erosion face higher relapse risks. These factors collectively reflect a persistent background of immune dysregulation and structural damage, which serves as fertile ground for disease reactivation even during clinical remission (18-21). More importantly, this study highlights the central role of subclinical inflammation in driving relapse. MSUS can sensitively detect synovial and tenosynovial inflammation that is often missed by conventional physical examination (22-24). In this study, a substantial proportion of patients in remission still exhibited ultrasound-detected synovitis (GS ≥1 and PD ≥1), particularly concentrated in specific areas such as the wrist, MCP2, knee, and TP. These regions can be considered “hot spots” of inflammation during RA remission, and their involvement indicates persistent subclinical disease activity, serving as crucial early warning signals for relapse.
In RA management, assessing only small hand joints may underestimate the overall inflammatory burden (25). This study innovatively incorporated the knee, foot, ankle joints, and tendon sheaths into the evaluation system, revealing that synovitis in the wrist, MCP2, MCP3, MCP5, knee, ankle, and PIP3 joints, as well as tenosynovitis in the ECU and TP tendon sheaths, are all predictors of relapse. Notably, the predictive value of ECU and TP tenosynovitis has been systematically validated for the first time, which expands beyond the limitations of traditional ultrasound assessment. Although large joint involvement is not typical in RA (26), synovitis in the foot and ankle (including MTP5) and knee joints holds significant value for predicting relapse (27,28). The incidence of TP tenosynovitis supports its inclusion in standardized ultrasound scoring systems (29,30). These findings suggest that key tendon sheath assessments should be incorporated into routine RA monitoring to more accurately evaluate inflammatory load and predict relapse risk.
In model development, we systematically compared multiple machine learning algorithms and sampling strategies, ultimately establishing a high-performance and robust prediction model based on LightGBM and SMOTE methods. Compared to the baseline model containing only clinical indicators (Model I, AUC =0.755), the simplified model incorporating 4 key joint ultrasound indicators (Model II) not only significantly improved discriminative ability (AUC increased to 0.865) but also demonstrated excellent clinical utility, with decision curve net benefit exceeding traditional strategies across a wide threshold range. Particularly important is the breakthrough in scanning efficiency achieved by the simplified model—reducing assessment time from 29 min for comprehensive scanning to 10.5 min, with minimal loss in prediction accuracy (only ~4% relative reduction in AUC). This achieves the optimal balance between predictive performance and clinical feasibility, laying the foundation for its integration into routine follow-up pathways.
This study methodologically addresses with previous research on ultrasound prediction of RA relapse. Building on the STARTER study, which confirmed the predictive value of ultrasound-detected synovitis and tenosynovitis [odds ratio (OR) =2.09] but faced challenges in clinical implementation of comprehensive joint assessment (31), our research achieves a key methodological breakthrough: through machine learning-based feature selection, we have streamlined the assessment to 4 key joints while improving predictive performance to an AUC of 0.86. Compared to similar studies, Matsuo et al. achieved an AUC of 0.747 using 14 joints and 73 features (32), whereas our study, through optimized feature engineering and the LightGBM algorithm, achieves superior performance with an ultra-simplified 4-joint protocol, establishing a new, more efficient paradigm.
Our study successfully applies this simplified protocol to the new scenario of predicting relapse in RA patients during remission. While maintaining excellent predictive performance, it significantly reduces scanning time and demonstrates clinical net benefit through decision curve analysis. This marks the transition of simplified ultrasound from methodological validation to practical clinical application, providing a truly feasible tool for precision management of RA.
Limitations
There are several limitations in this study. First, as a single-center study, although rigorous internal validation was performed, the model has not been validated in an independent external cohort, and its generalizability requires further confirmation. Second, although the sample size met statistical requirements, the events per variable (EPV; EPV =9.5) is at the lower limit of acceptability, and model stability needs to be verified in larger samples. Third, the follow-up period of 12 months is relatively short, preventing assessment of the model’s long-term predictive value. Additionally, the study did not systematically incorporate radiological progression scores, thus failing to explore the direct relationship between ultrasound-detected inflammation and structural damage accumulation. These limitations indicate that the current findings should be considered preliminary and require further validation through multicenter studies and longer follow-up before clinical application.
Conclusions
This study successfully developed and validated a RA relapse risk prediction model that integrates key clinical indicators with simplified ultrasound features. While maintaining excellent predictive performance, the model demonstrates significant potential for clinical translation through its efficient assessment protocol. We recommend incorporating this standardized evaluation protocol into routine management of RA patients in remission, providing a practical tool for precise risk stratification and individualized treatment decision-making.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1518/rc
Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1518/dss
Funding: This study has received funding from
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1518/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of The Second Hospital of Lanzhou University (No. 2020A-326) and informed consent was obtained from all individual participants.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Lin YJ, Anzaghe M, Schülke S. Update on the Pathomechanism, Diagnosis, and Treatment Options for Rheumatoid Arthritis. Cells 2020;9:880. [Crossref] [PubMed]
- Maneiro JR, Perez-Pampin E, Salgado E, Carmona L, Gomez-Reino JJ. Observational study of optimization of biologic therapies in rheumatoid arthritis: a single-centre experience. Rheumatol Int 2014;34:1059-63. [Crossref] [PubMed]
- Dénarié D, Constant E, Thomas T, Marotte H. Could biomarkers of bone, cartilage or synovium turnover be used for relapse prediction in rheumatoid arthritis patients? Mediators Inflamm 2014;2014:537324. [Crossref] [PubMed]
- Murata K, Ito H, Hashimoto M, Murakami K, Watanabe R, Tanaka M, Yamamoto W, Matsuda S. Fluctuation in anti-cyclic citrullinated protein antibody level predicts relapse from remission in rheumatoid arthritis: KURAMA cohort. Arthritis Res Ther 2020;22:268. [Crossref] [PubMed]
- Molina Collada J, López Gloria K, Castrejón I, Nieto-González JC, Rivera J, Montero F, González C, Álvaro-Gracia JM. Ultrasound in clinically suspect arthralgia: the role of power Doppler to predict rheumatoid arthritis development. Arthritis Res Ther 2021;23:299. [Crossref] [PubMed]
- Mouterde G, Lukas C, Filippi N, Marin G, Molinari N, Combe B, Morel J. Persistence of power Doppler ultrasonography-detected synovitis over 1 year of follow-up predicts poor prognosis in rheumatoid arthritis in clinical remission: the SONORE prospective longitudinal study. RMD Open 2024;10:e004269. [Crossref] [PubMed]
- Colebatch AN, Edwards CJ, Østergaard M, van der Heijde D, Balint PV, D’Agostino MA, et al. EULAR recommendations for the use of imaging of the joints in the clinical management of rheumatoid arthritis. Ann Rheum Dis 2013;72:804-14. [Crossref] [PubMed]
- Filippucci E, Cipolletta E, Mashadi Mirza R, Carotti M, Giovagnoni A, Salaffi F, Tardella M, Di Matteo A, Di Carlo M. Ultrasound imaging in rheumatoid arthritis. Radiol Med 2019;124:1087-100. [Crossref] [PubMed]
- Su J, Han X, Yang F, Song Y, Lei H, Wang X, Fan X, Li Y. Application of Automated Hand Ultrasound Scanning and a Simplified Three-Joint Scoring System for Assessment of Rheumatoid Arthritis Activity. Ultrasound Med Biol 2021;47:2860-8. [Crossref] [PubMed]
- Zabotti A, Finzel S, Baraliakos X, Aouad K, Ziade N, Iagnocco A. Imaging in the preclinical phases of rheumatoid arthritis. Clin Exp Rheumatol 2020;38:536-42.
- Bellis E, Scirè CA, Carrara G, Adinolfi A, Batticciotto A, Bortoluzzi A, et al. Ultrasound-detected tenosynovitis independently associates with patient-reported flare in patients with rheumatoid arthritis in clinical remission: results from the observational study STARTER of the Italian Society for Rheumatology. Rheumatology (Oxford) 2016;55:1826-36. [Crossref] [PubMed]
- van der Maas A, Lie E, Christensen R, Choy E, de Man YA, van Riel P, Woodworth T, den Broeder AA. Construct and criterion validity of several proposed DAS28-based rheumatoid arthritis flare criteria: an OMERACT cohort validation study. Ann Rheum Dis 2013;72:1800-5. [Crossref] [PubMed]
- Szkudlarek M, Court-Payen M, Jacobsen S, Klarlund M, Thomsen HS, Østergaard M. Interobserver agreement in ultrasonography of the finger and toe joints in rheumatoid arthritis. Arthritis Rheum 2003;48:955-62. [Crossref] [PubMed]
- Beitinger N, Ehrenstein B, Schreiner B, Fleck M, Grifka J, Lüring C, Hartung W. The value of colour Doppler sonography of the knee joint: a useful tool to discriminate inflammatory from non-inflammatory disease? Rheumatology (Oxford) 2013;52:1425-8. [Crossref] [PubMed]
- Vreju F, Ciurea M, Roşu A, Muşetescu A, Grecu D, Ciurea P. Power Doppler sonography, a non-invasive method of assessment of the synovial inflammation in patients with early rheumatoid arthritis. Rom J Morphol Embryol 2011;52:637-43.
- Naredo E, D’Agostino MA, Wakefield RJ, Möller I, Balint PV, Filippucci E, Iagnocco A, Karim Z, Terslev L, Bong DA, Garrido J, Martínez-Hernández D, Bruyn GAOMERACT Ultrasound Task Force. Reliability of a consensus-based ultrasound score for tenosynovitis in rheumatoid arthritis. Ann Rheum Dis 2013;72:1328-34. [Crossref] [PubMed]
- Terslev L, Naredo E, Aegerter P, Wakefield RJ, Backhaus M, Balint P, Bruyn GAW, Iagnocco A, Jousse-Joulin S, Schmidt WA, Szkudlarek M, Conaghan PG, Filippucci E, D’Agostino MA. Scoring ultrasound synovitis in rheumatoid arthritis: a EULAR-OMERACT ultrasound taskforce-Part 2: reliability and application to multiple joints of a standardised consensus-based scoring system. RMD Open 2017;3:e000427. [Crossref] [PubMed]
- Trier NH, Houen G. Anti-citrullinated protein antibodies as biomarkers in rheumatoid arthritis. Expert Rev Mol Diagn 2023;23:895-911. [Crossref] [PubMed]
- Garcia-Montoya L, Kang J, Duquenne L, Di Matteo A, Nam JL, Harnden K, Chowdhury R, Mankia K, Emery P. Factors associated with resolution of ultrasound subclinical synovitis in anti-CCP-positive individuals with musculoskeletal symptoms: a UK prospective cohort study. Lancet Rheumatol 2024;6:e72-80. [Crossref] [PubMed]
- Gessl I, Hana CA, Deimel T, Durechova M, Hucke M, Konzett V, Popescu M, Studenic P, Supp G, Zauner M, Smolen JS, Aletaha D, Mandl P. Tenderness and radiographic progression in rheumatoid arthritis and psoriatic arthritis. Ann Rheum Dis 2023;82:344-50. [Crossref] [PubMed]
- Smolen JS, Landewé RBM, Bijlsma JWJ, Burmester GR, Dougados M, Kerschbaumer A, et al. EULAR recommendations for the management of rheumatoid arthritis with synthetic and biological disease-modifying antirheumatic drugs: 2019 update. Ann Rheum Dis 2020;79:685-99. [Crossref] [PubMed]
- Canhão H, Rodrigues AM, Gregório MJ, Dias SS, Melo Gomes JA, Santos MJ, Faustino A, Costa JA, Allaart C, Gvozdenović E, van der Heijde D, Machado P, Branco JC, Fonseca JE, Silva JA. Common Evaluations of Disease Activity in Rheumatoid Arthritis Reach Discordant Classifications across Different Populations. Front Med (Lausanne) 2018;5:40. [Crossref] [PubMed]
- Elangovan S, Tan YK. The Role of Musculoskeletal Ultrasound Imaging in Rheumatoid Arthritis. Ultrasound Med Biol 2020;46:1841-53. [Crossref] [PubMed]
- Harnden K, Di Matteo A, Mankia K. When and how should we use imaging in individuals at risk of rheumatoid arthritis? Front Med (Lausanne) 2022;9:1058510. [Crossref] [PubMed]
- Picchianti Diamanti A, Navarini L, Messina F, Markovic M, Arcarese L, Basta F, Meneguzzi G, Margiotta DPE, Laganà B, Afeltra A, D’Amelio R, Iagnocco A. Ultrasound detection of subclinical synovitis in rheumatoid arthritis patients in clinical remission: a new reduced-joint assessment in 3 target joints. Clin Exp Rheumatol 2018;36:984-9.
- Sidhu N, Wouters F, Niemantsverdriet E, van der Helm-van Mil AHM. MRI-detected synovitis of the small joints predicts rheumatoid arthritis development in large joint undifferentiated inflammatory arthritis. Rheumatology (Oxford) 2022;61:SI23-9. [Crossref] [PubMed]
- Wechalekar MD, Lester S, Hill CL, Lee A, Rischmueller M, Smith MD, Walker JG, Proudman SM. Active Foot Synovitis in Patients With Rheumatoid Arthritis: Unstable Remission Status, Radiographic Progression, and Worse Functional Outcomes in Patients With Foot Synovitis in Apparent Remission. Arthritis Care Res (Hoboken) 2016;68:1616-23. [Crossref] [PubMed]
- Simonsen MB, Hørslev-Petersen K, Cöster MC, Jensen C, Bremander A. Foot and Ankle Problems in Patients With Rheumatoid Arthritis in 2019: Still an Important Issue. ACR Open Rheumatol 2021;3:396-402. [Crossref] [PubMed]
- Elsaman AM, Mostafa ES, Radwan AR. Ankle Evaluation in Active Rheumatoid Arthritis by Ultrasound: A Cross-Sectional Study. Ultrasound Med Biol 2017;43:2806-13. [Crossref] [PubMed]
- Bruyn GA, Hanova P, Iagnocco A, d’Agostino MA, Möller I, Terslev L, Backhaus M, Balint PV, Filippucci E, Baudoin P, van Vugt R, Pineda C, Wakefield R, Garrido J, Pecha O, Naredo E. Ultrasound definition of tendon damage in patients with rheumatoid arthritis. Ultrasound definition of tendon damage in patients with rheumatoid arthritis. Results of a OMERACT consensus-based ultrasound score focussing on the diagnostic reliability. Ann Rheum Dis 2014;73:1929-34. [Crossref] [PubMed]
- Filippou G, Sakellariou G, Scirè CA, Carrara G, Rumi F, Bellis E, et al. The predictive role of ultrasound-detected tenosynovitis and joint synovitis for flare in patients with rheumatoid arthritis in stable remission. Results of an Italian multicentre study of the Italian Society for Rheumatology Group for Ultrasound: the STARTER study. Ann Rheum Dis 2018;77:1283-9. [Crossref] [PubMed]
- Matsuo H, Kamada M, Imamura A, Shimizu M, Inagaki M, Tsuji Y, Hashimoto M, Tanaka M, Ito H, Fujii Y. Machine learning-based prediction of relapse in rheumatoid arthritis patients using data on ultrasound examination and blood test. Sci Rep 2022;12:7224. [Crossref] [PubMed]


