T2 mapping of healthy knee cartilage: multicenter multivendor reproducibility
Introduction
Quantitative magnetic resonance imaging (qMRI) techniques to assess changes in biochemical cartilage composition in osteoarthritis (OA) are emerging (1). By detecting cartilage degeneration before it is visible on radiography or conventional MRI, qMRI techniques enable early intervention and monitoring of disease progression in OA (2). T2 mapping, which provides a marker for collagen integrity without the need for intravenous contrast or specific MRI hardware (2-5), is the most widely used qMRI technique in knee OA research (5,6). Although cartilage T2 mapping has found wide-spread use in OA research (7), reproducibility studies on T2 mapping in a multicenter setting are scarce. Longitudinal reproducibility analyses of multicenter cartilage T2 mapping have been limited to studies using similar scanners and harmonized MRI acquisition protocols (5,8,9). However, differences in MRI hardware and T2 mapping sequences, which may be attributable to local requirements and restrictions regarding MRI acquisition, are often present when performing a multicenter trial. Complete standardization of MRI acquisition across different centers is, therefore, not always feasible, especially in large-scale multidisciplinary clinical trials. Little is known about the longitudinal reproducibility of cartilage T2 values acquired on MRI scanners from different vendors and with non-harmonized acquisition protocols. The aim of the present study was to evaluate the multicenter reproducibility of cartilage T2 mapping, from a clinical and pragmatic perspective. We assessed the longitudinal T2 mapping reproducibility and the variation of T2 relaxation times among various MRI systems with different field strengths and acquisition protocols.
Methods
Study design
Five medical centers located in different geographical parts of The Netherlands participated in this prospective observational study. In these centers, a multicenter randomized controlled trial (RCT) is currently conducted on the outcomes of conservative versus operative treatment of a traumatic meniscal tear (trial number NTR 4511). T2 mapping is used as an outcome measure for deterioration of knee cartilage two years after a meniscal tear in this study. Four traveling human subjects underwent MR imaging of the knee, including a T2 mapping sequence, at each of the five centers in one day (i.e., baseline measurements). To evaluate longitudinal reproducibility of T2 mapping, the exact same experiment was performed 6 months later (i.e., follow-up measurements). Subjects were scanned in the same order in each center, both at baseline and follow-up. Moreover, centers were visited in the same order and at the same time of day to address potential diurnal variation in T2 measurements. To assess the variation of T2 values across centers, cross-validation was performed in the human subjects as well as a phantom. Approval from the Institutional Review Board of our institution (MEC 2014-096) and written consent of all subjects was obtained.
Human subjects and phantom
For in vivo T2 measurements, the left knee of four healthy volunteers (median age 29 years, range 25–30 years, median BMI 21.5 kg/m2, three females) was scanned. The subjects had no history of knee pathology and did not report any knee complaints or injuries before or during the 6 months between scans. During baseline- and follow-up measurement days, subjects all had the same physical activity level without significant exercise or heavy loading. The subjects traveled by car; the same car was used during baseline- and follow-up measurements. None of the subjects engaged in significant exercise or heavy loading of the knee two days preceding the measurement days. An in-house developed phantom was scanned once at each center to assess the variation of the T2 values. The phantom consisted of eight vials of 3 cm diameter, containing various concentrations of manganese chloride (0 to 80 mg/mL). These concentrations were selected to encompass T2 values within the range of human articular cartilage (1).
Data acquisition
MRI acquisition parameters are summarized per center in Table 1. MRI scanners manufactured by GE Healthcare (Milwaukee, WI, USA), Siemens (Erlangen, Germany) and Philips (Eindhoven, The Netherlands) were used for this study; three 3-Tesla scanners (GE, Siemens and Philips), and two 1.5-Tesla scanners (both Siemens). Dedicated knee coils were used in each center; either receive only or combined transmit-receive. MRI protocols were optimized in each center according to locally available MRI hardware and software. All knees were scanned in the sagittal plane. For phantom measurements, the same T2 mapping protocol was used as for human subjects. For the purpose of cartilage segmentation in vivo, a sagittal high-resolution fast-spoiled gradient-echo (FSPGR) sequence with fat-saturation was acquired of each subject at center 1 at baseline. None of the MRI systems or acquisition protocols underwent updates or adjustments during the study period.
Full table
Image processing
An in-house developed MATLAB (R2011a; The Math-Works, Natick, MA, USA) extension was used for post-processing analyses of all scans (10). Rigid registration in 3D provided motion compensation between echo times of the T2 mappings scans. All T2 mapping scans were registered to the high-resolution FSPGR scan acquired at baseline at center 1, to ensure that exactly matching regions of interest (ROIs) were measured. Full-thickness cartilage masks of the central portion of the medial and lateral tibiofemoral compartment were manually segmented on the subjects’ high-resolution FSPGR scans. Segmentation was performed by a researcher with a medical degree and four years of experience in musculoskeletal imaging (JV) on five slices with a three-millimeter-interval. Subsequently, the segmented masks were divided into six cartilage ROIs, located in the medial and lateral weight-bearing and posterior femoral condyles and tibial plateaus (Figure 1) as scans will be analyzed in the same manner in the aforementioned RCT on the outcomes traumatic meniscal tear treatment. The outer perimeters of the menisci demarcated the weight-bearing ROIs of the femur and tibia. The posterior ROIs contained the femoral cartilage behind the posterior border of the menisci. Within each ROI, mean T2 relaxation time was computed using a weighted averaging procedure (10). Besides T2 values per ROI, an average T2 value per patient was calculated to assess the variation of T2 relaxation times across centers. The automated registration of the follow-up T2 mapping scan to the high-resolution scan yielded visually inaccurate registration in two measurements (center 3; subject 3 and center 4; subject 4). For these measurements, cartilage was segmented directly on T2 mapping images while ensuring that the regions matched those segmented on the high-resolution scan. In phantom scans, a central circle of approximately 2 cm diameter was segmented directly on the T2 mapping images, on four consecutive slices of 3 mm thickness.
Statistical analyses
The longitudinal reproducibility of T2 measurements in each cartilage ROI and the ROIs combined was evaluated with intraclass correlation coefficients (ICCs) for absolute agreement of single measures, using a two-way random model. As there were not enough subjects to calculate an ICC per center, we pooled the T2 values of all subjects from all centers. To interpret ICC findings, we used the following scale: poor (ICC <0.5), moderate (ICC 0.5–0.7), good (ICC 0.7–0.9), or excellent (ICC >0.9) reproducibility (11). To assess the reproducibility per center, we calculated coefficients of variation (CVs, defined as the standard deviation (SD) normalized by the mean value of the measurements) of the differences in T2 measurements between both measurements for each subject. Since averaging the subject’s CVs to obtain pooled CVs for each center and for each cartilage ROI is inadequate (12,13), we calculated the root-mean-square coefficient of variation (RMS-CV, expressed as a percentage) according to the method of Glüer et al. (12). RMS-CV is defined as the square root of the sum of the squared CVs for each subject, divided by the sample size. An RMS-CV value of zero represents a perfect precision of agreement. A Bland-Altman plot was made per ROI to determine limits of agreement of T2 measurements, in order to gain insight into the extent and nature of the error (i.e., systematic or random error), and to identify possible outliers. The limits of agreement were defined as the mean difference in T2 values between baseline and follow-up measurements (i.e., the mean error) ±1.96 SD.
To assess the variation of T2 relaxation times across centers, we compared the T2 relaxation times of the subjects (average T2 value per subject) of the baseline measurements and the phantom between centers. Variation in T2 values was analyzed using one-way ANOVA with Dunn’s Multiple Comparison Test. Data was tested for normality using Shapiro-Wilk tests. P values <0.05 were considered statistically significant. Statistical analyses were performed using SPSS version 24.0 (IBM Corp., Armonk, NY, USA, 2016) and GraphPad Prism version 8.0 (GraphPad Software, San Diego California USA, 2018).
Results
Longitudinal reproducibility of in vivo T2 measurements
The ICCs of the T2 measurements pooled across all centers ranged from 0.73 to 0.91 for the different ROIs, indicating a good to excellent reproducibility (Table 2). When using the average T2 values per subject, we found an excellent reproducibility with an ICC of 0.90. In the same table, the RMS-CVs of the longitudinal T2 measurements per center are presented for the different ROIs and the ROIs combined. The overall (average T2 value per subject) RMS-CV in each center ranged from 0.6% to 1.6%. The Bland-Altman plot revealed a mean difference of −0.11 milliseconds between baseline and follow-up T2 measurements (Figure 2). Lowest mean differences were observed in center 1 and center 5, indicating highest reproducibility. A systematic error was not observed.
Full table
Two (out of 120) data points of the follow-up measurements were excluded from analysis. The lateral posterior femoral condyle of subject 1 in center 2 and the lateral tibial plateau of subject 4 in center 3 showed T2 values beyond plausible ranges (>150 milliseconds). The invalid T2 value of the first mentioned ROI was due to substantial excess blurring in the slice direction in that particular scan. Non-saturated fat signals, causing partial volume effects, were most likely responsible for the invalid value of the other excluded ROI.
Multicenter variation of in vivo and phantom T2 measurements
In Figure 3A, the average T2 values per subject are plotted for each center, showing discrepancies across centers. A statistically significant difference in T2 values was found between center 1 and center 4 (P<0.01). However, mutual differences in T2 values between subjects were consistent across all centers. Moreover, phantom T2 measurements showed a comparable pattern of differences in T2 values across centers as seen in vivo, especially in vials with lower concentration of manganese chloride (Figure 3B). Phantom stability was verified [ICC 0.90, 95% CI (0.856–0.928) over a 6-month-interval].
Discussion
The reproducibility of qMRI techniques such as T2 mapping is a highly relevant issue that multicenter studies are facing. In the present study, we evaluated the longitudinal reproducibility and variation of T2 measurements in different cartilage ROIs in a multicenter setting, using various MRI systems and acquisition protocols. ICCs for longitudinal T2 measurements ranged from 0.73 to 0.91 with RMS-CVs ranging from 0.6% to 1.6%, indicating good to excellent longitudinal reproducibility. Our results indicate that T2 mapping allows reliable evaluation of intra-subject changes in cartilage T2 values, given that subjects are evaluated on the same scanner at each time point. These findings highlight the value of T2 mapping as non-invasive biomarker to longitudinally assess changes in cartilage tissue composition in clinical trials, and, potentially, in future clinical practice.
Our findings are consistent with a previous single center reproducibility study (9), using a 3 Tesla scanner, reporting RMS-CVs of 3.2% to 6.3% over a 2-month-interval. A multicenter, single vendor study by Li et al. (8), evaluated longitudinal reproducibility of cartilage T2 values of two traveling subjects acquired at two locations with similar types of MRI scanner and sequence parameters over a 10-month-interval. In the latter study, a RMS-CV of 5.1% was reported, whereas ICCs were not described. Although using identical scanners and harmonized T2 mapping protocols would be optimal from an imaging perspective, mandating uniform MRI equipment is not always feasible when performing a multicenter trial. Differences in MRI hardware and T2 mapping sequences are often present across centers, and local requirements and restrictions (e.g., regarding acquisition time) in participating centers may prevail over optimal imaging strategies. Thus, assessing reproducibility in a multicenter multivendor setting is of key importance for future implementation of T2 mapping in OA research, such that differences in T2 values across centers can be taken into consideration. An overall assessment of reproducibility of cartilage T2 measurements was provided in a multicenter multivendor by Mosher and colleagues (5). Longitudinal cartilage T2 measurements were evaluated by pooling 50 subjects, involving patients with OA and asymptomatic control subjects, from five centers using two different MRI vendors. A moderate to excellent reproducibility (ICC between 0.61 and 0.98) was reported over a 2-month-interval, with RMS-CVs ranging from 5% to 9% in healthy volunteers. As none of the subjects in the latter study underwent MRI scanning in more than one scanner, the within-subject reproducibility across centers could not be assessed. To our knowledge, the present work is the first study assessing the longitudinal reproducibility of cartilage T2 mapping in a multicenter multivendor setting, using traveling human subjects.
When evaluating longitudinal reproducibility of the five participating centers, longitudinal T2 measurements from center 1 and center 5 showed the lowest RMS-CVs and the lowest mean differences. A potential explanation for this finding could be the use of fast spin echo (FSE) pulse sequences in center 1 and 5 whereas the remaining centers uses spin echo (SE) sequences (14).
Many factors can potentially cause longitudinal variation in T2 measurements, apart from biological changes. These include environmental factors (e.g., MRI room temperature), upgrades in MRI hardware or software, changes in phantom composition, subject features (exercise, knee flexion), and diurnal variation in T2 measurements (8,9). In the present study, all efforts were made to maintain conditions constant: stability in room temperatures, and no hardware or software updates during the experiment. Great care was taken to minimize and standardize physical activity level of the subjects, prior to and during scanning days. Furthermore, centers were visited in the same order at baseline and follow-up, and in each center, measurements took place at the same time of day to address potential diurnal variation in T2 values.
We observed discrepancies in T2 values across centers, both in vivo and in the phantom. These findings are in line with previous studies on multicenter variation of cartilage T2 measurements (9). Several factors could potentially explain the inter-scanner differences in T2 values we found. First, scanners from three different MRI vendors were used in this study. A multivendor comparability study by Balamoody and colleagues reported significant inter-scanner differences in cartilage T2 values of 12 healthy subjects across three centers with different MRI vendors (GE Healthcare, Siemens and Philips). As in our study, T2 values obtained with GE equipment were lower compared to Siemens and Philips T2 values. A relevant potential source of variation in T2 values from various MRI vendors are the differences in radiofrequency coil provided by each vendor (15,16), in particular the use of receive only versus transmit and receive coils. Dardzinski et al. reported higher cartilage T2 values and lower RMS-CVs using a receive only coil compared to a transmit and receive coil (15), similar to our findings. Second, magnetic field strength among centers varied in our study, potentially influencing T2 values (17,18). Finally, different T2 mapping techniques were used among centers. In center 1, a 3D FSE pulse sequence was used, whereas the remaining centers used 2D sequences. In a study by Matzat et al. (14), the influence of different T2 mapping sequence protocols in a single scanner was assessed. In the latter study, 2D FSE resulted in 28% (SD 19%) higher T2 values than 3D FSE. A possible explanation for this could be the stimulated echo effect in the second echo time and onwards. This might have led to artificially higher T2 values in center 2, 3, 4 and 5, compared to the 3D sequence of center 1. Also, the application of fat saturation in T2 mapping sequences could have been a potential source of variation in T2 values across centers. Center 2 and center 3 used a non-fat-suppressed sequence and generated relatively low T2 values. This is in line with a study by Ryu et al. (19), reporting that non-fat-suppressed T2 mapping results in higher T2 values and less reproducible T2 measurements compared to fat-suppressed T2 mapping. A systematic study investigating the causes of the observed differences in T2 values across centers, with the aim of providing protocols that result in comparable T2 values for different vendors and T2 mapping techniques would be valuable, but this is beyond the scope of the current study. For now, we conclude that absolute T2 values across centers should not be assumed to be comparable and should therefore not be pooled. In multicenter clinical trials, researchers should focus on intra-subject T2 changes rather than absolute mean T2 values across subject groups.
The present study has limitations that must be noted. First, our sample size was small. We opted to perform T2 measurements at each of the five centers in one day, hence only a limited sample size was feasible. Consequently, this study was statistically underpowered to report ICCs for longitudinal reproducibility of each center individually. With a larger sample size it might have been possible to find reference T2 values of healthy cartilage for each scanner (brand and field strength), which was beyond the scope of the current study. Second, as our study was limited to healthy subjects, it is not sure whether these findings are generalizable to OA subjects and care should be taken to use this information in other contexts such as cartilage repair.
Conclusions
In this multicenter multivendor study, in vivo cartilage T2 mapping showed a good to excellent longitudinal reproducibility. Our results suggest that T2 mapping can be used to longitudinally assess intra-subject changes in cartilage degeneration in multicenter studies, yet these findings must be interpreted with caution considering the size and nature (i.e., healthy subjects) of the study population. Given the variation in T2 values across centers, absolute T2 values obtained in various centers in multicenter multivendor clinical trials should not be pooled.
Acknowledgments
We gratefully thank Lisebette Burger and Annika Willems for volunteering during both scanning days. In addition, the authors would like to thank Chris Bijl (Department of Radiology, Máxima Medical Center, Eindhoven, The Netherlands), Scott Martin (Department of Radiology, Onze Lieve Vrouwe Gasthuis, Amsterdam, The Netherlands), and Stefan van der Linden (Department of Radiology, Sint Antonius Ziekenhuis, Utrecht, The Netherlands) for their assistance with MR scanning. EHGO and JAHT receive research support from GE Healthcare.
Funding: None.
Footnote
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at http://dx.doi.org/10.21037/qims-20-674). EHGO serves as an unpaid editorial board member of Quantitative Imaging in Medicine and Surgery. The authors have no other conflicts of interest to declare.
Ethical Statement: This study was approved by the Institutional Review Board of our institution (MEC 2014-096) and written consent of all subjects was obtained.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Oei EH, van Tiel J, Robinson WH, Gold GE. Quantitative radiologic imaging techniques for articular cartilage composition: toward early diagnosis and development of disease-modifying therapeutics for osteoarthritis. Arthritis Care Res (Hoboken) 2014;66:1129-41. [Crossref] [PubMed]
- Baum T, Joseph GB, Karampinos DC, Jungmann PM, Link TM, Bauer JS. Cartilage and meniscal T2 relaxation time as non-invasive biomarker for knee osteoarthritis and cartilage repair procedures. Osteoarthritis Cartilage 2013;21:1474-84. [Crossref] [PubMed]
- Li X, Cheng J, Lin K, Saadat E, Bolbos RI, Jobke B, Ries MD, Horvai A, Link TM, Majumdar S. Quantitative MRI using T1rho and T2 in human osteoarthritic cartilage specimens: correlation with biochemical measurements and histology. Magn Reson Imaging 2011;29:324-34. [Crossref] [PubMed]
- Kim T, Min BH, Yoon SH, Kim H, Park S, Lee HY, Kwack KS. An in vitro comparative study of T2 and T2* mappings of human articular cartilage at 3-Tesla MRI using histology as the standard of reference. Skeletal Radiol 2014;43:947-54. [Crossref] [PubMed]
- Mosher TJ, Zhang Z, Reddy R, Boudhar S, Milestone BN, Morrison WB, Kwoh CK, Eckstein F, Witschey WR, Borthakur A. Knee articular cartilage damage in osteoarthritis: analysis of MR image biomarker reproducibility in ACRIN-PA 4001 multicenter trial. Radiology 2011;258:832-42. [Crossref] [PubMed]
- Guermazi A, Roemer FW, Burstein D, Hayashi D. Why radiography should no longer be considered a surrogate outcome measure for longitudinal assessment of cartilage in knee osteoarthritis. Arthritis Res Ther 2011;13:247. [Crossref] [PubMed]
- OsteoArthritis_Initiative. Webpage: Publications & Presentations. Available online: http://www.oai.ucsf.edu/
- Li X, Pedoia V, Kumar D, Rivoire J, Wyatt C, Lansdown D, Amano K, Okazaki N, Savic D, Koff MF, Felmlee J, Williams SL, Majumdar S. Cartilage T1rho and T2 relaxation times: longitudinal reproducibility and variations using different coils, MR systems and sites. Osteoarthritis Cartilage 2015;23:2214-23. [Crossref] [PubMed]
- Balamoody S, Williams TG, Wolstenholme C, Waterton JC, Bowes M, Hodgson R, Zhao S, Scott M, Taylor CJ, Hutchinson CE. Magnetic resonance transverse relaxation time T2 of knee cartilage in osteoarthritis at 3-T: a cross-sectional multicentre, multivendor reproducibility study. Skeletal Radiol 2013;42:511-20. [Crossref] [PubMed]
- Bron EE, van Tiel J, Smit H, Poot DH, Niessen WJ, Krestin GP, Weinans H, Oei EH, Kotek G, Klein S. Image registration improves human knee cartilage T1 mapping with delayed gadolinium-enhanced MRI of cartilage (dGEMRIC). Eur Radiol 2013;23:246-52. [Crossref] [PubMed]
- Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med 2016;15:155-63. [Crossref] [PubMed]
- Glüer CC, Blake G, Lu Y, Blunt BA, Jergas M, Genant HK. Accurate assessment of precision errors: how to measure the reproducibility of bone densitometry techniques. Osteoporos Int 1995;5:262-70. [Crossref] [PubMed]
- Kiebzak GM, Morgan SL, Peace F. Which to use to evaluate change in BMD at follow-up: RMS-SD or RMS-%CV? J Clin Densitom 2012;15:26-31. [Crossref] [PubMed]
- Matzat SJ, McWalter EJ, Kogan F, Chen W, Gold GE. T2 Relaxation time quantitation differs between pulse sequences in articular cartilage. J Magn Reson Imaging 2015;42:105-13. [Crossref] [PubMed]
- Dardzinski BJ, Schneider E. Radiofrequency (RF) coil impacts the value and reproducibility of cartilage spin-spin (T2) relaxation time measurements. Osteoarthritis Cartilage 2013;21:710-20. [Crossref] [PubMed]
- Chang G, Wiggins GC, Xia D, Lattanzi R, Madelin G, Raya JG, Finnerty M, Fujita H, Recht MP, Regatte RR. Comparison of a 28-channel receive array coil and quadrature volume coil for morphologic imaging and T2 mapping of knee cartilage at 7T. J Magn Reson Imaging 2012;35:441-8. [Crossref] [PubMed]
- Glaser C, Horng A, Mendlik T, Weckbach S, Hoffmann RT, Wagner S, Raya JG, Horger W, Reiser M. T2 relaxation time in patellar cartilage--global and regional reproducibility at 1.5 tesla and 3 tesla. Rofo 2007;179:146-52. [Crossref] [PubMed]
- Welsch GH, Apprich S, Zbyn S, Mamisch TC, Mlynarik V, Scheffler K, Bieri O, Trattnig S. Biochemical (T2, T2* and magnetisation transfer ratio) MRI of knee cartilage: feasibility at ultra-high field (7T) compared with high field (3T) strength. Eur Radiol 2011;21:1136-43. [Crossref] [PubMed]
- Ryu YJ, Hong SH, Kim H, Choi JY, Yoo HJ, Kang Y, Park SJ, Kang HS. Fat-suppressed T2 mapping of femoral cartilage in the porcine knee joint: A comparison with conventional T2 mapping. J Magn Reson Imaging 2017;45:1076-81. [Crossref] [PubMed]