An intelligent quantitative analysis software for videofluoroscopic swallowing study in patients with dysphagia
Original Article

An intelligent quantitative analysis software for videofluoroscopic swallowing study in patients with dysphagia

Miao Wu1,2# ORCID logo, Fengmei Li3,4# ORCID logo, Chen Geng3 ORCID logo, Surong Qian1,2 ORCID logo, Yakang Dai3 ORCID logo, Tong Wang5 ORCID logo

1Department of Rehabilitation Medicine, the Affiliated Suzhou Hospital of Nanjing Medical University, Suzhou, China; 2Suzhou Municipal Hospital, Gusu School, Nanjing Medical University, Suzhou, China; 3Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, China; 4Division of Life Sciences and Medicine, School of Biomedical Engineering (Suzhou), University of Science and Technology of China, Hefei, China; 5Department of Rehabilitation Medicine, the First Affiliated Hospital of Nanjing Medical University, Nanjing, China

Contributions: (I) Conception and design: M Wu, F Li, C Geng; (II) Administrative support: Y Dai, T Wang; (III) Provision of study materials or patients: M Wu, S Qian, T Wang; (IV) Collection and assembly of data: M Wu, S Qian, F Li; (V) Data analysis and interpretation: F Li, C Geng; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

Correspondence to: Tong Wang, PhD. Department of Rehabilitation Medicine, the First Affiliated Hospital of Nanjing Medical University, No. 300, Guangzhou Road, Nanjing 210029, China. Email: wangtong60621@163.com; Yakang Dai, PhD; Chen Geng, PhD. Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, No. 88 Keling Road, Suzhou 215163, China. Email: daiyk@sibet.ac.cn; gengc@sibet.ac.cn.

Background: Videofluoroscopic swallowing study (VFSS) employs quantitative analysis methods, which are valued in dysphagia diagnosis for their objectivity and precision. Nonetheless, conventional methods are laborious and time-consuming. Despite advancements in automatic tracking methods, existing software still requires substantial manual intervention and offers a restricted set of quantitative metrics. This study aimed to develop an intelligent VFSS quantitative analysis tool that assists clinicians in dysphagia diagnosis by enabling automated tracking and providing a comprehensive set of three kinematic and seven temporal parameters.

Methods: This software utilizes a feature-based target tracking and contour extraction algorithm, which enables accurate and automated detection of hyoid bone displacement, upper esophageal sphincter (UES) opening amplitude and pharyngeal contraction ratio. This study analyzed 82 VFSS samples from Suzhou Municipal Hospital, comprising 40 from 18 dysphagia patients with varied etiologies and 42 from 14 healthy controls. Agreement between automated and manual tracking for the three kinematic parameters was evaluated using Pearson correlation coefficients and relative errors (%).

Results: In both patient and control groups, the results showed strong correlations (Pearson’s r ranging from 0.947 to 0.995, P value <0.001) between automatic and manual methods across three kinematic parameters. The relative errors of three parameters of two-dimensional range were 5.30±3.79, 3.72±1.93, and 5.22±3.25 in dysphagic patients and 4.62±3.46, 3.07±2.02, and 5.43±3.69 in controls. Comparative analysis demonstrated significantly reduced values in dysphagia patients versus healthy controls across three parameters, characterizing compromised swallowing biomechanics in the dysphagia population. While manual analysis typically requires about one hour per case, the proposed platform completes automatic quantitative analysis within 3–4 minutes per sample.

Conclusions: The developed software provides an efficient and user-friendly platform that streamlines dysphagia diagnosis through the automatic tracking and assessment of essential parameters, thereby enhancing the diagnostic accuracy and workflow efficiency.

Keywords: Dysphagia; quantitative analysis; videofluoroscopic swallowing study (VFSS); automatic tracking


Submitted Jan 17, 2025. Accepted for publication Oct 21, 2025. Published online Oct 21, 2025.

doi: 10.21037/qims-2025-134


Introduction

Dysphagia, a condition resulting from structural or functional impairments of the jaw, lips, tongue, and other associated organs, disrupts the normal transit of food from the oral cavity to the stomach. It is frequently observed in patients with esophageal diseases, stroke, Parkinson’s disease, and a variety of other conditions (1-3), with notable health disparities across sociodemographic groups. Epidemiological studies reveal significantly higher prevalence rates among older adults, with the incidence doubling in those over 80 compared to younger populations (4). These impairments significantly impact the quality of life and may lead to severe health complications, including pneumonia and malnutrition (5-7). In some cases, silent aspiration occurs without evident dysphagia symptoms, potentially delaying optimal treatment. Therefore, early assessment of dysphagia is crucial (8).

Currently, tools available for evaluating dysphagia include videofluoroscopic swallowing study (VFSS), fiberoptic endoscopic evaluation of swallowing (FEES), and oropharyngeal-esophageal scintigraphy (9). Among these, VFSS is regarded as the “gold standard” and the “ideal method” for dysphagia assessment (10). Analysis methods range from qualitative and semi-quantitative to fully quantitative approaches. Qualitative analysis, known for its simplicity and efficiency, is widely used in clinical practice. However, its accuracy is influenced by factors such as patient compliance and the experience level of the physicians (11). Quantitative analysis, involving frame-by-frame examination of fluoroscopic videos to measure temporal and kinematic parameters, allows for precise quantification. The application of digital quantitative methods has been shown to enhance the detection of swallowing impairments and improve inter-rater reliability compared to traditional qualitative assessments (12,13). It provides precise information that aids in identifying dysfunctions and monitoring disease progression. Despite its comprehensiveness and accuracy, manual quantitative assessment is time-consuming, often taking over an hour per case, and is susceptible to subjective biases.

Traditional manual quantitative analysis methods are plagued by several limitations, including susceptibility to artificial error and inefficiency. With the development of advanced imaging and deep learning technologies, many studies have been dedicated to the automated assessment of swallowing disorders, including the automatic identification of hyoid bone movement and upper esophageal sphincter (UES) opening. The hyoid bone is a key kinematic parameter for classifying dysphagia and evaluating treatment outcomes (14-21). Its upward movement facilitates UES opening, enabling bolus passage into the esophagus, while improper movement may lead to aspiration (22). Previous studies have employed computer-assisted methods to quantify hyoid kinematics during swallowing. Perlman et al. (23) developed an early computer-based approach using calibrated video imaging to measure hyoid displacement, while Kellen et al. (24) subsequently advanced this methodology through automated tracking algorithms for more precise assessment of hyoid motion. Kim et al. (25) proposed an algorithm for tracking occluded targets and conducting automatic hyoid movement segmentation, and Lee et al. (26) developed software for diagnosing swallowing disorders. Deep learning advancements, including SSD-based algorithm, convolutional neural network (CNN)-based online learning method and U-Net model, have enabled automatic hyoid detection and tracking (18-20). In addition, UES opening duration is essential for normal swallowing; delays or premature closure can cause pharyngeal residue (27,28). Khalifa et al. (29) used a convolutional recurrent neural network to estimate UES timings, while Bandini and Steele (30) confirmed CNN-based methods accurately detect bolus passage past the mandible and UES closure frames, consistent with Lee et al. (31-33). Jeong et al. (34) utilized ResNet3D to measure temporal swallowing parameters.

However, there is a scarcity of commercially available automated dysphagia assessment platform capable of integrating multi-parameter evaluation; most clinicians still rely on manual quantitative assessments, which are extremely time-consuming and labor-intensive. In addition, most existing studies focus solely on tracking hyoid or UES parameters, providing relatively limited measurement indices that fail to comprehensively capture the complex physiological changes during the swallowing process. To address these challenges effectively, an intelligent quantitative analysis software for VFSS in patients with dysphagia (IQSD) was designed for clinicians to objectively and rapidly assess swallowing function. This software can help clinicians automatically or semi-automatically measure seven temporal parameters and three kinematic parameters associated with the swallowing process. This study also conducts a manual-automatic comparison test using VFSS data from dysphagia patients to evaluate the software’s performance. The design and implementation of IQSD could satisfy the following key requirements:

  • Objectivity: The software provides objective and reproducible quantitative assessments of swallowing by automatically or semi-automatically measuring three kinematic parameters and seven temporal parameters.
  • Comprehensiveness: The software provides the most comprehensive and precise quantitative metrics for VFSS analysis. These metrics are essential for establishing normative data and identifying impairments in individual patients.
  • Stability: The software ensures the reliability of analysis results, independent of the uniformity of training or the professional skill level of clinicians. This means that consistent and accurate outcomes can be achieved regardless of the user’s background or experience.
  • Efficiency: The system drastically reduces the analysis time per case to just 3–4 minutes, facilitating efficient data analysis and processing in clinical and research work.

We present this article in accordance with the TRIPOD+AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-134/rc).


Methods

Study population

This retrospective study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of Suzhou Municipal Hospital (No. KL901168), and the requirement of patient approval or written informed consent for reviewing medical records or images was waived. This study was retrospective and observational in nature, in accordance with relevant regulations and Ethics Committee guidance; clinical trial registration is not applicable to this study. In addition, there is no patient or public involvement during the design, conduct, reporting, interpretation or dissemination of the study. The data that support the findings of this study are available from the corresponding author upon reasonable request. The data are not publicly available due to privacy or ethical restrictions.

This study retrospectively analyzed 82 VFSS collected at Suzhou Municipal Hospital between November 2021 and May 2024. A total of 116 swallowing sequences were initially considered, of which 34 sequences were excluded based on the predefined criteria, resulting in the final inclusion of 40 swallowing sequences from 18 dysphagic patients and 42 sequences from 14 age-matched healthy controls. All participants met strict inclusion and exclusion criteria. Inclusion criteria required dysphagic patients to exhibit dysphagia symptoms and stable vital signs. Exclusion criteria included: (I) cases in which subjects experienced coughing or swallowing failure during the process; (II) cases with senile dementia who could not cooperate, or who retained food in their mouth without swallowing; (III) cases where pharyngeal residue was present and where food needed to be swallowed in multiple portions due to swallowing difficulty; (IV) cases moved vigorously or showed significant head tilting and nodding, preventing clear identification of hyoid and cervical vertebrae; and (V) instances in which food prematurely entered the pharynx without triggering a swallowing reflex, leading to abnormal alimentary bolus flow.

In this study, the participants were evaluated using the VFSS technique. Under the supervision of experienced physicians, participants sequentially swallowed 5 and 10 mL of thick, paste-like, and thin liquids to assess swallowing ability across textures, followed by a 10 mL solid bolus (e.g., bread). All tests were conducted within a single session, discontinued if participants exhibited severe difficulty, frequent choking, or inability to swallow. During image capture, the exposure range extended vertically from the nasal cavity roof to the C6 vertebra, and horizontally from the lips on the left side to the cervical vertebrae on the right side. This ensured clear visualization of C1 to C6 vertebrae, hyoid bone, and cricopharyngeal muscle. Participants were positioned facing to the left to provide a clear view of the entire pharyngeal cavity. Additionally, to convert pixel units in video frames into millimeter-scale units, an 8-mm diameter circular scale was attached to the collar or the back of the head of participants as a spatial calibration reference for quantitative analysis.

Software workflow

IQSD comprises the following six steps: video loading, scale recognition, key targets annotation, video frame correction, kinematic parameter measurement, and temporal parameter calculation. The interface design of IQSD system is shown in Figure 1, and the overall workflow of the software is presented in Figure 2.

Figure 1 The interface of IQSD system. IQSD, an intelligent quantitative analysis software for VFSS in patients with dysphagia; UES, upper esophageal sphincter; VFSS, videofluoroscopic swallowing studies.
Figure 2 The workflow in IQSD. IQSD, an intelligent quantitative analysis software for VFSS in patients with dysphagia; UES, upper esophageal sphincter; VFSS, videofluoroscopic swallowing studies.

Step 1: video loading

The system initiates the analysis process by importing the patient’s video data, demonstrating exceptional versatility through its comprehensive support for multiple video formats, including MP4 and AVI, thereby ensuring seamless integration with various data processing workflows. Beyond conventional playback capabilities, the software incorporates a frame-by-frame playback function, enabling meticulous temporal control and precise examination of critical time points throughout the video sequence.

Step 2: scale recognition

The software implements an automated scale recognition system utilizing the Hough Circle algorithm for initial detection. The Hough Circle Transform is a computer vision technique used to detect circular shapes in images. It works by analyzing edge pixels and mathematically identifying patterns that form circles (35). To ensure robust tracking across video sequences, the system integrates multiple advanced tracking methodologies which collectively enable precise and consistent scale measurement throughout all frames. The system calculates the scale dimension by computing the median value of detected diameters across the entire frame sequence, thereby enhancing measurement reliability by mitigating outlier effects. Recognizing potential limitations in automated detection due to video quality constraints, the software incorporates a complementary manual scale annotation function, providing users with flexible measurement options tailored to specific experimental conditions and data quality requirements.

Step 3: key targets annotation

Four key targets need to be annotated by users at the first clear video frame: the antero-inferior corners of cervical vertebrae C2, C4, and C6, and the antero-superior corner of the hyoid bone. The video frame correction and corrected coordinate system construction are based on C2 and C4, with C4 serving as the origin of the coordinate system. The hyoid bone is used for measurement of hyoid bone displacement, while C6 is used to locate the position of UES. The software supports frame-by-frame tracking of the four key points throughout the entire process. The coordinates of these key points in each frame are then obtained, providing essential data for subsequent kinematic parameter calculations.

Step 4: video frame correction

Upon completion of the aforementioned steps, the software obtains the coordinates of four manually annotated key points in the initial frame, as well as the tracking coordinates of these points in all subsequent video frames through target tracking techniques. For both the initial frame and all subsequent frames, the software performs frame-by-frame image correction based on the coordinates of C2 and C4. The correction process involves the following steps: first, calculating the geometric center point of the video frame, which serves as the rotation pivot. Then, determining the angle between the line connecting the antero-inferior corners of C4 and C2 and the positive X-axis direction. This angle is used as the rotation parameter to generate a rotation matrix. Finally, applying the rotation matrix to perform image correction results in aligned images where the C2C4 vertebral line is vertically oriented.

Step 5: kinematic parameter measurement

The software provides automated measurement of three essential kinematic parameters: hyoid displacement, UES opening amplitude, and pharyngeal constriction ratio. For hyoid displacement quantification, users initiate the process by selecting the hyoid position in the initial video frame, after which the software autonomously tracks the hyoid movement throughout the entire sequence without requiring further user intervention.

The UES opening amplitude measurement is performed through an automated process where the software first identifies the UES region boundaries, it then dynamically tracks and extracts the contours of the alimentary bolus during its passage through the UES, continuously measuring the luminal width in real-time. The UES width is determined as the narrowest luminal dimension observed when the bolus traverses the region between C4 and C6 vertebrae.

The pharyngeal constriction ratio is the ratio of the minimum area of the pharynx and the maximum area of the pharynx. First, the user needs to identify the frame in the video where the maximum area of the pharynx is located, and use the brush function in the software to outline the maximum area of the pharynx. The software will automatically calculate the area within the closed curve. Then, the user needs to find the frame in the video where the minimum area of the pharynx is located (where the pharynx is completely contracted and the area of the pharynx is the area of the high-density food bolus), and draw a bounding box to circle the high-density food bolus. The software subsequently performs image preprocessing and contour recognition on the selected region, automatically identifying and computing the areas of high-density bolus. Finally, the pharyngeal constriction ratio is calculated by the ratio of the minimum and maximum area of the pharynx. Figure 3 illustrates the pharyngeal area measured by the software.

Figure 3 The software automatically calculates the pharyngeal constriction ratio through outlining the maximum area of the pharynx (A) and box selecting the minimum area of the pharynx (B).

Step 6: temporal parameter calculation

The software automatically computes the UES opening duration of the seven temporal parameters. The remaining six parameters require manual input, where users specify start and end time points by adjusting the video progress bar or using the frame-by-frame playback feature, with the software subsequently calculating the temporal differences. The calculation formulas are as follows:

  • Oral transit time (TOT): oral transit time is defined as the time difference between the end of the oral phase (TOE) and the start of the oral phase (TOS). The formula is TOT=TOETOS.
  • Soft palate elevation time (TSPE): soft palate elevation time is determined by subtracting the time when the soft palate starts to elevate (TSPSE) from the time when the soft palate returns to its original position (TSPRP). The formula is TSPE=TSPRPTSPSE.
  • Laryngeal closing duration (TLC): laryngeal closing duration is calculated as the time difference between when the laryngeal re-opens (TLO) and when it closes (TLCs). The formula is TLC=TLOTLCs.
  • Pharyngeal transit time (TPT): pharyngeal transit time is obtained by subtracting the time at the end of the oral phase (TOE) from the time when the UES closes (TUESC). The formula is TPT=TUESCTOE.
  • Swallowing onset time (TSO): swallowing onset time is calculated as the time difference between when the hyoid bone starts to move (THSM) and the end of the oral phase (TOE). The formula is TSO=THSMTOE.
  • Hyoid movement duration (THM): Hyoid movement duration is the time difference between when the hyoid bone returns to its original position (THR) and when it starts to elevate (THS). The formula is THM=THRTHS.
  • UES opening duration (TUESO): UES opening duration is determined by subtracting the time when the UES starts to open (TUESOS) from the time when the UES closes (TUESC). This parameter is also automatically calculated by the software, and the formula is TUESO=TUESCTUESOS.

Through systematic implementation of these six procedural steps, the software automatically generates a comprehensive analysis report, delivering precise quantitative data to support clinicians in evaluating swallowing function and facilitating evidence-based clinical decision-making.

Calculation methods

This software utilizes advanced technologies such as object detection, edge detection, and image processing to achieve automated measurement and analysis of three key kinematic parameters and seven temporal parameters during the swallowing process, and automatically generates detailed analysis reports. Below are the calculation methods for several important parameters of the software:

Hyoid displacement

The manual quantitative analysis of hyoid displacement measurement was performed through the ImageJ program, consistent with the established method by Kim and McCullough (36). Two experienced clinicians, with respectively 6 years and 8 years of diagnostic experience, independently conducted and reviewed all manual measurements. Both were blind to clinical information and software-generated data pertaining to the samples. The clinicians manually identified and captured two key frames: one at the onset of hyoid elevation and another at the point of maximum hyoid displacement. Both frames were then calibrated through ImageJ program such that the line connecting the antero-inferior corners of the C2 and C4 was oriented vertically. Then, the coordinates of the antero-superior corner of the hyoid bone (x1, y1) and (x2, y2), as well as the antero-inferior corner of C4 (C4x1, C4y1) and (C4x2, C4y2) are measured in the two calibrated images, using the lower-left corner of each image as the coordinate origin (see Figure 4). To measure hyoid motion relative to the vertebra movement and eliminate the errors introduced by patient head or body motion, the maximum hyoid bone displacement was calculated using Eqs. [1-3] based on the aligned images.

Maximun anterior displacement HA=(x2x1)(C4x2C4x1)

Maximun vertical displacement HV=(y2y1)(C4y2C4y1)

Maximun hyoid displacement Hmax=HA2+HV2

Figure 4 The resting frame (A) and the maximum displacement frame (B) of hyoid bone. Both frames were calibrated such that the line connecting the antero-inferior corners of the C2 and C4 was oriented vertically. Origin: bottom-left corner of the image; X-axis: the horizontal line extending rightward from the origin; Y-axis: the vertical line extending upward from the origin.

Building upon the working principles of the described manual quantitative analysis, the automatic quantitative analysis workflow operates as follows: this software first enables the user to identify key anatomical targets (C2 and C4 vertebrae, hyoid bone) within the first clear video frame. The software then employs advanced object-tracking algorithm, including Channel and Spatial Reliability Discriminative Correlation Filters (CSR-DCF) (37), MedianFlowTracker (38), and Kalman Filter (39) algorithms. These algorithms learn the visual appearance of the moving targets and predict their future positions based on historical motion patterns, enabling continuous tracking throughout the video sequence.

Throughout the tracking process, this software conducted automated real-time calibration of video frames using the updated C2 and C4 vertebrae, along with computation of hyoid displacement relative to vertebral movement, to transform the original coordinates into a patient-specific anatomical coordinate system. This system defines the origin as the antero-inferior corner of C4, the Y-axis as the line connecting the antero-inferior corners of C2 and C4, and the X-axis as the line perpendicular to the Y-axis passing through the origin, consistent with the previously established methods (25,40). Upon completion of tracking, a displacement-time curve was generated to visualize motion trajectories. Noise reduction was achieved through smoothing techniques applied to the displacement curve, enhancing accuracy and stability. Anterior and vertical displacement, as well as the resultant hyoid displacement (calculated as the square root of the sum of squared anterior and superior displacements), are systematically computed for each frame. The maximum value of hyoid displacement across all frames was identified as the maximum hyoid displacement. Finally, all displacement measurements were converted from pixel units to millimeters using a scale-based conversion.

UES opening amplitude

The UES opening amplitude refers to the width of the narrowest part of the transition area from C4 to C6 when the bolus passes through and expands the UES region to its maximum extent (41). When the bolus flows through and fills the UES region, the bolus contour is automatically identified by the software, which then obtains the upper, lower, left, and right boundaries of the bolus region. The bolus width within the vertical height range of C4 to C6 is calculated, with the minimum width taken as the UES opening amplitude.

Pharyngeal contraction ratio

At the frame of pharynx contracting to the minimum area, the user needs to draw a bounding box to select the minimum area of pharynx (area of the high-density food bolus). The software subsequently cropped the selected area, removed image noise with low-pass filtering, converted the RGB image to a grayscale image. The contour of the high-density food bolus is outlined, and the pixel area within the contour is calculated. This area is then converted to millimeters using the scale reference. After obtaining the minimum and maximum areas of the pharyngeal cavity, the pharyngeal contraction ratio is determined by dividing the two values.

UES opening duration

The UES is typically located within the vertical range between the C4 and C6 cervical vertebrae. The UES opening start time is determined when the vertical coordinate of the alimentary bolus leading edge aligns with the upper end of C4. Conversely, the UES closing time is identified when the bolus trailing edge coincides with the lower end of the C6. The UES opening duration is then calculated as the time interval between the initiation opening and the complete closure of the UES.

Evaluation of the implemented software

An automated tracking method was employed to monitor the three key kinematic parameters, including hyoid displacement, UES opening amplitude, and pharyngeal constriction ratio. These results were compared with those manually tracked by clinicians, which served as the ground truth. The linear correlation of measurement value in the X-axis, Y-axis, and two-dimension between the automated and manual method was measured using Pearson correlation coefficients. Additionally, the performance of the automatic tracking method was assessed using the relative error. The calculation formula for the relative error is as follows:

Relative error=|Measurement value in automatic trackingMeasurement value in manual tracking|Measurement value in manual tracking100%

Besides, the hyoid bone displacement and UES opening amplitude over time was plotted using a quantitative VFSS analysis method. This study also performed direct trajectory comparison between manual annotations and automated tracking results for hyoid movement to assess the spatial-temporal correspondence. This comprehensive validation framework, incorporating multiple kinematic metrics and expert-verified ground truth data, ensures the reliability of our automated tracking system for clinical applications.

Statistical analysis

Statistical analyses were performed using SPSS software (version 23.0). To compare clinical characteristics between patients with dysphagia and healthy controls, continuous variables were analyzed using either Student’s t-test or the Mann-Whitney U test, depending on data distribution. Normally distributed data were analyzed with the t-test and are reported as mean ± standard deviation; non-normally distributed data were analyzed with the Mann-Whitney U test and are expressed as median (interquartile range). Categorical variables were compared using the chi-squared test. A two-sided P value <0.05 was considered statistically significant for all analyses, unless otherwise specified.


Results

To evaluate the performance of our quantitative analysis software, this study analyzed 82 videofluoroscopic swallowing samples, comprising 40 samples from 18 dysphagic patients (15 males, 3 females; mean age 67.67±15.39 years), and 42 samples from 14 healthy controls (11 males, 3 females; mean age 60.57±10.05 years). Etiology among dysphagic patients included ischemic stroke (61.11%), hemorrhagic stroke (33.33%), and cerebral tumor (5.56%). Functional Oral Intake Scale (FOIS) scores in the patient group varied widely: level 1 (11.11%), level 2 (22.22%), level 3 (22.22%), level 4 (5.56%), level 5 (22.22%), level 6 (5.56%), and level 7 (11.11%). In contrast, the majority of healthy controls achieved FOIS level 7 (64.29%), with the remaining at level 6 (35.71%). Statistical comparisons revealed significant differences between groups in primary diagnosis (P<0.001) and FOIS scores (P=0.001), while no significant differences were observed in age (P=0.146) or gender distribution (P=0.732) (shown in Table 1).

Table 1

Clinical characteristics of patients with dysphagia compared with healthy controls

Variables Patient (n=18) Healthy control (n=14) P value
Age, years 67.67±15.39 60.57±10.05 0.146
Male 15 (83.33) 11 (78.57) 0.732
Dysphagia etiology <0.001*
   Healthy 0 (0.00) 14 (100.0)
   Ischemic stroke 11 (61.11) 0 (0.00)
   Hemorrhagic stroke 6 (33.33) 0 (0.00)
   Cerebral tumor 1 (5.56) 0 (0.00)
FOIS 0.001*
   Level 1 2 (11.11) 0 (0.00)
   Level 2 4 (22.22) 0 (0.00)
   Level 3 4 (22.22) 0 (0.00)
   Level 4 1 (5.56) 0 (0.00)
   Level 5 4 (22.22) 0 (0.00)
   Level 6 1 (5.56) 5 (35.71)
   Level 7 2 (11.11) 9 (64.29)

*, P<0.05. Data are shown as mean ± standard deviation and n (%). FOIS, Functional Oral Intake Scale.

Our quantitative analysis software automatically measured three kinematic parameters, including hyoid displacement, UES opening amplitude, and pharyngeal constriction ratio. The results obtained from the automatic tracking method were compared with those from the manual tracking method. The Pearson correlation coefficients and relative errors were used to evaluate the degree of agreement between the automatic and manual tracking method for measuring three parameters, with the results presented in Table 2.

Table 2

Pearson correlation coefficients and relative errors (%) between automatic and manual tracking methods for three kinematic parameters

Parameter Patient group (n=40) Healthy group (n=42)
Hyoid displacement, mm UES opening amplitude, mm Pharyngeal contraction ratio Hyoid displacement, mm UES opening amplitude, mm Pharyngeal contraction ratio
Meaturement value on X-axis, mean ± SD
   Automatic 7.94±2.69 9.10±2.69
   Manual 8.19±2.71 9.25±2.74
   Relative errors (%) 6.34±4.18 6.98±4.91
   Pearson r 0.974 0.959
Meaturement value on Y-axis, mean ± SD
   Automatic 7.90±3.90 9.09±6.31
   Manual 8.04±3.86 9.39±6.52
   Relative errors (%) 7.08±3.54 6.73±4.72
   Pearson r 0.982 0.996
Meaturement value on 2D, mean ± SD
   Automatic 11.67±3.39 5.27±1.53 0.47±0.18 13.34±5.85 5.83±1.41 0.68±0.15
   Manual 11.97±3.20 5.27±1.49 0.49±0.18 13.72±5.94 5.90±1.36 0.68±0.14
   Relative errors (%) 5.30±3.79 3.72±1.93 5.22±3.25 4.62±3.46 3.07±2.02 5.43±3.69
   Pearson r 0.972 0.990 0.988 0.995 0.973 0.947

P values for all Pearson correlation coefficients were less than 0.001. SD, standard deviation; UES, upper esophageal sphincter.

The results for both dysphagia patients and healthy controls show strong correlations between automatic and manual methods in all three parameters (Pearson’s r ranging from 0.947 to 0.995, P value <0.001). In dysphagia patients, the result shows a relative error of 5.30%±3.79% for the two-dimensional range of hyoid displacement, 3.72%±1.93% for UES opening amplitude, and 5.22%±3.25% for pharyngeal constriction rate, while in healthy controls, the result shows a relative error of 4.62%±3.46% for the two-dimensional range of hyoid displacement, 3.07%±2.02% for UES opening amplitude, and 5.43%±3.69% for pharyngeal constriction rate, further verifying the consistency of the measurements between the two methods. While both groups performed well, the healthy group exhibited marginally lower errors, likely due to the amplitude of hyoid bone displacement and UES opening in the healthy group being greater and more easily captured by the algorithms.

Besides, our comparative analysis demonstrated noticeably reduced values in dysphagia patients versus healthy controls across three key parameters: hyoid displacement (11.67±3.39 vs. 13.34±5.85 mm), UES opening amplitude (5.27±1.53 vs. 5.83±1.41 mm), and pharyngeal contraction ratio (0.47±0.18 vs. 0.68±0.15), with these marked reductions in patients (12.5%, 10.7%, and 30.9% respectively) reflecting clinically meaningful impairments in hyoid bone elevation, sphincter compliance, and pharyngeal constriction efficiency. These reductions collectively quantify and clearly characterize the specific manifestations and severity of impaired swallowing function in the dysphagia population.

In addition, this study conducted a comparison between the manually computed trajectory and the automatically measured trajectory for VFSS data. The comparison between two methods for one example are presented in Figure 5. The movement trajectory of the hyoid bone can be mainly categorized into an elevation period and a descent period. During the elevation period, the hyoid bone commences an upward and forward movement, ultimately attaining the maximum Euclidean norm of anterior and vertical displacement. Subsequently, in the descent period, the hyoid bone gradually returns to its original position. Notably, the two trajectories matched well, as evidenced by high Pearson correlation coefficients and low relative errors. This result indicates strong consistency between the automated VFSS analysis and the manual tracking method.

Figure 5 Comparison of hyoid bone trajectories between automatic tracking and manual tracking for one example.

An example of changes in hyoid displacement and UES opening amplitude is shown in Figure 6. The horizontal and vertical position of hyoid bone over time are displayed in Figure 6A. The two-dimensional movement trajectory curve of the hyoid bone throughout the whole swallowing process is presented in Figure 6B, while the two-dimensional movement trajectory of the hyoid bone in one swallowing loop is demonstrated in Figure 6C. And the curve of UES width variation over time is depicted in Figure 6D.

Figure 6 An example of automatically tracking the hyoid bone and UES. (A) Horizontal and vertical position of hyoid bone over time. (B) The two-dimensional movement trajectory curve of the hyoid bone throughout the whole swallowing process. (C) The two-dimensional movement trajectory of the hyoid bone in one swallowing loop. (D) The curve of UES width variation over time. UES, upper esophageal sphincter.

Discussion

This study developed an intelligent quantitative analysis method for VFSS data, aiming to provide clinicians with an efficient and accurate quantitative analysis software to address the numerous challenges faced by current manual quantitative analysis methods. The quantitative analysis software constructed in this study has multiple advantages, with its core goal focusing on achieving the following three key functions: First, the software significantly reduces the analytical workload for clinicians processing VFSS. Traditionally, manual analysis requires considerable time and effort to track key points frame by frame and manually calculate parameters. In contrast, this software rapidly processes video data through computer algorithms, enhancing work efficiency. Second, the software minimizes errors inherent in manual operations. Manual analysis depends on the subjective judgment and experience of operators, potentially leading to discrepancies. The software’s automated quantitative analysis reduces human interference, providing more objective results. Third, the software shortens the time required for case analysis. Rapid acquisition of accurate quantitative results is crucial for timely patient treatment. This software generates detailed analysis reports within minutes, assisting doctors in quickly formulating diagnostic strategies. Users can conveniently obtain final results through simple button and mouse operations.

In an in-depth analysis of swallowing videos of patients with dysphagia, the intelligent VFSS quantitative analysis method was compared with traditional manual tracking methods, yielding the following results. The trajectories tracked by both methods exhibited a high degree of overall consistency, as evidenced by high Pearson correlation coefficients and low relative errors. Specifically, the Pearson correlation coefficients for multiple key parameters (e.g., hyoid displacement, UES opening amplitude, and pharyngeal contraction ratio) were at high levels, indicating a strong linear correlation between the data measured by both methods. The relative errors were within a low range, further demonstrating the high reliability of the automated tracking method.

Compared to other existing research and software on the market, the software developed in this study demonstrates significant advantages in assisting the diagnosis of dysphagia. First, our software demonstrated superior performance in hyoid displacement measurement. When compared against manual 2D measurements, it achieved a Pearson correlation coefficient of 0.972 and relative errors of 5.30%±3.79% in the patient group (n=40), and a Pearson correlation coefficient of 0.995 and relative errors of 4.62%±3.46% in the healthy group (n=42). These results represent a significant improvement over the values reported by Kim et al. (Pearson r=0.957; relative error=5.5%±4.9%) (25), comparable to the values reported by Lee et al. [Pearson r (X-axis) =0.988±0.009; Pearson r (Y-axis) =0.991±0.006; relative error =4.2%±4.8%] (26), and are slightly better than the results from Hsiao et al. (relative errors =6.20%) (42) and Feng et al. [Pearson r (X-axis) =0.985±0.013; Pearson r (Y-axis) =0.919±0.034; relative errors =9.5%±6.1%] (43). This underscores the enhanced accuracy and robustness of our method across different subject populations. Second, unlike previous studies that merely focus on hyoid displacement, this software enables automatic and intelligent measurement of multiple key parameters. It captures three kinematic parameters and seven temporal parameters, providing a more comprehensive and detailed profile of swallowing function. These rich and accurate metrics support in-depth research into the pathogenesis, pathophysiological processes, and treatment efficacy of dysphagia. Third, while current quantitative analysis software in clinical settings still depends heavily on manual operation—leading to low efficiency and subjective variability—our system automates the evaluation of dysphagia-related indices and generates standardized, detailed reports automatically. This functionality substantially reduces manual processing time, enhances workflow efficiency, and offers clinicians a powerful tool for streamlined VFSS data analysis.

Despite the convenience brought to clinical practice by the intelligent quantitative analysis method for VFSS data, certain limitations still persist. (I) For hyoid bone tracking, our current software still relies on manual annotation of key points on the initial clear video frame to establish a reference for subsequent tracking, which may limit efficiency in processing large datasets. (II) In videofluoroscopic sequences where the hyoid bone is partially occluded by mandibular structures or exhibits motion blur during rapid displacement, the tracking accuracy of our automated system may be temporarily compromised in affected frames. The absence of critical visual information can limit the algorithm’s ability to track movement trajectories. (III) The current software version excludes samples with pharyngeal residue or delayed swallowing reflex, as these conditions may confound kinematic measurements: pharyngeal residue could reduce bolus volume through the UES, potentially underestimating opening amplitude, while delayed initiation may shorten hyoid displacement and UES opening parameters. Future research should prioritize three key objectives: (I) developing automated target recognition capabilities to minimize manual intervention and improve system efficiency; (II) optimizing algorithmic performance to enhance robustness and tracking accuracy; and (III) addressing these complex cases involving pharyngeal residue or delayed swallowing reflex. This requires the implementation of an intelligent residue recognition and compensation algorithm combined with a timing compensation mechanism for delayed swallowing reflex to enhance clinical applicability across diverse dysphagia presentations.


Conclusions

The developed software marks a notable advancement in the field of dysphagia management, providing clinicians with a streamlined, intuitive, and highly efficient platform for evaluating swallowing function. By facilitating automatic or semi-automatic tracking of key kinematic parameters, the software simplifies the diagnostic process, alleviates manual workload, and reduces the likelihood of human error. This innovative solution not only elevates the precision and reliability of swallowing assessments but also optimizes workflow efficiency, enabling clinicians to dedicate more attention to patient care and personalized treatment strategies.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-134/rc

Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-134/dss

Funding: This study was supported by the National Science and Technology Major Project of the Ministry of Science and Technology of China (No. 2023ZD0503606); in part by National Natural Science Foundation of China (No. 62441114); in part by Jiangsu Key Technology Research Development Program (No. BE2022842); in part by Suzhou Science & Technology Project (No. SSD2023008); in part by Suzhou Key Laboratory Project (No. SZS2024007).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-134/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This retrospective study was approved by the Ethics Committee of Suzhou Municipal Hospital (No. KL901168). Individual consent for this retrospective analysis was waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Ertekin C, Aydogdu I, Yüceyar N, Kiylioglu N, Tarlaci S, Uludag B. Pathophysiological mechanisms of oropharyngeal dysphagia in amyotrophic lateral sclerosis. Brain 2000;123:125-40. [Crossref] [PubMed]
  2. Han TR, Paik NJ, Park JW. Quantifying swallowing function after stroke: A functional dysphagia scale based on videofluoroscopic studies. Arch Phys Med Rehabil 2001;82:677-82. [Crossref] [PubMed]
  3. Golabbakhsh M, Rajaei A, Derakhshan M, Sadri S, Taheri M, Adibi P. Automated acoustic analysis in detection of spontaneous swallows in Parkinson's disease. Dysphagia 2014;29:572-7. [Crossref] [PubMed]
  4. Liu Y, Jiang Y, Huang X, Pan J, Shen Y, Zhang Y. The prevalence of dysphagia among Chinese older adults: a meta-analysis. Chinese General Practice 2023;26:1496-502.
  5. Marik PE, Kaplan D. Aspiration pneumonia and dysphagia in the elderly. Chest 2003;124:328-36. [Crossref] [PubMed]
  6. Martino R, Foley N, Bhogal S, Diamant N, Speechley M, Teasell R. Dysphagia after stroke: incidence, diagnosis, and pulmonary complications. Stroke 2005;36:2756-63. [Crossref] [PubMed]
  7. Sura L, Madhavan A, Carnaby G, Crary MA. Dysphagia in the elderly: management and nutritional considerations. Clin Interv Aging 2012;7:287-98. [Crossref] [PubMed]
  8. Hinchey JA, Shephard T, Furie K, Smith D, Wang D, Tonn SStroke Practice Improvement Network Investigators. Formal dysphagia screening protocols prevent pneumonia. Stroke 2005;36:1972-6. [Crossref] [PubMed]
  9. Lee BH, Lee JC, Lee SM, Park Y, Ryu JS. Application of Automatic Kinematic Analysis Program for the Evaluation of Dysphagia in ALS patients. Sci Rep 2019;9:15644. [Crossref] [PubMed]
  10. Belafsky PC, Kuhn MA. The Videofluoroscopic Swallow Study Technique and Protocol. In: Belafsky PC, Kuhn MA. The Clinician's Guide to Swallowing Fluoroscopy. New York, NY: Springer; 2014:7-13.
  11. Cui Q, Wei B, He Y, Zhang Q, Jia W, Wang H, Xi J, Dai X. Findings of a videofluoroscopic swallowing study in patients with dysphagia. Front Neurol 2023;14:1213491. [Crossref] [PubMed]
  12. Dou Z, Lan Y, Yu F. Application of videofluoroscopy digital analysis in swallowing function assessment for brainstem stroke patients with dysphagia. Chinese Journal of Rehabilitation Medicine 2013;28:799-805.
  13. Kerrison G, Miles A, Allen J, Heron M. Impact of Quantitative Videofluoroscopic Swallowing Measures on Clinical Interpretation and Recommendations by Speech-Language Pathologists. Dysphagia 2023;38:1528-36. [Crossref] [PubMed]
  14. Paik NJ, Kim SJ, Lee HJ, Jeon JY, Lim JY, Han TR. Movement of the hyoid bone and the epiglottis during swallowing in patients with dysphagia from different etiologies. J Electromyogr Kinesiol 2008;18:329-35. [Crossref] [PubMed]
  15. Nam HS, Beom J, Oh BM, Han TR. Kinematic effects of hyolaryngeal electrical stimulation therapy on hyoid excursion and laryngeal elevation. Dysphagia 2013;28:548-56. [Crossref] [PubMed]
  16. Seo HG, Oh BM, Han TR. Longitudinal changes of the swallowing process in subacute stroke patients with aspiration. Dysphagia 2011;26:41-8. [Crossref] [PubMed]
  17. Molfenter SM, Steele CM. Kinematic and temporal factors associated with penetration-aspiration in swallowing liquids. Dysphagia 2014;29:269-76. [Crossref] [PubMed]
  18. Zhang Z, Coyle JL, Sejdić E. Automatic hyoid bone detection in fluoroscopic images using deep learning. Sci Rep 2018;8:12310. [Crossref] [PubMed]
  19. Lee D, Lee WH, Seo HG, Oh BM, Lee JC, Kim HC. Online Learning for the Hyoid Bone Tracking During Swallowing With Neck Movement Adjustment Using Semantic Segmentation. IEEE Access 2020;8:157451-61.
  20. Kim HI, Kim Y, Kim B, Shin DY, Lee SJ, Choi SI. Hyoid Bone Tracking in a Videofluoroscopic Swallowing Study Using a Deep-Learning-Based Segmentation Network. Diagnostics (Basel) 2021;11:1147. [Crossref] [PubMed]
  21. Singh S, Hamdy S. The upper oesophageal sphincter. Neurogastroenterol Motil 2005;17:3-12. [Crossref] [PubMed]
  22. Cook IJ, Dodds WJ, Dantas RO, Massey B, Kern MK, Lang IM, Brasseur JG, Hogan WJ. Opening mechanisms of the human upper esophageal sphincter. Am J Physiol 1989;257:G748-59. [Crossref] [PubMed]
  23. Perlman AL, VanDaele DJ, Otterbacher MS. Quantitative assessment of hyoid bone displacement from video images during swallowing. J Speech Hear Res 1995;38:579-85. [Crossref] [PubMed]
  24. Kellen PM, Becker DL, Reinhardt JM, Van Daele DJ. Computer-assisted assessment of hyoid bone motion from videofluoroscopic swallow studies. Dysphagia 2010;25:298-306. [Crossref] [PubMed]
  25. Kim WS, Zeng P, Shi JQ, Lee Y, Paik NJ. Semi-automatic tracking, smoothing and segmentation of hyoid bone motion from videofluoroscopic swallowing study. PLoS One 2017;12:e0188684. [Crossref] [PubMed]
  26. Lee JC, Nam KW, Jang DP, Paik NJ, Ryu JS, Kim IY. A Supporting Platform for Semi-Automatic Hyoid Bone Tracking and Parameter Extraction from Videofluoroscopic Images for the Diagnosis of Dysphagia Patients. Dysphagia 2017;32:315-26. [Crossref] [PubMed]
  27. Ahuja NK, Chan WW. Assessing Upper Esophageal Sphincter Function in Clinical Practice: a Primer. Curr Gastroenterol Rep 2016;18:7. [Crossref] [PubMed]
  28. Kahrilas PJ, Dodds WJ, Dent J, Logemann JA, Shaker R. Upper esophageal sphincter function during deglutition. Gastroenterology 1988;95:52-62. [Crossref] [PubMed]
  29. Khalifa Y, Donohue C, Coyle JL, Sejdic E. Upper Esophageal Sphincter Opening Segmentation With Convolutional Recurrent Neural Networks in High Resolution Cervical Auscultation. IEEE J Biomed Health Inform 2021;25:493-503. [Crossref] [PubMed]
  30. Bandini A, Steele CM. The effect of time on the automated detection of the pharyngeal phase in videofluoroscopic swallowing studies. Annu Int Conf IEEE Eng Med Biol Soc 2021;2021:3435-8. [Crossref] [PubMed]
  31. Lee JT, Park E, Jung TD. Automatic Detection of the Pharyngeal Phase in Raw Videos for the Videofluoroscopic Swallowing Study Using Efficient Data Collection and 3D Convolutional Networks. Sensors (Basel) 2019;19:3873. [Crossref] [PubMed]
  32. Lee JT, Park E, Hwang JM, Jung TD, Park D. Machine learning analysis to automatically measure response time of pharyngeal swallowing reflex in videofluoroscopic swallowing study. Sci Rep 2020;10:14735. [Crossref] [PubMed]
  33. Lee KS, Lee E, Choi B, Pyun SB. Automatic Pharyngeal Phase Recognition in Untrimmed Videofluoroscopic Swallowing Study Using Transfer Learning with Deep Convolutional Neural Networks. Diagnostics (Basel) 2021;11:300. [Crossref] [PubMed]
  34. Jeong SY, Kim JM, Park JE, Baek SJ, Yang SN. Application of deep learning technology for temporal analysis of videofluoroscopic swallowing studies. Sci Rep 2023;13:17522. [Crossref] [PubMed]
  35. Ballard DH. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognition 1981;13:111-22.
  36. Kim Y, McCullough GH. Maximum hyoid displacement in normal swallowing. Dysphagia 2008;23:274-9. [Crossref] [PubMed]
  37. Lukežic A, Vojír T, Zajc LC, Matas J, Kristan M. Discriminative Correlation Filter with Channel and Spatial Reliability. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017.
  38. Kalal Z, Mikolajczyk K, Matas J. Forward-backward error: Automatic detection of tracking failures. 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey; 2010:2756-9.
  39. Kalman RE. A New Approach To Linear Filtering and Prediction Problems. J Basic Eng 1960;82:35-45.
  40. Kim YH, Oh BM, Jung IY, Lee JC, Lee GJ, Han TR. Spatiotemporal characteristics of swallowing in Parkinson's disease. Laryngoscope 2015;125:389-95. [Crossref] [PubMed]
  41. Leonard RJ, Kendall KA, McKenzie S, Gonçalves MI, Walker A. Structural displacements in normal swallowing: a videofluoroscopic study. Dysphagia 2000;15:146-52. [Crossref] [PubMed]
  42. Hsiao MY, Weng CH, Wang YC, Cheng SH, Wei KC, Tung PY, Chen JY, Yeh CY, Wang TG. Deep Learning for Automatic Hyoid Tracking in Videofluoroscopic Swallow Studies. Dysphagia 2023;38:171-80. [Crossref] [PubMed]
  43. Feng S, Shea QT, Ng KY, Tang CN, Kwong E, Zheng Y. Automatic Hyoid Bone Tracking in Real-Time Ultrasound Swallowing Videos Using Deep Learning Based and Correlation Filter Based Trackers. Sensors (Basel) 2021;21:3712. [Crossref] [PubMed]
Cite this article as: Wu M, Li F, Geng C, Qian S, Dai Y, Wang T. An intelligent quantitative analysis software for videofluoroscopic swallowing study in patients with dysphagia. Quant Imaging Med Surg 2025;15(12):12346-12360. doi: 10.21037/qims-2025-134

Download Citation