Automated method for quantitative analysis of iris fluorescein angiography based on machine learning
Original Article

Automated method for quantitative analysis of iris fluorescein angiography based on machine learning

Yixuan Zhu1#, Shuo Sun2#, Shaolei Han3#, Jing Chen2, David Western1, Longli Zhang2

1School of Engineering, University of the West of England, Bristol, UK; 2Tianjin Key Laboratory of Retinal Functions and Diseases, Tianjin Branch of National Clinical Research Center for Ocular Disease, Eye Institute and School of Optometry, Tianjin Medical University Eye Hospital, Tianjin, China; 3Hebei Eye Hospital, Hebei Provincial Key Laboratory of Ophthalmology, Xingtai, China

Contributions: (I) Conception and design: Y Zhu, L Zhang; (II) Administrative support: Y Zhu, D Western, L Zhang; (III) Provision of study materials or patients: L Zhang; (IV) Collection and assembly of data: Y Zhu, S Sun, S Han, J Chen, L Zhang; (V) Data analysis and interpretation: Y Zhu, S Sun, S Han, D Western, L Zhang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

Correspondence to: Dr. David Western, PhD. School of Engineering, University of the West of England, Coldharbour Ln, Stoke Gifford, Bristol BS16 1QY, UK. Email: david.western@uwe.ac.uk; Dr. Longli Zhang, MD, PhD. Tianjin Medical University Eye Hospital, 251 Fukang Rd, Xiqing District, Tianjin 300392, China. Email: zhanglonglieye@126.com.

Background: Diabetic retinopathy is a leading cause of vision impairment, often progressing to neovascular glaucoma. Early detection of neovascularisation of the iris (NVI) is crucial for timely intervention. Traditional diagnostic methods, such as slit-lamp examination, have limitations in identifying early-stage NVI. This study presents a deep learning-based automated approach for analysing iris fluorescein angiography (IFA) images to detect and quantify peripupillary leakage, a key indicator of NVI.

Methods: A dataset of 2,449 IFA images was used to train a YOLOv8n-based segmentation model for precise pupil localisation. A leakage circularity detection algorithm was developed to quantify peripupillary fluorescein leakage. The algorithm’s performance was evaluated using an independent test set of 131 clinically standardized IFA images. Performance metrics included mean absolute error (MAE), mean absolute percentage error (MAPE), and intersection over union (IoU). Results were compared with manual annotations from two clinical experts.

Results: The proposed method demonstrated a significant reduction in MAE (20.81 degrees) and MAPE (21.64%) compared to Clinical Staff 1 (MAE: 34.23 degrees, MAPE: 58.38%) and Clinical Staff 2 (MAE: 43.17 degrees, MAPE: 75.71%). The algorithm achieved an IoU of 39.3%, slightly lower than Clinical Staff 1 (44.5%) and Clinical Staff 2 (41.7%), indicating high segmentation accuracy but minor spatial misalignment. The inter-clinician agreement yielded an IoU of 54.8%, highlighting subjectivity in human assessments.

Conclusions: The deep learning-based approach provides superior consistency and accuracy in quantifying peripupillary fluorescein leakage compared to manual expert annotations. While human experts demonstrated slightly higher spatial precision, the algorithm significantly reduces variability and subjectivity in leakage quantification. This automated method has the potential to enhance early detection of NVI, improve clinical workflow efficiency, and assist ophthalmologists in diagnosing DR. Further optimization will focus on refining spatial segmentation accuracy.

Keywords: Deep learning; diabetic retinopathy (DR); iris fluorescein angiography (IFA); iris leakage detection; neovascular glaucoma


Submitted Feb 24, 2025. Accepted for publication Nov 03, 2025. Published online Dec 31, 2025.

doi: 10.21037/qims-2025-480


Introduction

Diabetic retinopathy (DR) is a severe ocular complication of diabetes and the leading cause of vision impairment in affected individuals. Neovascular glaucoma represents the terminal manifestation of DR, characterised by poor prognosis and a high rate of blindness (1). Early detection of iris leakage and neovascularisation, followed by timely treatment, is key to preventing further disease progression and improving treatment outcomes. Currently, the diagnosis of neovascularisation of the iris (NVI) primarily relies on slit-lamp examination. However, early NVI often presents as subtle, microscopic vessels that are notoriously difficult to detect with slit-lamp biomicroscopy, especially in patients with dark irides or faint vascular changes (2,3). Delayed recognition of NVI can allow silent progression to neovascular glaucoma (NVG), a devastating complication of DR that carries a very poor visual prognosis. Iris fluorescein angiography (IFA) is a diagnostic tool that reveals abnormalities in the iris and provides evidence for tracking the progression of DR, offering valuable insights for clinical treatment (4,5). IFA can detect NVI one month earlier than clinical detection by slit-lamp examination and is more sensitive to NVI (6), thus becoming the “gold standard” for the diagnosis of NVI (7).

In a previous study (7), we designed a method to measure the circumferential analysis of pupillary margin fluorescein leakage in IFA to assess the degree of iris leakage. This method allows for objective analysis of the severity of iris abnormalities caused by retinal vascular diseases. We conducted numerous experiments demonstrating the correlation between the circumferential range of pupillary margin fluorescein leakage and the severity of retinal abnormalities in proliferative diabetic retinopathy (PDR) patients.

However, there are several issues with this method:

  • Manual annotation has a large margin of error. Due to the small and dense white spots representing leakage, it is challenging for clinicians to precisely mark the boundaries of the leakage. Often, a broader range is defined for dense leakage, and smaller leak spots are omitted, forming a kind of compensation mechanism. However, this mechanism could potentially lead to significant errors.
  • It is difficult to form a unified standard. Different clinicians often vary greatly in their annotations of the same fluorescein angiography image. This is not because they disagree on the representation of leakage, but because they have different understandings of how to annotate the extent of leakage that should be marked. As leakage is diffusive, its edges are always gradual. Different clinical experts have different interpretations of how dim the edges should be considered a boundary. Even the same clinician may make different judgments on different fluorescein angiography images. Additionally, as mentioned earlier, manual annotation often employs the compensation mechanism due to reasons such as labour-saving or unclear recognition of small leakages, but the extent of compensation varies greatly among individuals. Finally, many clinicians tend to overlook very small or very low brightness leakages, but the specifics of annotation vary significantly among different individuals, and when there are many small leakages on a fluorescein angiography image, the cumulative error can be even greater.

In light of this, we have developed an algorithm based on deep learning [a type of artificial intelligence (AI) technology] that can automatically calculate the circumferential extent of peripupillary leakages and visually mark the leakage locations on the image. Our algorithm can be divided into two main parts: firstly, we use an image segmentation model to accurately identify the pupil location in the IFA image; then, we apply a fixed algorithm to detect leakages around the pupil and calculate their circumferential extent.

Our contributions are:

  • Automatically calculate the extent of iris leakage from IFA images, potentially saving labour costs and speeding up screening.
  • Develop and validate a standardised approach for measuring the circumferential extent of fluorescein leakages, offering a more accurate and reproducible method that enhances consistency in clinical assessments.
  • Precisely locate the leakage positions and mark them on the image.

Related work

Iris angiography is a diagnostic tool that reflects iris abnormalities and provides insights into the progression of DR, offering a reference for clinical treatment. Since most early NVI occurs along the pupillary margin and grows along it, observing early iris abnormalities around the pupillary margin is a more precise method for detecting early NVI compared to measuring the area of abnormalities. Ding et al. report a semi-quantitative method for analysing the range of abnormalities in IFA for patients with DR with iris neovascularisation, describing the cumulative range of abnormalities in quadrants (8). However, this method is not accurate for measuring early NVI. Some scholars have used swept-source (SS) optical coherence tomography (OCT) angiography (OCTA) to observe neovascularisation in the iris, but this method cannot show whether the iris vessels and neovascularisation are leaking (9). IFA remains the preferred choice for evaluating vascular leakage and dynamic perfusion. Recent studies have highlighted the importance of treatment modalities like panretinal photocoagulation (PRP) and intravitreal anti-vascular endothelial growth factor (VEGF) injections in preventing progression to NVG in eyes with anterior segment neovascularisation (ASNV) (10). These treatments have been shown to improve visual outcomes, which could further guide the application of iris angiography in monitoring and evaluating treatment effects. In our previous research, we developed a method to analyse the circumferential extent of fluorescein leakage at the pupillary margin in iris angiography, aimed at quantifying the degree of iris leakage (11). This approach provides an objective means of evaluating the severity of iris abnormalities associated with retinal vascular diseases. Through extensive experimentation, we established a strong correlation between the circumferential extent of fluorescein leakage at the pupillary margin and the severity of retinal abnormalities in patients with PDR. This confirmed the positive relationship between the extent of leakage and the actual retinal abnormalities. The method requires manual annotation by clinicians to mark the abnormal regions in the iris angiography images, after which the software calculates the cumulative circumferential extent of these abnormalities. We present this article in accordance with the CLEAR reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-480/rc).


Methods

Dataset description

Patients with diabetes were recruited from our outpatient department between January 2017 and January 2023. The exclusion criteria included topical myotic therapy, previous laser or anti-VEGF therapy, prior ocular surgery, and other ocular disorders or systemic diseases such as retinal detachment, glaucoma, hypertension, and immune disorders. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Tianjin Medical University Eye Hospital Ethics Committee (No. 2018KY-11 for the project “Application of Iris Fluorescein Angiography in Normal Individuals and Diabetic Patients”) and (No. 2016KY-24 for the project “Iris Fluorescein Angiography and Anti-VEGF Treatment Guidance for Diabetic Retinopathy”). Written informed consent was obtained from all participants.

This is a prospective cross-sectional study similar in design to a previous investigation that employed a conventional manual approach to this task (7). A total of 67 patients with PDR were included in this study. The mean age of the patients was 53.51±11.34 years, with 16.42% younger than 40 years, 16.42% between 40 and 49 years, 37.31% between 50 and 60 years, and 29.85% older than 60 years. There were 37 males (55.22%) and 30 females (44.78%). The mean duration of diabetes was 11.16±7.21 years, and the median duration of PDR was 2.00 months (IQR: 0.48–6.00 months). The median best-corrected visual acuity (BCVA) at baseline was 2.00 logMAR [interquartile range (IQR): 1.15–4.90].

Regarding disease staging, 38 patients (56.72%) were classified as stage V and 29 (43.28%) as stage VI. The mean HbA1c level indicated that 29.85% of patients had HbA1c <7.0%, 26.87% had HbA1c between 7.0–8.0%, and 43.28% had HbA1c >8.0% (Table 1).

Table 1

Baseline characteristics of patients

Characteristics Value
Age (years) 53.51±11.34
   <40 11 (16.42)
   40–49 11 (16.42)
   50–60 25 (37.31)
   >60 20 (29.85)
Sex
   Male 37 (55.22)
   Female 30 (44.78)
Duration of diabetes (years) 11.16±7.21
Duration of PDR (months) 2.00 (0.48, 6.00)
BCVA (logMAR) 2.00 (1.15, 4.90)
Staging of DR
   Stage V 38 (56.72)
   Stage VI 29 (43.28)
HbA1c (%)
   <7.0 20 (29.85)
   7.0–8.0 18 (26.87)
   >8.0 29 (43.28)

Data are presented as n (%), mean ± SD or median (IQR). BCVA, best corrected visual acuity; DR, diabetic retinopathy; IQR, interquartile range; logMAR, logarithm of the minimum angle of resolution; PDR, proliferative diabetic retinopathy; SD, standard deviation.

Following methodologies established in prior studies (7), subjects underwent a series of comprehensive ocular examinations, including slit-lamp biomicroscopy, ultra-widefield fluorescein angiography (uwFFA), and ultra-wide field colour fundus photography using Optos Panoramic 200 (Optos). However, the primary focus of this study’s analysis was on the use of indocyanine green angiography performed with Spectralis HRA (Heidelberg Engineering, Germany), which followed similar protocols to those described in earlier research.

In a refined protocol aimed at enhancing the comparison between iris and fundus angiograms, both uwFFA and IFA were performed sequentially during the same session. Initially, a 5-mL dose of 20% fluorescein was administered intravenously without pupil dilation. The IFA was then conducted in the early phase for one minute, followed by uwFFA for 1–3 minutes, and concluded with a 15-minute late-phase recording of the IFA.

Based on this imaging protocol, two distinct datasets were established: one for training and validating the segmentation model, and another for evaluating the final circularity detection algorithm.

Training, validation, and test sets for segmentation

The original dataset consisted of 2,867 IFA images from 67 subjects. After excluding 434 low-quality or clinically unusable images (e.g., severely blurred or occluded pupils), a total of 2,433 images were retained for pupil segmentation. To prevent patient-level data leakage across subsets, the dataset was re-divided into training, validation, and test sets comprising 1,377, 410, and 412 images from 29, 9, and 12 patients, respectively. A dual-objective greedy strategy was employed during the splitting process to balance both the number of images and the number of subjects across subsets as closely as possible, while ensuring that each patient appears in only one subset.

The segmentation dataset maintained diverse image quality to simulate real-world clinical variability. We did not impose strict criteria on clarity or leakage visibility; the only requirement was that the pupil be recognisable by human observers. Images with blurred, partially occluded, or leakage-obscured pupil contours were still included, provided the pupil region could be inferred from surrounding anatomical context. Manual annotations were provided by a single ophthalmologist with more than 5 years of experience. All pupil boundaries were manually annotated using polygons with at least ten vertices to ensure high-precision masks. Each image was cropped to a square region centred on the lens and resized to 640×640 pixels.

Independent test set for final circularity evaluation

To validate the full pipeline—including leakage circularity estimation—we used a completely independent test set of 131 images selected from a separate pool of 500 IFA images (from the same cohort but not overlapping in patients with the segmentation dataset). These images were chosen based on clinical quality standards and ensured subject-level representation. Since this test set does not share any patients with the segmentation dataset, its results remain unaffected by the segmentation split revision.

This study was conducted and reported in accordance with the CLEAR (Checklist for Evaluation of Radiomics) guidelines.

Pupil segmentation algorithm

In our algorithm, accurately locating and segmenting the pupil’s boundary is the primary task, crucial for detecting leakages around the pupil. Abnormalities inside or far from the pupil’s edge are excluded from our circumferential calculation. Precise calculation of leakage circumferential extent also necessitates knowing the pupil’s centre.

For pupillary segmentation, we employed the You Only Look Once (YOLO) model, specifically version YOLOv8n. YOLO, a real-time object segmentation system, differs from traditional methods by simultaneously performing object segmentation and classification in a single step (12). Its efficiency in speed and relative accuracy is noteworthy, particularly its global image understanding which reduces false segmentations from minor, irrelevant image disturbances. However, YOLO has limitations, such as challenges with small or highly overlapping objects. But for our application, i.e., pupil segmentation, YOLO’s real-time and accurate nature makes it a suitable choice.

YOLOv8 was chosen due to its latest advancements in the YOLO series, incorporating modern design strategies and techniques (13). YOLOv8’s backbone and Neck incorporate YOLOv7 ELAN’s design principles, with significant model performance improvements. It is important to note, however, that some operations in the C2f module may not be as hardware-friendly. Changes in the Head part compared to YOLOv5 are substantial, including the adoption of a decoupled head structure separating classification and segmentation heads and switching from Anchor-Based to Anchor-Free. The model uses TaskAlignedAssigner for positive sample allocation and introduces Distribution Focal Loss for loss function calculation. Data augmentation strategies from YOLOX are also integrated.

These designs enhance the model’s performance in classification and segmentation. YOLOv8n showed a significant improvement in mAP on the COCO Val 2017 dataset compared to YOLOv5n, increasing from 28.0 to 37.3. The ‘n’ version was selected for its minimal parameters (3.2M), offering two advantages: (I) high-speed segmentation capability, with inference times ranging from 3.5 to 9 ms per image on a CPU with a batch size of 1; (II) the smaller model size facilitates training and potential overfitting prevention. Given our training set size of 1,412 images, even with pre-trained models, larger models would be challenging to train, making the lightweight YOLOv8n an optimal choice.

The YOLOv8n model employed in our study was initially pre-trained on the COCO dataset (14), which is crucial for leveraging the diverse and extensive range of annotated images available in COCO. This pre-training helps the model generalise better to our specific dataset, despite its relatively small size. Utilising a pre-trained model also reduces the training time and improves the reliability of the results by providing a robust starting point before fine-tuning on our specialised set of images. In our specific application, we utilised the YOLOv8n model to segment pupil contours. The training dataset’s labels were carefully annotated by clinical professionals, ensuring precise polygonal contours that accurately represent the pupil boundaries.

Leakage circularity detection algorithm

Firstly, it is essential to accurately determine the contour of the pupil, as leakages only appear around the pupil. Identifying the position of the pupil helps in precisely locating the leakages. The pupil segmentation algorithm provides the contour of the pupil in the fluorescein angiography image. This is represented as a set of 𝑘 points:

P={(xi,yi)}i=1k

Forming a Convex Polygon (Figures 1,2). Next, we aim to locate the centre of the pupil, a critical element for calculating the circumferential measure around the pupil (Figure 1B). The centre C of the pupil is calculated by averaging all points on the pupil contour. If it is the centroid of a convex polygon:

(Cx,Cy)=(i=1kxik,i=1kyik)

Figure 1 Panels illustrate the process of pupil detection and analysis. Starting from the original image, the pupil contour (green) and centroid (red) are detected and overlaid on the image to accurately locate the shape and centre of the pupil (A,B). Subsequently, the pupil contour is expanded to generate an outer contour (dark blue), defining the peripheral region of the pupil (C). Next, a smaller inner contour (light blue) is created through further expansion (C). Finally, the area between the inner and outer contours is analysed, and leakage regions are marked with red shading (D).
Figure 2 Flowchart of the process of pupil detection and analysis. (A) Segment the pupil contour using a segmentation algorithm and locate its centre point. (B) Calculate the Outer Contour using the T1. (C) Determine the Inner Contour using T2. The region between the Outer and Inner Contours is defined as the leakage detection area (D). (D) Divide D into T3 sections, then evaluate each section for potential leakage by applying T4 and T5. (E) Calculate the circumferential area where leakage is detected (orange area).

Establishing the Outer Contour (Figures 1C,2B): following this, we establish the search contours—both outer and inner. Since the leakages only occur around the pupil, limiting the search area is vital to prevent detection from being affected by extraneous factors. For each point (xi,yi), on the contour, extend outward in the direction of the line from the centre C by T1 times the original distance from the centre to the pupil contour, where T1 is constrained between 0.1 and 0.4 to match the typical range where normal leakages occur. The new points are calculated as:

(xi,yi)=[xi+T1×(xiCx),yi+T1×(yiCy)]

where (Cx,Cy) are the coordinates of point C. T1 is designed to restrict the outward search boundary closely around the pupil.

Establishing the Inner Contour (Figures 1C,2C): in addition to the outer boundary, it is also necessary to establish an inner boundary, which is defined by the hyperparameter T2. The significance of the inner boundary lies in the irregular contour of the pupil; we aim for the model to be unaffected by such irregularities. Similarly, we get points on the inner contour (xi,yi) by extending each point on the pupil contour outward (or inward, depending on the sign of T2) by constant T2 times:

(xi,yi)=[xi+T2×(xiCx),yi+T2×(yiCy)]

where T2 provides stability to the algorithm, especially considering two scenarios: first, if the pupil is significantly brighter than its surroundings and our identified contour is smaller than the actual pupil, leading to a risk of misidentification of the pupil’s edge as a leakage. Second, there is a small probability that the leakage may develop inward towards the pupil rather than outward. In such cases, if we only detect leakage outside the pupil, we might miss some instances of leakage occurring inside the pupil. By setting T2 to a negative value, we can expand the search range to include the interior of the pupil, thus improving our detection coverage. Thus, the optimal inner contour might be slightly larger or smaller than the actual pupil contour. The value of T2 usually ranges from −0.1 to 0.1.

Defining the leakage detection area (Figures 1D,2C): the leakage detection area D is determined to lie between the inner and outer contours.

Area division (Figure 2D): the 360-degree circularity is divided into T3 parts, referenced from the centre point C, and rays from C further divide area D into Dj[j(,T3)]. T3 determines the scale for leakage detection. A larger T3 means a finer division around the circularity for more precise leakage localisation but may also increase misjudgment of the actual extent of smaller leakages. T3 is typically set to 180, 360, or 720.

Identifying pathological areas (Figures 1D,2F): for each detection area Dj, convert it to a grayscale image, then count pixels with a grayscale value above T4. T4 is used to determine which grayscale areas are considered anomalies. In fluorescein angiography images, this typically means areas significantly brighter than their surroundings. The value of T4 usually ranges from 80 to 220. If this pixel count exceeds T5, the area is considered pathological, denoted as:

Rj={1,ifpixelcountexceedsT50,otherwise

T5 determines the size threshold for detectable anomalies, avoiding misidentification of very small white spots caused by noise or other factors as leakages. The value of T5 usually ranges from 2 to 10.

Calculating circularity (Figures 1D,2F): the circularity of the leakage is calculated as:

Circularity=j=1T3Rj×(360°T3)

This algorithm allows for the precise calculation of the leakage’s proportion on the pupil’s circularity, providing an important reference for further diagnosis and treatment.

The circumferential detection algorithm is tailored for IFA images, where abnormal areas around the pupil usually appear as light-grey, white spots distinctly different from their surroundings. To significantly differentiate these anomalies, we introduced several key threshold parameters: T1, T2, T3, T4, and T5. These parameters require manual adjustment based on actual medical images and application needs to ensure optimal detection effectiveness.

Experiments

To achieve accurate pupil segmentation, we first utilised the pupil segmentation dataset for training and testing the model. The training process spanned 100 epochs and did not incorporate an early stopping strategy. The best model version was selected based on the performance of the validation set. The learning rate was set to 0.01, with the Adam optimiser chosen. During training, a batch size of 16 was used, and the model was initialised with pre-trained weights from the YOLO model on the COCO dataset, aiming to accelerate convergence and enhance robustness.

After completing the pupil segmentation, we applied the Leakage Circularity Detection Algorithm to calculate the final circularity occupied by the leakage. Given the significant influence of several hyperparameters (T1 to T5) on the detection outcome, we employed a greedy search strategy for optimisation. In this approach, each parameter was tuned sequentially while keeping the others fixed, selecting the value that yielded the best performance at each step.

The search spaces for each parameter were as follows:

T1[0.10,0.15,0.20,0.25,0.30,0.35,0.40]T2[0.10,0.05,0.00,0.05,0.10]T3{360,180,90}T4[80,100,120,140,160,180,200,220]T5{2,3,456}

The evaluation criterion combined quantitative comparison [intersection over union (IoU)] with manual annotations and qualitative assessment via visual inspection to ensure clinically meaningful and robust performance. Following the search, the selected values for T1 to T5 were 0.25, 0, 360, 100, and 3, respectively.

Evaluation

To measure the performance of our algorithm, we compared its results with the labels to observe the performance differences between our algorithm and human clinicians. The labels were annotated by clinical staff, indicating the location of lesions and their final circumferential extent. Specifically, we will calculate the mean absolute error (MAE), mean absolute percentage error (MAPE) (15), and IoU (16) for each group compared to the labels. These metrics will help us assess whether our algorithm meets or exceeds the diagnostic performance of the clinical staff.

Additionally, we involved two other clinical staff members to annotate the test set separately, allowing us to compare the performance of the algorithm with human clinical practice. The key difference between these two groups and the labels is that the labels were very precisely annotated, with multiple experts discussing and consulting on disputed areas.

Moreover, we will compare the results of the two groups of clinical staff to examine the consistency between different clinicians. By comparing the predictions of the two clinicians, we can understand the degree of consistency in their evaluations of the same set of IFA images.

Our primary objective was the quantification of the proportion of lesions or leakages occupying the circularity of the pupil’s edge. To evaluate the accuracy of this regression task, we utilised MAE and MAPE. MAE calculates the average absolute difference between predicted and actual values, offering an intuitive measure of error. MAPE calculates the average percentage error between predictions and actual values, providing a measure of relative error, and aiding in understanding the degree of deviation of predictions from actual values.

MAE=1ni=1n|CircularityiCircularltyi^|

MAPE=100%ni=1n|CircularityiCircularltyi^Circularityi|

MAE and MAPE each have their advantages and limitations, especially under different lesion ranges where these metrics can perform quite differently. With smaller lesion ranges, MAE more accurately reflects the actual size of prediction errors since it directly computes the absolute difference between predictions and true values, unaffected by the lesion size. Here, even minor errors are captured by MAE. In contrast, MAPE, which calculates the percentage of relative error, can translate small absolute errors into significant percentage errors, possibly exaggerating the actual differences in model performance.

Conversely, MAPE becomes more appropriate with larger lesion ranges, where errors in marking lesions are likely more frequent and numerous, thus focusing more on the percentage of errors rather than their absolute size. MAPE reflects model performance in extensive lesion detection better, while MAE might not fully show the overall impact of errors, especially when lesion ranges vary greatly. However, it’s crucial to note that in cases with no or minimal lesions, MAPE may become uncalculable (due to a denominator near or equal to zero) or yield abnormally high results. Therefore, choosing and interpreting these metrics should be done cautiously, considering the specific application context and lesion range.

Building on this, we also employ the Bland-Altman plot to further analyse the agreement between our algorithm’s predictions and the labels. In this context, the Bland-Altman plot will allow us to compare the absolute differences in the leakage circularity measurements against the mean of these measurements. By plotting the differences against the averages, we can visually assess if our algorithm tends to overestimate or underestimate the leakage circularity relative to the human experts.

However, direct comparison of circularity has two limitations. For one, the substantial variance in manual annotations complicates their reliability as true labels. Furthermore, an interesting but misleading phenomenon is that algorithmic calculations of circumferential extent close to the actual extent might occur coincidentally. For instance, inaccurate identification of abnormal points could still result in a total circumferential extent matching the actual condition. Therefore, we also incorporated positional accuracy measures.

Therefore, we provide two evaluation methods that can measure the differences in leakage location detection among the various groups. While our task somewhat resembles image segmentation, we do not need exact leakage or leakage locations, focusing instead on accurate circularity occupation. This translates into a one-dimensional segmentation problem. To evaluate the algorithm’s accuracy in extracting leakage or leakage positions, we used the IoU. IoU is widely used in image segmentation tasks and is defined as the ratio between the overlap of the predicted leakage region and the reference annotated region to their union:

IoU=|PredictionReference||PredictionReference|


Results

Pupil segmentation performance

The segmentation model was evaluated independently on the test set of the segmentation dataset, with no patient overlap between the test set and the training or validation sets. The model achieved a Dice coefficient of 0.982 and an IoU of 0.967, indicating good delineation performance under real-world variations. Visual inspection of difficult cases confirmed that the predicted pupil contours were generally consistent with clinical expectations, even under low contrast or leakage interference.

Leakage circularity estimation performance

As shown in Table 2, our method showed significantly lower MAPE (21.64%) and MAE (20.81 degrees) compared to both Clinical Staff 1 (MAPE: 58.38%, MAE: 34.23 degrees) and Clinical Staff 2 (MAPE: 75.71%, MAE: 43.17 degrees). As shown in Table 3, the differences in both MAPE and MAE between our method and Clinical Staff 1/2 were statistically significant, with all t tests yielding highly significant p values. This indicates that our algorithm is much more accurate in predicting the proportion of leakages occupying the circularity of the pupil’s edge.

Table 2

Performance metrics for algorithm and clinical staff predictions

Method IoU (%) MAPE (%) MAE (degrees)
Our method 39.3 21.6 20.8
Clinical Staff 1 44.5 58.4 34.2
Clinical Staff 2 41.7 75.7 43.2
Staff 1 vs. Staff 2 54.8 18.2 18.2

The table shows the performance metrics for our method and the predictions made by two clinical staff members, all compared to the labels. Metrics include IoU, MAPE, and MAE. The last row indicates the comparison metrics between Clinical Staff 1 and Clinical Staff 2, showing inter-rater disagreements. IoU, intersection over union; MAE, mean absolute error; MAPE, mean absolute percentage error.

Table 3

Independent t-test results comparing the algorithm with Clinical Staff 1 and Clinical Staff 2 for MAPE and MAE

Clinical comparator MAPE (%) MAE (degrees)
t value P value t value P value
Clinical Staff 1 −5.59 <0.001 −7.58 <0.001
Clinical Staff 2 −3.59 <0.001 −6.11 <0.001

MAE, mean absolute error; MAPE, mean absolute percentage error.

Our method achieved an IoU of 0.393, which is slightly lower than those of Clinical Staff 1 (IoU: 44.5%) and Clinical Staff 2 (IoU: 41.7%). This suggests that our algorithm is somewhat weaker than the clinical staff in identifying the exact locations of the leakages.

When comparing the two clinical staff members, their results show an IoU of 54.8%, which is not particularly high. Additionally, their MAPE (18.22%) and MAE (18.24 degrees) are not especially low. This indicates that there is a considerable amount of disagreement between the two clinicians, highlighting the variability in their predictions.

In summary, our method demonstrates superior accuracy in terms of relative and absolute error when predicting the extent of leakages around the pupil’s edge compared to clinical staff. Although our predictions are slightly less precise in terms of the exact leakage locations, the overall performance of our algorithm is competitive. The results also reveal significant discrepancies between the predictions of the two clinical staff members, suggesting variability in their assessments.

Additionally, the Bland-Altman plot (Figure 3) reveals that our algorithm exhibits a certain systematic bias in predicting the leakage circularity. The plot shows that as the circularity increases, the prediction error correspondingly increases.

Figure 3 Bland-Altman plots illustrate the comparison of leakage circularity. (A) Comparison between the predicted leakage circularity by the algorithm and the reference annotations. (B) Comparison between the leakage circularity annotations by Clinician 1 and Clinician 2. The plots highlight the agreement between the predictions and annotations by showing the mean difference and limits of agreement.

Discussion

Comparison between our algorithm and human results

The evaluation of our algorithm, which integrates deep learning and mathematical approaches, against the predictions made by two clinical staff members reveals several key insights into the performance and reliability of our method.

Firstly, our method demonstrated significantly lower MAPE (21.64%) and MAE (20.81 degrees) compared to both Clinical Staff 1 (MAPE: 58.38%, MAE: 34.23 degrees) and Clinical Staff 2 (MAPE: 75.71%, MAE: 43.17 degrees). This substantial difference underscores the superior accuracy of our algorithm in predicting the proportion of leakages occupying the circularity of the pupil’s edge. The lower MAPE and MAE indicate that our method not only minimises the overall prediction errors but also ensures that the relative errors are significantly reduced compared to human predictions.

However, our algorithm’s performance in terms of IoU, while competitive, was slightly lower than that of the clinical staff. Our method achieved an IoU of 39.3%, whereas Clinical Staff 1 achieved an IoU of 44.5%, and Clinical Staff 2 achieved an IoU of 41.0%. These metrics suggest that while our algorithm is highly accurate in predicting the extent of leakages, it is slightly less precise than the clinical staff in identifying the exact locations of the leakages. The smaller gap in IoU implies that our algorithm generally locates the leakages correctly but misses some details. Given that we generate annotated images showing the predicted leakage positions, clinical staff can still evaluate and verify the algorithm’s accuracy during use, ensuring practical applicability. We will discuss the specific issues contributing to these performance differences and potential improvements in the subsequent sections.

The comparison between the two clinical staff members highlights a notable variability in their predictions. The IoU between Clinical Staff 1 and Clinical Staff 2 were 54.8% and 68.4%, respectively, indicating moderate agreement. However, their MAPE (18.22%) and MAE (18.24 degrees) suggest that there is a considerable amount of error and variability in their assessments. This variability points to the subjective nature of human interpretation in medical imaging, which can lead to inconsistencies in diagnosis. All the metrics indicate that the differences between the judgments of the two clinical staff members are not small, though these differences are smaller than the differences between each clinician’s predictions and the labels. Our algorithm does not exhibit this kind of bias, providing more consistent results.

Figure 3 illustrates the variability between human annotators, showing that they are not in perfect agreement. This can be seen from the systematic bias, where the red dashed line representing the average difference between the clinicians is non-zero. Such bias indicates that even trained clinicians, who might be expected to align closely in their assessments, demonstrate some inconsistency. For smaller circumferences (<90 degrees), the disagreements between the clinical staff members are relatively minor, yet clearly biased. This suggests that, while clinicians may generally agree on small leakage extents, their predictions are not always fully accurate and may follow a certain skew. For larger circumferences, the disagreement between clinical staff becomes more pronounced, with errors occurring in both positive and negative directions. This increase in variability implies that as the complexity of the leakage assessment grows, so does the difficulty for clinicians to accurately and consistently evaluate it.

In conclusion, while our hybrid algorithm shows superior performance in reducing relative and absolute errors in predicting leakage circularities, there remains a slight gap in spatial precision compared to human experts. The variability observed between different clinical staff members further underscores the potential of our algorithm to provide more consistent and reliable predictions. Future work will focus on enhancing the spatial accuracy of the algorithm to fully match or exceed human performance, as discussed in the following sections.

Advantages of the algorithm

In Figure 4, we present four sets of images that reveal the performance of our model in handling different types of leakages. The selection criterion was the intersection of 20 images from the test set with the lowest MAPE and the 20 with the highest IoU. The MAPE values for these four sets are 0.5%, 1.0%, 1.2%, and 1.3%, while the corresponding IoU values are 93.5%, 91.9%, 83.8%, and 81.8%, respectively. Overall, these images demonstrate the model’s ability to recognise and localise leakages, especially in cases where the model’s predictions closely align with manual annotations. Here is a specific analysis of these four sets of images:

  • Figure 4A,4B (radiating small leakages): this image presents a special case where the leakages are small and scattered, forming a radiating pattern. This scenario poses a challenge for the model, as it needs to accurately identify and locate each small leakage. However, due to the clear edges of these leakages, the model maintains high consistency with manual annotations, demonstrating its accuracy in handling such situations.
  • Figure 4C,4D (large continuous leakages): in this image, the leakage area is large and continuous with clear boundaries. This situation is relatively simpler for the model because large, continuous areas are easier to recognise. The model’s prediction closely matches the manual annotations, showcasing its strong capability in handling such types of leakages.
  • Figure 4E,4F (large blurry leakages): this image shows leakages that are blurry and have severe leakage due to late imaging. Despite the increased difficulty in recognition, the model still manages to capture the location and extent of the leakages to a certain degree, maintaining good consistency with manual annotations.
  • Figure 4G,4H (small concentrated leakages): in this image, the leakage area is small but concentrated with clear edges. This type of situation is also well-handled by the model, which can accurately identify and locate the leakages, providing results close to manual annotations.
Figure 4 Images with the best performance and corresponding manual annotations. The selection criterion was the intersection of 20 images from the test set with the lowest MAPE and the 20 with the highest IoU. The MAPE values for these four sets are 0.5%, 1.0%, 1.2%, and 1.3%, while the corresponding IoU values are 93.5%, 91.9%, 83.8%, and 81.8%, respectively. (A,B) Radiating small leakages. (C,D) Large continuous leakages. (E,F) Large blurry leakages. (G,H) Small concentrated leakages. For each pair of images, the left image shows the zoomed-in manual annotations and the right image shows the leakage locations detected by the model. IoU, intersection over union; MAPE, mean absolute percentage error.

In summary, these various types of leakage images demonstrate the adaptability and accuracy of our model. Although there is room for improvement in handling certain special cases, the model provides reliable and accurate leakage detection in a variety of clinical scenarios, proving its great potential in practical applications.

Potential clinical applications

Although the IoU of our algorithm was modestly lower than that of clinical staff, this limitation should be interpreted with caution. In clinical practice, treatment decisions such as the timing of PRP or anti-VEGF therapy are guided more by the quantified extent of iris leakage than by exact pixel-level localization. Importantly, our algorithm significantly reduced both MAE and MAPE, which directly reflect the accuracy of quantifying leakage extent. These improvements suggest that the tool is well suited to support clinical decision-making where precise quantification is critical. Nevertheless, we acknowledge that higher spatial accuracy would further enhance clinical confidence, and future work will focus on optimizing the algorithm to improve IoU while maintaining its superior quantitative performance.

Beyond performance metrics, our method also holds potential for deployment in real-world settings. First, as an adjunctive tool in DR screening programs, it can automatically flag patients with early NVI, particularly in subtle cases that are easily overlooked on slit-lamp examination. Second, in ophthalmic clinics, the algorithm may serve as a decision-support system by providing objective and reproducible quantification of leakage, thereby assisting clinicians in determining treatment strategies such as when to initiate PRP or anti-VEGF therapy. Third, in research contexts, automated and standardized measurements could facilitate longitudinal monitoring of treatment responses and enable consistent cross-cohort comparisons. Together, these applications underscore the translational potential of our approach to bridge technical innovation with clinical care.

Algorithm limitations

To gain a deeper understanding of the model’s weaknesses and shortcomings, we carefully analysed the 20 images from the test set with the largest MAPE and the 20 with the IoU. There were three images common to both groups, as illustrated in Figure 5. The IoU values for these three images were 14.9%, 13.6%, and 13.1%, while their MAPE values were 87.5%, 81.0%, and 55.1%, respectively. In these figures, the left column displays the manually annotated images, while the right column shows the model output.

Figure 5 Images with the poorest performance and corresponding manual annotations. These images represent cases where the model struggled the most in recognising and localising leakages. The selection criterion was the intersection of 20 images from the test set with the highest MAPE and the 20 with the lowest IoU. The MAPE values for these three sets are 12.5%, 15.2%, and 18.7%, while the corresponding IoU values are 45.2%, 38.6%, and 32.4%, respectively. (A,B) Case 1. (C,D) Case 2. (E,F) Case 3. For each pair of images, the left image shows the zoomed-in manual annotations, and the right image shows the leakage locations detected by the model. IoU, intersection over union; MAPE, mean absolute percentage error.

For the pairs of images in Figure 5A,5B, as well as Figure 5E,5F, the model made similar errors. It tended to miss small, dim, or blurry leakages, even though it correctly detected the major leakages. One solution involves standardizing the brightness of the area around the pupil to prevent issues arising from dim images. Additionally, resizing the images based on the pupil area to a certain extent can help avoid the problem of having a too-small pupil and surrounding leakages. These adjustments can potentially enhance the model’s ability to detect finer details and improve overall performance.

For the pair of images shown in Figure 5C,5D, the model only detected a portion of the leakages extending inward towards the pupil (a rare leakage scenario). In the future, we plan to incorporate a decision algorithm into the model to determine whether the leakages also extend into the pupil. If such an extension is detected, we will decrease the threshold T2 to increase the model’s search range for leakages extending into the pupil. This adjustment aims to ensure that the model captures the entirety of such inward leakages, thereby improving its detection accuracy in these specific cases.

The hyperparameters of the algorithm were optimized during the training process and remained fixed during our evaluation. In clinical practice, users can manually adjust them if needed. Future work can explore the generalizability of the algorithm’s performance across different populations. Additionally, we plan to investigate the potential for improving performance in individual cases through automatic, semi-automatic, or fully manual hyperparameter adjustments using a custom visual interface. For instance, we may implement automatic hyperparameter tuning for test samples.


Conclusions

DR is a serious public health issue, with neovascular glaucoma being a severe complication of this condition. To assess such leakages more accurately and efficiently, we have successfully developed a deep learning-based algorithm capable of automatically quantifying retinal leakage from fundus fluorescein angiography images. This not only significantly reduces the complexity and error of manual operations but also enhances the efficiency and accuracy of assessments.

Our algorithm addresses a long-standing challenge in standardising the measurement of leakage circularity. By leveraging deep learning, we ensure that each measurement is performed according to consistent and objective criteria, minimising human bias and variability. Moreover, the algorithm is more than just a statistical tool; it precisely locates leakage positions and intuitively annotates them on the images, which is greatly beneficial for physicians in further analysis and treatment.

Importantly, while prior studies (5, 7, and 8) have indicated a strong relationship between pupillary margin leakage and the severity of PDR, the specific circularity biomarker proposed in our study has not yet been clinically validated as an independent predictor of DR severity or disease progression. We acknowledge this limitation and propose this biomarker as a promising, quantifiable indicator for future validation. Its potential for objective longitudinal monitoring warrants further clinical investigation.

Overall, the deep learning algorithm that we developed provides a powerful and reliable tool for automated IFA analysis, and it holds promise not only for improving diagnostic accuracy and workflow efficiency but also for advancing our understanding of disease progression in DR.


Acknowledgments

The authors would like to thank Tianjin Medical University Eye Hospital and the University of the West of England for providing institutional support.

During the preparation of this work, the authors used ChatGPT (OpenAI) to improve the readability and language of the manuscript. After using this tool, the authors carefully reviewed and edited the content as needed and took full responsibility for the content of the published article.


Footnote

Reporting Checklist: The authors have completed the CLEAR reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-480/rc

Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-480/dss

Funding: This study was supported by the Independent and Open Project of Tianjin Key Laboratory of Retinal Function and Diseases (No. 2020tjswmm004) and Tianjin Key Medical Discipline (Specialty) Construction Project (No. TJYXZDXK-037A).

Conflicts of Interest: The authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-480/coif). The authors have no conflicts of interest to disclose.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was approved by the Tianjin Medical University Eye Hospital Medical Ethics Committee (No. 2018KY-11 for the project “Application of Iris Fluorescein Angiography in Normal Individuals and Diabetic Patients”) and (No. 2016KY-24 for the project “Iris Fluorescein Angiography and Anti-VEGF Treatment Guidance for Diabetic Retinopathy”). This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. Written informed consent was obtained from all participants involved in the study.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Rodrigues GB, Abe RY, Zangalli C, Sodre SL, Donini FA, Costa DC, Leite A, Felix JP, Torigoe M, Diniz-Filho A, de Almeida HG. Neovascular glaucoma: a review. Int J Retina Vitreous 2016;2:26. [Crossref] [PubMed]
  2. Cui Y, Luo GW, Xie CF, Wen F, Huang SZ, Liu CJ, Guan TQ. Clinical value of iris fluorescein angiography in diagnosis of uveitis in Chinese with brown iris. Chin J Exp Ophthalmol 2012;30:625-8.
  3. Cui Y, Luo GW, Liu X, Wen F, Huang SZ, Lv L, Liu CJ, Guan TQ. Iris fluorescein angiography in diagnosing iris neovascularization in brown iris of Chinese. Chin J Pract Ophthalmol 2007;25:1094-7.
  4. Brancato R, Bandello F, Lattanzio R. Iris fluorescein angiography in clinical practice. Surv Ophthalmol 1997;42:41-70. [Crossref] [PubMed]
  5. Avery RL, Pearlman J, Pieramici DJ, Rabena MD, Castellarin AA, Nasir MA, Giust MJ, Wendel R, Patel A. Intravitreal bevacizumab (Avastin) in the treatment of proliferative diabetic retinopathy. Ophthalmology 2006;113:1695.e1-15. [Crossref] [PubMed]
  6. Zhao J, Huang H, Wang C, Yu M, Shi W, Mori K, Jiang Z, Liu J. Dual contrastive learning for synthesizing unpaired fundus fluorescein angiography from retinal fundus images. Quant Imaging Med Surg 2024;14:2193-212. [Crossref] [PubMed]
  7. Ke Y, Zhang H, Hei K, Shi Y, Li X, Zhang L. Application of iris fluorescein angiography in diabetic iridopathy and retinopathy in brown iris. Eur J Ophthalmol 2021; Epub ahead of print. [Crossref]
  8. Ding Y, Yan S, Wang L, Chen Y, Song T, Zhang L, Li Z, Yang Z, Tian B. Multimodal characteristics of iris neovascularization in patients with proliferative diabetic retinopathy. Chin J Ophthalmol Med 2021;11:8-13. (Electron Ed).
  9. Confalonieri F, Ngo HB, Petersen HH, Eide NA, Petrovski G. Iris racemose hemangioma assessment with swept-source optical coherence tomography angiography: A feasibility study and stand-alone comparison. J Clin Med 2022;11:6575. [Crossref] [PubMed]
  10. Sastry A, Ryu C, Jiang X, Ameri H. Visual Outcomes in Eyes With Neovascular Glaucoma and Anterior Segment Neovascularization Without Glaucoma. Am J Ophthalmol 2022;236:1-11. [Crossref] [PubMed]
  11. Zhao Q, Zhang H, Han JD, Zhang LL. Application of iris angiography combined with ultra-wide-field fundus fluorescein angiography in diabetic retinopathy. Zhonghua Yan Ke Za Zhi 2021;57:916-21. [Crossref] [PubMed]
  12. Terven J, Córdova-Esparza DM, Romero-González JA. A comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach Learn Knowl Extr 2023;5:1680-716.
  13. Jocher G, Chaurasia A, Qiu J. Ultralytics YOLO. Available online: https://github.com/ultralytics/ultralytics
  14. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft COCO: Common objects in context. In: Proc 13th Eur Conf Comput Vis (ECCV); 2014. p. 740-55.
  15. Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim Res 2005;30:79-82.
  16. Jaccard P. The distribution of the flora in the alpine zone. New Phytol 1912;11:37-50.
Cite this article as: Zhu Y, Sun S, Han S, Chen J, Western D, Zhang L. Automated method for quantitative analysis of iris fluorescein angiography based on machine learning. Quant Imaging Med Surg 2026;16(1):68. doi: 10.21037/qims-2025-480

Download Citation