Prediction of postoperative facial nerve function with preoperative magnetic resonance imaging in patients with vestibular schwannoma
Introduction
Vestibular schwannoma (VS) is the most common benign tumor of the internal auditory tract and the cerebellopontine angle region, accounting for approximately 90% of cerebellopontine angle tumors and 8% of intracranial tumors in adults and is primarily treated with surgery (1). Safety and the preservation of facial nerve function are the main objectives of surgery, as patients experiencing facial nerve dysfunction experience a multitude of challenges, including aesthetic defects, psychological distress, sociological difficulties (2,3), and other issues including abnormal taste sensation and dysfunctional lacrimation (4). With advanced microscopy, neuronavigation, intraoperative electrophysiological monitoring, and other technologies, complications from resection and mortality rates have significantly decreased (5). Therefore, the influence of human factors has gradually weakened with developments in surgical and monitoring technologies in recent decades. Currently, attempts are being made to predict postoperative facial nerve outcomes with objective examinations via artificial intelligence (AI).
With advancements in computer technology, machine learning (ML, an AI discipline) has become increasingly incorporated in biomedical fields such as clinical diagnosis, precision treatment, and tumor health monitoring (6). Some studies have investigated facial nerve function by using AI (7-9). Moreover, magnetic resonance imaging (MRI) is a necessary diagnostic method for detecting VS. Therefore, we used preoperative magnetic resonance (MR) images to predict postoperative facial nerve function via AI and compared the performance of the resulting learning models with those of neurosurgeons. We present this article in accordance with the TRIPOD + AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1-2501/rc).
Methods
Patient information and MR images
The information and MR images of 89 VS patients were obtained from the Department of Neurosurgery, Renmin Hospital of Wuhan University, from March 2018 to March 2025. We collected patient information including age, sex, postoperative facial nerve function, the neurosurgeon in charge, and the doctor responsible for patient monitoring. The MR images included T1, T2, and contrast-enhanced images on the cross-sectional, coronal, and sagittal planes. We chose the contrast-enhanced imaging plane showing the maximum tumor area, which tended to differ among the patients. We drew the tumor range manually using VGG Image Annotator (Oxford University, UK). The MR images were randomly divided into a training set and test set at a ratio of 8:2.
Surgical treatment
The chief surgeon for every operation was an expert in the Department of Neurosurgery. Intraoperative neurophysiological monitoring (IONM) was performed during the entire operation, including facial nerve and trigeminal nerve motor-evoked potential (MEP) monitoring; upper limb somatosensory-evoked potential (SSEP) monitoring; facial nerve and trigeminal nerve free-electromyography (free-EMG); facial nerve and trigeminal nerve trigger-electromyography (trigger-EMG); and brainstem auditory-evoked potential (BAEP) monitoring. IONM was performed by experienced monitoring physicians. On the premise of preserving facial nerve function as much as possible, the surgeon removed the tumor to the maximum possible extent.
Postoperative facial nerve function
We collected information on postoperative facial nerve function on the third day after surgery from the electronic medical records system. Facial nerve function was graded according to House-Brackmann (HB) grading system (10). Because all the grades of facial nerve function were lower than HB IV, we defined facial nerve function HB I–II as a good outcome and HB III–IV as a poor outcome.
Model training and testing
We constructed decision tree (DT), gradient boosting decision tree (GBDT), random forest (RF), K-nearest neighbors (KNN), Gaussian naïve Bayes (GNB), support vector machine (SVM), logistic regression (LR), single-channel input deep learning (Single DL), and dual-channel input deep learning (Dual DL) models to predict postoperative facial nerve function on the basis of preoperative MR images. We obtained 275 MR images from 89 VS patients (Figure 1). We then randomly assigned these images to the training and test sets at an 8:2 ratio and assessed model performance.
Man-machine comparison
A total of 15 neurosurgeons from the Department of Neurosurgery, Renmin Hospital of Wuhan University, participated in the comparisons, including 3 expert surgeons (>20 years of neurosurgery experience), 5 senior surgeons (5–20 years of neurosurgery experience), and 7 novice surgeons (1–5 years of neurosurgery). These neurosurgeons independently reviewed the MR images and predicted the grades of postoperative facial nerve function (Figure 1). Afterward, we collected their answers and compared them with the answers generated by the AI models.
Ethics
The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethical Committee of the Renmin Hospital of Wuhan University (No. WDRY2025-K147) and the requirement for individual consent for this retrospective analysis was waived.
Statistical analysis
The performances of the DT, GBDT, RF, KNN, GNB, SVM, LR, Single DL, and Dual DL models, as well as those of the neurosurgeons, were evaluated in terms of accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). One-sample t-tests were used to compare the accuracy, sensitivity, specificity, PPV, and NPV between the neurosurgeons and the DT model. Analysis of variance (ANOVA) was used to compare the accuracy, sensitivity, specificity, PPV, and NPV among novice, senior, and expert neurosurgeons. If heterogeneity of variance was present, the independent-samples Kruskal-Wallis test was used to compare the accuracy, sensitivity, specificity, PPV, and NPV among novice, senior, and expert neurosurgeons. Moreover, one-sample t-tests were used to compare the accuracy, sensitivity, specificity, PPV, and NPV among novices, seniors, experts, and the DT model. A P value <0.05 was considered statistically significant.
Results
Clinical characteristics
Overall, 89 patients were included in this study, and there were 275 MR images obtained from these patients. The patient characteristics, including age, sex, and postoperative facial nerve function, are shown in Table 1. We analyzed the relationships between postoperative facial nerve function and basic information, including age, sex, the neurosurgeon in charge, and the doctor responsible for monitoring. None of these factors were found to be significantly associated with postoperative facial nerve function (Table 2).
Table 1
| Characteristic | MRI test (n=89) |
|---|---|
| Age, years | 54.06±13.07 |
| Sex | |
| Male | 39 (43.8) |
| Female | 50 (56.2) |
| Postoperative facial nerve function | |
| I | 28 (31.5) |
| II | 16 (18.0) |
| III | 33 (37.1) |
| IV | 12 (13.5) |
| V | 0 |
| VI | 0 |
Data are presented as number (%), or mean ± standard deviation. MRI, magnetic resonance imaging.
Table 2
| Characteristics | Postoperative facial nerve function (P value) |
|---|---|
| Age, years | 0.695 |
| Sex | 0.228 |
| Neurosurgeon | 0.183 |
| Monitoring doctors | 0.055 |
The performance of feature-extraction models and fitting prediction models
After model feature extraction, we chose 8 image features and 5 quantitative features to construct or sift the models, with the image features including color principal component, K-means color clustering, histogram of oriented gradient (HOG), local entropy, texture, discrete cosine transform (DCT) coefficient, and area and near roundness; moreover, the quantitative features included 0–1 cross, roughness, contrast ratio, directionality, and flatness (Figure 2A).
The features were input to nine ML-based fitting prediction models. The results demonstrated that the DT model exhibited the best performance. The features selected by the DT model included the color principal component, HOG, texture, DCT coefficient and flatness, with weights of 0.158, 0.225, 0.275, 0.126, and 0.215 being observed, respectively (Figure 2B).
The contributions of each feature
To explain the contributions of each feature, we performed SHapley Additive exPlanations (SHAP) analysis. The results demonstrated that texture, HOG, and flatness positively affected the performance of the prediction model and that DCT coefficient exerted negative effects. Furthermore, texture made the greatest contribution to the model, followed by HOG and flatness. DCT coefficients made a limited contribution, and color principal component did not make any contribution (Figure 3).
Man-machine comparison
We invited neurosurgeons with different levels of experience to read the MR images and subsequently predict postoperative facial nerve function. We subsequently evaluated the performances of each model and the neurosurgeons. Overall, 15 neurosurgeons were evaluated, including 3 expert, 5 senior, and 7 novice neurosurgeons. We compared the performances of the neurosurgeons and DT model and observed that the performance of the DT model was better than that of the neurosurgeons (Table 3 and Figure 4). There was no significant difference observed between the DT model and the expert neurosurgeons in terms of sensitivity or NPV; however, the other performance metrics of the DT model were better than those of the experts. We analyzed the performances of neurosurgeons with different experience levels, and the results revealed that experts exhibited better sensitivity; however, the other indices did not significantly differ among the neurosurgeon groups.
Table 3
| Model/neurosurgeon | Accuracy | Sensitivity | Specificity | PPV | NPV |
|---|---|---|---|---|---|
| DT | 80.39*** | 92.31*** | 76.32*** | 57.14*** | 96.67*** |
| GBDT | 78.43 | 76.92 | 78.95 | 55.56 | 90.91 |
| RF | 76.47 | 84.62 | 73.68 | 52.38 | 93.33 |
| KNN | 68.63 | 92.31 | 60.53 | 44.44 | 95.83 |
| GNB | 64.71 | 53.85 | 68.42 | 36.84 | 81.25 |
| SVM | 62.75 | 46.15 | 68.42 | 33.33 | 78.79 |
| LR | 54.90 | 92.31 | 42.11 | 35.30 | 94.12 |
| Single DL | 64.71 | 61.54 | 65.79 | 38.10 | 83.33 |
| Dual DL | 68.63 | 53.85 | 73.68 | 41.18 | 82.35 |
| Neurosurgeon (n=15) | |||||
| Novices (n=7) | 57.98 (41.60–74.37)‡ | 48.35 (32.60–64.11)‡‡‡ | 61.28 (35.72–86.83) | 34.64 (25.04–44.22)‡‡ | 75.66 (67.24–84.08)‡‡ |
| Seniors (n=5) | 57.25 (47.16–67.35)‡‡ | 66.15 (53.34–78.97)‡‡ | 54.21 (38.27–70.15)‡ | 33.89 (26.24–41.55)‡‡ | 82.57 (77.61–87.52)‡‡ |
| Experts (n=3) | 43.14 (21.91–64.37)‡ | 89.74 (78.71–100)†† | 27.19 (7.73–53.61)‡ | 29.90 (20.21–39.59)‡‡ | 87.55 (70.23–100) |
Data are presented as number (%) or median (interquartile range). The one-sample t-test was used to compare the accuracy, sensitivity, specificity, PPV, and NPV between neurosurgeon and DT. The ANOVA test was used to compare the accuracy, sensitivity, specificity, PPV, and NPV among novices, seniors and experts. If there was heterogeneity of variance, independent-samples Kruskal-Wallis test was used to compare the accuracy, sensitivity, specificity, PPV, and NPV among novices, seniors, and experts. The one-sample t-test was used to compare the accuracy, sensitivity, specificity, PPV, and NPV between novices, seniors, experts and DT. *, significant difference between DT and all neurosurgeons. ***, P<0.001. †, significant difference among novices, seniors, and experts. ††, P<0.01. ‡, significant difference between DT and neurosurgeons in different levels. ‡, P<0.05; ‡‡, P<0.01; ‡‡‡, P<0.001. ANOVA, analysis of variance; DL, deep learning; DT, decision tree; GBDT, gradient boosting decision tree; GNB, Gaussian naïve Bayes; KNN, K-nearest neighbors; LR, logistic regression; MRI, magnetic resonance imaging; NPV, negative predictive value; PPV, positive predictive value; RF, random forest; SVM, support vector machine.
Discussion
With the development of AI, some studies have applied AI to the diagnosis and prediction of diseases. Moreover, some studies have attempted to predict the postoperative facial nerve function of patients with VS (7,8,11,12). There is considerable controversy regarding the factors influencing the outcome of facial nerve function.
The greatest factor regarding the aforementioned controversy is the size of the tumor. One previous study revealed that 3 predictors (tumor diameter, tumor volume, and tumor surface area) were the most important prognostic factors for surgical outcomes (7). Studies have also demonstrated that the probability of facial nerve damage after surgery for tumors with a diameter >3 cm is 6 times greater than that for small-diameter tumors (13). However, other studies have demonstrated that some patients exhibit good facial nerve outcomes with large tumors (14), whereas the facial nerves of patients with small tumors cannot be preserved or the function of the facial nerve is very poor; in some cases, even the anatomical structure cannot be preserved (15). In our study, when we used the area of the tumor to predict the function of the facial nerve, the results revealed that the area of the tumor did not affect the outcomes. Therefore, we believe that the area of the VS does not affect postoperative facial nerve function.
The models in our articles are completely built with the data which are analyzed from MRI by software. In the results, texture makes the most contribution to postoperative facial nerve function in the model. Texture analysis could provide image information that human eyes cannot detect, where it is a statistical method of quantifying gray-level intensity (16,17). In MRI, gray-level is related with density and water content of tumor, thus we speculated that texture value might reflect the density and water content of tumor, furthermore the level of adhesion to facial nerve because of tumor compression.
Langenhuizen et al. demonstrated the feasibility of ML models for predicting the long-term response of VS patients to stereotactic radiosurgery treatment by using radiomic tumor texture features (18). In our study, the texture of the tumor on MRI demonstrated the greatest effect on the outcomes of the facial nerve after craniotomy surgery. In some articles, there were other factors in imaging examination which were related with postoperative short- and long-term facial nerve function in VS patients, such as preoperative apparent diffusion coefficient and MR elastography (19-21). Our DT prediction model did not require MRI examinations via special MRI machines; moreover, it could utilize pictures of contrast-enhanced MR images obtained from other hospitals. This scenario is very practical, due to the fact that another MRI test does not need to be performed if the patient has previously undergone MRI.
IONM is necessary during VS surgery because the facial nerve has been compressed and deformed by the tumor and cannot be distinguished. In VS surgery, the monitoring of free-running EMG, facial MEPs, and direct nerve stimulation (DNS) supports the preservation of facial and vestibulocochlear function, which consequently preserves postoperative quality of life (22). The identification of the different IONM tests and the relationships between IONM and outcomes of the facial nerve still require further investigation, and monitoring by doctors is very important during VS surgery. Although some studies have investigated the relationships between the different characteristics of IONM and facial nerve function (23,24), it is still important for doctors to be responsible for IONM. In our study, there was no significant difference observed among monitoring physicians (Table 2).
With the assistance of IONM, neurosurgeons can distinguish facial nerves that have been moved and deformed by tumors during surgery. Although surgeons carefully perform these operations, a risk of damage to the facial nerve still exists. If postoperative facial nerve function can be predicted before surgery, different surgical plans can be developed to achieve better outcomes and meet patients’ expectations. In our study, the DT model could predict postoperative facial nerve function through preoperative MR images, and we believe that better performance can be achieved through more learning. Furthermore, this process can aid neurosurgeons in developing surgical plans and provide patients with more accurate expected outcomes.
Conclusions
VS is a benign tumor; however, damage to the facial nerve caused by the tumor or surgery cannot be ignored. Facial paralysis can cause a serious psychological burden to VS patients. Therefore, a thorough assessment of the patient’s facial nerve function must be performed before surgery, and the surgical plan may need to be adjusted as much as possible during the surgery to protect the facial nerve. In our study, we attempted to predict postoperative facial nerve function with preoperative MR images by utilizing AI. The results revealed that the performance of the DT model was better than that of the neurosurgeons. The DT model could help neurosurgeons to predict postoperative facial nerve function as a decision-support tool.
Our study has several limitations. Firstly, the data were obtained from only one institution, and more data from other institutions are needed to verify our prediction model. Secondly, data concerning the performances of neurosurgeons and monitoring doctors are limited; thus, more data from neurosurgeons and monitoring physicians are needed to determine their performance levels. Thirdly, we only used the facial nerve grade at three days after surgery. Although the data could not represent the final function, it could show the surgery level, which is the most important factor to affect the facial nerve function.
Acknowledgments
We would like to thank Wu Yu for collecting part of patient information.
Footnote
Reporting Checklist: The authors have completed the TRIPOD + AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1-2501/rc
Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1-2501/dss
Funding: This study was supported by
Conflicts of Interest: Both authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1-2501/coif). Both authors declared that this study was supported by the National Natural Science Foundation of China (No. 82473209). The authors have no other conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethical Committee of the Renmin Hospital of Wuhan University (No. WDRY2025-K147) and individual consent for this retrospective analysis was waived.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Halliday J, Rutherford SA, McCabe MG, Evans DG. An update on the diagnosis and treatment of vestibular schwannoma. Expert Rev Neurother 2018;18:29-39. [Crossref] [PubMed]
- Irving RM, Viani L, Hardy DG, Baguley DM, Moffat DA. Nervus intermedius function after vestibular schwannoma removal: clinical features and pathophysiological mechanisms. Laryngoscope 1995;105:809-13. [Crossref] [PubMed]
- Kunert P, Smolarek B, Marchel A. Facial nerve damage following surgery for cerebellopontine angle tumours. Prevention and comprehensive treatment. Neurol Neurochir Pol 2011;45:480-8. [Crossref] [PubMed]
- CHOROBSKI J. The syndrome of crocodile tears. AMA Arch Neurol Psychiatry 1951;65:299-318. [Crossref] [PubMed]
- Yaşargil MG. The internal acoustic meatus. J Neurosurg 2002;97:1014-5; discussion 1015-7. [Crossref] [PubMed]
- Acs B, Rantalainen M, Hartman J. Artificial intelligence as the next step towards precision pathology. J Intern Med 2020;288:62-81. [Crossref] [PubMed]
- Yu Y, Song G, Zhao Y, Liang J, Liu Q. Prediction of Vestibular Schwannoma Surgical Outcome Using Deep Neural Network. World Neurosurg 2023;176:e60-7. [Crossref] [PubMed]
- Wang J. Prediction of postoperative recovery in patients with acoustic neuroma using machine learning and SMOTE-ENN techniques. Math Biosci Eng 2022;19:10407-23. [Crossref] [PubMed]
- Przepiorka L, Kujawski S, Wójtowicz K, Maj E, Marchel A, Kunert P. Development and application of explainable artificial intelligence using machine learning classification for long-term facial nerve function after vestibular schwannoma surgery. J Neurooncol 2025;171:165-77. [Crossref] [PubMed]
- Samii M, Matthies C. Management of 1000 vestibular schwannomas (acoustic neuromas): the facial nerve--preservation and restitution of function. Neurosurgery 1997;40:684-94; discussion 694-5. [Crossref] [PubMed]
- Wang MY, Jia CG, Xu HQ, Xu CS, Li X, Wei W, Chen JC. Development and Validation of a Deep Learning Predictive Model Combining Clinical and Radiomic Features for Short-Term Postoperative Facial Nerve Function in Acoustic Neuroma Patients. Curr Med Sci 2023;43:336-43. [Crossref] [PubMed]
- Song G, Li K, Wang Z, Liu W, Xue Q, Liang J, Zhou Y, Geng H, Liu D. A fully automatic radiomics pipeline for postoperative facial nerve function prediction of vestibular schwannoma. Neuroscience 2025;574:124-37. [Crossref] [PubMed]
- Wiet RJ, Mamikoglu B, Odom L, Hoistad DL. Long-term results of the first 500 cases of acoustic neuroma surgery. Otolaryngol Head Neck Surg 2001;124:645-51. [Crossref] [PubMed]
- Samii M, Gerganov VM, Samii A. Functional outcome after complete surgical removal of giant vestibular schwannomas. J Neurosurg 2010;112:860-7. [Crossref] [PubMed]
- Seo JH, Jun BC, Jeon EJ, Chang KH. Predictive factors influencing facial nerve outcomes in surgery for small-sized vestibular schwannoma. Acta Otolaryngol 2013;133:722-7. [Crossref] [PubMed]
- Kassner A, Thornhill RE. Texture analysis: a review of neurologic MR imaging applications. AJNR Am J Neuroradiol 2010;31:809-16. [Crossref] [PubMed]
- Maani R, Yang YH, Kalra S. Voxel-based texture analysis of the brain. PLoS One 2015;10:e0117759. [Crossref] [PubMed]
- Langenhuizen PPJH, Zinger S, Leenstra S, Kunst HPM, Mulder JJS, Hanssens PEJ, de With PHN, Verheul JB. Radiomics-Based Prediction of Long-Term Treatment Response of Vestibular Schwannomas Following Stereotactic Radiosurgery. Otol Neurotol 2020;41:e1321-7. [Crossref] [PubMed]
- Freeman LM, Ung TH, Thompson JA, Ovard O, Olson M, Hirt L, Hosokawa P, Thaker A, Youssef AS. Refining the predictive value of preoperative apparent diffusion coefficient (ADC) by whole-tumor analysis for facial nerve outcomes in vestibular schwannomas. Acta Neurochir (Wien) 2024;166:168. [Crossref] [PubMed]
- Duhon BH, Thompson K, Fisher M, Kaul VF, Nguyen HT, Harris MS, Varadarajan V, Adunka OF, Prevedello DM, Kolipaka A, Ren Y. Tumor biomechanical stiffness by magnetic resonance elastography predicts surgical outcomes and identifies biomarkers in vestibular schwannoma and meningioma. Sci Rep 2024;14:14561. [Crossref] [PubMed]
- De Marco R, Morana G, Sgambetterra S, Penner F, Melcarne A, Garbossa D, Lanotte M, Albera R, Zenga F. Predicting the Consistency of Vestibular Schwannoma and Its Implication in the Retrosigmoid Approach: A Single-Center Analysis. Curr Oncol 2025;32:647. [Crossref] [PubMed]
- Stankovic P, Wittlinger J, Georgiew R, Dominas N, Hoch S, Wilhelm T. Continuous intraoperative neuromonitoring (cIONM) in head and neck surgery-a review. HNO 2020;68:86-92. [Crossref] [PubMed]
- Strauss C, Prell J, Rampp S, Romstöck J. Split facial nerve course in vestibular schwannomas. J Neurosurg 2006;105:698-705. [Crossref] [PubMed]
- Rampp S, Illert J, Krempler K, Strauss C, Prell J. A-train clusters and the intermedius nerve in vestibular schwannoma patients. Clin Neurophysiol 2019;130:722-6. [Crossref] [PubMed]

