ACTIVE DEEP ENSEMBLE LEARNING FRAMEWORK FOR AUTOMATED SURGICAL VIDEO SEGMENTATION AND EFFICIENT ANNOTATION IN MINIMALLY INVASIVE PROCEDURES

ICTACT Journal on Image and Video Processing ( Volume: 16 , Issue: 3 )

Abstract

Surgical video analysis has become an essential component in computer-assisted interventions and clinical documentation. The rapid growth of minimally invasive surgery has produced large volumes of surgical recordings that require detailed frame-level annotations for training intelligent systems. Manual annotation of surgical videos remains a labor-intensive and time-consuming process that often requires expert knowledge. As a result, the development of automated annotation systems has become a critical research direction in medical image analysis. Existing segmentation and annotation approaches have faced limitations in handling complex surgical scenes, instrument occlusions, illumination variations, and tissue deformation. Conventional deep learning models often rely on large labelled datasets, whereas surgical datasets usually remain limited due to the difficulty of manual labeling. This challenge has reduced the reliability and scalability of automated surgical video segmentation systems. To address these issues, this study has proposed an Active Deep Ensemble Segmentation Network (ADES-Net) for automated surgical video segmentation annotation. The framework has integrated an ensemble of convolutional segmentation models with an active learning strategy that has selectively identified informative frames for annotation. The ensemble architecture has combined multiple deep segmentation networks that have captured diverse spatial representations from surgical frames. An uncertainty-driven active sampling mechanism has prioritized frames that required expert labeling, which has reduced redundant annotations. Feature representations that were extracted from each model have contributed to robust segmentation predictions, while iterative learning cycles have refined the annotation quality. The experimental evaluation demonstrates that the proposed ADES-Net framework achieves superior segmentation performance across multiple metrics. The model achieves a Dice similarity coefficient of 0.93, an IoU of 0.86, precision of 0.93, recall of 0.91, and an F1 score of 0.92 when trained with twenty-five annotated frames. These results indicate that the active ensemble mechanism effectively captures spatial and contextual features, reduces false positives, and improves boundary delineation. Compared with baseline methods such as U-Net, Attention U-Net, and DeepLabV3+, the proposed framework achieves improvements of 5–10% across all metrics, demonstrating enhanced segmentation reliability, efficiency, and robustness in automated surgical video annotation tasks.

Authors

Mariam Safar Mohammed Alshahrani1, M.K. Jayanthi Kannan2
Digital Government Authority of KSA, Kingdom of Saudi Arabia1, VIT Bhopal University, India2

Keywords

Surgical Video Analysis, Deep Ensemble Learning, Active Learning, Automated Annotation, Medical Image Segmentation

Published By
ICTACT
Published In
ICTACT Journal on Image and Video Processing
( Volume: 16 , Issue: 3 )
Date of Publication
February 2026
Pages
3821 - 3829
Page Views
33
Full Text Views
2