Abstract
Surgical video analysis has become an essential component in
computer-assisted interventions and clinical documentation. The rapid
growth of minimally invasive surgery has produced large volumes of
surgical recordings that require detailed frame-level annotations for
training intelligent systems. Manual annotation of surgical videos
remains a labor-intensive and time-consuming process that often
requires expert knowledge. As a result, the development of automated
annotation systems has become a critical research direction in medical
image analysis. Existing segmentation and annotation approaches
have faced limitations in handling complex surgical scenes, instrument
occlusions, illumination variations, and tissue deformation.
Conventional deep learning models often rely on large labelled
datasets, whereas surgical datasets usually remain limited due to the
difficulty of manual labeling. This challenge has reduced the reliability
and scalability of automated surgical video segmentation systems. To
address these issues, this study has proposed an Active Deep Ensemble
Segmentation Network (ADES-Net) for automated surgical video
segmentation annotation. The framework has integrated an ensemble
of convolutional segmentation models with an active learning strategy
that has selectively identified informative frames for annotation. The
ensemble architecture has combined multiple deep segmentation
networks that have captured diverse spatial representations from
surgical frames. An uncertainty-driven active sampling mechanism has
prioritized frames that required expert labeling, which has reduced
redundant annotations. Feature representations that were extracted
from each model have contributed to robust segmentation predictions,
while iterative learning cycles have refined the annotation quality. The
experimental evaluation demonstrates that the proposed ADES-Net
framework achieves superior segmentation performance across
multiple metrics. The model achieves a Dice similarity coefficient of
0.93, an IoU of 0.86, precision of 0.93, recall of 0.91, and an F1 score
of 0.92 when trained with twenty-five annotated frames. These results
indicate that the active ensemble mechanism effectively captures
spatial and contextual features, reduces false positives, and improves
boundary delineation. Compared with baseline methods such as U-Net,
Attention U-Net, and DeepLabV3+, the proposed framework achieves
improvements of 5–10% across all metrics, demonstrating enhanced
segmentation reliability, efficiency, and robustness in automated
surgical video annotation tasks.
Authors
Mariam Safar Mohammed Alshahrani1, M.K. Jayanthi Kannan2
Digital Government Authority of KSA, Kingdom of Saudi Arabia1, VIT Bhopal University, India2
Keywords
Surgical Video Analysis, Deep Ensemble Learning, Active Learning, Automated Annotation, Medical Image Segmentation