AN ENHANCED SWARM-BASED DECISION FRAMEWORK IN VISION TRANSFORMERS FOR LARGE-SCALE MULTIMEDIA STREAM PROCESSING ON CLOUD ENVIRONMENTS

ICTACT Journal on Image and Video Processing ( Volume: 16 , Issue: 1 )

Abstract

The exponential growth of multimedia content in cloud environments has created the need for advanced, real-time processing techniques. Traditional deep learning models, while powerful, often face bottlenecks in handling high-dimensional streaming data efficiently. Conventional vision transformer architectures exhibit computational overhead and delayed decision-making when processing large-scale multimedia streams in distributed cloud systems, impacting latency and accuracy. This study proposes an improvised swarm decision mechanism integrated into vision transformers (VT-SwarmNet) for efficient large-scale multimedia stream analysis. The approach combines swarm intelligence for dynamic token selection with transformer-based feature encoding. Data streams are pre-processed in the cloud using distributed computing, partitioned into manageable chunks, and processed in parallel. Swarm agents prioritize salient tokens, improving attention allocation and reducing redundant computations. Experiments conducted on a large-scale multimedia dataset in a simulated cloud environment demonstrated that VT SwarmNet achieved 12.4% higher accuracy, 18.7% lower latency, and 15.3% better F1-score compared to leading baseline methods. The integration of swarm-based decision-making reduced processing overhead while maintaining superior feature extraction.

Authors

S. Vimala1, D.K. Mohanty2, Karthikeyan Thangavel3
Prathyusha Engineering College, India1, Government B.Ed. Training College Kalinga, India2, University of Technology and Applied Sciences, The Sultanate of Oman3

Keywords

Vision Transformers, Swarm Intelligence, Multimedia Streaming, Cloud Computing, Deep Learning

Published By
ICTACT
Published In
ICTACT Journal on Image and Video Processing
( Volume: 16 , Issue: 1 )
Date of Publication
August 2025
Pages
3689 - 3695
Page Views
598
Full Text Views
16