AN IMPROVED TOPIC MODELLING FRAMEWORK FOR DISCOVERING DOMINANT PCOS SYMPTOM FROM REDDIT POSTS

ICTACT Journal on Data Science and Machine Learning ( Volume: 7 , Issue: 1 )

Abstract

Nowadays social media plays a vital role in health care applications. A disorder known as PolyCystic Ovarian Syndrome (PCOS) affects females who are capable of reproducing between the ages of 15 and 35. The symptoms of PCOS are hormonal issues, irregular periods, weight gain, ovaries follicles, infertility, excessive hair growth in skin, hair loss, acne, pimples and dark scars in skin and depression. The main scope of this proposed work is to discover the dominant PCOS symptom based on current symptoms given by the Reddit users. The collected unstructured data from Reddit users are pre-processed and PCOS symptoms are extracted using Bag of Words and TF-IDF. A novel and improved topic modelling methods called Symptom Segmentation and Grouping (SSG) of Probabilistic Latent Semantic Analysis, Latent Dirichlet Allocation and BERTopic is designed to reduce the dimensionality of the features and map the sub symptoms of social media users into the head symptoms of Gynecologists. Finally, using maximum likelihood probabilities of these algorithms, the dominant head and sub symptoms are identified within the less time compared to traditional algorithms. Periods issues achieved the highest probabilities and dominant symptom with the value of 0.706 rather than other symptoms.

Authors

Santhi Selvaraj, Selva Nidhyananthan Sundaradhas, Umakanth Nagendran
Mepco Schlenk Engineering College, India

Keywords

PCOS, Bag of Words, TF-IDF, Social Media, LDA, LSA, BERT

Published By
ICTACT
Published In
ICTACT Journal on Data Science and Machine Learning
( Volume: 7 , Issue: 1 )
Date of Publication
December 2025
Pages
930 - 937
Page Views
25
Full Text Views