Pattern Mining From Unlabeled News Article Dataset Using Semi-Supervised Learning
Abstract
Text classification is one of the prominent tasks in the field of Natural language Processing as day by day the amount of textual data is growing rapidly, Therefore it is an emergent demand to build some kind of knowledge model out of this growing data to extract the internal information out of the data samples due to the limitation of memory and computational power. One such example of this kind of rapidly growing data is the news articles produced daily by the vast amount of news publication platforms. Therefore in this work, we would like to introduce an automated approach to detect target events from these textual news articles and the type of events that are related to violent incidents where multi events labels will be detected within the news articles and extract several types of information from the news articles. To ensure the authenticity of the classification and event pattern analysis we adopted a semi-supervised approach where a small volume data predictive model is used to amplify the dataset eventually gathering enough data to feed it into the Deep net. To categorize the type of events are been detected are Murder, Rape, Kidnap, Clash, Suicide, and Teen Suicide where we experimented with multiple feature extraction techniques like N-gram, TF-IDF, Word-2-Vec, Fasttext, and BERT from which the BERT-based classifier achieved the highest accuracy of 92% in the provided test set. Utilizing the best-performing model we conducted Trend and Pattern analysis experiment on a five-year period of time series data which reveals some exciting insight information that is related to or affected by these violent events concerning various units of time.
Collections
- M.Sc Thesis/Project [151]