Skip to content

This repository includes files of work for Suicidal Ideation detection based on social media dataset using semantic, contextual and graph neural network based hybrid approach

Notifications You must be signed in to change notification settings

Jabeen211/Suicidal-Ideation-Detection-based-on-Social-Media-Dataset-using-Semantic-Contextual-and-Graph-Neura

Repository files navigation

Suicidal Ideation Detection based on Social Media Dataset using Semantic, Contextual and Graph Neural Network based Hybrid Approach

This project aims to develop a system that can detect suicidal ideation (SI) from Facebook,Twitter and Reddit using Natural Language Processing (NLP) and Deep Learning (DL) models including Long Short Term Memory (LSTM) and Graph Neural Network (GNN). We develop two pipelines. One is LSI based where the LSI topic modeling is peforformed on the data then the output of LSI is embedded with word2vec. The original data is also embedded with Bidirectional Encoder Representation of Transformer (BERT). The concatenated embeddings from word2vec and BERT is used as input in LSTM to detect SI. In another pipeline, we incorporate the power of lexical features and cutting edge technique for constructing a lexical psycholinguistic knowledge-guided graph neural network based model for SI detection. We employ LIWC to extract psycholinguistic features from the collected and pre-processed text data. The LIWC features are used to create graph using k-nearest neighbour. Later, we apply graph neural network on the graph for SI detection. The system aims to identify individuals who may be at risk of suicide and contribute to suicide prevention and suicide preventional policy making approaches.

Dataset

We collect a total of 785 posts where 386, 321 and 78 posts are from reddit, Facebook and Twitter, respectively. We scrawl and scrap data from those platforms with search keywords ”Suicide”, ”suicidal”, ”self injury”, ”self harm”, and many more related words. . The collected data is annotated as ’YES’ and ’NO’ for suicidal and non suicidal labels, respectively by one behavioural scientist. Thus, 405 posts are annotated as ’YES’ and 380 posts are annotated as ’NO’.

Data Pre-processing

We carefully clean the textual data before executing them into SI detection task since the data can be noisy. We pre-process the data for both approaches. Data pre-processing steps include removing irrelevant characters, stemming and lemmatization and stop words removal etc. Nonsensical characters are not recognizable. to the machine learning models which make the text noisy. It must be deleted from text to ease the classification task. Emojis, URLs, punctuation, white space, numerals, and user references are deducted from the text using regular expressions. We apply porter stemmer and wordnet lemmatizer of nltk to perform stemming and lemmatization to improve text categorization accuracy. Unimportant and frequently occurring words which has little or no grammatical responsibility for text classification is identified as stop words. We use nltk stop words corpus to eliminate stop words to concentrate more on the relevant information.

Feature Extraction

In pipeline 1, We incorporate the power of word semantics (LSI) and preserving long text (LSTM) and produce an integrated LSI-LSTM model for SI detection. We employ TF-IDF for converting the text data into vectors. Before performing LSI, it is important to ensure term document matrix to be filled with important words by TF-IDF vectors. The TF-IDF vectors are passed through LSI for topic modeling. The output from LSI are embedded with word2vec. The original text data are embedded with BERT embedding. In pipeline 2, We employ LIWC to extract psycholinguistic features from the collected and pre-processed text data.

Deep Learning Models

LSI-LSTM Model :

The concatenated embeddings from word2vec and BERT enter into the LSTM model as input for SI detection. By incorporating BERT, we include the power of contextual word embedding through the pre-trained language model.

GNN Model :

The LIWC features are used to create graph using k-nearest neighbour. Graph neural network is applied on the graph for SI detection.

About

This repository includes files of work for Suicidal Ideation detection based on social media dataset using semantic, contextual and graph neural network based hybrid approach

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published