Adverse Human Behaviors Corpus Creator
Adverse Human Behaviors is a term created to encompass all types of human behaviors that affect one or more individuals in physical, psychological, or emotional ways.
There are four main categories:
- Hate speech
- Depression and/or suicidal attempt
- Eating disorders
- Illicit drug use
This application serves as the orchestrator, utilizing a docker-compose.yml file to connect the other two applications with the database managed by AHBCC.
The primary objective is to gather information from X (formerly Twitter) using GoXCrap. Subsequently, each tweet is manually evaluated to determine if it discusses an Adverse Human Behavior using Binarizer. Finally, AHBCC is in charge of creating a balanced corpus from the retrieved and categorized tweets.
To allow GoXCrap to save the tweets into the database and then retrieve them using Binarizer, this application exposes different endpoints, encapsulating the access to the database in one place (this app).
Tables: Entity Relationship Diagram
erDiagram
tweets ||--o| tweets_quotes : ""
tweets ||--|| search_criteria : ""
tweets {
INTEGER id PK
TEXT hash
TIMESTAMP posted_at
BOOLEAN is_a_reply
BOOLEAN has_text
BOOLEAN has_images
TEXT text_content
TEXT[] images
BOOLEAN has_quote
INTEGER quote_id FK
INTEGER search_criteria_id FK
}
tweets_quotes {
INTEGER id PK
BOOLEAN is_a_reply
BOOLEAN has_text
BOOLEAN has_images
TEXT text_content
TEXT[] images
}
search_criteria {
INTEGER id PK
TEXT name
TEXT[] all_of_these_words
TEXT this_exact_phrase
TEXT[] any_of_these_words
TEXT[] none_of_these_words
TEXT[] these_hashtags
TEXT language
DATE since_date
DATE until_date
}
users {
INTEGER id PK
TEXT name
}
categorized_tweets ||--|{ tweets : ""
categorized_tweets ||--|{ users : ""
categorized_tweets {
INTEGER id PK
INTEGER tweet_id FK
INTEGER user_id FK
BOOLEAN adverse_behavior
}
To connect to the database we need to define a .env
file in the root of the project. It should contain the following environment variables
DB_NAME=<Database name>
DB_USER=<Database username>
DB_PASS=<Database password>
Replace the < ... >
by the correct value. For example: DB_NAME=<Database name>
--> DB_NAME=ahbcc
.