Skip to content

lhbelfanti/ahbcc

Repository files navigation

Repository logo

Adverse Human Behaviors Corpus Creator

Repo size License Coverage


AHBCC: Adverse Human Behaviors Corpus Creator

Adverse Human Behaviors is a term created to encompass all types of human behaviors that affect one or more individuals in physical, psychological, or emotional ways.

There are four main categories:

  • Hate speech
  • Depression and/or suicidal attempt
  • Eating disorders
  • Illicit drug use

Application

This application serves as the orchestrator, utilizing a docker-compose.yml file to connect the other two applications with the database managed by AHBCC.

The primary objective is to gather information from X (formerly Twitter) using GoXCrap. Subsequently, each tweet is manually evaluated to determine if it discusses an Adverse Human Behavior using Binarizer. Finally, AHBCC is in charge of creating a balanced corpus from the retrieved and categorized tweets.

Endpoints

To allow GoXCrap to save the tweets into the database and then retrieve them using Binarizer, this application exposes different endpoints, encapsulating the access to the database in one place (this app).

Database

Tables: Entity Relationship Diagram

erDiagram
    tweets ||--o| tweets_quotes : ""
    tweets ||--|| search_criteria : ""
    tweets {
        INTEGER id PK
        TEXT hash
        TIMESTAMP posted_at
        BOOLEAN is_a_reply
        BOOLEAN has_text
        BOOLEAN has_images
        TEXT text_content
        TEXT[] images
        BOOLEAN has_quote
        INTEGER quote_id FK
        INTEGER search_criteria_id FK
    }
    tweets_quotes {
        INTEGER id PK
        BOOLEAN is_a_reply
        BOOLEAN has_text
        BOOLEAN has_images
        TEXT text_content
        TEXT[] images
    }
    search_criteria {
        INTEGER id PK
        TEXT name
        TEXT[] all_of_these_words
        TEXT this_exact_phrase
        TEXT[] any_of_these_words
        TEXT[] none_of_these_words
        TEXT[] these_hashtags
        TEXT language
        DATE since_date
        DATE until_date
    }
    users {
        INTEGER id PK
        TEXT name
    }
    categorized_tweets ||--|{ tweets : ""
    categorized_tweets ||--|{ users : ""
    categorized_tweets {
        INTEGER id PK
        INTEGER tweet_id FK
        INTEGER user_id FK
        BOOLEAN adverse_behavior
    }
Loading

Necessary files to start the database

To connect to the database we need to define a .env file in the root of the project. It should contain the following environment variables

DB_NAME=<Database name>
DB_USER=<Database username>
DB_PASS=<Database password>

Replace the < ... > by the correct value. For example: DB_NAME=<Database name> --> DB_NAME=ahbcc.