Skip to content

emorynlp/GenderBiasMLM

 
 

Repository files navigation

What is your favorite gender MLM?: Gender Bias Evaluation in Multilingual Masked Language Models

This GitHub page consists of the dataset and implementation of our paper "What is your favorite gender MLM?: Gender Bias Evaluation in Multilingual Masked Language Models." Our work distinguishes itself from other works through its unique features and characteristics such as:

Strengths

  • It provides a multi-lingual gender lexicon in English, German, Spanish, Portuguese, and Chinese.
  • It evaluates the gender bias of language models on any corpus in these five languages.
  • The evaluation corpus and the language model can be easily altered to assess gender bias.

Guideline

  1. Multilingual Gender Lexicon
  • MGL in five languages, English, German, Spanish, Portuguese, and Chinese are within "eval_words" folder in the repository.
  • Encoded as a pickle file, each file is classified with respect to gender and language.
  • In generating the pairs of sentences for evaluating gender bias of language models, each file is required as input.
  1. Lexicon_based and Model_based Sentence Extraction
  • Given this MGL from eval_words folder, lexicon_based and model_based sentence extraction is conducted through "extract.py" file.
  • Within this file, one can change the evaluation corpus by modifying the arguments of this Python file.
  • The required arguments to pass are the language of the corpus(model), the male gender lexicon, the female gender lexicon, and the corpus.
  • This file first tokenizes the corpus, extracts the sentences containing the gendered word, generates the sentences, and writes the sentences in pickle format.
  • One can also use Jupyter Notebook to make the sentence that is shown in "extraction_chn.ipynb" file.
  • An illustration of how this pipeline works is shown in the main function of "extract.py" file.
  1. Multilingual Bias Evaluation Metrics
  • Using the sentences, Strict Bias Metrics that quantify gender bias of language models can be evaluated in "MBE_Calculation.ipynb" file.
  • With the size of our corpus being approximately 30,000 sentences for each language, our evaluation for each language took less than 10 minutes for each language.

Contact

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 92.5%
  • Python 7.5%