MLSP_LCP_Baseline

An LCP baseline for the Multilingual Lexical Simplification Pipeline 2024 Shared Task modelled as a linear regression on log-frequency. The frequency baseline is trained using log-frequency (minimum value if the target consists of multiple tokens) on the trial set for each language. We use frequencies provided by the wordfreq package when possible. Additionally, since the package uses an incompatible tokenization for Japanese and does not provide any data for Sinahala, we use TUBELEX-JA for Japanese, and the word frequency list for Sinhala.

Reproducing the baseline

Note that the trained models and output of the baseline are already included in the repository. You can reproduce them by following the steps below.

Install the Git submodule for MLSP_Data, Word-Frequency-List-for-Sinhala and tubelex:

git submodule init && git submodule update
Install the requirements:

python -m pip install -r requirements.txt
Run the baseline (both training and prediction):

bash experiments.sh

Links

MLSP shared task web site
shared task data repository
cleaned gold test data on Hugging Face
LLM-based lexical simplification baseline

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
MLSP_Data @ e98e333		MLSP_Data @ e98e333
Word-Frequency-List-for-Sinhala @ 1283550		Word-Frequency-List-for-Sinhala @ 1283550
models		models
output		output
tubelex @ 52bee10		tubelex @ 52bee10
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
baseline.py		baseline.py
experiments.sh		experiments.sh
frequency_data.py		frequency_data.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLSP_LCP_Baseline

Reproducing the baseline

Links

About

Releases

Packages

Languages

License

MLSP2024/MLSP_LCP_Baseline

Folders and files

Latest commit

History

Repository files navigation

MLSP_LCP_Baseline

Reproducing the baseline

Links

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages