Skip to content
@hplt-project

HPLT - High Performance Language Technologies

A space that combines petabytes of natural language data with large-scale model training

Pinned Loading

  1. OpusCleaner OpusCleaner Public

    OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.

    Python 45 13

  2. OpusTrainer OpusTrainer Public

    Curriculum training

    Python 15 5

Repositories

Showing 10 of 19 repositories
  • OpusTrainer Public

    Curriculum training

    hplt-project/OpusTrainer’s past year of commit activity
    Python 15 MIT 5 18 0 Updated Sep 14, 2024
  • warc2text-runner Public

    Scripts for parallelized extraction of plain texts from WARC archieves. Aiming at common and reproducible extraction approach.

    hplt-project/warc2text-runner’s past year of commit activity
    HTML 3 0 5 0 Updated Sep 8, 2024
  • OpusCleaner Public

    OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.

    hplt-project/OpusCleaner’s past year of commit activity
  • data-analytics-tool Public

    Data Analytics Tool

    hplt-project/data-analytics-tool’s past year of commit activity
    JavaScript 7 1 0 0 Updated Sep 7, 2024
  • monotextor-slurm Public

    Set of scripts to run monotextor-like pipeline under slurm HPCs

    hplt-project/monotextor-slurm’s past year of commit activity
    Rust 2 GPL-3.0 0 0 0 Updated Sep 5, 2024
  • OpusPocus Public

    Marian machine translation training pipeline for thousands of models

    hplt-project/OpusPocus’s past year of commit activity
    Python 2 0 22 (4 issues need help) 2 Updated Sep 2, 2024
  • hplt-project/cc-download’s past year of commit activity
    Shell 0 0 0 0 Updated Aug 7, 2024
  • ia-download Public

    Internet archive downloader

    hplt-project/ia-download’s past year of commit activity
    Jupyter Notebook 2 0 1 0 Updated Aug 7, 2024
  • HPLT-WP4 Public

    Information and pipelines on WP4: language models training

    hplt-project/HPLT-WP4’s past year of commit activity
    Python 1 CC0-1.0 1 0 0 Updated Jul 11, 2024
  • sacremoses Public

    Python port of Moses tokenizer, truecaser and normalizer

    hplt-project/sacremoses’s past year of commit activity
    Python 486 MIT 59 26 (2 issues need help) 5 Updated May 26, 2024

Top languages

Loading…

Most used topics

Loading…