Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add .en model support #4

Open
Marble879 opened this issue Jan 4, 2024 · 0 comments
Open

Add .en model support #4

Marble879 opened this issue Jan 4, 2024 · 0 comments
Labels
good first issue Good for newcomers

Comments

@Marble879
Copy link
Owner

Marble879 commented Jan 4, 2024

User story

As a user, I want to be able to use .en models, so that I can have a better transcription performance.

Acceptance criteria

  • The system should be able to download .en models if they do not already exist
  • The system should be able to utilize already downloaded .en models.

Development information

The model_handler.rs contains code responsible for downloading models based on their name.

The download of a model is as follows:

  1. Instantiate the model handler:
let m = model_handler::ModelHandler::new("tiny", "models/").await;
  1. The model handler then assigns the model name based on a hashmap:
const MODEL_MAP: phf::Map<&'static str, &'static str> = phf::phf_map! {
    "tiny" => "ggml-tiny",
    "base" => "ggml-base",
    "small" => "ggml-small",
    "medium" => "ggml-medium",
    "large" => "ggml-large",
};

impl ModelHandler {
    pub async fn new(model_name: &str, models_dir: &str) -> ModelHandler {
        let model_handler = ModelHandler {
            model_name: MODEL_MAP
                .get(&model_name.to_lowercase())
                .copied()
                .unwrap()
                .to_string(),
            models_dir: models_dir.to_string(),
        };
  1. The download function uses this name to download the model:
    async fn download_model(&self) -> Result<(), Box<dyn std::error::Error>> {
        if !self.is_model_existing() {
            self.setup_directory()?;
        }
        let base_url = "https://huggingface.co/ggerganov/whisper.cpp/resolve/main";
        let response = reqwest::get(format!("{}/{}.bin", base_url, &self.model_name)).await?;
        let mut file =
            std::fs::File::create(format!("{}/{}.bin", &self.models_dir, &self.model_name))?;
        let mut content = std::io::Cursor::new(response.bytes().await?);
        std::io::copy(&mut content, &mut file)?;
        Ok(())
    }

Potential solution

A possible solution would be to add the .en variant to the MODEL_MAP constant in the model_handler.rs file. As an example, if the user instantiates the ModelHandler with "tiny.en", a mapping should exist for: "tiny.en" => "ggml-tiny-en"

@Marble879 Marble879 added the good first issue Good for newcomers label Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant