Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save snapshots to Azure Blob storage #401

Open
namtab-ma-i opened this issue Jun 5, 2024 · 3 comments
Open

Save snapshots to Azure Blob storage #401

namtab-ma-i opened this issue Jun 5, 2024 · 3 comments

Comments

@namtab-ma-i
Copy link

I use in-memory DuckDB and my sources are all located in Azure Blob Storage. I set models to external materialize to the same storage and it works without issues. However, it seems that I can't use an external materialize option for snapshots.

Is there any way to do this? Or is there a way to set the .duckdb path to be in the blob storage?

@namtab-ma-i
Copy link
Author

namtab-ma-i commented Jun 5, 2024

External DuckDB code and error:

profiles.yml

  outputs:
    dev:
      type: duckdb
      attach:
        - path: abfs://<my_container>/dev.duckdb
          alias: snapshot_db
      extensions:
        - azure
        - parquet
      filesystems:
        - fs: abfs
          anon: false
          account_name: "{{ env_var('ADLS_STORAGE_ACCOUNT') }}"
          account_key: "{{ env_var('ADLS_STORAGE_ACCOUNT_KEY') }}"
        ```

This results in 

Encountered an error:
Runtime Error
NotImplementedError: File mode not supported

At:
/IdeaProjects/ray-inference/.venv/lib/python3.9/site-packages/adlfs/spec.py(1957): _ _ init_ _
/IdeaProjects/ray-inference/.venv/lib/python3.9/site-packages/adlfs/spec.py(1833): _open
/IdeaProjects/ray-inference/.venv/lib/python3.9/site-packages/fsspec/spec.py(1298): open

        

@jwills
Copy link
Collaborator

jwills commented Jun 5, 2024

Yeah snapshots require table mutations (i.e., UPDATE statements), which aren't supported for external materializations. Blob-store based .duckdb files are supported, but alas, they are read-only (the TL;DR here is that both of these features require the ability to do random writes in DuckDB, which is the only thing blob stores are not designed to do.)

@jwills
Copy link
Collaborator

jwills commented Jun 5, 2024

If you can copy the .duckdb file to a local FS, you can do all of these things to it and then write the resulting .duckdb file back out to blob storage once you're done, which is the best option I have for you at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants