Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting indexing of archives in the /files endpoint #484

Open
ml-evs opened this issue Sep 20, 2023 · 2 comments
Open

Supporting indexing of archives in the /files endpoint #484

ml-evs opened this issue Sep 20, 2023 · 2 comments
Labels
status/has-concrete-suggestion This issue has one or more concrete suggestions spelled out that can be brought up for consensus. type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus.

Comments

@ml-evs
Copy link
Member

ml-evs commented Sep 20, 2023

Currently our file entry type has a 1-to-1 mapping with a file on disk, however there are many cases where databases serve archive files that aggregate data on multiple structures. I would like to be able to run an index over the contents of an archive file and list them as separate files entries, e.g.,

{"id": "archive_1_bs_1", "attributes": {"url": "http://example.com/archive.tar.gz", "name": "bs_1.bands", "relpath": "bandstructures/bs_1.bands"}}
{"id": "archive_1_bs_2", "attributes": {"url": "http://example.com/archive.tar.gz", "name": "bs_2.bands", "relpath": "bandstructures/bs_2.bands"}}

where a client can be smart enough to only download the archive once. Each file can then have a relationship with the structure that the data pertains to.

relpath or relative_path is not part of the current spec, but I think it could be useful and easy to add. The archiving mechanism/compression should be handled by our current fields (e.g., "media_type": "application/tar+gzip" above) but we will lose some info on the size/type of the file at relpath after extracting (although the description seems to be the intended way to handle this anyway for files without defined mime-types).

If others agree this is useful then I am happy to concoct a PR.

@ml-evs ml-evs added type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus. status/has-concrete-suggestion This issue has one or more concrete suggestions spelled out that can be brought up for consensus. labels Sep 20, 2023
@ml-evs
Copy link
Member Author

ml-evs commented Sep 20, 2023

cc @eimrek and @unkcpz

@merkys
Copy link
Member

merkys commented Jun 9, 2024

Good idea. I wonder whether we can re-use JSON:API relationships to describe every archive member in a same way the archive itself would be described. For sure we would need to introduce a property for the relative path as name is supposed to hold only the basename.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/has-concrete-suggestion This issue has one or more concrete suggestions spelled out that can be brought up for consensus. type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus.
Projects
None yet
Development

No branches or pull requests

2 participants