Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a user, I want to skip upload of files that are already in the Registry #99

Open
jordanpadams opened this issue Apr 26, 2024 · 5 comments
Assignees
Labels

Comments

@jordanpadams
Copy link
Member

jordanpadams commented Apr 26, 2024

Checked for duplicates

Yes - I've already checked

πŸ§‘β€πŸ”¬ User Persona(s)

Node Operator

πŸ’ͺ Motivation

...so that I do not try to reload the data

πŸ“– Additional Details

No response

Acceptance Criteria

Given
When I perform
Then I expect

βš™οΈ Engineering Details

The easiest way to do this would be search the registry either for the file path OR by checksum OR both? We could do this with the LID/LIDVID but I think that will add some significant overhead.

Do we want to figure out some sort of auto-generated UUID for every file we upload to the cloud and add this as metadata? Maybe this is something we could actually store then in the Nucleus database and eventually in the registry. It could link throughout the whole system, agnostic of the LIDVID for the products themselves.

@tloubrieu-jpl
Copy link
Member

from the breakout meeting today:
The API need to provide a simple end-point for DUM to retrieve files, critical information to provide is:

file name/path
discipline node
md5sum

We are not sure yet what the best key should provided by the API, either:

option1: node + file path β†’ returns md5sum, lidvid
option2: md5sum β†’ returns node, file path and lidvid

With option 1, a new end-point for the API could be:

/files/{node}/{file_path} which would return the {md5sum}Β 

The issue is that the file_path is not always the same, on the staging bucket or where the file is archived eventually.Β 

@ramesh-maddegoda, @viviant100 could you investigate how the path on the archive bucket (ODR) is being created from the path in the staging bucket ?

@tloubrieu-jpl
Copy link
Member

tloubrieu-jpl commented May 21, 2024

As discussed today, I will create a ticket to have an end-point in the api:
/files/{md5sum} would return 200 or 404.

We'll make that part of the Registry API.

@tloubrieu-jpl
Copy link
Member

@collinss-jpl can you validate that what is above works for you ?

@collinss-jpl
Copy link
Contributor

@tloubrieu-jpl Yes I think that would work. Does the Registry API use API Gateway though? Will the DUM client need to provide an authentication token with the request to the new endpoint?

@jordanpadams
Copy link
Member Author

@collinss-jpl @tloubrieu-jpl just want to check on a status for this? has this been implemented and can it be tested at least locally?

For DUM, we should just make sure we throw a warning when the API is down, but then keep going through the processing so we aren't blocking the workflow when we have system downtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Release Backlog
Status: ToDo
Development

No branches or pull requests

3 participants