Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Catalog] Verify object-store compatible paths #8891

Open
snazy opened this issue Jun 21, 2024 · 0 comments
Open

[Catalog] Verify object-store compatible paths #8891

snazy opened this issue Jun 21, 2024 · 0 comments
Labels
catalog Nessie Catalog / Iceberg REST

Comments

@snazy
Copy link
Member

snazy commented Jun 21, 2024

Depending on the actual use case, object stores can throttle accesses to "hot spots" (partitions), which are "identified" by the first characters of the object key (name/path).

One way around it is to introduce prefixes that distribute objects across multiple object store partitions (see Iceberg's impl for example).

Since the (default) Iceberg way is to construct the object-key as a concatenation of storage-location + hash + (context +) file, the part that distributes the data is placed after a "long-ish string" (namespaces + table-name), possibly eliminating the effect of the hash.

To work around the latter, users set the write.data.path table property to something like s3://bucket/. While this solves the hot-spot issue, it introduces problems for file-based access checks.

We might want to update the file-based access checks in the S3-signer and related code to "ignore" the "randomizer part", simply speaking: instead of doing a "simple" String.startsWith() check in o.p.catalog.service.rest.IcebergS3SignParams#verifyAndSign, we could leverage a regex - but this idea is not fully thought through though.

@adutra adutra added the catalog Nessie Catalog / Iceberg REST label Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
catalog Nessie Catalog / Iceberg REST
Projects
None yet
Development

No branches or pull requests

2 participants