Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

number of files/file descriptors #2

Open
blp opened this issue Nov 21, 2023 · 7 comments
Open

number of files/file descriptors #2

blp opened this issue Nov 21, 2023 · 7 comments

Comments

@blp
Copy link
Member

blp commented Nov 21, 2023

At least a naive reading of Data Format suggests that, in this design, there will be one or more files per OrderedLayer/ColumnLayer. Since a Spine can contain an arbitrary number of these, we need to be careful about proliferating the number of them unless we want to be in the business of not just buffer caching but fd caching also. I believe that Linux has a hard limit of 65536 fds per process.

@blp
Copy link
Member Author

blp commented Nov 21, 2023

This might suggest that we should consider a container format that is essentially an embedded filesystem. But it's probably better if we don't have to.

@gz
Copy link
Contributor

gz commented Nov 21, 2023

Rocksdb has this problem too, fwiw they have a setting to set the max open files (and assume resort to closing/opening files once they go beyond this limit).

I'll add it to the doc

@blp
Copy link
Member Author

blp commented Nov 21, 2023

FWIW, older Microsoft Office files used the MS "compound file binary format" that contained an embedded FAT file system.

@mihaibudiu
Copy link

Another problem with multiple files is that sequential access in multiple files != sequential access on disk
Perhaps for SSDs this does not matter as much.

@gz
Copy link
Contributor

gz commented Nov 21, 2023

the assumption for writes is that the OS block allocator is good enough to be able to allocate consecutive blocks (if possible) for writing the batches

this is a nice thing about spdk which gives you more fine grained control over this in the blobfs library.

fwiw rocksdb has a similar design where the LSM tree has to read from potentially multiple files

@gz
Copy link
Contributor

gz commented Nov 22, 2023

I forgot to mention, one plus of the many files approach is that once the file is written it is immutable and no longer changes. I heard two times (e.g., talk on Paimon and some talk from RockSet) how this is a nice property for distributed storage as you can move some of these files to cold storage (or send them around etc.) whenever necessary and you don't have to worry about them being modified concurrently by the pipeline.

@blp
Copy link
Member Author

blp commented Nov 22, 2023

Since we're dealing with immutable data, we might find some value in the ability to refer to an object by its hash, e.g. name a batch by a hash of its data. I don't have a clear idea of how this is valuable yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants