number of files/file descriptors #2

blp · 2023-11-21T21:51:55Z

At least a naive reading of Data Format suggests that, in this design, there will be one or more files per OrderedLayer/ColumnLayer. Since a Spine can contain an arbitrary number of these, we need to be careful about proliferating the number of them unless we want to be in the business of not just buffer caching but fd caching also. I believe that Linux has a hard limit of 65536 fds per process.

blp · 2023-11-21T21:52:53Z

This might suggest that we should consider a container format that is essentially an embedded filesystem. But it's probably better if we don't have to.

gz · 2023-11-21T22:18:27Z

Rocksdb has this problem too, fwiw they have a setting to set the max open files (and assume resort to closing/opening files once they go beyond this limit).

I'll add it to the doc

blp · 2023-11-21T22:31:14Z

FWIW, older Microsoft Office files used the MS "compound file binary format" that contained an embedded FAT file system.

mihaibudiu · 2023-11-21T23:26:53Z

Another problem with multiple files is that sequential access in multiple files != sequential access on disk
Perhaps for SSDs this does not matter as much.

gz · 2023-11-21T23:35:25Z

the assumption for writes is that the OS block allocator is good enough to be able to allocate consecutive blocks (if possible) for writing the batches

this is a nice thing about spdk which gives you more fine grained control over this in the blobfs library.

fwiw rocksdb has a similar design where the LSM tree has to read from potentially multiple files

gz · 2023-11-22T01:11:42Z

I forgot to mention, one plus of the many files approach is that once the file is written it is immutable and no longer changes. I heard two times (e.g., talk on Paimon and some talk from RockSet) how this is a nice property for distributed storage as you can move some of these files to cold storage (or send them around etc.) whenever necessary and you don't have to worry about them being modified concurrently by the pipeline.

blp · 2023-11-22T17:36:10Z

Since we're dealing with immutable data, we might find some value in the ability to refer to an object by its hash, e.g. name a batch by a hash of its data. I don't have a clear idea of how this is valuable yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

number of files/file descriptors #2

number of files/file descriptors #2

blp commented Nov 21, 2023

blp commented Nov 21, 2023

gz commented Nov 21, 2023

blp commented Nov 21, 2023

mihaibudiu commented Nov 21, 2023

gz commented Nov 21, 2023 •

edited

Loading

gz commented Nov 22, 2023 •

edited

Loading

blp commented Nov 22, 2023

number of files/file descriptors #2

number of files/file descriptors #2

Comments

blp commented Nov 21, 2023

blp commented Nov 21, 2023

gz commented Nov 21, 2023

blp commented Nov 21, 2023

mihaibudiu commented Nov 21, 2023

gz commented Nov 21, 2023 • edited Loading

gz commented Nov 22, 2023 • edited Loading

blp commented Nov 22, 2023

gz commented Nov 21, 2023 •

edited

Loading

gz commented Nov 22, 2023 •

edited

Loading