Skip to content

Latest commit

 

History

History
131 lines (100 loc) · 4.17 KB

README.md

File metadata and controls

131 lines (100 loc) · 4.17 KB

ZipROFS

Build Status

ZipROFS is a FUSE file-system that acts as pass through to another FS except it expands zip files like folders and allows direct transparent access to the contents.

We created a branch of ZipROFS to adopt it for the needs of mass spectrometry software. Our mass spectrometry records are stored in ZIP files:

File tree with zip files on NAS server:
 ├── brukertimstof
 │   └── 202302
 │       ├── 20230209_hsapiens_Sample_001.d.Zip
 │       ├── 20230209_hsapiens_Sample_002.d.Zip
 │       └── 20230209_hsapiens_Sample_003.d.Zip

...

With the original version of ZipROFS we would see folders ending with .d.Zip. However, the software requires folders ending with .d like this:

Virtual file tree presented by ZipROFS:
 ├── brukertimstof
 │   └── 202302
 │       ├── 20230209_hsapiens_Sample_001.d
 │       │   ├── analysis.tdf
 │       │   └── analysis.tdf_bin
 │       ├── 20230209_hsapiens_Sample_002.d
 │       │   ├── analysis.tdf
 │       │   └── analysis.tdf_bin
 │       └── 20230209_hsapiens_Sample_003.d
 │           ├── analysis.tdf
 │           └── analysis.tdf_bin
 

A current problem is that computation is slowed down with ZipROFS compared to conventional file systems.

The reason lies within the closed source shared library timsdata.dll. Reading proprietary mass spectrometry files with this library creates a huge amount of file system requests. These many requests have to pass the user-space-kernel boundary. Another reason for reduced performance is that file reading is not sequential.

To solve the performance problem, we

  • Re-implement ZipROFS using the language C: ZIPsFS.

  • Catching calls to the file API using the LD_PRELOAD technique. Filtering the calls and implementing a cache for directory listings: cache_readdir_stat

Dependencies

  • FUSE
  • fusepy

Limitations

  • Read only
  • Nested zip files are not expanded, they are still just files

Example usage

To mount run ziprofs.py:

$ ./ziprofs.py ~/root ~/mount -o allowother,cachesize=2048

Example results:

$ tree root
root
├── folder
├── test.zip
└── text.txt

$ tree mount
mount
├── folder
├── test.zip
│   ├── folder
│   │   ├── emptyfile
│   │   └── subfolder
│   │       └── file.txt
│   ├── script.sh
│   └── text.txt
└── text.txt

You can later unmount it using:

$ fusermount -u ~/mount

Or:

$ umount ~/mount

Full help:

$ ./ziprofs.py -h
usage: ziprofs.py [-h] [-o options] [root] [mountpoint]

ZipROFS read only transparent zip filesystem.

positional arguments:
  root        filesystem root (default: None)
  mountpoint  filesystem mount point (default: None)

optional arguments:
  -h, --help  show this help message and exit
  -o options  comma separated list of options: foreground, debug, allowother, async, cachesize=N (default: {})

foreground and allowother options are passed to FUSE directly.

debug option is used to print all syscall details to stdout.

By default ZipROFS disables async reads to improve performance since async syscalls can be reordered in fuse which heavily impacts read speeds. If async reads are preferable, pass async option on mount.

cachesize option determines in memory zipfile cache size, defaults to 1000