Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong structs dirent, dirent64 #2669

Open
safinaskar opened this issue Feb 4, 2022 · 16 comments
Open

Wrong structs dirent, dirent64 #2669

safinaskar opened this issue Feb 4, 2022 · 16 comments
Labels
C-bug Category: bug E-medium E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate.
Milestone

Comments

@safinaskar
Copy link

Your structs dirent, dirent64 are wrong at x86_64-unknown-linux-gnu. Their field d_name contains 256 bytes. But names can be bigger than 256 bytes in NTFS (in Linux). I was able to create NTFS partition in my Linux and then create file with name "ыыыы..." (250 times "ы", this is Cyrillic letter). Such name contains more than 256 bytes (if you have difficulties with reproducing, I can provide more details).

So, fix structs. I am not sure how exactly fix should be implemented. Maybe [c_char; 512], but in this case we need to double check what is true upper limit. Maybe we can simply put [c_char; 1] or [c_char; 0]. Of course, it would be perfect to make dirent unsized, but with machine word sized reference. Unfortunately, as well as I know, such unsized structs are not possible in stable Rust

@safinaskar safinaskar added the C-bug Category: bug label Feb 4, 2022
@devnexen
Copy link
Contributor

devnexen commented Feb 5, 2022

I m sorry to say but in fact dirent* struct are in fact correct, I invite you to look at the man pages/headers for confirmation. Note that libc is not a "smart wrapper" neither but reflects what the system libc are.
What seems more appropriate in your case is the getdents api usage.

@safinaskar
Copy link
Author

safinaskar commented Feb 5, 2022

@devnexen consider this code:

// This code for x86_64-unknown-linux-gnu
use std::ffi::CStr;
fn main() {
    unsafe {
        let dir = libc::opendir(b"/mnt/3\x00" as *const u8 as *const i8);
        assert_ne!(dir, std::ptr::null_mut());
        loop {
            let ent: *mut libc::dirent = libc::readdir(dir);
            if ent == std::ptr::null_mut() {
                break;
            }
            let name = CStr::from_ptr((*ent).d_name.as_ptr());
            println!("{}", name.to_bytes().len());
            println!("{:?}", name);
        }
    }
}

This code correctly handles file names bigger than 256 bytes. Does it invoke undefined behavior? I. e. is reading past formal struct size undefined behavior?

@safinaskar
Copy link
Author

getdents

getdents is not wrapped by libc crate, nor by glibc. So I will have to manually wrap it, this is error-prone

@devnexen
Copy link
Contributor

devnexen commented Feb 5, 2022

you can call it via syscall (the syscalls ids are present in libc) and map the linux_dirent* I think.

@safinaskar
Copy link
Author

I still think that we should research whether accessing dirent past its formal size is undefined behavior (possibly by reading https://rust-lang.github.io/unsafe-code-guidelines ). If yes, then we should fix or remove readdir function from libc, because it is impossible to use it correctly

@devnexen
Copy link
Contributor

devnexen commented Feb 5, 2022

or you can copy the result into a sized buffer. I disagree removing readdir honestly.

@safinaskar
Copy link
Author

or you can copy the result into a sized buffer

You mean read result from dirent to sized buffer? Okey, but is this reading undefined behavior?

@devnexen
Copy link
Contributor

devnexen commented Feb 5, 2022

Note that unsafe really means what it means, inside this block, rust safety and guarantees do not apply, it is the responsibility of code user to handle carefully possible side effects, use after free, stack overflow ... just like with language like C.

If anyone wants to chime in the discussion feel free :-)

@digama0
Copy link

digama0 commented Jun 4, 2023

@devnexen consider this code:

// This code for x86_64-unknown-linux-gnu
use std::ffi::CStr;
fn main() {
    unsafe {
        let dir = libc::opendir(b"/mnt/3\x00" as *const u8 as *const i8);
        assert_ne!(dir, std::ptr::null_mut());
        loop {
            let ent: *mut libc::dirent = libc::readdir(dir);
            if ent == std::ptr::null_mut() {
                break;
            }
            let name = CStr::from_ptr((*ent).d_name.as_ptr());
            println!("{}", name.to_bytes().len());
            println!("{:?}", name);
        }
    }
}

This code correctly handles file names bigger than 256 bytes. Does it invoke undefined behavior? I. e. is reading past formal struct size undefined behavior?

I don't have all the context to answer this question definitively, but assuming that readdir actually returned a contiguous allocation with a null-terminated d_name field of longer than 256 bytes, your code as written is suspicious and possibly UB (whether it is actually UB is rust-lang/unsafe-code-guidelines#134) due to creating a reference at (*ent).d_name.as_ptr(), but you can rewrite it to be definitely not UB by using addr_of!((*ent).d_name) instead.

@safinaskar
Copy link
Author

@digama0 , thanks a lot for answer. This is very big gotcha. This means lib::dirent should be fixed or at very least properly documented

@safinaskar
Copy link
Author

I just found that actual std implementation of ReadDir is even more careful. The authors reject addr_of!((*ent).d_name) as wrong and instead perform offset computations manually. See code starting from this line: https://github.com/rust-lang/rust/blob/42f28f9eb41adb7a197697e5e2d6535d00fd0f4a/library/std/src/sys/unix/fs.rs#L676

@digama0
Copy link

digama0 commented Jun 5, 2023

@RalfJung Could you comment on the correctness of the comment there? My take is that the reason addr_of!((*ptr).d_name) is said to be invalid is because the allocation may not actually be 256 bytes (plus the rest of the struct), and (at least for now) addr_of!((*ptr).d_name) asserts that the entire d_name: [c_char; 256] field is within an allocation. So an alternative way to fix the code would be to have a version of dirent with a d_name: [c_char; 0] VLA and using that to do the offset computation.

@RalfJung
Copy link
Member

RalfJung commented Jun 5, 2023

Yeah that comment sounds accurate for the current very conservative semantics we have for *ptr (where the pointer has to be fully aligned and dereferenceable). That comment is also a good argument for why we'd probably want a more relaxed semantics...

@tgross35 tgross35 added this to the 1.0 milestone Aug 29, 2024
@tgross35 tgross35 added the E-medium E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. label Aug 29, 2024
@tgross35
Copy link
Contributor

It sounds like we can improve things with a breaking change. If anyone has any proposals, feel free to put up a PR for 1.0.

@safinaskar
Copy link
Author

@tgross35 , obviously, perfect solution would be waiting for "thin cstr" and making thin cstr (not reference to thin cstr, but unsized thin cstr itself) last member of dirent

@tgross35
Copy link
Contributor

tgross35 commented Aug 29, 2024

Yeah, I do wish we had that available. Since we don't though, changing to [c_char; 0] is probably our best bet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug E-medium E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate.
Projects
None yet
Development

No branches or pull requests

5 participants