Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API for normalizing addresses #114

Merged
merged 25 commits into from
Apr 20, 2023

Conversation

danielocfb
Copy link
Collaborator

We want to support symbolization in scenarios where addresses are captured on one system and then symbolized elsewhere (#61). In a nutshell, we are going to perform address "normalization" on the "local" system (the one that captured the addresses), send them to the remote, and then symbolize these normalized addresses on the remote.

This pull request introduces, among other things, the logic for normalizing addresses. Basically, we use the corresponding proc maps entry, parse the contents, identify the ELF files to which those addresses map, and then perform normalization based on the ELF information.

Please see individual commits for descriptions. Note that included here are all but the last commit from #110, so there is some overlap. I will rebase #110 on top of what is done here (and address the remaining comments).

The entire proc maps logic is Linux specific and yet nothing except for
LinuxMapsEntry contains the Linux prefix. Remove it from that type as
well -- it is unnecessary to encode this "information" in the name.

Signed-off-by: Daniel Müller <[email protected]>
This change refactors the parsing logic for proc maps slightly, allowing
for passing in of a filter function that allows for early discarding of
entries that we have no interest in dealing with. Because we move the
generalized existing filter functionality into the maps module, we make
it easier reusable for future changes (which will require it). On top of
that such early filtering shrinks the size of allocations used.

Signed-off-by: Daniel Müller <[email protected]>
So far our proc maps parsing logic read in the entire file before
allowing evaluation of individual entries (barring preliminary
filtering introduced earlier).
With this change we convert the functionality over to using a proper
lazy iterator.

Signed-off-by: Daniel Müller <[email protected]>
The proc maps special file contains a virtual address range that is
exclusive of the end address. Currently it is anybody's guess what
_end_address represents.
Switch to using ops::Range for representing the virtual address range of
entries.

Signed-off-by: Daniel Müller <[email protected]>
This change makes sure not to leak implementation details through the
public API accidentally by restricting the visibility of more types to
pub(crate).

Signed-off-by: Daniel Müller <[email protected]>
This change introduces some builder pattern inspired logic to the Mmap
type. This will later allow us to configure memory mappings a bit more.
Right now it's just syntactic sugar.

Signed-off-by: Daniel Müller <[email protected]>
With upcoming changes we are going to need to instantiate ElfParser and
Archive objects from existing memory mappings, which are meant to be
shared with other program entities.
In preparation for that, make both types with with Rc<Mmap> instead of a
Mmap.

Signed-off-by: Daniel Müller <[email protected]>
We are going to add another C source file for the creation of a shared
object that we want to work with. Rename the existing test.c to
test-exe.c to make it clear that this one is for an executable.

Signed-off-by: Daniel Müller <[email protected]>
This change introduces a shared object that we can use for testing
purposes. Currently it only contains a single exported dummy function.
The binary is added to the existing zip archive, because we will later
use it for testing.

Signed-off-by: Daniel Müller <[email protected]>
The SymbolType type has a logical default state: that for "unknown"
symbols. Implement Default accordingly.

Signed-off-by: Daniel Müller <[email protected]>
We don't need shared mappings, because we are not doing any shared
memory work. Hence, create memory mappings privately for the time being.

Signed-off-by: Daniel Müller <[email protected]>
In Rust "getters" are typically not prefixed with "get". Rename some
internal function accordingly.

Signed-off-by: Daniel Müller <[email protected]>
We want to support symbolization in scenarios where addresses are
captured on one system and then symbolized elsewhere (libbpf#61). In a
nutshell, we are going to perform address "normalization" on the "local"
system (the one that captured the addresses), send them to the remote,
and then symbolize these normalized addresses on the remote.
This change introduces the logic for normalizing addresses. Basically,
we use the corresponding proc maps entry, parse the contents, identify
the ELF files to which those addresses map, and then perform
normalization based on the ELF information.
I introduced a new public module, normalize. We have to think some more
about how best to expose this functionality, but this is a reasonable
starting point.
The normalization does not yet use advanced caching of sorts: each ELF
entry handled is parsed as we found it. That is not particularly
effective but not wrong. We can optimize further subsequently without
adjusting the API.

Signed-off-by: Daniel Müller <[email protected]>
As part of the address normalization, users may needs the build ID of
the binary in which an address lies along with the binary's path. That's
useful in cases where a remote server is queried which indexes binaries
by build ID, such as with debuginfod.
With this change we make sure to attempt to read the build ID of the ELF
binaries that we found and populate the corresponding information
inside UserAddrMeta.

Signed-off-by: Daniel Müller <[email protected]>
This change adds two tests for the user space address normalization
functionality.

Signed-off-by: Daniel Müller <[email protected]>
@codecov
Copy link

codecov bot commented Apr 17, 2023

Codecov Report

Patch coverage: 71.66% and project coverage change: -2.87 ⚠️

Comparison is base (97c45f3) 73.22% compared to head (5764032) 70.35%.

❗ Current head 5764032 differs from pull request most recent head eaa8cb1. Consider uploading reports for the commit eaa8cb1 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #114      +/-   ##
==========================================
- Coverage   73.22%   70.35%   -2.87%     
==========================================
  Files          19       23       +4     
  Lines        3862     4372     +510     
==========================================
+ Hits         2828     3076     +248     
- Misses       1034     1296     +262     
Impacted Files Coverage Δ
src/c_api/symbolize.rs 33.33% <ø> (ø)
src/dwarf/parser.rs 65.27% <ø> (-0.97%) ⬇️
src/dwarf/resolver.rs 79.32% <ø> (ø)
src/elf/cache.rs 92.20% <ø> (ø)
src/elf/types.rs 0.00% <ø> (ø)
src/gsym/resolver.rs 74.35% <ø> (ø)
src/kernel.rs 45.58% <ø> (ø)
src/ksym.rs 52.14% <0.00%> (ø)
src/c_api/normalize.rs 46.35% <46.35%> (ø)
src/symbolize.rs 60.59% <60.59%> (ø)
... and 10 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

This change restructures the existing c_api module by introducing a new
sub-module, symbolize, and moving the existing functionality into it.

Signed-off-by: Daniel Müller <[email protected]>
@danielocfb danielocfb force-pushed the topic/address-normalization branch 2 times, most recently from 12c9a92 to 81f3ec2 Compare April 17, 2023 22:53
@danielocfb danielocfb force-pushed the topic/address-normalization branch 4 times, most recently from 93d629f to 49d57c7 Compare April 18, 2023 22:00
Copy link
Member

@anakryiko anakryiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first batch about normalization to give you a chance to look through that

src/normalize/normalize.rs Outdated Show resolved Hide resolved
src/normalize/normalize.rs Outdated Show resolved Hide resolved
src/normalize/normalize.rs Outdated Show resolved Hide resolved
})??;
}

let meta_idx = if let Some(meta_idx) = meta_lookup.get(&entry.path) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just an idea for optimization: most of the time addr will end up in the same Elf file and most probably even the same segment, so it seems beneficial to remember last path+meta+idx and do a quick check. It's probably very beneficial in practice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Yes, we can optimize more. Will probably do that as a follow up, though.

meta_idx
};

let normalized_addr = normalize_elf_addr(addr, &entry)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have a private meta cache with also virtaddr -> fileoffset shift, to avoid linear search of matching PT_LOAD segments in ELF. Then at the end we can distill that to user-visible metadata

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will work on optimizing this functionality as follow up.

src/normalize/normalize.rs Show resolved Hide resolved
This change adds C API bindings for the address normalization
functionality.

Signed-off-by: Daniel Müller <[email protected]>
Currently the C API definitions are intermingled with everything Rust in
the documentation. That's causing more confusion than necessary. Let's
export the module instead. Doing so has no bearing on C code itself.

Signed-off-by: Daniel Müller <[email protected]>
src/normalize/normalize.rs Outdated Show resolved Hide resolved
src/normalize/normalize.rs Outdated Show resolved Hide resolved
include/blazesym.h Outdated Show resolved Hide resolved
Remove a bunch of unnecessary allocations when looking up addresses for
symbols.

Signed-off-by: Daniel Müller <[email protected]>
This change overhauls the Pid type slightly, giving it a nicer
conversion from u32 and adjusts users to rely on it.

Signed-off-by: Daniel Müller <[email protected]>
@d-e-s-o d-e-s-o force-pushed the topic/address-normalization branch 2 times, most recently from ce0908e to 36b6ac4 Compare April 20, 2023 00:08
@d-e-s-o
Copy link
Collaborator

d-e-s-o commented Apr 20, 2023

Addressed review comments.

Copy link
Member

@anakryiko anakryiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides the build_id in Unknown (and few nits), lgtm

src/c_api/normalize.rs Outdated Show resolved Hide resolved
src/c_api/normalize.rs Show resolved Hide resolved
src/normalize/normalize.rs Outdated Show resolved Hide resolved
@danielocfb danielocfb merged commit 8bb013b into libbpf:master Apr 20, 2023
@danielocfb danielocfb deleted the topic/address-normalization branch April 20, 2023 22:59
The end-to-end benchmarks broke because of some API change a while back.
Fix them. Unfortunately, it's not easy to just build them in CI, because
at least currently, that is bound to the extraction of our large
vmlinux-5.17.12-100.fc34.x86_64 image, which we do not want to pull in
CI.

Signed-off-by: Daniel Müller <[email protected]>
Let's move the [features] section closer to the top of Cargo.toml, where
it is easier to spot. After all, it is more relevant to users than,
say, build dependencies.

Signed-off-by: Daniel Müller <[email protected]>
This change moves a bunch of the symbolization related functionality
into the newly introduced symbolize module. This separation will make it
easier to conditionally compile parts of the crate in the future.

Signed-off-by: Daniel Müller <[email protected]>
With this change we introduce the 'symbolize' feature, which guards all
the symbolization related functionality. It is enabled by default. When
disabled, mostly the normalization part remains, which can be useful in
remote symbolization scenarios, for capturing and normalizing addresses
on embedded devices, for example. When this feature is disabled, the
regex and lru dependencies are no longer necessary, meaning that all we
need is libc.

Signed-off-by: Daniel Müller <[email protected]>
Some of our users are very sensitive to the number of dependencies. For
that matter we have tried to keep them small. With this change we remove
the anyhow build dependency, replacing its usage by directly working
with io::Error objects instead.

Signed-off-by: Daniel Müller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants