Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support remote symbolization #61

Closed
danielocfb opened this issue Feb 27, 2023 · 1 comment · Fixed by #362
Closed

Support remote symbolization #61

danielocfb opened this issue Feb 27, 2023 · 1 comment · Fixed by #362
Labels
enhancement New feature or request

Comments

@danielocfb
Copy link
Collaborator

danielocfb commented Feb 27, 2023

Symbolization is a potentially resource intensive process and it may not be feasible to perform it on the very system where addresses are recorded. Embedded devices, for example, with limited disk space and CPU capacity, cannot afford to perform symbolization on the device itself: debug information can be large and would be prohibitive to disk space usage and so it is unlikely to be stored on the device itself and the process of symbolization is likely to impact other running applications negatively, would be taking excessive amounts of time, or both.

For that and other reasons, we'd like to support remote (or off-device) symbolization. The below (preliminary) API proposal flushes out the idea somewhat.

The local side normalizes a list of addresses using the normalize_addresses function:

pub type Address = usize;

mod address_meta {
    use super::*;

    /// A GNU build ID.
    type BuildId = String;


    /// Meta information about a Linux kernel address.
    #[derive(Clone, Debug)]
    pub struct Kernel {
        /// The kernel's release string (i.e., roughly what `uname -r` reports).
        ///
        /// This is a free-form string.
        pub release: String,
        /// The kernel binary's build ID, if available.
        pub build_id: Option<BuildId>,
        /// The struct is non-exhaustive and open to extension.
        #[doc(hidden)]
        pub _non_exhaustive: (),
    }


    /// Meta information about a Linux kernel module address.
    #[derive(Clone, Debug)]
    pub struct KernelModule {
        /// The name of the kernel module.
        pub name: String,
        /// The kernel module's version string.
        ///
        /// This is a free-form string. It may resemble bits of `modinfo`'s
        /// `vermagic` field.
        pub version: String,
        /// The kernel's release string (i.e., roughly what `uname -r` reports).
        ///
        /// This is a free-form string.
        pub kernel_release: String,
        /// The kernel module's build ID, if available.
        pub build_id: Option<BuildId>,
        /// The struct is non-exhaustive and open to extension.
        #[doc(hidden)]
        pub _non_exhaustive: (),
    }

    /// Meta information about a user space binary (executable or shared object).
    #[derive(Clone, Debug)]
    pub struct Binary{
        /// The canonical absolute path to the binary, including its name.
        pub path: PathBuf,
        /// The binary's build ID, if available.
        pub build_id: Option<BuildId>,
        /// The struct is non-exhaustive and open to extension.
        #[doc(hidden)]
        pub _non_exhaustive: (),
    }

    /// Meta information about an address that could not be determined to be
    /// belonging to a specific component.
    #[derive(Clone, Debug)]
    pub struct Unknown {
        /// The struct is non-exhaustive and open to extension.
        #[doc(hidden)]
        pub _non_exhaustive: (),
    }
}


/// Meta information for an address.
#[derive(Clone, Debug)]
#[non_exhaustive]
pub enum AddressMeta {
    Kernel(address_meta::Kernel),
    KernelModule(address_meta::KernelModule),
    Binary(address_meta::Binary),
    Unknown(address_meta::Unknown),
}


/// A type capturing normalized addresses along with captured meta data.
#[derive(Clone, Debug)]
pub struct NormalizedAddresses {
    /// Normalized addresses along with an index into `meta` for retrieval of
    /// the corresponding [`AddressMeta`] information.
    addresses: Vec<(Address, usize)>,
    /// Meta information about the normalized addresses.
    meta: Vec<AddressMeta>,
}


/// Normalize `addresses` belonging to either a process or the kernel.
///
/// If the provided addresses belong to a process, its PID should be provided in
/// `pid`. For kernel addresses, `pid` may be `None`.
///
/// Normalized addresses are reported in the exact same order in which the
/// non-normalized ones were provided.
pub fn normalize_addresses<A>(addresses: A, pid: Option<u32>) -> Result<NormalizedAddresses, Error>
where
    A: IntoIterator<Item = Address>,
{
    // ...
}

The resulting normalized addresses together with information about their owners have to be conveyed to the remote for the actual symbolization to happen. The transfer of this information is outside of blazesym‘s purview and a responsibility of the user. For Rust users, we will provide serde derives for convenient serialization & deserialization.

On the remote system, blazesym‘s existing BlazeSymbolizer can be used to perform the symbolization using the newly added symbolize_normalized method:

/// A trait for resolving meta information for an address to a [`SymResolver`] to
/// use for the actual symbolization.
pub trait AddressMetaResolver {
    /// The type of [symbol resolver](SymResolver) returned by the
    /// `resolve_address_meta` method.
    type Resolver: SymResolver;

    /// Resolve the provided [`AddressMeta`] to a [symbol resolver](SymResolver) to use.
    fn resolve_address_meta(&self, address_meta: &AddressMeta) -> Result<Self::Resolver, Error>;
}

/// BlazeSymbolizer provides an interface to symbolize addresses with
/// a list of symbol sources.
pub struct BlazeSymbolizer {
   // ...
}

impl BlazeSymbolizer {
    // ...

    /// Symbolize a list of normalized addresses with associated meta
    /// information.
    ///
    /// Please refer to [`normalize_addresses`] for information on how to
    /// normalize addresses.
    ///
    /// The function returns one `Vec<SymbolizedResult>` for each address passed
    /// in, in the order they were passed in. Multiple `SymbolizedResult`
    /// candidates may be present in case an address is ambiguous owing to
    /// compiler optimizations.
    pub fn symbolize_normalized<R>(
        &self,
        addresses: &NormalizedAddresses,
        address_meta_resolver: R,
    ) -> Result<Vec<Vec<SymbolizedResult>>, Error>
    where
        R: AddressMetaResolver,
    {
        // ...
    }
}

The API should also allow us to enable debuginfod support, by having an implementor of AddressMetaResolver that speaks the corresponding protocol and fetches debug information from a service using it.

@danielocfb danielocfb self-assigned this Feb 27, 2023
danielocfb pushed a commit to danielocfb/blazesym that referenced this issue Apr 17, 2023
We want to support symbolization in scenarios where addresses are
captured on one system and then symbolized elsewhere (libbpf#61). In a
nutshell, we are going to perform address "normalization" on the "local"
system (the one that captured the addresses), send them to the remote,
and then symbolize these normalized addresses on the remote.
This change introduces the logic for normalizing addresses. Basically,
we use the corresponding proc maps entry, parse the contents, identify
the ELF files to which those addresses map, and then perform
normalization based on the ELF information.
I introduced a new public module, normalize. We have to think some more
about how best to expose this functionality, but this is a reasonable
starting point.
The normalization does not yet use advanced caching of sorts: each ELF
entry handled is parsed as we found it. That is not particularly
effective but not wrong. We can optimize further subsequently without
adjusting the API.

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit to danielocfb/blazesym that referenced this issue Apr 17, 2023
We want to support symbolization in scenarios where addresses are
captured on one system and then symbolized elsewhere (libbpf#61). In a
nutshell, we are going to perform address "normalization" on the "local"
system (the one that captured the addresses), send them to the remote,
and then symbolize these normalized addresses on the remote.
This change introduces the logic for normalizing addresses. Basically,
we use the corresponding proc maps entry, parse the contents, identify
the ELF files to which those addresses map, and then perform
normalization based on the ELF information.
I introduced a new public module, normalize. We have to think some more
about how best to expose this functionality, but this is a reasonable
starting point.
The normalization does not yet use advanced caching of sorts: each ELF
entry handled is parsed as we found it. That is not particularly
effective but not wrong. We can optimize further subsequently without
adjusting the API.

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit to danielocfb/blazesym that referenced this issue Apr 17, 2023
We want to support symbolization in scenarios where addresses are
captured on one system and then symbolized elsewhere (libbpf#61). In a
nutshell, we are going to perform address "normalization" on the "local"
system (the one that captured the addresses), send them to the remote,
and then symbolize these normalized addresses on the remote.
This change introduces the logic for normalizing addresses. Basically,
we use the corresponding proc maps entry, parse the contents, identify
the ELF files to which those addresses map, and then perform
normalization based on the ELF information.
I introduced a new public module, normalize. We have to think some more
about how best to expose this functionality, but this is a reasonable
starting point.
The normalization does not yet use advanced caching of sorts: each ELF
entry handled is parsed as we found it. That is not particularly
effective but not wrong. We can optimize further subsequently without
adjusting the API.

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit to danielocfb/blazesym that referenced this issue Apr 17, 2023
We want to support symbolization in scenarios where addresses are
captured on one system and then symbolized elsewhere (libbpf#61). In a
nutshell, we are going to perform address "normalization" on the "local"
system (the one that captured the addresses), send them to the remote,
and then symbolize these normalized addresses on the remote.
This change introduces the logic for normalizing addresses. Basically,
we use the corresponding proc maps entry, parse the contents, identify
the ELF files to which those addresses map, and then perform
normalization based on the ELF information.
I introduced a new public module, normalize. We have to think some more
about how best to expose this functionality, but this is a reasonable
starting point.
The normalization does not yet use advanced caching of sorts: each ELF
entry handled is parsed as we found it. That is not particularly
effective but not wrong. We can optimize further subsequently without
adjusting the API.

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit to danielocfb/blazesym that referenced this issue Apr 17, 2023
We want to support symbolization in scenarios where addresses are
captured on one system and then symbolized elsewhere (libbpf#61). In a
nutshell, we are going to perform address "normalization" on the "local"
system (the one that captured the addresses), send them to the remote,
and then symbolize these normalized addresses on the remote.
This change introduces the logic for normalizing addresses. Basically,
we use the corresponding proc maps entry, parse the contents, identify
the ELF files to which those addresses map, and then perform
normalization based on the ELF information.
I introduced a new public module, normalize. We have to think some more
about how best to expose this functionality, but this is a reasonable
starting point.
The normalization does not yet use advanced caching of sorts: each ELF
entry handled is parsed as we found it. That is not particularly
effective but not wrong. We can optimize further subsequently without
adjusting the API.

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit to danielocfb/blazesym that referenced this issue Apr 17, 2023
We want to support symbolization in scenarios where addresses are
captured on one system and then symbolized elsewhere (libbpf#61). In a
nutshell, we are going to perform address "normalization" on the "local"
system (the one that captured the addresses), send them to the remote,
and then symbolize these normalized addresses on the remote.
This change introduces the logic for normalizing addresses. Basically,
we use the corresponding proc maps entry, parse the contents, identify
the ELF files to which those addresses map, and then perform
normalization based on the ELF information.
I introduced a new public module, normalize. We have to think some more
about how best to expose this functionality, but this is a reasonable
starting point.
The normalization does not yet use advanced caching of sorts: each ELF
entry handled is parsed as we found it. That is not particularly
effective but not wrong. We can optimize further subsequently without
adjusting the API.

Signed-off-by: Daniel Müller <[email protected]>
@danielocfb
Copy link
Collaborator Author

First set of changes enabling address normalization: #114

danielocfb pushed a commit to danielocfb/blazesym that referenced this issue Apr 18, 2023
We want to support symbolization in scenarios where addresses are
captured on one system and then symbolized elsewhere (libbpf#61). In a
nutshell, we are going to perform address "normalization" on the "local"
system (the one that captured the addresses), send them to the remote,
and then symbolize these normalized addresses on the remote.
This change introduces the logic for normalizing addresses. Basically,
we use the corresponding proc maps entry, parse the contents, identify
the ELF files to which those addresses map, and then perform
normalization based on the ELF information.
I introduced a new public module, normalize. We have to think some more
about how best to expose this functionality, but this is a reasonable
starting point.
The normalization does not yet use advanced caching of sorts: each ELF
entry handled is parsed as we found it. That is not particularly
effective but not wrong. We can optimize further subsequently without
adjusting the API.

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit to danielocfb/blazesym that referenced this issue Apr 18, 2023
We want to support symbolization in scenarios where addresses are
captured on one system and then symbolized elsewhere (libbpf#61). In a
nutshell, we are going to perform address "normalization" on the "local"
system (the one that captured the addresses), send them to the remote,
and then symbolize these normalized addresses on the remote.
This change introduces the logic for normalizing addresses. Basically,
we use the corresponding proc maps entry, parse the contents, identify
the ELF files to which those addresses map, and then perform
normalization based on the ELF information.
I introduced a new public module, normalize. We have to think some more
about how best to expose this functionality, but this is a reasonable
starting point.
The normalization does not yet use advanced caching of sorts: each ELF
entry handled is parsed as we found it. That is not particularly
effective but not wrong. We can optimize further subsequently without
adjusting the API.

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit to danielocfb/blazesym that referenced this issue Apr 18, 2023
We want to support symbolization in scenarios where addresses are
captured on one system and then symbolized elsewhere (libbpf#61). In a
nutshell, we are going to perform address "normalization" on the "local"
system (the one that captured the addresses), send them to the remote,
and then symbolize these normalized addresses on the remote.
This change introduces the logic for normalizing addresses. Basically,
we use the corresponding proc maps entry, parse the contents, identify
the ELF files to which those addresses map, and then perform
normalization based on the ELF information.
I introduced a new public module, normalize. We have to think some more
about how best to expose this functionality, but this is a reasonable
starting point.
The normalization does not yet use advanced caching of sorts: each ELF
entry handled is parsed as we found it. That is not particularly
effective but not wrong. We can optimize further subsequently without
adjusting the API.

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit to danielocfb/blazesym that referenced this issue Apr 18, 2023
We want to support symbolization in scenarios where addresses are
captured on one system and then symbolized elsewhere (libbpf#61). In a
nutshell, we are going to perform address "normalization" on the "local"
system (the one that captured the addresses), send them to the remote,
and then symbolize these normalized addresses on the remote.
This change introduces the logic for normalizing addresses. Basically,
we use the corresponding proc maps entry, parse the contents, identify
the ELF files to which those addresses map, and then perform
normalization based on the ELF information.
I introduced a new public module, normalize. We have to think some more
about how best to expose this functionality, but this is a reasonable
starting point.
The normalization does not yet use advanced caching of sorts: each ELF
entry handled is parsed as we found it. That is not particularly
effective but not wrong. We can optimize further subsequently without
adjusting the API.

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit to danielocfb/blazesym that referenced this issue Apr 19, 2023
We want to support symbolization in scenarios where addresses are
captured on one system and then symbolized elsewhere (libbpf#61). In a
nutshell, we are going to perform address "normalization" on the "local"
system (the one that captured the addresses), send them to the remote,
and then symbolize these normalized addresses on the remote.
This change introduces the logic for normalizing addresses. Basically,
we use the corresponding proc maps entry, parse the contents, identify
the ELF files to which those addresses map, and then perform
normalization based on the ELF information.
I introduced a new public module, normalize. We have to think some more
about how best to expose this functionality, but this is a reasonable
starting point.
The normalization does not yet use advanced caching of sorts: each ELF
entry handled is parsed as we found it. That is not particularly
effective but not wrong. We can optimize further subsequently without
adjusting the API.

Signed-off-by: Daniel Müller <[email protected]>
d-e-s-o added a commit to d-e-s-o/blazesym that referenced this issue Apr 19, 2023
We want to support symbolization in scenarios where addresses are
captured on one system and then symbolized elsewhere (libbpf#61). In a
nutshell, we are going to perform address "normalization" on the "local"
system (the one that captured the addresses), send them to the remote,
and then symbolize these normalized addresses on the remote.
This change introduces the logic for normalizing addresses. Basically,
we use the corresponding proc maps entry, parse the contents, identify
the ELF files to which those addresses map, and then perform
normalization based on the ELF information.
I introduced a new public module, normalize. We have to think some more
about how best to expose this functionality, but this is a reasonable
starting point.
The normalization does not yet use advanced caching of sorts: each ELF
entry handled is parsed as we found it. That is not particularly
effective but not wrong. We can optimize further subsequently without
adjusting the API.

Signed-off-by: Daniel Müller <[email protected]>
d-e-s-o added a commit to d-e-s-o/blazesym that referenced this issue Apr 20, 2023
We want to support symbolization in scenarios where addresses are
captured on one system and then symbolized elsewhere (libbpf#61). In a
nutshell, we are going to perform address "normalization" on the "local"
system (the one that captured the addresses), send them to the remote,
and then symbolize these normalized addresses on the remote.
This change introduces the logic for normalizing addresses. Basically,
we use the corresponding proc maps entry, parse the contents, identify
the ELF files to which those addresses map, and then perform
normalization based on the ELF information.
I introduced a new public module, normalize. We have to think some more
about how best to expose this functionality, but this is a reasonable
starting point.
The normalization does not yet use advanced caching of sorts: each ELF
entry handled is parsed as we found it. That is not particularly
effective but not wrong. We can optimize further subsequently without
adjusting the API.

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit to danielocfb/blazesym that referenced this issue Apr 20, 2023
We want to support symbolization in scenarios where addresses are
captured on one system and then symbolized elsewhere (libbpf#61). In a
nutshell, we are going to perform address "normalization" on the "local"
system (the one that captured the addresses), send them to the remote,
and then symbolize these normalized addresses on the remote.
This change introduces the logic for normalizing addresses. Basically,
we use the corresponding proc maps entry, parse the contents, identify
the ELF files to which those addresses map, and then perform
normalization based on the ELF information.
I introduced a new public module, normalize. We have to think some more
about how best to expose this functionality, but this is a reasonable
starting point.
The normalization does not yet use advanced caching of sorts: each ELF
entry handled is parsed as we found it. That is not particularly
effective but not wrong. We can optimize further subsequently without
adjusting the API.

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit that referenced this issue Apr 20, 2023
We want to support symbolization in scenarios where addresses are
captured on one system and then symbolized elsewhere (#61). In a
nutshell, we are going to perform address "normalization" on the "local"
system (the one that captured the addresses), send them to the remote,
and then symbolize these normalized addresses on the remote.
This change introduces the logic for normalizing addresses. Basically,
we use the corresponding proc maps entry, parse the contents, identify
the ELF files to which those addresses map, and then perform
normalization based on the ELF information.
I introduced a new public module, normalize. We have to think some more
about how best to expose this functionality, but this is a reasonable
starting point.
The normalization does not yet use advanced caching of sorts: each ELF
entry handled is parsed as we found it. That is not particularly
effective but not wrong. We can optimize further subsequently without
adjusting the API.

Signed-off-by: Daniel Müller <[email protected]>
@danielocfb danielocfb removed their assignment May 30, 2023
@danielocfb danielocfb added the enhancement New feature or request label Jun 7, 2023
danielocfb pushed a commit to danielocfb/blazesym that referenced this issue Oct 17, 2023
With the normalization API rework (mainly libbpf#359), we no longer intend to
add specific remote symbolization APIs. Rather, the existing
symbolization APIs were adjusted to be compatible for use with the
outputs of normalization. As such, let's mark the remote symbolization
TODO (and issue) as done.

Closes: libbpf#61

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit that referenced this issue Oct 17, 2023
With the normalization API rework (mainly #359), we no longer intend to
add specific remote symbolization APIs. Rather, the existing
symbolization APIs were adjusted to be compatible for use with the
outputs of normalization. As such, let's mark the remote symbolization
TODO (and issue) as done.

Closes: #61

Signed-off-by: Daniel Müller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant