Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V4 #81

Closed
wants to merge 411 commits into from
Closed

V4 #81

wants to merge 411 commits into from

Conversation

xacrimon
Copy link
Owner

@xacrimon xacrimon commented Apr 19, 2020

V4 is a major rewrite of DashMap which switches the map from a locked sharded internal design to a mostly lockfree table. This has improved performance significantly, simplified the API, removed the deadlock bugs and reduced the amount of gotchas.

Remaining tasks

  • Iterators

  • Serde support

  • API guideline trait implementations

  • Documentation

  • cas / compute if present

  • upsert

  • insert_if_not_exists

  • drain

  • dashset

  • custom allocator support

  • Benchmarks

  • Blog post explaining the switch and the new internals

@xacrimon xacrimon marked this pull request as draft April 19, 2020 00:46
@xacrimon xacrimon self-assigned this Apr 19, 2020
@xacrimon xacrimon added the enhancement New feature or request label Apr 19, 2020
@xacrimon
Copy link
Owner Author

Fixes #79. Fixes #74. Fixes #24.

@c-x-berger
Copy link

Is DashSet returning in v4, or should I just use a "bare" DashMap<Thing, ()>?

@xacrimon
Copy link
Owner Author

DashSet<T> will be returning. I just haven't gotten to it yet.

@walfie
Copy link

walfie commented May 24, 2020

Apologies if this is a known issue (it's a pre-release build, after all), but mentioning just in case it's not.

I've been using 4.0.0-rc3 and encountered some deadlock issues with the code below:


src/main.rs

use dashmap::DashMap;
use std::time::Duration;

#[tokio::main]
async fn main() {
    let size = 50;
    let delay = Duration::from_millis(500);

    let iter = (0..size).map(|i| (i, i * 2));
    let map = DashMap::<usize, usize>::from_iter(iter);

    let spawned = tokio::spawn(async move {
        for i in 0..size {
            tokio::time::delay_for(delay).await;
            assert_eq!(*map.get(&i).unwrap().value(), i * 2);
            println!("{}", i);
        }
    });

    spawned.await.unwrap();
}

Cargo.toml

[package]
name = "dashmap-example"
version = "0.1.0"
edition = "2018"

[dependencies]
dashmap = "4.0.0-rc3"
tokio = { version = "0.2.21", features = ["rt-threaded", "macros", "time"] }

On my machine (Mac with i5, 4 CPUs) it seems to deadlock about half the time within the first 5 iterations with size of 50 and delay of 100ms, and deadlocks more consistently for larger values of size and delay (at 500ms, it doesn't make it past the first iteration).

@xacrimon
Copy link
Owner Author

I'm going to take a look but this is really odd due to the fact that get doesn't take any locks. Thanks for reporting.

@xacrimon
Copy link
Owner Author

I can reproduce, going to try and find the issue. Seems to only happen in tokio though.

@xacrimon
Copy link
Owner Author

This seems to be tokio using some sort of API and parks a thread while it is registering itself with the GC. Not really sure how to solve it.

@xacrimon
Copy link
Owner Author

xacrimon commented May 24, 2020

@walfie Thanks a ton for the bug report. It's super useful to have people testing this in pre-release so that this doesn't happen later. I've diagnosed the issue and released 4.0.0-rc4 and I've confirmed your example works properly when using the new pre-release. Thanks to the absolutely fantastic Alice Ryhl on the Tokio Discord for helping me rule out Tokio.

@walfie
Copy link

walfie commented May 24, 2020

I wasn't expecting such a quick turnaround, thanks! I've confirmed that it also fixes the deadlock issue I was having in my real code (that the example was based on).

@xacrimon
Copy link
Owner Author

xacrimon commented May 24, 2020

For context this issue was not caught earlier because it would only occur if a resize happened before the first garbage collection cycle occurred. It was due to the fact that the GC would not yet have registered the runner thread with itself which locks a global mutex. The garbage collector would lock it during the collect cycle and then during execution of a deferred function containing code which would trigger the runner thread to try and register with GC, thus deadlocking.

@xacrimon
Copy link
Owner Author

xacrimon commented May 24, 2020

The fix was to simply force register the runner thread with the GC before starting the collect cycle.

@walfie
Copy link

walfie commented May 24, 2020

That makes sense -- it explains why I only saw the issue when I started initializing the map with more than a certain number of elements, and never saw it when the map started off empty.

walfie added a commit to walfie/petronel-graphql that referenced this pull request May 24, 2020
The map would deadlock if we initialized it with more than a certain
number of items on startup (on my machine, more than 15 or 16 items).
The latest RC fixes this issue:
xacrimon/dashmap#81 (comment)
@ebkalderon
Copy link

Thanks for all the awesome work so far, @xacrimon! Will the std-style entry API be returning in 4.0.0? Or will the upcoming insert_if_not_exists() be recommended as an alternative instead?

@xacrimon
Copy link
Owner Author

It and a Java like compute_if_present will be exposed instead.

@ebkalderon
Copy link

Awesome, thanks for clarifying! That should still work for my own use cases. Your quick response is much appreciated.

@Avarel
Copy link

Avarel commented Jun 20, 2020

Hey! I'm currently using DashMap 4.0.0-rc6 and it is an absolute pleasure. However, I'm still quite concerned about how this is memory safe. I assume that you are atomic reference counting with the ElementGuard, which are backed by ABox, but I just wanted to be sure. Could you give a quick and dirty run down of the new implementation?

@xacrimon
Copy link
Owner Author

xacrimon commented Jun 20, 2020

Hey, thanks for using a prerelease. DashMap V4 is currently quite cutting edge stuff, even in academics. Quite a few parts are involved in making it safe to use and the design isn't exactly simple. There are a few core parts though. First, the public table state is just a bunch of atomicptrs containing a pointer to elements and some tag data used to speed things up dramatically. This means operations that change how the table is viewed just comes down to atomic cas operations. Entries are indeed single layer reference counted but this opens a problem. What if the reference count is decremented to 0 and an element dropped inbetween another thread reading the pointer and increment the refcount itself? To solve this we use an epoch based memory reclaimer to synchronize and defer destructive operations until a safe point in time. The exact details on how this works can be found in Keir Fraser's paper "Practical lock-freedom".

@xacrimon
Copy link
Owner Author

Since probing is very expensive we use a few techniques to narrow down what keys we check. One of them is storing a partial hash in the pointers upper bits. If the partial hash values do not match, we don't have to incur a memory access and check the entry. There are some additional optimizations to make yet but those will come later.

@MichaelBurge
Copy link

MichaelBurge commented Jul 4, 2020

Hello, I was trying out the new DashMap to hopefully fix some lock issues in my program. But I was having trouble porting it due to lifetime compile errors.

This snippet compiles on "3.11.5" but fails on "4.0.0-rc6" with 3 issues:

use dashmap::DashMap;
use std::ops::Deref;

type X<'a> = &'a ();

#[derive(Default, Debug)]
struct Foo<'a>(DashMap<u8, X<'a>>);

fn bar<'a>(foo: &'a Foo<'a>) -> impl Deref<Target = X<'a>> + 'a { foo.0.get(&3).unwrap() }

fn foo() {
    let foo = Foo::default();
    foo.0.insert(3, &());
    println!("foo={:?}", &*bar(&foo));
}

Issues 1 and 2 are that Default and Debug don't autoderive on types that take lifetime parameters.

Issue 3 is that the function get used to return a guard with a lifetime matching the DashMap. The new get doesn't. That is, ElementGuard doesn't satisfy impl Deref<Target = X<'a>> + 'a, so bar cannot be implemented with the new get.

@HactarCE
Copy link

HactarCE commented Aug 26, 2020

I'm very much looking forward to V4, but I understand it might be a while before it's ready for use. In particular, my application needs insert_if_not_exists() functionality, which isn't ready yet. In the meantime, I'm fine with using an RwLock<std::collections::HashSet<Arc<T>>>, but I want to make the eventual upgrade process to DashMap V4 as simple as I can, so I have a few questions:

  1. Is the v4.0.0-rc6 design of 'static borrowing from the map still the plan? bucket::Guard in acrimon/experimental currently holds a non-'static reference to the map.
  2. If so, is it ok to store ElementGuard/Guards in the DashMap as long as I avoid cycles and remove all entries before dropping the map, or should I build my own refcount/GC mechanism for entries that point to other entries?
  3. If not, is it redundant to use a DashSet<Arc<T>>?

@xacrimon
Copy link
Owner Author

@HactarCE

  1. Keys and values in the final implementation will not need to be static. Elementguards will be bound by the lifetime of the map.
  2. Storing elementguards in the map itself is fine but keep in mind that this will prevent reclamation of memory.
  3. It's not redundant at all.

@shekohex
Copy link

shekohex commented Sep 6, 2020

Hi, I'm trying out v4.0.0-rc6 and found that Iter does not implement Send.
from docs.
and in my code, I used to iterate over the collection to do checks and other operations.. my code is async btw, and here is a simple example of my code and the error message:

// This an example from my game server where I need to iterate over all the players on a map and do an operation like removing the current character from there screen.
pub async fn clear(&self) -> Result<(), Error> {
	debug!("Clearing Screen..");
    let me = self.owner.character().await?;
    for character in self.characters.iter() {
    	let observer = character.owner();
        let observer_screen = observer.screen().await?;
        observer_screen.remove_character(me.id()).await?;
        self.remove_character(character.id()).await?;
	}
    self.characters.clear();
    Ok(())
}

and when I use this method in another async function, the compiler shows this error:

error: future cannot be sent between threads safely
   --> server\game\src\packets\msg_action.rs:143:34
    |
143 |       ) -> Result<(), Self::Error> {
    |  __________________________________^
144 | |         let ty = self.action_type.into();
145 | |         match ty {
146 | |             ActionType::SendLocation => {
...   |
287 | |         Ok(())
288 | |     }
    | |_____^ future returned by `__process` is not `Send`
    |
    = help: the trait `std::marker::Send` is not implemented for `(dyn std::iter::Iterator<Item = dashmap::ElementGuard<u32, world::character::Character>> + 'static)`
note: future is not `Send` as this value is used across an await
   --> server\game\src\systems\screen.rs:40:13
    |
35  |         for character in self.characters.iter() {
    |                          ----------------------
    |                          |                    |
    |                          |                    `self.characters.iter()` is later dropped here
    |                          has type `dashmap::Iter<u32, world::character::Character>` which is not `Send`
...
40  |             self.remove_character(id).await?;
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ await occurs here, with `self.characters.iter()` maybe used later
    = note: required for the cast to the object type `dyn std::future::Future<Output = std::result::Result<(), errors::Error>> + std::marker::Send`

is there a workaround this?

@xacrimon
Copy link
Owner Author

xacrimon commented Sep 7, 2020

Unfortunately there is no workaround at the moment. Lockfree code typically heavily relies on bookkeeping on what each thread is referencing and moving an iterator across threads would break it. I have an idea for implementing migration of such bookkeeping state but it's a while away.

@Farkal
Copy link

Farkal commented Sep 11, 2020

Hi i am also trying v4.0.0-rc6 (need the async support to use in actix_web).
I have several questions:

  • How can i update a value without having to clone the whole struct in the update method (my RAM is crying) i store heavy objects on the DashMap so i need to be able to update some values with a &mut for example
  • How can i ask for a write lock on some key ? My server handle several download requests and they often concern the same file so if two request are on the same file i need the first to download it to my local cache and the other to wait. I didn't find a way to do it 😕

@xacrimon
Copy link
Owner Author

Dashmap v4 doesn't lock internally for flexibility reasons. If you need &mut access to values wrap them in your lock of choice. Regarding keys you'd probably be best off storing some state in the valie, using enums for different states if needed.

@SergioBenitez
Copy link

SergioBenitez commented Oct 1, 2020

@xacrimon Where, perhaps a branch or a tag or something else, is the current source code for v4? I've identified a memory leak in the latest published version and I'd like to help track it down.

@xacrimon
Copy link
Owner Author

xacrimon commented Oct 1, 2020

v4 is currently in limbo and pending a major rework. I do not recommend using the rc versions anywhere important. I'll rework it when I have time.

@ckaran
Copy link

ckaran commented Dec 4, 2020

Is the task list at the top up to date?

@xacrimon
Copy link
Owner Author

xacrimon commented Dec 4, 2020

Not very. V4 is pending a major rewrite at the moment which I have not had time for.

@ckaran
Copy link

ckaran commented Dec 4, 2020

Which branch are you currently working off of?

@xacrimon
Copy link
Owner Author

This iteration and V4 featureset will currently be paused due to a lack of time and the need for a maintenance major release.

@xacrimon xacrimon closed this Dec 25, 2020
@ckaran
Copy link

ckaran commented Dec 27, 2020

OK, thanks for letting us know!

@xacrimon xacrimon deleted the acrimon/experimental branch May 15, 2021 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock when working within async threads Combined read-write benchmarks