Full update of weighted index by assigning weights #1194

SuperFluffy · 2021-10-19T08:53:15Z

I need to update my weighted indices inside a hot loop. Instead of reconstructing the entire index from scratch, this commit allows updating the inner cumulative weights in-place using a slice of weights via WeightedIndex::assign_weights. WeightedIndex::assign_weights_unchecked` is also provided in those cases where the user promises that all weights are valid and their sum exceeds zero.

Open questions

How to handle a partial update? If assignment fails during WeightedIndex::assign_weights, the index can be left in a partially updated undefined state. The method's doc comment notes that but does not go further than that. I see the following ways to handle this:

Leave things as they are. Users of assign_weights read the documentation and will keep this caveat in mind.
Roll back the changes by using e.g. SubAssign. This will probably require keeping around the old cumulative weights, which implies an extra allocation in the function body, which goes against the point of this new feature.
Add a field has_errored: bool to WeightedIndex, initialized to false. If an error is encountered during assignment, set to true. If WeightedIndex is used for sampling with has_errored == True, panic. I don't recall where I have seen this, but I believe this solution is even used somewhere in the standard library.

dhardy

Thanks for the PR!

There are a few comments below. Likely the new constructor should be updated slightly too.

Could we have some benchmarks please comparing (1) replacing with a new instance, (2) assign_weights and (3) assign_weights_unchecked.

dhardy · 2021-10-19T10:10:29Z

src/distributions/weighted_index.rs

@@ -130,6 +130,72 @@ impl<X: SampleUniform + PartialOrd> WeightedIndex<X> {
        })
    }

+    /// Updates all weights by recalculating the index, without changing the number of weights.
+    ///
+    /// **NOTE:** if `weights` contains invalid elements (for example, `f64::NAN` in the case of


NaN will fail the w >= &zero check. What wouldn't be caught is +inf (or a sum of weights which overflows to +inf). Possibly we should add a check for this (total_weight.is_finite()).

Despite this comment, the cases which are caught are identical to those of the new constructor. Possibly both need updating.

Yeah, I noticed that as well. There is also is_normal(), but subnormal values are probably not of concern.

The bigger problem however is that at the moment WeightedIndex is valid for all X: SampleUniform, which includes integers, while is_finite only applies to floats.

Good point... I don't think we have any way of dealing with this.

The debug asserts in impl UniformSampler for UniformFloat<$ty> will catch this, but it doesn't seem ideal.

src/distributions/weighted_index.rs

dhardy · 2021-10-19T10:14:20Z

src/distributions/weighted_index.rs

+    /// partially updated index is undefined. It is the user's responsibility to not sample from
+    /// the index upon encountering an error. The index may be used again after assigning a new set 
+    /// of weights that do not result in an error.
+    pub fn assign_weights(&mut self, weights: &[X]) -> Result<(), WeightedError >


new takes an iterator while this method takes a slice, which is inconsistent. There's no real reason we can't use an iterator here (should be benchmarked but I suspect perf. will be very similar).

@vks do you think we should use an iterator for consistency?

But if we do, we have an additional choice: require ExactSizeIterator or just test we finish with the right length? I think I favour using ExactSizeIterator but I haven't thought a lot about it (it's also the more restricted choice: potentially we could switch away from it later if required).

Yes, that's a good idea as that's strictly more general. Since I am zipping internally anyways, this should not change anything with regards to slices and vectors.

Implemented the change, but leaving the convo open because of the question.

I think using an iterator makes sense, unless the slice optimizes better.

src/distributions/weighted_index.rs

SuperFluffy · 2021-10-19T11:18:52Z

I have implemented the changes, including removing the _unchecked version because it is IMO no longer necessary with the check for validity moved to outside the loop.

I could not act on the is_finite() suggestion, even though I agree with it, as it requires specialization or an extra trait. It would simply always evaluate to true in the case of integers, and delegate to {f32,f64}::is_finite in the case of floats. Should I do that?

Benchmarks showing that assignment via exact size iterator gives a nice little speedbump. Roughly 3x to 5x.

test weighted_index_assignment       ... bench:          27 ns/iter (+/- 0)
test weighted_index_assignment_large ... bench:         395 ns/iter (+/- 10)
test weighted_index_creation         ... bench:          97 ns/iter (+/- 0)
test weighted_index_creation_large   ... bench:       2,079 ns/iter (+/- 18)
test weighted_index_modification     ... bench:          26 ns/iter (+/- 0)

dhardy

Excellent.

I'll let @vks take a look before merging.

dhardy · 2021-10-19T12:51:36Z

I could not act on the is_finite() suggestion, even though I agree with it, as it requires specialization or an extra trait. It would simply always evaluate to true in the case of integers, and delegate to {f32,f64}::is_finite in the case of floats. Should I do that?

A custom trait would be the right choice but overall I think it's better to leave this as it is: a public trait adds another complication to the API for little gain, while a private (sealed) trait restricts usage to std types.

Note that for integer types there's already a check in debug builds: overflow when the sum gets too large. Again, this is not ideal, but whether it is worth checking for overflow is questionable.

SuperFluffy · 2021-10-19T13:37:40Z

I think we haven't addressed what to do with the index if assignment fails mid update.

Right now it's going to be filled with garbage if it encounters Nan, so one probably shouldn't sample on it. :-D

vks · 2021-10-19T13:50:10Z

A partial update could also be handled by setting the length of the weights to zero, this should make all other calls panic.

dhardy · 2021-10-19T14:31:00Z

If any of the weights is NaN or inf, then total_weight will be NaN or inf, and then the sampler, X::Sampler::new(zero, total_weight.clone()), will have NaN/inf range, resulting in an assertion in Sampler::new.

Not the most elegant handler but still sufficient in my opinion. Perhaps the docs should mention that NaN/inf weight will result in a panic.

SuperFluffy · 2021-10-19T15:09:23Z

If any of the weights is NaN or inf, then total_weight will be NaN or inf, and then the sampler, X::Sampler::new(zero, total_weight.clone()), will have NaN/inf range, resulting in an assertion in Sampler::new.

Not the most elegant handler but still sufficient in my opinion. Perhaps the docs should mention that NaN/inf weight will result in a panic.

That's not actually the case. :-( We are returning early so that X::Sampler::new is never hit.

The sampler as well as the total weight as stored in the weighted index itself are not updated until the very end of the function.

dhardy · 2021-10-19T15:31:02Z

Ugh; you're right. We could just not return early (use a result var). That way on error the weights get updated, the sampler doesn't and the method panics.

A panic is still not ideal in this context, but it's what Sampler::new does. Most other distributions don't: see #581 / #770. I had hoped to be past making significant breaking changes like this now, but it seems worth considering.

In the mean-time, perhaps the right thing to do here is to use a sealed trait (a "pub" trait in a private module) to enable the checks we need. In the future we may be able to drop the trait, which won't be a breaking change. Caveat: using the fixed bounds-detection in new is a breaking change (restricting compatible types), thus that change should be left until later.

dhardy · 2021-10-19T15:51:09Z

If #1195 is implemented we can avoid the need for an extra trait bound. For now I suggest getting this PR merged without depending on that, however.

SuperFluffy · 2021-10-21T07:48:18Z

I need further clarification. The issues we are discussing are somewhat orthogonal, and I think the sealed trait part warrants its own PR.

@vks Suggested to set the cumulative weights to 0 if it encounters a problem mid-update. This will cause the index to panic if it's used thereafter. I am leaning towards this solution.
The sealed trait @dhardy is suggesting is for the f64::is_finite issue (please correct me if I am wrong). But that change not only touches the new assign_weights method, but general initialization of WeightedIndex. As such, this should be done in a separate PR.

@dhardy Did I understand 2. correctly, or did you want me to do a completely different trait?

dhardy · 2021-10-21T08:11:37Z

@SuperFluffy you are concerned with the state of WeightedIndex after returning an error? I was merely going to document this, but sure, clearing the weights to cause a panic in this case is an extra precaution.

SuperFluffy · 2021-10-21T09:52:53Z

Turns out the way Distribution is implemented on WeightedIndex nothing actually happens if cumulative_weights.len() == 0. The binary search will simply return 0 in error position.

I am now:

explicitly clearing the cumulative weights before assignment happens, and clearing them again if an error is encountered during the loop.
asserting in Distribution::sample that cumulative_weights.len() > 0.

assign_weights was renamed to assign_new_weights, and now allows setting an arbitrary number of new weights because they are pushed into the index anyways.

If you are happy with these change I can squash the commits.

SuperFluffy · 2021-10-21T10:10:52Z

The newest changes were not good for perf:

test weighted_index_assignment       ... bench:          50 ns/iter (+/- 1)
test weighted_index_assignment_large ... bench:       2,048 ns/iter (+/- 26)
test weighted_index_creation         ... bench:          99 ns/iter (+/- 0)
test weighted_index_creation_large   ... bench:       2,069 ns/iter (+/- 9)
test weighted_index_modification     ... bench:          26 ns/iter (+/- 0)

SuperFluffy · 2021-10-21T10:28:35Z

Again enforcing assignment to be equal length to do a lockstep zip between weights iterator and cumulative weights.

test weighted_index_assignment       ... bench:          26 ns/iter (+/- 0)
test weighted_index_assignment_large ... bench:         394 ns/iter (+/- 15)
test weighted_index_creation         ... bench:          99 ns/iter (+/- 8)
test weighted_index_creation_large   ... bench:       2,094 ns/iter (+/- 66)
test weighted_index_modification     ... bench:          26 ns/iter (+/- 0)

dhardy

There are four options:

Use std::panic::catch_unwind. This is "not recommended for a general try/catch mechanism" and comes with various warnings, but may be okay.
Change Sampler::new to return Result on error — but this is beyond the scope of this PR: Error handling of distributions::Uniform::new #1195.
Use another trait bound to let us directly check the weight is finite. This fixes the specific case of f32/f64 but not necessarily user-extensions to the Uniform distribution.
Simply state that if the method fails, results of sampling the distribution are undefined (within certain bounds).

src/distributions/weighted_index.rs

dhardy · 2021-10-21T11:01:11Z

src/distributions/weighted_index.rs

+            return Err(WeightedError::AllWeightsZero);
+        };
+
+        self.weight_distribution = X::Sampler::new(zero, total_weight.clone());


There's still a problem: this panics if total_weight is +inf, and we don't catch panics.

Right, but here WeightedIndex::new suffers from the same issue. So if we want to address this, both assign_new_weights and new should be changed in a new PR, I think.

SuperFluffy · 2021-10-21T11:43:35Z

There are four options:

1. Use `std::panic::catch_unwind`. This is "not recommended for a general try/catch mechanism" and comes with various warnings, but may be okay.

2. Change `Sampler::new` to return `Result` on error — but this is beyond the scope of this PR: [Error handling of distributions::Uniform::new #1195](https://github.com/rust-random/rand/issues/1195).

3. Use another trait bound to let us directly check the weight is finite. This fixes the specific case of `f32`/`f64` but not necessarily user-extensions to the `Uniform` distribution.

4. Simply state that if the method fails, results of sampling the distribution are undefined (within certain bounds).

Alright, I went with 4., mentioning that the results of sampling the distribution are undefined.

Regarding your other comment about total_weight being +inf - I think this should be part of another PR fixing it for both new as well as assign_new_weights.

dhardy

Okay, I think I'm happy with this now, but I'll let @vks take another look before merging.

src/distributions/weighted_index.rs

benches/weighted.rs

src/distributions/weighted_index.rs

SuperFluffy · 2021-10-21T14:24:48Z

@vks Addressed all your points and squashed.

vks

Thanks! The new code is unfortunately not compatible with Rust 1.36:

error[E0277]: `[f64; 4]` is not an iterator
   --> src/distributions/weighted_index.rs:475:29
    |
475 |             let mut distr = WeightedIndex::new([1.0f64, 2.0, 3.0, 0.0]).unwrap();
    |                             ^^^^^^^^^^^^^^^^^^ borrow the array with `&` or call `.iter()` on it to iterate over it
    |
    = help: the trait `core::iter::Iterator` is not implemented for `[f64; 4]`
    = note: arrays are not iterators, but slices like the following are: `&[1, 2, 3]`
    = note: required because of the requirements on the impl of `core::iter::IntoIterator` for `[f64; 4]`

error[E0277]: the trait bound `[f64; 3]: core::iter::ExactSizeIterator` is not satisfied
   --> src/distributions/weighted_index.rs:476:29
    |
476 |             let res = distr.assign_new_weights([1.0f64, 2.0, 3.0]);
    |                             ^^^^^^^^^^^^^^^^^^ the trait `core::iter::ExactSizeIterator` is not implemented for `[f64; 3]`

error[E0277]: `[f64; 4]` is not an iterator
   --> src/distributions/weighted_index.rs:480:29
    |
480 |             let mut distr = WeightedIndex::new([1.0f64, 2.0, 3.0, 0.0]).unwrap();
    |                             ^^^^^^^^^^^^^^^^^^ borrow the array with `&` or call `.iter()` on it to iterate over it
    |
    = help: the trait `core::iter::Iterator` is not implemented for `[f64; 4]`
    = note: arrays are not iterators, but slices like the following are: `&[1, 2, 3]`
    = note: required because of the requirements on the impl of `core::iter::IntoIterator` for `[f64; 4]`
note: required by `distributions::weighted_index::WeightedIndex::<X>::new`
   --> src/distributions/weighted_index.rs:97:5
    |
97  | /     pub fn new<I>(weights: I) -> Result<WeightedIndex<X>, WeightedError>
98  | |     where
99  | |         I: IntoIterator,
100 | |         I::Item: SampleBorrow<X>,
...   |
131 | |         })
132 | |     }
    | |_____^

error[E0599]: no associated item named `NAN` found for type `f64` in the current scope
   --> src/distributions/weighted_index.rs:481:67
    |
481 |             let res = distr.assign_new_weights([1.0f64, 2.0, f64::NAN, 0.0]);
    |                                                                   ^^^ associated item not found in `f64`
    |
    = help: items from traits can only be used if the trait is in scope
    = note: the following trait is implemented but not in scope, perhaps add a `use` for it:
            `use core::num::dec2flt::rawfp::RawFloat;`

error[E0277]: `[u32; 4]` is not an iterator
   --> src/distributions/weighted_index.rs:485:29
    |
485 |             let mut distr = WeightedIndex::new([1u32, 2, 3, 0]).unwrap();
    |                             ^^^^^^^^^^^^^^^^^^ borrow the array with `&` or call `.iter()` on it to iterate over it
    |
    = help: the trait `core::iter::Iterator` is not implemented for `[u32; 4]`
    = note: arrays are not iterators, but slices like the following are: `&[1, 2, 3]`
    = note: required because of the requirements on the impl of `core::iter::IntoIterator` for `[u32; 4]`

error[E0277]: the trait bound `[u32; 4]: core::iter::ExactSizeIterator` is not satisfied
   --> src/distributions/weighted_index.rs:486:29
    |
486 |             let res = distr.assign_new_weights([0u32, 0, 0, 0]);
    |                             ^^^^^^^^^^^^^^^^^^ the trait `core::iter::ExactSizeIterator` is not implemented for `[u32; 4]`

error: aborting due to 6 previous errors

f64::NAN can be replaced with core::f64::NAN.
Iterating over slices instead of arrays should fix the other errors.

vks · 2021-10-21T17:56:26Z

@dhardy Do you think we can start to merge breaking changes for rand 0.9?

dhardy · 2021-10-21T18:48:06Z

@vks I guess that depends on whether there are any significant non-breaking changes in master or expected to be merged soon. I don't know but can check tomorrow. If not, then I think we can start merging.

BREAKING CHANGE: This commit adds a variant to `WeightedError`.

SuperFluffy · 2021-10-22T11:32:06Z

@vks Replaced f64::NAN by ::core::f64::NAN. Also changed the arrays to slices with &[<values>][..]. I actually did this to benches/weighted.rs as well, since none of the benches (mine and the old ones) were compatible with 1.36. Guess its because benches are checked with nightly, but not with "1.36 nightly".

vks · 2021-10-22T12:10:20Z

Great, thanks!

For the benchmarks it's fine to use newer features, because they require nightly anyway. For the API it's more important to track the MSRV, because this may break crates depending on rand.

kazcw · 2021-10-22T17:40:02Z

There is a more general API that would solve this with less new code, and avoid rand having to choose a stance on a new failure case. Rather than providing optimized methods for specific use cases, why not give the user the tools to perform any operations of this sort efficiently?

The documentation already commits to WeightedIndex being implemented with a Vec<X>, so we could expose that with some easily-implemented functions:

WeightedIndex<X>::into_cumulative_weights(self) -> Vec<X>;
WeightedIndex<X>::from_cumulative_weights(weights: Vec<X>) -> Result<Self, WeightedError>;
WeightedIndex<X>::from_cumulative_weights_unchecked(weights: Vec<X>) -> Self;

The Vecs would have total_weight as the final element; this could be a conversion done by pushing/popping in the new methods, or (preferably, I think) the existing methods could be modified to store total_weight in that position. (total_weight doesn't actually need to be stored at all, but I think the API simplicity of keeping it in the Vec rather than accepting it as a separate parameter outweighs the overhead of a single unneeded element—it is logically the final element of the series.)

For convenience, WeightedIndex<X> could also have a Default implementation with an empty weights Vec.

Then, a user like @SuperFluffy could achieve the optimized operation in question like this:

fn example(&mut self) -> Result<()> {
    let mut weights = self.weighted_index.take().into_weights();
    update_weights_somehow(&mut weights[..]);
    self.weighted_index = WeightedIndex::from_weights(weights)?;
}

The possibility of length mismatch is obviated here, and what becomes of the source value in the error case is explicitly the user's choice. This would also support at least one additional use case: if the user needs distributions of varying length at different times, they can reuse one Vec so that it only needs allocation when it reaches a new high-water mark.

Incidentally, an example of using the new API in place of update_weights would be exactly the same. update_weights could potentially be deprecated after this.

dhardy · 2021-10-23T16:40:56Z

Interesting points @kazcw. There are two caveats:

You're pushing more work onto the user: converting weights into cumulative weights (not hard I know)
It sounds like this is an unchecked API — fine, but we should be clear about that.

Anyway, it does make me consider something else:

There doesn't appear to be much reason to require that the number of weights match. We can just use reserve or collect iterators, right?
The constructor is much the same code as this method. Can we save some code, e.g. by making the constructor create an empty instance and then use replace_weights?

SuperFluffy · 2021-10-23T16:44:05Z

I actually had written a version that clears the original vector and pushes new elements into it instead of doing a lockstep zip + assignment.

This led to exactly the same performance as a just calling new, rendering the point of this PR moot.

kazcw · 2021-10-23T18:41:19Z

The from/into approach asks the user to implement more, but it asks the user to be aware of API subtleties less. With assign_new_weights, rand must take a stance on:

what invariants are required (does the new distribution have to have the same number of elements?)
what happens if the invariants are violated (do we panic? try to rollback? leave self in an inconsistent state? poison self so that sample will panic?)
what are the performance characteristics (if the new distribution has smaller len(), does self still have the same memory footprint?)

rand must try to find a compromise that will be acceptable to as many use cases as possible (which, as the discussion on this issue shows, is not easy—are we there yet? I think a poisoned self would be better than inconsistent!), and the user needs to understand rand's decisions and their implications (likewise with update_weights).

Whereas given the from/into approach, these decisions are in the users hands; tradeoffs can be chosen appropriately to the use case, and the consequences should not come as a surprise. The meaning of from_cumulative_weights is clear without consulting documentation: if-and-only-if the input represents the cumulative weights of a valid distribution, it returns Ok(_).

As for the checked vs. unchecked question, the performance of from_cumulative_weights (the checked API) should be similar to assign_new_weights; each has to make a pass over the full input. from_cumulative_weights_unchecked would of course be O(1).

As far as I can see, the advantage of assign_new_weights is that it requires less code to use. If so, this comes down to a question of priorities: is it more important that code built on rand be terse, or that it be straightforward to write correctly and easy to review?

dhardy · 2021-10-24T08:32:12Z

I actually had written a version that clears the original vector and pushes new elements into it instead of doing a lockstep zip + assignment.

Slightly weird, but pushing new elements is definitely slower. Using resize fixes this however:

self.cumulative_weights.resize(iter.len(), zero.clone());
for (w, c) in iter.zip(self.cumulative_weights.iter_mut()) {
    // ...

Of course, this technique can be used in new for a speed boost too, given ExactSizeIterator (if only specialisation was stable); probably also without that using the size hint plus a second loop to catch any remaining elements (but ugly redundant code).

@kazcw: I think you're right that this isn't the optimal API. I will think further on it.

SuperFluffy · 2021-10-25T12:21:29Z

Slightly weird, but pushing new elements is definitely slower. Using resize fixes this however:
self.cumulative_weights.resize(iter.len(), zero.clone());
for (w, c) in iter.zip(self.cumulative_weights.iter_mut()) {
    // ...

It improves it a lot compared to direct pushing, but there is still a significant performance penalty on my M1 ARM machine:

# Zip without resize
test weighted_index_assignment       ... bench:          18 ns/iter (+/- 0)
test weighted_index_assignment_large ... bench:         392 ns/iter (+/- 3)

# Zip after resize
test weighted_index_assignment       ... bench:          22 ns/iter (+/- 0)
test weighted_index_assignment_large ... bench:         484 ns/iter (+/- 2)

SuperFluffy · 2021-10-25T12:30:58Z

I have been thinking about @kacw's suggestion. Their argument applies not just to cumulative weights, but to normal weights as well. We can easily do:

WeightedIndex<X>::into_cumulative_weights(self) -> Vec<X>;

# Take the provided weights as they are
WeightedIndex<X>::from_cumulative_weights(weights: Vec<X>) -> Result<Self, WeightedError>;
WeightedIndex<X>::from_cumulative_weights_unchecked(weights: Vec<X>) -> Self;

# Iterate over the weights accumulating a total, and assign that total weight in each iteration
WeightedIndex<X>::from_weights(weights: Vec<X>) -> Result<Self, WeightedError>;
WeightedIndex<X>::from_weights_unchecked(weights: Vec<X>) -> Self;

The *cumulative_weights* versions allow direct insertion of the cumulative weights, while the non-cumulative versions allow reusing the provided weights in-place. The constructor takes ownership of the vector, so it can trivially iter_mut over it and assign the total weights accumulator.

kazcw · 2021-10-25T14:55:22Z

@SuperFluffy from_weights makes a lot of sense—if we'd need a pass over the data to check its cumulativeness, we might as well offer to accumulate it instead. I'm not sure about from_weights_unchecked though—it would have negligable performance benefit for typical X (anything that isn't very expensive to compare). I think offering an unchecked version that isn't normally appreciably faster could mislead users. For the rare X that could use it (bignums?), there's always from_cumulative_weights_unchecked, although WeightedIndex is not going to be optimal for bignums anyway.

As another convenience in the same vein, we might consider to_weights to better support the update_weights use case.

So we'd have:

/// O(1), no allocations
WeightedIndex<X>::into_cumulative_weights(self) -> Vec<X>;
/// O(N), no allocations
WeightedIndex<X>::to_weights(self) -> Vec<X>;

/// O(N), no allocations
WeightedIndex<X>::from_weights(weights: Vec<X>) -> Result<Self, WeightedError>;
/// O(N), no allocations
WeightedIndex<X>::from_cumulative_weights(weights: Vec<X>) -> Result<Self, WeightedError>;
/// O(1), no allocations, if input is not cumulative then distribution will not yield meaningful samples
WeightedIndex<X>::from_cumulative_weights_unchecked(weights: Vec<X>) -> Self;

(Distinguishing to from into in accordance with the Rust API Guidelines: into_cumulative_weights deconstructs, decreasing level of abstraction, and is cheap; to_weights converts, maintaining level of abstraction, and is asymptotically more expensive.)

dhardy · 2021-10-25T15:46:22Z

I had a little play with from_weights; it is a nice way to rewrite the constructor without significant performance impact. Using it to rewrite assign_new_weights does impact performance (35-40% on the simple bench used). Ultimately I'm not sure whether we care enough about performance in this specific case @SuperFluffy?

See: SuperFluffy#1

I didn't bother with from_cumulative_weights though that may have some uses (unrelated to this PR).

SuperFluffy · 2021-10-26T10:55:18Z

@dhardy I also pushed my version of from_weights. Interestingly, I am only seeing at most a 25% decrease in performance compared to assign_new_weights, but that includes an extra clone in the benchmark, which might account for a significant part of the the regression (the right thing to do would be to move the benches to criterion, which can ensure that the clone is not measured).

I especially like that from_cumulative_weights_unchecked is extremely cheap.

Here are my results:

test weighted_index_assignment                    ... bench:          18 ns/iter (+/- 0)
test weighted_index_assignment_large              ... bench:         392 ns/iter (+/- 1)
test weighted_index_from_cumulative_weights       ... bench:          41 ns/iter (+/- 0)
test weighted_index_from_cumulative_weights_large ... bench:         156 ns/iter (+/- 0)
test weighted_index_from_weights                  ... bench:          46 ns/iter (+/- 0)
test weighted_index_from_weights_large            ... bench:         478 ns/iter (+/- 1)
test weighted_index_modification                  ... bench:          26 ns/iter (+/- 0)
test weighted_index_new                           ... bench:          59 ns/iter (+/- 0)
test weighted_index_new_large                     ... bench:       2,073 ns/iter (+/- 6)

NOTE: I should get rid of the total_weight field, as in @dhardy's version. This was just quick and dirty.

dhardy · 2021-10-26T13:50:30Z

@SuperFluffy — about your benches — most of the results are extremely small (under 60ns). While the benchmark can fairly reliably time an operation at that level, I'm not convinced that the operation is representative. E.g. the "small" benchmark says that weighted_index_assingment is more than twice as wast as from_cumulative_weights, whereas the "large" version says basically the opposite. I would think the "large" variant is about as small as you'd want to go (an alternative would be to use multiple small distributions in the same loop).

Looking at the "large" variants, assignment and from_weights are 4-5 times faster, which does sound a useful speedup, however assignment is only 18% faster than from_weights, which, given the general unreliability of results from micro-benchmarks, is not that significant. Put differently: if you can demonstrate that assignment is significantly better than from_weights in something close to a real problem, then I will recognise the value of having both, but from the above I will not — and, if you don't specifically have a need for assignment, I'd prefer to drop it to simplify the API.

Because API simplicity is an important factor, and I have a feeling that we are over-optimising here (without a specific target).

SuperFluffy · 2021-10-26T13:56:32Z

@dhardy I agree with removing assignment, and I agree that results are not reliable. I very much prefer the from and into APIs.

If you want, I will close this PR and submit a fresh PR that contains these.

dhardy · 2021-10-26T14:09:07Z

@SuperFluffy not very important whether you use a new PR. Notice that my PR reduced line count significantly by making new a wrapper around from_weights and via simpler code there (enabled in part by pushing total_weight into the vec).

dhardy · 2022-12-06T16:17:25Z

@SuperFluffy are you still able to work on this? It would be good to get it merged soon!

Apparently it fails on Rust 1.36. The MSRV for the project is now being bumped to 1.56, so this shouldn't be an issue any more.
Uniform samplers don't yet return a result (Make Uniform constructors return a result #1229).

Was there anything else to resolve?

dhardy · 2023-02-20T15:06:10Z

@SuperFluffy can I remind you of this? Both the above issues are now resolved in master.

SuperFluffy · 2023-02-20T15:07:42Z

@dhardy apologies for not having responded. I left the job where this was relevant and it wasn't immediately relevant to me.

I'm going to find some time and rebased/adjust this PR.

Thanks for the reminder

dhardy · 2024-01-29T17:24:09Z

Closing due to inactivity. We can re-open if someone wishes to work on this again.

I have a branch related to this here, but I don't recall any the motivation (probably benchmarking): https://github.com/dhardy/rand/commits/assign_weighted_index/

dhardy reviewed Oct 19, 2021

View reviewed changes

dhardy approved these changes Oct 19, 2021

View reviewed changes

dhardy reviewed Oct 21, 2021

View reviewed changes

src/distributions/weighted_index.rs Show resolved Hide resolved

SuperFluffy force-pushed the assign_weighted_index branch 2 times, most recently from ff9d38a to 79f928f Compare October 21, 2021 13:11

vks approved these changes Oct 21, 2021

View reviewed changes

benches/weighted.rs Show resolved Hide resolved

src/distributions/weighted_index.rs Outdated Show resolved Hide resolved

src/distributions/weighted_index.rs Show resolved Hide resolved

src/distributions/weighted_index.rs Outdated Show resolved Hide resolved

vks added the B-API Breakage: API label Oct 21, 2021

vks mentioned this pull request Oct 21, 2021

Tracker: rand 0.9 #1165

Open

23 tasks

SuperFluffy force-pushed the assign_weighted_index branch from 79f928f to 00dd89e Compare October 21, 2021 14:24

vks requested changes Oct 21, 2021

View reviewed changes

Full update of weighted index by assigning weights

f6187ec

BREAKING CHANGE: This commit adds a variant to `WeightedError`.

SuperFluffy force-pushed the assign_weighted_index branch from 00dd89e to f6187ec Compare October 22, 2021 11:27

SuperFluffy requested a review from vks October 22, 2021 11:29

Add first set of from and into methods

4085e38

dhardy closed this Jan 29, 2024

dhardy added the X-stale Outdated or abandoned work label Jul 10, 2024

Full update of weighted index by assigning weights #1194

Full update of weighted index by assigning weights #1194

Conversation

SuperFluffy commented Oct 19, 2021

Open questions

dhardy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SuperFluffy Oct 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SuperFluffy commented Oct 19, 2021

dhardy left a comment

Choose a reason for hiding this comment

dhardy commented Oct 19, 2021 • edited Loading

SuperFluffy commented Oct 19, 2021 • edited Loading

vks commented Oct 19, 2021

dhardy commented Oct 19, 2021

SuperFluffy commented Oct 19, 2021 • edited Loading

dhardy commented Oct 19, 2021

dhardy commented Oct 19, 2021

SuperFluffy commented Oct 21, 2021

dhardy commented Oct 21, 2021

SuperFluffy commented Oct 21, 2021 • edited Loading

SuperFluffy commented Oct 21, 2021 • edited Loading

SuperFluffy commented Oct 21, 2021 • edited Loading

dhardy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SuperFluffy commented Oct 21, 2021

dhardy left a comment

Choose a reason for hiding this comment

SuperFluffy commented Oct 21, 2021 • edited Loading

vks left a comment • edited Loading

Choose a reason for hiding this comment

vks commented Oct 21, 2021

dhardy commented Oct 21, 2021

SuperFluffy commented Oct 22, 2021

vks commented Oct 22, 2021

kazcw commented Oct 22, 2021 • edited Loading

dhardy commented Oct 23, 2021

SuperFluffy commented Oct 23, 2021 • edited Loading

kazcw commented Oct 23, 2021

dhardy commented Oct 24, 2021

SuperFluffy commented Oct 25, 2021

SuperFluffy commented Oct 25, 2021 • edited Loading

kazcw commented Oct 25, 2021

dhardy commented Oct 25, 2021

SuperFluffy commented Oct 26, 2021 • edited Loading

dhardy commented Oct 26, 2021

SuperFluffy commented Oct 26, 2021

dhardy commented Oct 26, 2021

dhardy commented Dec 6, 2022

dhardy commented Feb 20, 2023

SuperFluffy commented Feb 20, 2023

dhardy commented Jan 29, 2024

SuperFluffy Oct 19, 2021 •

edited

Loading

dhardy commented Oct 19, 2021 •

edited

Loading

SuperFluffy commented Oct 19, 2021 •

edited

Loading

SuperFluffy commented Oct 19, 2021 •

edited

Loading

SuperFluffy commented Oct 21, 2021 •

edited

Loading

SuperFluffy commented Oct 21, 2021 •

edited

Loading

SuperFluffy commented Oct 21, 2021 •

edited

Loading

SuperFluffy commented Oct 21, 2021 •

edited

Loading

vks left a comment •

edited

Loading

kazcw commented Oct 22, 2021 •

edited

Loading

SuperFluffy commented Oct 23, 2021 •

edited

Loading

SuperFluffy commented Oct 25, 2021 •

edited

Loading

SuperFluffy commented Oct 26, 2021 •

edited

Loading