-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement cache deletes #1029
Implement cache deletes #1029
Conversation
* Add ability to add programs to the runtime and reuse this function in the program initialization flow
@@ -257,7 +257,7 @@ pub fn state_keys(_attr: TokenStream, item: TokenStream) -> TokenStream { | |||
|
|||
// add default attributes | |||
item_enum.attrs.push(parse_quote! { | |||
#[derive(Clone, Copy, Debug, Eq, PartialEq, Hash)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a tab was off 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can take this change and land it in a separate PR if you want it in earlier/immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine, it's only catching the eye when writing another aligned stuff just below
#[derive(Debug, PartialEq)] | ||
pub enum CachedData { | ||
Data(Vec<u8>), | ||
Deleted, | ||
ToDelete, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, Deleted
means "we know that there is no value here", it could be a bit unclear since it may mean that some value was here before, while it just means that we checked in the actual storage and there is no value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, having a value at a key K
in the cache means "we know the state of the value here". If it's not here that means that it's unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
Since we know that the empty Vec
is not something that we store, would it make more sense to simply have:
#[derive]
#[non_exhaustive] // should include this in case we decide to extend or change the behaviour here
pub enum CacheValue {
Data(Vec<u8>),
ToDelete,
}
Then you can match on the value as follows:
match cache_value {
Data(bytes) if !bytes.is_empty() => borsh::from_slice(bytes),
ToDetele => ...
}
match self.cache.borrow_mut().entry(key) { | ||
Entry::Occupied(entry) => match entry.get() { | ||
CachedData::Data(data) => from_slice::<V>(data) | ||
.map_err(|_| StateError::Deserialization) | ||
.map(Some), | ||
CachedData::ToDelete | CachedData::Deleted => Ok(None), | ||
}, | ||
Entry::Vacant(entry) => { | ||
if let Some(bytes) = Self::get_from_host(key)? { | ||
let value = from_slice::<V>(&bytes) | ||
.map_err(|_| StateError::Deserialization) | ||
.map(Some); | ||
entry.insert(CachedData::Data(bytes)); | ||
value | ||
} else { | ||
entry.insert(CachedData::Deleted); | ||
Ok(None) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to avoid the repetition here, in one branch the data is borrowed because it comes from the cache. On the other, it has been copied from host so it's owned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's working quite well though because we need an owned value to insert in the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you follow the same pattern as before but use a match
instead?
let val_bytes = match cache.get(&key) {
Some(CacheValue::Data(val)) => val,
Some(CacheValue::ToDelete | CacheValue::Deleted) => {
return Ok(None),
},
None => // same as the else from before
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this actually compiles
pub fn get<V>(self, key: K) -> Result<Option<V>, Error>
where
V: BorshDeserialize,
{
#[link(wasm_import_module = "state")]
extern "C" {
#[link_name = "get_bytes"]
fn get_bytes(ptr: *const u8, len: usize) -> HostPtr;
}
let mut cache = self.cache.borrow_mut();
let cached_data = match cache.get(&key) {
Some(cached_data) => cached_data,
None => {
let args_bytes = borsh::to_vec(&key).map_err(|_| StateError::Serialization)?;
let ptr = unsafe { get_bytes(args_bytes.as_ptr(), args_bytes.len()) };
if ptr.is_null() {
cache.insert(key, CachedData::Deleted);
return Ok(None);
}
cache.entry(key).or_insert(CachedData::Data(ptr.into()))
}
};
match cached_data {
CachedData::Data(data) => from_slice::<V>(data)
.map_err(|_| StateError::Deserialization)
.map(Some),
CachedData::ToDelete | CachedData::Deleted => Ok(None),
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, sorry for taking so long with this review.
I want to check with @dboehm-avalabs to see if he also things it would make sense before you launch into it, but what I would like you to do is centralize the API first. What I mean by that is a single
#[link(wasm_import_module = "state")]
extern "C" {
#[link_name = "put_bytes"]
fn put_bytes(ptr: *const u8, len: usize);
}
That actually takes a something like Vec<(Vec<u8>, Vec<u8>)>
as the serialized data. The first value in the tuple is the key, the second is the value. If the value is empty, that should signify a delete. I'm not 100% sure what the best way is to represent this in Go, maybe [][2][]byte
or something along those lines?
You should be able to do that with only minor changes to the Rust (nothing to the actual cache).
Once we merge that code, we can come back and do the CachedData
enum.
Off the top of my head, I think the cache would probably be HashMap<K, Option<Vec<u8>>>
.
#[derive(Debug, PartialEq)] | ||
pub enum CachedData { | ||
Data(Vec<u8>), | ||
Deleted, | ||
ToDelete, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
Since we know that the empty Vec
is not something that we store, would it make more sense to simply have:
#[derive]
#[non_exhaustive] // should include this in case we decide to extend or change the behaviour here
pub enum CacheValue {
Data(Vec<u8>),
ToDelete,
}
Then you can match on the value as follows:
match cache_value {
Data(bytes) if !bytes.is_empty() => borsh::from_slice(bytes),
ToDetele => ...
}
match self.cache.borrow_mut().entry(key) { | ||
Entry::Occupied(entry) => match entry.get() { | ||
CachedData::Data(data) => from_slice::<V>(data) | ||
.map_err(|_| StateError::Deserialization) | ||
.map(Some), | ||
CachedData::ToDelete | CachedData::Deleted => Ok(None), | ||
}, | ||
Entry::Vacant(entry) => { | ||
if let Some(bytes) = Self::get_from_host(key)? { | ||
let value = from_slice::<V>(&bytes) | ||
.map_err(|_| StateError::Deserialization) | ||
.map(Some); | ||
entry.insert(CachedData::Data(bytes)); | ||
value | ||
} else { | ||
entry.insert(CachedData::Deleted); | ||
Ok(None) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you follow the same pattern as before but use a match
instead?
let val_bytes = match cache.get(&key) {
Some(CacheValue::Data(val)) => val,
Some(CacheValue::ToDelete | CacheValue::Deleted) => {
return Ok(None),
},
None => // same as the else from before
}
} | ||
} | ||
return nil | ||
})}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a single API for delete
and delete_many
. It'll be easier to review the changes if you consolidate the API and fix the call signature on the Rust
side before you start using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that it would probably even be better to have a single put
operation such that put
-ing an empty Vec
/byte-slice is the same as delete.
I'm not 100% sure if that makes sense with how state should work though. I'm only assuming that storing an empty slice isn't valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #1057 in order to address #1029 (comment)
@@ -257,7 +257,7 @@ pub fn state_keys(_attr: TokenStream, item: TokenStream) -> TokenStream { | |||
|
|||
// add default attributes | |||
item_enum.attrs.push(parse_quote! { | |||
#[derive(Clone, Copy, Debug, Eq, PartialEq, Hash)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can take this change and land it in a separate PR if you want it in earlier/immediately.
match self.cache.borrow_mut().entry(key) { | ||
Entry::Occupied(entry) => match entry.get() { | ||
CachedData::Data(data) => from_slice::<V>(data) | ||
.map_err(|_| StateError::Deserialization) | ||
.map(Some), | ||
CachedData::ToDelete | CachedData::Deleted => Ok(None), | ||
}, | ||
Entry::Vacant(entry) => { | ||
if let Some(bytes) = Self::get_from_host(key)? { | ||
let value = from_slice::<V>(&bytes) | ||
.map_err(|_| StateError::Deserialization) | ||
.map(Some); | ||
entry.insert(CachedData::Data(bytes)); | ||
value | ||
} else { | ||
entry.insert(CachedData::Deleted); | ||
Ok(None) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this actually compiles
pub fn get<V>(self, key: K) -> Result<Option<V>, Error>
where
V: BorshDeserialize,
{
#[link(wasm_import_module = "state")]
extern "C" {
#[link_name = "get_bytes"]
fn get_bytes(ptr: *const u8, len: usize) -> HostPtr;
}
let mut cache = self.cache.borrow_mut();
let cached_data = match cache.get(&key) {
Some(cached_data) => cached_data,
None => {
let args_bytes = borsh::to_vec(&key).map_err(|_| StateError::Serialization)?;
let ptr = unsafe { get_bytes(args_bytes.as_ptr(), args_bytes.len()) };
if ptr.is_null() {
cache.insert(key, CachedData::Deleted);
return Ok(None);
}
cache.entry(key).or_insert(CachedData::Data(ptr.into()))
}
};
match cached_data {
CachedData::Data(data) => from_slice::<V>(data)
.map_err(|_| StateError::Deserialization)
.map(Some),
CachedData::ToDelete | CachedData::Deleted => Ok(None),
}
}
} else { | ||
entry.insert(CachedData::Deleted); | ||
Ok(None) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the comment above, I don't think you need all the repetition. There's a lot of surface area for bugs here.
/// Returns an [Error] if the key cannot be serialized | ||
/// or if the host fails to delete the key and the associated value | ||
pub fn delete<T: BorshDeserialize>(self, key: K) -> Result<Option<T>, Error> { | ||
fn get_from_host(key: K) -> Result<Option<Vec<u8>>, Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this static method.
CachedData::Data(value) => puts.push(PutArgs { key, value }), | ||
CachedData::ToDelete => deletes.push(key), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this tells me that a single API here would be better. I can't remember where I said it (could have been in this review or elsewhere), but I think that putting an empty Vec<u8>
isn't valid. So you can actually eliminate all invalid state (minimizing the runtime checks) by making empty Vec
puts the equivalent of delete
.
@dboehm-avalabs, thoughts?
if !puts.is_empty() { | ||
let serialized_args = borsh::to_vec(&puts).expect("failed to serialize"); | ||
unsafe { put_many_bytes(serialized_args.as_ptr(), serialized_args.len()) }; | ||
} | ||
|
||
if !deletes.is_empty() { | ||
let serialized_args = borsh::to_vec(&deletes.as_slice()).expect("failed to serialize"); | ||
unsafe { delete_many_bytes(serialized_args.as_ptr(), serialized_args.len()) }; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, I just keep seeing good reason to centralize the API into a single call. If there's only one call, you can do a simple if cache.is_empty() { return }
for Drop
Closing in favor of #1059 |
Cache deletes in the program state so that we now avoid cache misses when it comes to data deletion.
Also added a cache insertion when calling
get
.The new cache now tells the caller wether the data is here
Data(Vec<u8>)
, has been deletedDeleted
orToDelete
if it's scheduled to be deleted when the cache is flushed.There is a caveat though which is that ZST Vecs are completely disabled by borsh-rs. That's because they want to avoid a potential DOS since huge vecs can easily be crafted using ZST (they are free memory-wise!) with
vec![(), !0]
.See near/borsh-rs#52
We need to make sure that we are not exposed to that, or that fuel is correctly metering to avoid abuses.
Closes #867
Closes #868