Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch digest to cuckoo filters, to enable O(1) removal #413

Merged
merged 14 commits into from
Mar 1, 2018

Conversation

yoavweiss
Copy link
Contributor

Resolves #268

@yoavweiss
Copy link
Contributor Author

cc @mnot @kazuho

@mnot
Copy link
Member

mnot commented Nov 5, 2017

Hey Yoav,

Thanks; will take a look. Two immediate things:

  1. You're getting a error in the markdown; mapping values are not allowed in this context at line 80 column 25.

  2. I see you've added yourself as an author. That's generally the decision of the chair - @mcmanus in this case.

@yoavweiss
Copy link
Contributor Author

You're getting a error in the markdown; mapping values are not allowed in this context at line 80 column 25.

Hopefully fixed. Is there a way to test it locally?

I see you've added yourself as an author. That's generally the decision of the chair - @mcmanus in this case.

Apologies for the noobness. Removed myself.

@mnot
Copy link
Member

mnot commented Nov 6, 2017

See SUBMITTING.md for build info.

@kazuho
Copy link
Contributor

kazuho commented Nov 7, 2017

@yoavweiss Thank you for working on the proposal.

  • Am I correct in assuming that changes other than the switch to Cuckoo filters and the introduction SENDING_CACHE_DIGEST are unintentional? For example, I see VALIDATORS flag of the CACHE_DIGEST frame being removed.
  • Do you have a working code that implements Cuckoo filters? I am curious to see it working.
  • The concept of SENDING_CACHE_DIGEST makes sense to me. Maybe we might want to adjust the codepoints and the naming in relation to ACCEPT_CACHE_DIGEST.

@kazuho
Copy link
Contributor

kazuho commented Nov 7, 2017

For example, I see VALIDATORS flag of the CACHE_DIGEST frame being removed.

Oh, I now understand the intent of removing the flag.

The motive of the proposal is to build a digest without referring to every response object stored in cache. The fact means that it is not be easy for the client to determine the freshness of the entries that is going to be included in the digest.

I am sympathetic to the idea, but I am afraid if the approach works well with the current mechanism of HTTP/2 caching. My understanding is that browsers that exist today only consume a pushed response when it fails to find a freshly cached response in its cache. Otherwise, the pushed response never lands in the browser cache. Unless we change the behavior of the browsers to respect the pushed response even if a freshly cached object already exists in its cache, there's a chance that servers would continually push responses that gets ignored by the client (due to the existence of a freshly cached response in the browser cache with the same URL).

@yoavweiss Assuming that I correctly understand the motive of removing the distinction between a fresh digest and a stale digest, I would appreciate it if you could clarify your ideas on the problem.

@yoavweiss
Copy link
Contributor Author

Thanks for reviewing, @kazuho! :)

My intent was to include all stored resources in the digest, regardless of them being stale or fresh. Entries are added to the digest when a resource is added to the cache and removed from the digest when a resource is removed.

The reason is that I think the distinction doesn't make much sense, and maintaining it adds a lot of complexity, basically forcing browsers to recreate the digest for every connection at O(N) cost.

Under this premise what servers should do is:

  • Push all the resources that are known not to be in the cache digest
  • Push 304 responses for resources that are in the cache digest, but are likely to be stale (short freshness lifetime, etc)
  • Don't push resources that are in the cache digest and have a long term freshness lifetime or are immutable.

Does that make sense? I'm not sure I understand your reference to the push cache vs. the HTTP cache in your comment. In light of my explanation, is there still an issue there in your view?

@yoavweiss
Copy link
Contributor Author

Do you have a working code that implements Cuckoo filters? I am curious to see it working.

https://github.com/efficient/cuckoofilter is the reference implementation.

The concept of SENDING_CACHE_DIGEST makes sense to me. Maybe we might want to adjust the codepoints and the naming in relation to ACCEPT_CACHE_DIGEST.

Happy to change it. Do you have any specific changes in mind?

@kazuho
Copy link
Contributor

kazuho commented Nov 7, 2017

@yoavweiss

My intent was to include all stored resources in the digest, regardless of them being stale or fresh. Entries are added to the digest when a resource is added to the cache and removed from the digest when a resource is removed.

The reason is that I think the distinction doesn't make much sense, and maintaining it adds a lot of complexity, basically forcing browsers to recreate the digest for every connection at O(N) cost.

Thank you for the explanation. I now understand the intent better.

I think that we need to consider two issues regarding the approach.

First is the fact that a browser cache may contain more stale responses than fresh resources. Below are the numbers of cached objects found in my Firefox's cache (to be honest the date is from 2016, I haven't been using Firefox in recent weeks and therefore cannot provide up-to-date data).

host fresh stale total
*.facebook.com 790 1,483 2,273
*.google.com 373 630 1,003

As you can see, large scale websites tend to have more stale objects than fresh objects. In other words, including information of stale-cached objects increases the size of the digest roughly three times in this case. Since performance-sensitive resources (that we need to push) are likely to be stored fresh (since they are the most likely ones marked as immutable, or near-immutable), transmitting only the digest of freshly-cached responses makes sense.

Second is a configuration issue on the server side.

One strategy that can be employed by an H2 server (under the current draft) is to receive a digest of freshly cached resources only, compare the digest against the list of resources the browser should preload by only using the URL, and push the missing resources to the client. It is possible for a H2 server to perform the comparison without actually fetching the resource (from origin or from cache) since only the URL would be required for calculating the digest.

The proposal prevents such strategy from being deployed since it requires the ETag values to be always taken into consideration (should they be associated to the HTTP responses). In other words, servers would be required to load response headers of the resources to determine if it needs to be pushed, which could be a huge performance degradation on some deployments.

Fortunately, servers could avoid the issue by not including ETags for resources that it may push. I think such change on the server-side configuration would be possible, but we need to make sure if we are to take the path (of removing the fresh vs. stale distinction).

I'm not sure I understand your reference to the push cache vs. the HTTP cache in your comment. In light of my explanation, is there still an issue there in your view?

Let me explain using an example.

Consider the following case:

  • client has https://example.com/style.css with ETag: 12345 and Expires: Nov 30 2017
  • on server-side, the resource has been updated to ETag: 67890

When receiving a new request from the client, the server cannot determine if the client has style.css in its cache. Therefore, style.css would be pushed.

The client, when observing link: </style.css>; rel=preload (or equivalent <link> tag), tries to load the resource. Since the fresh resource exists within the browser cache, that would be used. The pushed version is ignored and gets discarded (*).

This would be repeated every time until the cached object either becomes stale or gets removed from the cache.

My understanding is that the browser behavior (explained in *) is true for Firefox and also for Chrome. Am I wrong, or missing something?

Do you have a working code that implements Cuckoo filters? I am curious to see it working.

https://github.com/efficient/cuckoofilter is the reference implementation.

Thank you for the link. I will try to use it.

OTOH, do you have some working code that can actually calculate the cache-digest value taking a list of URLs as an input (something like https://github.com/h2o/cache-digest.js)? I ask this because it would give us a better sense in how the actual size of the digest would be.

The concept of SENDING_CACHE_DIGEST makes sense to me. Maybe we might want to adjust the codepoints and the naming in relation to ACCEPT_CACHE_DIGEST.

Happy to change it. Do you have any specific changes in mind?

One way to proceed would be to split the discussion of SENDING_CACHE_DIGEST from Cuckoo filters into a separate issue or a PR. I do not have a strong opinion on the naming or the codepoints. What do you think? @mnot

@kazuho
Copy link
Contributor

kazuho commented Nov 7, 2017

@yoavweiss Have you considered the approach using Cuckoo filter to generate GCS?

I can understand the fact that you do not want to iterate through the browser cache when sending a cache digest. Per-host Cuckoo hash seems like a good solution to the issue.

OTOH, as I described in my previous comment, it seems that sending the hash directly has several issues.

That is why I am wondering if it would be viable to generate GCS from the per-host Cuckoo filter that would be maintained within the browser.

I can see three benefits in the approach, compared to sending the values of Cuckoo filter directly:

  • the size of the digest will be smaller
  • we can keep the distinction between fresh vs. cache. Sending digest of fresh resources only would end up in even smaller digests. Retaining the distinction lowers the bar to deploy cache-digests on the server side.
    • note: you can store the time when the cached object becomes stale in the data associated to the Cuckoo filter entry (assuming that you would have associated data to handle resize, as we discussed in Enabling O(1) removal from digest #268 (comment)). That information can be used when builiding the GCS to determine if a particular object should go into a GCS of fresh resources or that of stale ones
  • less change to the browser push handling (no need to handle pushes of 304 or replace a freshly cached object when an object with the same URL is being pushed)

In case of *.facebook.com or *.google.com in the comment above, sending fresh-only digests using GCS would be about 1/3 the size of sending fresh & stale digests using Cuckoo filter.

The biggest cost of calculating GCS from Cuckoo hash would be the sort operation. But I think that the cost could be negligible compared to the ECDH operation that we would be doing for every connection, considering the fact that the number of entries that we would need to sort would be small (e.g., up to 1,000 entries of uint32_t), and the fact that sort algorithms faster than O(n log n) radix sort can be deployed (e.g. radix sort).

WDYT?

@yoavweiss
Copy link
Contributor Author

Have you considered the approach using Cuckoo filter to generate GCS?

So have a cuckoo filter digest and then put its fingerprints in a GCS? I have not considered that. Need to give it some thought...

At the same time, it's not clear to me how that would enable a "stale" vs. "fresh" digests, or handling of improperly cached resources (fresh resources that were replaced on the server).

@kazuho
Copy link
Contributor

kazuho commented Nov 8, 2017

Have you considered the approach using Cuckoo filter to generate GCS?

So have a cuckoo filter digest and then put its fingerprints in a GCS? I have not considered that. Need to give it some thought...

I would appreciate it if you could consider. To me it seems it's worth giving a thought.

At the same time, it's not clear to me how that would enable a "stale" vs. "fresh" digests, or handling of improperly cached resources (fresh resources that were replaced on the server).

Under the approach proposed in this PR, structure that stores the per-host digest would look like below. hashes is required for resizing the filter (e.g., when doubling or halving num_backets).

uintFF_t fingerprints[num_backets]; // FF is the size of the fingerprint
uint32_t hashes[num_backets];       // contains 32-bit hash value of each entry in `fingerprints`

What I am suggesting that you could change the structure to the following.

uintFF_t fingerprints[num_backets]; // FF is the size of the fingerprint
struct {
  uint32_t hash;
  time_t becomes_stale_at;
} hashes_and_expire_times;

In addition to the hash value, each entry will contain the moment when the entry becomes stale. The moment can be calculated when the entry is added. For example, if the entry represents a HTTP response with a cache-control: max-age=V, becomes_stale_at can be calculated as now + V. If the entry represents an immutable HTTP response, then becomes_stale_at should be set to a very large value (e.g.. INT64_MAX assuming that underlying type of time_t is int64_t).

When building a GCS digest, you would do the following:

  • step 1. prepare an empty list that would contain hashes of fresh responses
  • step 2. prepare an empty list that would contain hashes of stale responses
  • step 3. foreach entry in cuckoo_filter:
    • step 3-1. check if the entry is fresh or not, by checking the value of becomes_stale_at
    • step 3-2. if the entry is fresh, append hash of the entry to the list of the hashes of fresh responses
    • step 3-3. otherwise, append hash of the entry to the list of the hashes of stale responses
  • step 4. sort the list of hashes of the fresh responses, encode as GCS, and send
  • step 5. sort the list of hashes of the stale responses, encode as GCS, and send

You can skip the operations related to stale objects (i.e. step 2, 3-3, 5) if the server is unwilling to receive stale digests.

Whether the approach can be implemented depends on if a client can determine the moment a response becomes stale. I anticipate that it is possible to determine that when you register the entry to Cuckoo filters (which is when you receive the response from the server).

@sebdeckers
Copy link

Do you have a working code that implements Cuckoo filters? I am curious to see it working.

https://github.com/efficient/cuckoofilter is the reference implementation.

Thank you for the link. I will try to use it.

OTOH, do you have some working code that can actually calculate the cache-digest value taking a list of URLs as an input (something like https://github.com/h2o/cache-digest.js)? I ask this because it would give us a better sense in how the actual size of the digest would be.

@yoavweiss @kazuho I'm planning to attend the IETF 100 hackathon this weekend in Singapore. (First timer here. 🤗🔰) I'm happy to collaborate on a (Node.js?) implementation of this spec if either of you are around and interested. I'm fairly familiar with the current spec, having implemented it as a service worker and on the server.

@kazuho
Copy link
Contributor

kazuho commented Nov 9, 2017

@sebdeckers

I'm planning to attend the IETF 100 hackathon this weekend in Singapore. (First timer here. 🤗🔰) I'm happy to collaborate on a (Node.js?) implementation of this spec if either of you are around and interested.

Wonderful! I'll be attending the hackathon on both days (i.e. Saturday and Sunday). I do not think that I would have time to work on Cache Digests, but would love to discuss with you (or help, if you need) about your work on Cache Digests.

Copy link

@sebdeckers sebdeckers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback based on WIP implementation of cuckoo filters for cache digest: https://gitlab.com/http2/cache-digest-koel

7. Let `h2` be the return value of {{hash}} with `fingerprint` and `N` as inputs, XORed with `h1`.
8. Let `h` be `h1`.
9. Let `position_start` be 40 + `h` * `f`.
10. Let `position_end` be `position_start` + `f` * `b`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b is not defined

4. While `fingerprint-value` is 0 and `h` > `f`:
4.1. Let `fingerprint-value` be the `f` least significant bits of `hash-value`.
4.2. Let `hash-value` be the the `h`-`f` most significant bits of `hash-value`.
4.3. `h` -= `f`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code feels inconsistent with the writing style used throughout. Would suggest:

Substract f from h.

`hash-value` can be computed using the following algorithm:

1. Let `hash-value` be the SHA-256 message digest {{RFC6234}} of `key`, expressed as an integer.
2. Return `hash-value` modulo N.
Copy link

@sebdeckers sebdeckers Nov 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is difficult to do in JavaScript where uint operations are typically still limited to 32 bits. The truncation in the previous proposal (step 4) is more compatible and, if I understand correctly, achieves the same objective. Can this be changed to something that does not require 256 bit integer modulo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the need for an integer modulo is due to an error in the specification.

The text in the PR states that N is a prime number smaller than 2\*\*32. Could it be the case that N is something to be defined as 2N?

If that is the case, the modulo operation can be implemented by using bitwise AND.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll truncate the hash before the modulo operation.

7. Let `h2` be the return value of {{hash}} with `fingerprint` and `N` as inputs, XORed with `h1`.
8. Let `h` be `h1`.
9. Let `position_start` be 40 + `h` * `f`.
10. Let `position_end` be `position_start` + `f` * `b`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b is not defined

* `ETag`, an array of characters
* `validators`, a boolean
* `URL` a string corresponding to the Effective Request URI ({{RFC7230}}, Section 5.5) of a cached response {{RFC7234}}.
* `ETag` a string corresponding to the entity-tag {{RFC7232}} if a cached response {{RFC7234}} (if the ETag is available; otherwise, null).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an implementor's perspective, it would help me to understand this if examples were provided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples for URL and ETag? Or something else?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ETag

Btw just noticed a typo on line 321: of a cached response

@@ -99,10 +102,9 @@ allows a stream to be cancelled by a client using a RST_STREAM frame in this sit
is still at least one round trip of potentially wasted capacity even then.

This specification defines a HTTP/2 frame type to allow clients to inform the server of their
cache's contents using a Golomb-Rice Coded Set {{Rice}}. Servers can then use this to inform their
cache's contents using a Cuckoo-fliter {{Cuckoo}} based digest. Servers can then use this to inform their

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: filter


1. Let `f` be the number of bits per fingerprint, calculated as `P + 3`
2. Let `b` be the bucket size, defined as 4
3. Let `bytes` be `f`*`N`*`b`/8 rounded up to the nearest integer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown escaping issue makes N italic and hides the * characters.

9. Let `position_start` be 40 + `h` * `f`.
10. Let `position_end` be `position_start` + `f` \* `b`.
11. While `position_start` < `position_end`:
7. Let `fingerprint-string` be the value of `fingerprint` in base 10, expressed as a string.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yoavweiss Curious... May I ask why this change? I don't see any issues with it. Just don't understand what it means.

Copy link
Contributor Author

@yoavweiss yoavweiss Nov 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It defines a way to convert fingerprint into a string, so that we can apply {{hash}} to it

@yoavweiss
Copy link
Contributor Author

yoavweiss commented Nov 15, 2017

I've got an incomplete initial reference implementation at https://github.com/yoavweiss/cache-digests-cuckoo

It doesn't yet include removal and querying (that's what I'll be adding next), but I did run it on a list of ~3250 URL (which I got out of my main profile chrome://cache/) and it seems to be creating reasonable sized digests. One more advantage, the digests seem to be highly compressible when sparse.

Results so far:
Digest with 1021 entries (so room for ~4K URLs): 5621 in-memory, 5233 gzipped (when filled with 3250 URLs).
Digest with 2503 entries (so room for ~10K URLs): 13772 in-memory, 6879 gzipped (same 3250 URLs).
Digest with 7919 entries (so room for ~31K URLs): 43560 in-memory, 9984 gzipped (same 3250 URLs).

In practice, I think ~1000 entries is most probably enough, but it's good to know we can increase the digest size (to avoid having to recreate it), without significant over-the-wire penalty.

@yoavweiss
Copy link
Contributor Author

yoavweiss commented Nov 16, 2017

OK, I now have a complete reference implementation and it seems to be working fine. It also exposed an issue with the initial algorithm, forcing table allocation to accommodate a power of 2 number of entries.

Latest results for 3250 URLs taken from my cache:

Number of entries Full capacity Digest memory size Digest gzipped size Digest brotli size
1021 4084 5637 5248 5092
1109 4436 11269 6153 5675
2019 8076 11269 7031 6785
4027 16108 22533 8663 7586

One note: the 1021 entries table had 35 collisions, so it seems like it's insufficient for that number of URLs, unless we're willing to absorb extra pushes for ~1% of the resources.

@kazuho
Copy link
Contributor

kazuho commented Nov 16, 2017

@yoavweiss Interesting! It's good to know that we have numbers now.

What is the value of P (the false positive ratio) that you used?

@yoavweiss
Copy link
Contributor Author

P=8 (so 1/256 false positive)

@yoavweiss
Copy link
Contributor Author

yoavweiss commented Nov 17, 2017

Note that the numbers here may be possible to further optimize. One example is semi-sorting of the buckets which the Cuckoo-Filters paper mentions, and which I have not yet implemented. It adds some runtime complexity, but can reduce the fingerprint size per resource by a full bit, so could have resulted in ~9% smaller digests in this case.

@sebdeckers
Copy link

@yoavweiss Awesome! 🤩

Me not being familiar with these data structures (despite reading Wikipedia article 😅), why does the 1/256 probability (~4/1000) result in 35 collisions?

@yoavweiss
Copy link
Contributor Author

The 35 collisions are on top of the false positive rates, and represents resources that we failed to put into the table to begin with (due to both their buckets being full). That rate of collisions seems high compared to the results in the paper, so I need to dig further to see who's wrong...

@yoavweiss
Copy link
Contributor Author

The collisions are now fixed. It was an algorithm issue, where the entry to be pushed was always the same one at the end of the bucket. I've change that to be a random fingerprint from the bucket, which significantly improved things. The ref implementation is now collision free almost up to the point where the digest is full.

@mnot mnot changed the title [Cache Digests] Switch digest to cuckoo filters, to enable O(1) removal Switch digest to cuckoo filters, to enable O(1) removal Dec 12, 2017
@kazuho kazuho mentioned this pull request Feb 28, 2018
@kazuho kazuho merged commit d02e5d1 into httpwg:master Mar 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants