Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Cache Manager #39

Open
switchupcb opened this issue Nov 17, 2022 · 3 comments
Open

add Cache Manager #39

switchupcb opened this issue Nov 17, 2022 · 3 comments
Labels

Comments

@switchupcb
Copy link
Owner

Problem

Users (developers) want an easy way to fetch information from the Discord Environment that the bot has access to.

The Disgo Cache Manager aims to solve this.

Caching

The difference between a "cache" and a "cache manager" is that a "cache manager" manages other "caches". The entire point of a cache is to minimize load on the application and network. However, the optimal way to do this will be dependent on the users (developers) application and code. Such that implementing a standard method of caching that every bot adheres to is an anti-pattern. Other Go Discord API Wrappers are either based on a cache (i.e Disgord), or implement a mandatory cache (i.e DiscordGo State). At minimum, this adds overhead to the program. In the worst case, it adds complexity to the end user. Let's analyze the following code.

Caching Overhead

The following code from Disgord showcases how a cache adds overhead.

// bypasses local cache
client.CurrentUser().Get(disgord.IgnoreCache)
client.Guild(guildID).GetMembers(disgord.IgnoreCache)

// always checks the local cache first
client.CurrentUser().Get()
client.Guild(guildID).GetMembers()

The problem here is not necessarily that the user will always have to specify the usage of a cache, but that the cache is always involved. It does not matter if the user creates a program that has no use for the cache: Providing an option to ignore the cache implies that requests are always cached. When this is the case, a large amount of memory is spent storing unnecessary entries (especially given the nature of Discord's Models). In the case of Disgord, it is stated that "the cache is immutable by default", such that "every incoming and outgoing data of the cache is deep copied". This adds even more overhead for applications which handle millions of requests.

Caching Complexity

The second issue with mandatory caching is the complexity that is added to the developer. In Disgord's case, you are unable to control your cache and unable to prevent data from being stored. In other cases, it can be even more problematic. Let's analyze the following code from DiscordGo.

// ChannelValue is a utility function for casting option value to channel object.
// s : Session object, if not nil, function additionally fetches all channel's data
func (o ApplicationCommandInteractionDataOption) ChannelValue(s *Session) *Channel {
    if o.Type != ApplicationCommandOptionChannel {
        panic("ChannelValue called on data option of type " + o.Type.String())
    }
    chanID := o.Value.(string)

    if s == nil {
        return &Channel{ID: chanID}
    }

    ch, err := s.State.Channel(chanID)
    if err != nil {
...

The following code takes a string ID value and turns it into a channel. Not too bad of an idea. However, the context of this function is that it's called after a user (developer) has received an event (with Application Command Options) from Discord. Such that the user (developer) may not expect the remaining channel data to come from a cache, but from Discord itself.

When the object is in the cache (and a session parameter is provided), the program becomes incorrect: The state of the channel from the cache is not guaranteed to match the state of the Discord Channel. When the object is not in the cache, the program adds overhead by creating an additional blocking network call; in a function for "casting" nonetheless. However, the latter behavior is stated.

In a similar manner to other API Wrappers, DiscordGo's cache (State) is structured in a way that does not allow the user (developer) to manage cached resources (due to unexpected fields). Such that developer is only able to solve the problem of incorrectness by manually calling the network themselves, defeating the purpose of the "typecast" function.

Solution

The solution to this problem is to implement a separated Cache Manager module. This cache manager should make it easy for the user to setup caching, but also operate the cache themselves. In this way, the user can ensure correctness of their program while minimizing overhead of the cache. In addition, external caching solutions (such as Redis or MemCache) can be used by making the cache an exported interface.

Implementation

Users (developers) would add the Cache Manager to their application. Then, the cache manager can be used in a manner similar to the rate limiter.

  1. Add CacheManager interface which defines necessary functions used throughout the cache manager.
  2. Create a struct that implements the CacheManager. During development of the actual cache manager, this in addition to the following steps may be completed prior to 1.
  3. Use sync.Map to create a cache struct that stores objects in a given manner, such that they can be retrieved in a given manner, without data race issues. No specifics are provided in this step because a solution that is flexible to the end-user has yet to be designed. The solution must account for caching by the bot and by resource.
  4. Define functions that setup various caches (supporting 3) [i.e Guild, Channel, etc].
  5. Define functions that ease cache retrieval and updates.
  6. Create tests.
  7. Add documentation in contribution.
  8. Add examples in _examples/bot.
@switchupcb
Copy link
Owner Author

Regarding 3, here is a proposed solution.

type Cache Manager stores a map of IDs (User Specified: Shard, Bot, etc) to Cache objects.
type Cache stores a map of IDs — for resources defined by the application (constant) — to the "cache" maps.
sync.Map cache stores a map of IDs — for resources defined by Discord (snowflakes) — to the actual resource.

This allows the user to specify how each cache will be partitioned; such that the same map isn't being used by the entire application. The user will have access to the address of each sync.Map cache such that it can be retrieved quickly when required. The only inflexible portion of this implementation is the need to cache by resource. This is done to reduce the need to type assert resources inside of a sync.Map before returning.

The alternative to caching by resource in this solution is eliminating the indirection caused by the type Cache and instead storing the resource ID with a pre-defined identifier (i.e g in g12345678 for a guild), then returning specific objects by using that identifier. This would allow you to store objects in bundles, rather then by object. See the implication of each solution below.

Example

A common usecase among developers is the need to cache users in a guild. In the following example, a bot provides services (commands) for multiple guilds. Current caching solutions (by other API Wrappers) will solve this problem by storing every object in a Cache object, which stores objects per resource. Such that the bot (application) will always check the same cache when a command is run, regardless of the command origin. If one server's user count spikes (or load increases), it will effect the other's access to the cache.

Proposed

The proposed solution solves this by allowing the user to create any number of cache objects. One could direct the Cache Manager to store a map of Guild ID's to Cache objects. Those Cache objects would point to resource ID (i.e users) which then points to a sync.Map containing users by Snowflake (ID). Given that the ID would be known at compile time, the CacheManager would be able to provide functions that reduce the need for the user to type assert anything.

cache := CacheManager.Get("GuildID")        // gets Cache by Guild ID, Client ID, etc. returns *Cache
cacheUsers := cache.Get("user")                   // gets User Cache by predefined application constant. returns *sync.Map
cachedUser := cacheUser.Get("12345678")  // gets User by Snowflake. returns `any`
user, ok := cachedUser.(*disgo.User)            // user from sync.Map requires type assertion returns *User

// equivalent to the following.
cache := CacheManager.Get("GuildID")        // gets Cache by Guild ID, Client ID, etc. returns *Cache
cache.GetUser("12345678")                          // only possible because cache structure is pre-defined with resources. returns *User

This allows the user to partition a cache in any manner (sharding, etc), but limits each cache to access resources using the same map (without storing a cache manager in a cache manager).

Alternative

The alternative solution solves this in the same way the proposed solution does, but does so by eliminating the second indirection. This means the same map will be used to access every resource. Resources pertaining to each other will be stored in the same map.

cache := CacheManager.Get("GuildID")               // gets Cache by Guild ID, Client ID, etc. returns *Cache
cachedUser := cacheUser.Get("user12345678")  // gets User by predefined application constant + UserID. returns `any`
user, ok := cachedUser.(*disgo.User)                  // user from sync.Map requires type assertion. returns *User

// equivalent to the following.
cache := CacheManager.Get("GuildID")        // gets Cache by Guild ID, Client ID, etc. returns *Cache
cache.GetUser("12345678")                          // only possible because constants are pre-defined. returns *User

In the alternative solution, the caches within the cache of ID user12345678 is received using CacheManager.Get(user12345678). Whereas the actual resource is received from the cache.GetUser() function call. In a similar manner, the cache.GetGuild() could be used to return a Guild cached by the user.

Comparison

Upon closer inspection, there is not much difference between the proposed and alternative solution. Both involve the use of pre-defined application constants and can result in an object within two function calls. However, the proposed solution explicitly limits how the map is accessed to the resource-type (all users within a guild use the same map), while the alternative solution limits how the map is accessed to the caching identifier (all users with tierPremium using the same map). In other words, both solutions use an identifier with the CacheManager to return a *Cache, and *Cache implements methods which allow the user to receive the expected object in an easy manner (using the resource identifier); at the cost of string concatenation for func implementations such as GetUser(UserID) (requires user + ID before calling map).

Requirements

Both solutions are flexible: The user can cache in any manner and in multiple ways (since the the CacheManager can contain two entries that point to the same sync.Map address). Caches do not experience data race issues through the use of sync.Map: The cache manager itself is a sync.Map abstraction while caches themselves are sync.Maps with functions. Each solution simplifies complexity during development for the developer. Thus, the only difference between solution is map access. The decision as to how to implement a solution (proposed, alternative, or both) lies in how many levels of indirection are allowed (such that caches are/aren't stored in other caches).

@switchupcb
Copy link
Owner Author

Following #39 (comment), the cache manager can be limited to three levels of indirection ("CacheManager(ID) to Cache", "Cache(HASH) to sync.Map", "sync.Map(ResourceID) to Resource") by allowing the user to specify the hashing mechanism for each resource.

cache := CacheManager.Get("GuildID") // returns *Cache, sync.Map (return type differ on solution)
cache.StoreUser("12345678", u)

// The following pseudocode refers to the code used INTERNALLY.
//
// when caching by resource (default)
// cachemanager.get returns *Cache full of *sync.Map
userCache := cache.Get(HASH)     // where HASH="user", returns *sync.Map full of users
userCache.Store("12345678", u)  // stores user by ID
cache.GetUser("12345678") // returns user from sync.Map of Cache sync.Map, all other user calls from this cache use the same map

// when caching per identifier
// cachemanager.get returns *Cache full of any
resourceCache.Store("user12345678", u)  // stores user by constant + ID
cache.GetUser("12345678", u) // returns user from Cache *sync.Map full of any, all resources access same map within cache

Thus both implementations are able to be implemented by having the user specify whether the cache should store objects directly (i.e field Cache.StorageMethod).

@switchupcb
Copy link
Owner Author

An example applicable to 4 (cache setup) has been created: #35 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant