Research: Client Run-Time Cache Management

This is a repost from GitHub. Thank you @pwv for creating such a stellar write-up :clap:

Original Author: PWV Consultants

Introduction

Polywrap currently caches each wrapper used for the entire lifetime of the client instance. This can cause scaling issues in the future as the cache bloats when many wrappers are queried and downloaded. To remedy this, a system will be needed to allow users to specify when the application should remove certain wrappers from the cache.

Background

This proposal views the introduced problem through two perspectives: at first glance, this seems to be a classic caching problem that, if approached as such, would allow maximum flexibility to end users; on the other hand, there is something to be said about the amount of static information that is known ahead of time, which may allow for certain optimizations to be made, even in such a highly dynamic environment. On this second view, we can instead consider usage of the client SDK as a kind of JIT-compiled program with dependencies as ‘variables’ having lifetimes, rather than dynamic libraries that are loaded on demand. The benefit to this perspective being that we can subsequently attempt to constrain the general problem of caching any data to a more narrow problem of garbage collecting shared variables in a concurrent system (which may provide us with better results, independently of any underlying caching perspective).

System Dynamism

All of the dependencies that a wrapper may use are known statically at “compile”-time, as they must be declared in the wrapper manifest. However, while the wrapper declares the set of all possible dependencies, the set of dependencies that will actually be used throughout the course of running the wrapper is some proper subset thereof — wrappers may have an “interface”-like dependency, where they expect to execute some functionality by contract, with the actual implementation not being determined until runtime — which is known only at runtime and depends upon the control flow execution of the wrapper itself.

Security Considerations

Understanding this problem as one related to caching provides us with insight into the potential attack vectors that would be familiar to malicious actors: timing attacks constitute the main class of potential attacks that can be executed against caches — by timing cacheable requests/operations, a wrapper may be able to interpret cache hits/misses and thus determine what other wrappers a client may be using, despite being sandboxed and unable to access them. What benefit an attacker might gain from such knowledge is not immediately clear, but we might imagine that it could serve some additional purpose to a clever actor, beyond the extant privacy concern.

Scope

  • Dependency caching solution generalized to any SDK/native languages (client language runtime: JS, Rust, etc.)
  • Applicable to multiple platforms and environments, including web and embedded (client system environment: browser, desktop, IoT)
  • NOT IN SCOPE: handling the caching of wrapper-internal elements (wrapper “heap” vs wrappers themselves)

Requirements

  • Must provide extreme developer flexibility of choice
    • Ex. Profiles: memory constrained (IoT), default (app, browser/desktop), performant (bounded memory space)
    • Users should be able to specify constraints as either “hard caps” or “soft caps”, depending on whether they can be violated (and with what severity)
  • Must be highly configurable, but come with sane defaults for those users who aren’t interested in GC-/cache-tuning
    • Users are welcome to manually specify certain cachable elements to be preloaded
  • Must provide general global options to describe the environment
    • Must provide appropriate options for highly constrained environments (e.g. allowing for limits on global memory/network usage)

Prior Art

Garbage Collection

Garbage collection, also sometimes known as automated memory management, has a long and storied history stretching back over sixty years. It is a well-researched and deeply explored field in both academia and industrial practice, and explaining it in any significant detail is both beyond the scope and ability of this document. For a good primer on garbage collection techniques and principles, consider [4]. For a more casual, practical description of a specific garbage collector implementation, consider [9].
In general, we may consider garbage collection to be the automatic process by which memory is allocated and reclaimed over the course of the lifetime of a program. This is often achieved in one of two popular ways (viz. tracing, reference counting), both having their own performance tradeoffs. In addition to this dichotomy, there is also the important distinction of when garbage is collected, and whether that occurs on the same thread as the main program, or concurrently, or in realtime.

Tracing Garbage Collectors

Tracing garbage collectors rely on the idea of a call graph, where certain elements of a program can be marked as “root” elements, against which all other elements are referenced. The garbage collector can then employ a strategy to traverse the graph and detect disconnected elements, removing any that weren’t found to pertain to actively ‘living’ root elements. This analogy serves our case of dependencies well, as dependencies are often nested and interconnected in meaningful ways, where we are able to assume that they aren’t needed right now if no one is using them.

Reference Counting Garbage Collectors

Similar to tracing garbage collectors, reference-counting garbage collectors concern themselves with keeping track of which elements can be disposed of. This is often done in a simple, greedy way: uses of an element increment a counter, and disuse appropriately decrements it; when the counter reaches zero, the element is immediately marked for garbage collection. A meaningful comparison of the two garbage collection techniques is beyond the scope of this document; however, it should be noted that popular programming languages which employ garbage collectors often allow the end users to select between possible algorithms, which sets a positive precedent (and is more ergonomic to developers who are familiar with the paradigm) for providing the same to users of a Polywrap SDK.

Cache Replacement Strategies

As with garbage collection, a full overview of cache replacement is outside the scope of this document. That being said, there exist many popular strategies with famous acronyms, (e.g. LRU, LFU, etc.), exhibiting different performance characteristics and each being suited better or worse for different use patterns. Amongst these, two are worth making points about directly, and an additional discussion will be had about the place of statistical learning in cache replacement policies.

ARC

Adaptive Replacement Cache (ARC) [8], is by-and-large considered to be one of the best general cache replacement policies. It combines the benefits of both the LRU (Least Recently Used) and LFU (Least Frequently Used) algorithms by maintaining two cache lists, one for each. ARC, in turn, can be improved upon by including further information, such as timing and interval sizes.

RRIP

An interesting approach worth noting is that of Re-reference Interval Prediction (RRIP) [3]. By attempting to predict the next time a dependency is used, a more intelligent decision can be reached regarding whether or not to evict it (dependencies which are not currently used but may be used in the near future would maintain preference over those that, despite also currently being unused are not likely to be used again soon). See the linked paper for further details.

Learning-based Cache Replacement

Modern advances in machine learning present us with an interesting opportunity to apply statistical reasoning to the question of what to evict from our cache next. For context, it is often considered that an “oracle algorithm” (Bélády’s algorithm) is the theoretical limit of optimal cache eviction; that is, always correctly choosing the cache element which will be used furthest into the future (and thus needing to know the future). We can attempt to approximate this perfect understanding using machine learning, for which appropriate frameworks are beginning to appear [14].
One possible approach that has been theorized is to adopt a regret-minimization perspective. Popularized in game and decision theories, regret minimization is a strategy which posits that a good way of narrowing down which decision you should make is to determine how much you will regret every possible decision, and then minimizing across those regret variables (that is, pruning the most regrettable options until an optimal choice, or set of choices, is elucidated). This perspective can be used to attempt to integrate user specifications with statistically learned responses to the dynamic environment expected of a wrapper cache. It could also potentially provide an interesting input methodology for users of the client SDKs, vis a vis hard caps and soft caps.

Other Related Situations

Dynamically Loaded Libraries

Many modern operating systems provide functionality for dynamically loading additional libraries at runtime. This allows for programs to begin without having to link to every library that they expect to use, potentially saving on time and space (or simply being required to when the information is not known at link time). Linux, in particular, employs a reference-counting strategy to decide when to deallocate dynamically loaded libraries [6].

Proposals

To preface the following two discussions, it should be noted that in either the garbage collection (GC) perspective, or the cache perspective, we should uphold our goal of providing end-users with maximum flexibility to make the correct, informed decision for their individual usage pattern; consequently, we must acknowledge that different platforms/SDKs may be able to provide different options to their users, and may have different reasonable defaults as a result. Furthermore, users may provide global constraints (such as maximum memory usage, or strict deadlines) which can act as heuristics to determine the best choices on-the-fly.

1. Garbage Collection

In modern programming languages with automated memory management, a shared terminology exists regarding how information is stored: the “heap”, and the “stack”, with the general conception here being that information of a known size and lifetime will be stored on the stack and immediately evicted once the containing “frame”, or query/function, is finished. However, not all data has a known size at “compile time”: that data, of dynamic size, is instead treated as a (boxed) pointer to some memory space that has been allocated on the heap. This data needs to be handled in a reasonable way that prevents the heap from leaking memory, while still providing live pointers to the parts of the program that need them. Different languages have different approaches to what to do with this heap data: Rust employs a borrow checker which enforces certain invariants and determines the lifetimes of heap data, removing it once it has “expired” and will not be accessed by anything else in the future; Nim, Java, and many others offer several GC algorithms for users to choose from.
Keeping with the traditional terminology, we can understand garbage collection in Polywrap clients as an effort in static analysis that ultimately allows us to plan stack frames appropriately, as well as handle the dynamic dependencies on the “heap” by employing an appropriate GC algorithm.

Implementation

While much of the execution of a query is deterministic (cf system dynamism), the exact extent to which stack frames can be planned effectively (and, consequently, the cache managed efficiently) is dependent upon the level of static introspection possible into the underlying dependencies: it is not enough to know which wrappers are used, we must also know in what order they will be used if we are to plan optimal cache evictions. That being said, the more static analysis that can be performed, the better the caching can be planned; but, even without any static introspection, certain GC work can still be done.

Optimal Cache Replacement

In a perfect world, our goal would be to have every wrapper ready-to-go just before it needs to be used and gone just after it won’t be needed any longer. To do that, we would need to know when each wrapper is needed and when it isn’t anymore. Manifests tell us which dependencies a wrapper might use, but not when or in what order — using manifests alone to inform us would let us take the conservative approach of simply ensuring that every possible dependency is available before a query is run, the efficiency of which is the ratio of the cardinality of the sets of the declared vs. actually-used dependencies. If, however, we know more than just what the manifest tells us, we can become more precise in our layouts and allow for executions that would seem impossible on the conservative approach due to the memory constraints.
Suppose that we are executing a wrapper with declared dependencies a, b, c, d, and e. Suppose further than we know from the manifest that a and b depend upon c, that c in turn depends on d, and that e depends only on d; and, from abstract interpretation we have determined that the wrapper first makes a call to b, then to a, then to e, and then finally exits. Given this information, we have the runtime execute the wrapper “program” as such:

  1. retrieve from cache or download d
  2. retrieve from cache or download c
  3. retrieve from cache or download b // we cannot execute out-of-order without knowing if there are side effects to the calls
  4. execute b
  5. evict b
  6. retrieve from cache or download a
  7. execute a
  8. evict a and c
  9. retrieve from cache or download e
  10. execute e
  11. evict d and e
  12. exit

This execution plan would ensure that dependencies are present when needed and removed when they will no longer be used. However, importantly, constraints placed upon the cache size, upon network usage, and other criteria, can change the order of these steps (e.g. having a large cache size and no constraint on startup time would allow us to move steps (6) and (9) up towards the top, and to move steps (5) and (8) down to join step (11)). In this simplistic example we have a fairly linear approach, where only a few steps can be moved around (e.g. steps (1) through (3) can be executed in any order, ostensibly), without much depth. In a more likely scenario, however, we might imagine this graph becoming increasingly large as the branching factor stays above 1 in the deeper dependencies, fanning out the graph of possible paths immensely. In those cases, further analysis, in particular of the dependencies, would be necessary. This leads into an important caveat that must be addressed: we cannot analyze what we cannot see.
Indeed, though it may seem trivial to point it out, it is worth mentioning that we cannot statically analyze a wrapper until we download it, which in and of itself requires memory/network usage. This problem of foresight significantly narrows our planning horizon to only include those wrappers which we currently have downloaded at the time of the next “decision”, and forces us to perform what are traditional offline, fully observable optimizations instead online, in a partially observable environment. On this understanding of the problem, the solution begins to look less and less like garbage collection and more akin to the kinds of “planning under uncertainty” that we see in traditional AI modelling and design.

Tracing & RC

While planning ahead using static analysis would be the preferred approach, the computational overhead may prove either undesirable or untenable (or both!) in certain situations. In those situations, it may be preferable to relax our efforts and engage in ad-hoc decision-making by employing more traditional techniques associated with garbage collections. To reason about this, during wrapper execution we might wish to either trace or perform reference counting on the dependencies as they are encountered. At every entry/exit of a (sub-)query, we increment/decrement a counter. When the counter reaches zero, we either immediately evict the offending wrapper, or else relegate it to a naïve cache replacement policy, depending on how conservative the user has set their configuration. Or, in the case of a tracing GC, we can trace the call graph at every entry/exit and again evict appropriately (or relegate, though some discussion of the need for that is had below).

Consequences

Static analysis, and even ad-hoc planning and garbage collection, incur some computation overhead at runtime. This overhead can prove negligible on the simple case, but can grow quickly to be intractable as programs and wrappers become more complex and deeply nested.

Dependencies

  • Garbage collection, at its core, requires some scheduling and computational overhead. As such, garbage collection would not be advisable/possible on systems that are not conducive to handling the multiple concerns (i.e. mutator & collector) of such a system in a reasonable fashion. Furthermore, modern garbage collection algorithms often rely on concurrency to operate efficiently and without imposing too severely upon the user experience; however, concurrency is not currently feasible on every platform (notably, in the browser), which can pose issues for implementation details on certain platforms.
  • Much of the value associated with static analysis depends upon gathering the most information about a potential execution at the lowest cost. To this end, introspectable source files and hierarchical dependency manifests would go a long way towards helping the cause of static analysis.

2. Cache Replacement Strategies

On the two perspectives in our framework of discussion, we can consider caching to be a degenerate case of garbage collection where nothing is known about the contents of the data being stored. This opacity requires us to treat all data equally and attempt to keep/evict it only on the basis of temporal/frequency patterns. As with the preface to the GC proposal, we should also note here that while the two aforementioned cache replacement policies are good defaults, it would make sense to offer as many other options as are available in the SDK native environment. In general this may mean that there may be some UX-fragmentation across different SDKs, as different policies are available in different environments, or may be of greater/lesser use depending on the memory accessibility thereof; however, this is something that ostensibly should be expected of cross-platform utilities, and is unlikely to severely affect developer ergonomics (although disclaimers are nevertheless important to placate belligerent users).

Implementation

Once a cache replacement policy is specified, the runtime can employ it to maintain a cache of dependencies. Whether any of those dependencies are preloaded, or if they are simply downloaded upon first request, can be determined by the user via configuration — the cache policy is meant to act with a certain naïveté, though one can imagine this to rest on a sliding scale of effort spent in planning, from none whatsoever in the simple cache policy, to complete forward planning in the aforementioned GC section.

Consequences

Cache replacement policies are a tried and tested methodology with known constraints and tradeoffs. The only comparable consequence in this analysis worth mentioning is that they may prove too naïve. If all you have is a hammer, everything begins to look like nails, which may be undesirable in certain environments or scenarios.

Dependencies

  • Some utility by which to enforce underlying memory handling (e.g. marking references as weak, calling an underlying GC in the SDK native language, manually cleaning up memory, etc.): it’s hard to enforce a cache if we can’t actually force something to be evicted when we say it was

3. Wizard

While not necessarily a separate concern, the complexity of the solution space discussed above may cause some concern to the average user (indeed, very few casual developers find themselves tuning GCs for performance, or analyzing flame graphs to determine their cache busts). Creating a wizard to help step users through the thought process required to grok the options available before them could prove very helpful. Whether it be a website or a CLI tool (or something else entirely), offering a way for the average user to gain insight into the various knobs and levers of the tuning apparatus can pay dividends in ensuring user viability and in lowering the barrier to entry for having performant programs.

GC

A short aside regarding garbage collection and the decision-making surrounding algorithm choices and their tunable parameters. In general, following the earlier dictum of providing maximum flexibility to the end-user, we should consider all of the dimensions along which garbage collection can make tradeoffs. Three clear dimensions are technique, timing, and deadlines. Providing these dimensions to the end-user will allow us to determine a GC implementation that is best suited to their needs (and, if they so choose, the end-user is welcome to override the SDK’s decision and force a particular GC algorithm, specifying its parameters as they best see fit for their particular usecase).
Along those three dimensions, we should consider separating GC algorithms as being either traced or reference-counted, sequential or concurrent, and conservative or liberal, respectively. Choices along these dimensions ostensibly have general defaults that can be assumed, which will inform the selection of a proper GC algorithm for a given situation. For example, if the user specifies that they are not concerned with memory usage, a generational garbage collector with a liberal eviction policy would keep weakly-referenced and phantom-referenced dependencies around for far longer, to the potential benefit of the user that may end up using those in the future.

Implementation

At its core, the wizard provides a questionnaire that steps users through the thought process required to fill out a cache config. Said cache config, shown below, would then be passed to the runtime. The questionnaire would ask the user about their global preferences, their usecase, and the kinds of data/operations that they would expect to see during the execution of their program. The answers to these questions would then inform the wizard in offering certain predetermined “profiles” with messages explaining the choices made (e.g. any conflicting choices, unreasonable expectations, etc.)

interface CacheConfig {
	network?: { // MB
    	    hard: number,
    	    soft?: number,
	},
	heap?: { // MB
    	    hard: number,
    	    soft?: number,
	},
	executionDelay?: { // ms
    	    hard: number,
    	    soft?: number,
	},
	gc?: {
    	    algorithm: string,
    	    options: Record,
	} | null,
	cachePolicy?: string,
	cacheWarmup: boolean,
}

const defaultCacheConfig: CacheConfig = {
	network: {
    	    hard: 10,
    soft: 8,
	},
	heap: {
    	    hard: 2048,
    	    soft: 1024,
	},
	executionDelay: {
    	    hard: 400,
    	    soft: 150,
	},
	gc: null,
	cachePolicy: 'ARC',
	cacheWarmup: false,
}

NB: this data model considers the problem to have a joint solution space between the GC approach and the cache replacement approach.

Consequences

Users who are unsure of exactly which GC or cache replacement algorithm they wish to employ, or who prefer to describe their situation in terms of their business constraints, would be well served by having access to a tool that can lead them through the necessary reasoning to determine the appropriate cache configuration for their usecase. This makes the system more accessible and more performant on the average case of user, as it allows those without the time or requisite knowledge to tune the runtime to meet their performance criteria.

Dependencies

  • CLI or website or other client-external tool that users can employ to help in their decision-making

Alternative Considered

None :slight_smile:

Unknowns

¿Por qué no los dos?

Is there any benefit to attempting to employ both garbage collection and advanced cache replacement strategies? Following the epistemic principle that second-order knowledge of the unknown is true, valuable knowledge, it would seem that there may be a point that we find ourselves in where we can say with some certainty what actions to take up to a point, and from that point (until a potential return to clarity) we may instead downshift to naïve caching policies. However, whether this would bring with it too much overhead, or even potentially incur performance penalties over simply ‘sticking to our guns’ on one of the approaches, is at this time unclear.

Intermediary servers

Is there a need for having some sort of intermediary compilation server, one with fewer constraints than the actual runtime? A server that has many popular dependencies cached could potentially receive the “program” desired by the runtime, and attempt to calculate an appropriate cache strategy / collection policy ahead-of-time, which it can then return to the runtime as a ‘plan’ of sorts to follow.

References