Frequency-Aware Cache Admission

Stop one-time queries from polluting the cache, and give the survivors a TTL that reflects how popular and how volatile they are.

The problem with admit-everything

A naive cache writes every result it computes. In a workload with skewed popularity (which is most graph workloads, see overlap), this means transient queries occupy slots that hot keys could use. The cache becomes a sliding-window of the most recent misses rather than a snapshot of what is genuinely useful.

TinyLFU solves this by putting an admission step in front of the cache: an item is only written if the count-min sketch says it has been seen often enough. This module implements the same idea with a Redis hash per key as the counter, plus a cost-aware fallback that admits expensive misses on first sight.

Per-key state in Redis

For every cache key the module maintains a hash under the prefix freq:. Three fields live there: count (access count), last_seen (UNIX timestamp), and update_frequency (incremented externally when the data changes).

middleware/optimizations/frequency_aware.pyPython

def _frequency_key(self, cache_key: str) -> str:
    return f"{self.settings.FREQUENCY_PREFIX}{cache_key}"

async def touch(self, cache_key: str) -> int:
    key = self._frequency_key(cache_key)
    score = await self.redis.hincrby(key, "count", 1)
    await self.redis.hset(key, mapping={"last_seen": time.time()})
    return int(score)

Admission decision

On every miss the orchestrator calls should_admit. The decision is the disjunction of two simple rules:

admit ⇔ ( f(k) ≥ τ_f ) ∨ ( ℓ(q) ≥ τ_ℓ )

where f(k) is the access count, τ_f is FREQUENCY_THRESHOLD (default 2), ℓ(q) is the measured Neo4j compute latency in milliseconds, and τ_ℓ is COST_THRESHOLD_MS (default 25 ms).

should_admit()Python

async def should_admit(self, cache_key, compute_latency_ms):
    frequency_score = await self.touch(cache_key)
    update_frequency = await self._update_frequency(cache_key)
    admit = (
        frequency_score >= self.settings.FREQUENCY_THRESHOLD
        or compute_latency_ms >= self.settings.COST_THRESHOLD_MS
    )
    if not admit:
        CACHE_ADMISSION_REJECTED.inc()
    metadata = {
        "frequency_score": frequency_score,
        "update_frequency": update_frequency,
        "compute_latency_ms": compute_latency_ms,
    }
    return admit, metadata

▶ Why the cost rule exists

Without the cost rule, an expensive shortest-path query that is only seen once would be missed twice before being admitted. The waste is large because graph misses are heavy-tailed (see the discussion in the thesis Chapter 4: SSCA misses are several times more expensive than LDBC point lookups). Admitting any miss above 25 ms keeps the cache responsive on first contact with expensive work.

Dynamic TTL

Once a key is admitted, the module assigns its TTL using a bounded sub-linear formula:

TTL(k) = clamp ( T_def · √f(k) / u(k), T_min, T_max )

The square-root rewards repeated access without growing TTL linearly with popularity (a key seen 100 times should not live ten times longer than one seen ten times). The divisor u(k) shortens lifetime for frequently-updated data. The clamp keeps the value within operational bounds (5 s minimum, 900 s maximum by default) even on degenerate counts.

ttl_for()Python

async def ttl_for(self, cache_key):
    freq_key = self._frequency_key(cache_key)
    raw = await self.redis.hgetall(freq_key)
    frequency_score  = int(raw.get("count", 1))
    update_frequency = float(raw.get("update_frequency",
                              self.settings.UPDATE_FREQUENCY_DIVISOR))
    computed = self.settings.DEFAULT_CACHE_TTL_SECONDS * math.sqrt(max(frequency_score, 1))
    computed *= 1 / max(update_frequency, 1)
    ttl = max(self.settings.MIN_CACHE_TTL_SECONDS,
              min(int(computed), self.settings.MAX_CACHE_TTL_SECONDS))
    return ttl, {"frequency_score": frequency_score, "update_frequency": update_frequency}

A few example TTLs

With defaults T_def=60 s, T_min=5 s, T_max=900 s:

f(k) (access count)	u(k) (update freq)	Raw TTL	Clamped TTL
1	1	60.0 s	60 s
4	1	120.0 s	120 s
16	1	240.0 s	240 s
100	1	600.0 s	600 s
10000	1	6000.0 s	900 s (cap)
16	4	60.0 s	60 s
1	20	3.0 s	5 s (floor)

Update-frequency feedback

The middleware does not by itself watch Neo4j for changes. Instead, it exposes register_invalidation for an external pipeline (a CDC consumer, a write-side hook, an admin API) to notify when a key has gone stale. Each call increments the per-key update_frequency counter, which immediately shortens future TTLs for that key.

register_invalidation()Python

async def register_invalidation(self, cache_key: str) -> None:
    freq_key = self._frequency_key(cache_key)
    await self.redis.hincrbyfloat(freq_key, "update_frequency", 1.0)

Storing alongside the metadata

When the orchestrator decides to admit, it calls store instead of writing the cache entry directly. store persists the payload and saves the per-key metadata in the frequency hash so that the next access has it ready.

store()Python

async def store(self, cache_key, payload, ttl_seconds, metadata):
    await self.redis.set(cache_key, payload, ex=ttl_seconds)
    freq_key = self._frequency_key(cache_key)
    await self.redis.hset(
        freq_key,
        mapping={
            "metadata": json.dumps(metadata, default=str),
            "update_frequency": metadata.get(
                "update_frequency", self.settings.UPDATE_FREQUENCY_DIVISOR
            ),
        },
    )

Hot-key counter

For Prometheus visibility, every hit on a key whose frequency has crossed the threshold increments HOT_KEY_HITS. This is a low-cardinality counter that gives the operator a feel for how much of the workload is hot vs cold.

record_hit()Python

async def record_hit(self, cache_key):
    raw = await self.redis.hgetall(self._frequency_key(cache_key))
    if int(raw.get("count", 0)) >= self.settings.FREQUENCY_THRESHOLD:
        HOT_KEY_HITS.inc()

▶ Workload-sensitivity

This module is the most workload-dependent of the five. On the SSCA-inspired workload it gives a strong throughput improvement because misses are expensive, so the cost rule keeps admitting them. On LDBC SF1 with the Person/KNOWS subset, the workload contains many one-time entity lookups, so the threshold rule delays admission and the isolated configuration can sit slightly below baseline. The thesis Chapter 4 discusses this at length.