External BFS

Run unweighted shortest-path queries through NetworkX in process, instead of through Neo4j. Implemented for completeness, but excluded from the default benchmark — here is the honest reason.

▶ Why this is excluded from the default benchmark

This module regressed on the real dataset-backed runs. At SF1 the cost of fetching up to 20,000 edges from Neo4j into a Python NetworkX graph dominates the cost of the BFS itself, and Neo4j's native shortest-path implementation is already faster. The module remains in the codebase as a reference for future hybrid-execution work, but every benchmark configuration in run_benchmark.py sets EXTERNAL_BFS=False.

What the module does

If a Cypher query matches a regex for unweighted shortest paths and does not mention weight, cost, or distance, the module attempts to run it externally. It pulls a bounded set of edges from Neo4j into a networkx.DiGraph, runs the standard library shortest-path algorithm, and caches the result under the dedicated bfs: prefix.

Candidate detection

The is-this-a-BFS-candidate check is a regex over the raw Cypher text. It matches three patterns: an explicit shortestPath or allShortestPaths call, or any variable-length right-pointing or left-pointing relationship. The negative regex rejects any query that mentions edge weights.

middleware/optimizations/external_bfs.pyPython
SHORTEST_PATH_RE = re.compile(
    r"(shortestPath|allShortestPaths|-\[[^\]]*\*[^\]]*\]->|<-\[[^\]]*\*[^\]]*\]-)",
    re.IGNORECASE,
)
WEIGHT_HINT_RE = re.compile(r"weight|cost|distance", re.IGNORECASE)

def is_candidate(self, query: str) -> bool:
    return bool(SHORTEST_PATH_RE.search(query)) and not bool(WEIGHT_HINT_RE.search(query))

Running the BFS

The flow is:

  1. Validate the request has a source_id in params.
  2. Pull all edges from Neo4j with a node-id projection, capped at BFS_MAX_EDGES (default 20,000).
  3. Refuse if the cap was hit, since the result would be incomplete.
  4. Build a NetworkX DiGraph by inserting every edge.
  5. Refuse if node count exceeds BFS_MAX_NODES (default 5,000).
  6. Run shortest_path if both source and target are provided, otherwise single_source_shortest_path for all reachable nodes.
run() — core pathPython
async def run(self, query, params, cache_key):
    source = params.get("source_id")
    target = params.get("target_id")
    if source is None:
        return None, {"bfs_override": False, "bfs_reason": "missing_source"}
    start = time.perf_counter()

    edges = await self.neo4j_client.execute_cypher(
        """
        MATCH (a)-[r]->(b)
        WHERE a.id IS NOT NULL AND b.id IS NOT NULL
        RETURN a.id AS source, b.id AS target
        LIMIT $limit
        """,
        {"limit": self.settings.BFS_MAX_EDGES},
    )
    if len(edges) >= self.settings.BFS_MAX_EDGES:
        return None, {"bfs_override": False, "bfs_reason": "edge_limit_exceeded"}

    graph = nx.DiGraph()
    for edge in edges:
        graph.add_edge(edge["source"], edge["target"])

    if graph.number_of_nodes() > self.settings.BFS_MAX_NODES:
        return None, {"bfs_override": False, "bfs_reason": "node_limit_exceeded"}

    try:
        if target is not None:
            path = nx.shortest_path(graph, source=source, target=target)
            result = [{"path": path, "length": max(len(path) - 1, 0)}]
        else:
            paths = nx.single_source_shortest_path(graph, source)
            result = [{"target": node, "path": path, "length": len(path) - 1} for node, path in paths.items()]
    except (nx.NetworkXNoPath, nx.NodeNotFound):
        result = []

The speedup metric

The module records a Prometheus histogram of the wall-clock latency and an estimated speedup against the assumed Neo4j cost. The estimate uses the per-request estimated_neo4j_ms param (the same one used by the jitter module's XFetch rule), with a floor of 1.0× to avoid reporting a negative speedup when BFS is slower than the assumed Neo4j cost.

run() — finalisationPython
elapsed = time.perf_counter() - start
BFS_EXTERNAL_LATENCY.observe(elapsed)
estimated_speedup = max(
    params.get("estimated_neo4j_ms", 100.0) / max(elapsed * 1000, 1),
    1.0,
)
BFS_SPEEDUP.observe(estimated_speedup)
return result, {
    "bfs_override": True,
    "bfs_reason": "executed",
    "bfs_cache_key": f"{self.settings.BFS_PREFIX}{cache_key}",
}

Why it regressed on the real dataset

Three things compound to make the module unprofitable at SF1:

  1. Edge fetch is the bottleneck. Pulling 20,000 edges through the Bolt driver and round-tripping them as Python dicts takes longer than Neo4j's native shortest-path query, which never leaves the database.
  2. Caching does not amortise enough. The benchmark workload issues many distinct shortest-path queries, so a freshly fetched edge set rarely gets reused before it expires.
  3. NetworkX is single-threaded Python. Even when the edge fetch is hot in cache, the BFS itself is slower than Neo4j's native C++ implementation for graphs of this size.

The module would be more interesting on a workload that issues many shortest-path queries against a small, stable subgraph: a small office hierarchy, a product catalog with bounded depth, or a frozen graph snapshot. None of those match LDBC SNB SF1.

How the exclusion is enforced

The benchmark runner explicitly disables the flag in every run, including baseline and all_enabled:

benchmarks/run_benchmark.pyPython
DISABLED_BENCHMARK_FLAGS = {
    "EXTERNAL_BFS": False,
}

def build_run_matrix():
    # every entry merges DISABLED_BENCHMARK_FLAGS into its toggle dict
    runs = [("baseline", {**DISABLED_BENCHMARK_FLAGS, **{flag: False for flag in flags}})]
    ...
    runs.append(("all_enabled", {**DISABLED_BENCHMARK_FLAGS, **{flag: True for flag in flags}}))
    return runs

This means the published "all_enabled" numbers in this thesis are honest: nothing is hidden behind a faulty optimization. The module exists, has a unit test (tests/test_external_bfs.py), and is available for anyone who wants to revive it for a more amenable workload.

▶ Lessons

External execution of graph algorithms in front of a graph database is a tempting idea, but it pays off only when the algorithm is structurally different from what the database can do natively (think: an in-memory PageRank that runs once per session and is reused), or when the database round-trip dominates and the in-memory copy is already amortised. Pure unweighted BFS is neither.