Bayesian Leakage Analysis (part 2)

Bayesian Leakage Analysis, Part 2: Automating Attacks with Bayesian Networks

The first post in this series developed a framework for measuring leakage exploitability: a model, a metric (advantage), and bounds that decompose the sources of inferential difficulty. That machinery is useful analytically, but it has a practical bottleneck. Every new leakage profile requires a bespoke instantiation — its own likelihood structure, its own bound calculation, and usually its own hand-crafted attack. This is manageable for query equality but does not scale to the variety of leakage patterns that real encrypted-search schemes exhibit.

This post describes the step from analysis to automation. The idea is that once leakage models are written as explicit joint distributions over secrets, hidden variables, and observations — which the framework already requires — they can be encoded as Bayesian networks and attacked through a reusable inference engine. We call that engine Bayle.

The payoff is not just computational convenience. It is methodological: a shared inference pipeline makes it possible to run controlled experiments across different leakage profiles, different auxiliary regimes, and different adversary objectives using the same assumptions and the same code.

Why Generic Inference Breaks

Before building a specialized engine, it helps to see why off-the-shelf tools fail. The joint distribution (\mathbf{X} = (\mathbf{S}, \mathbf{H}, \mathbf{L})) from the first post can, in principle, be encoded as a Bayesian network and solved with standard variable elimination. The problem is scale.

Consider query equality again — the worked example from the first post. The hidden variable is a bijection (\pi) on an (m)-element alphabet, so the hidden space has (m!) elements. For even moderate alphabet sizes — (m = 7), say — this is 5,040 configurations. Variable elimination constructs intermediate factors whose domains can be products of these spaces, and runtime and memory grow accordingly. In our experiments, generic variable elimination for a query-equality model with (m = 7) and sequence length (n = 7) requires roughly 48 seconds and 98 GB of memory. That is a toy-sized instance.

The issue is structural: generic inference algorithms do not know that the hidden variable is a bijection, that the leakage function is deterministic, or that the secrets are drawn i.i.d. They treat the model as an opaque graphical model and pay for generality with intractable intermediate factors. Exploiting the model’s specific structure is what makes automation practical.

Bayesian Networks as the Right Representation

A Bayesian network is a directed acyclic graph where each node carries a conditional probability table (CPT), and the joint distribution factors as [ p(x_1, \ldots, x_n) = \prod_{i=1}^n p(x_i \mid \mathrm{Pa}(X_i)). ] For readers in cryptography who have not worked with graphical models, the important point is that this factorization is not just compact notation — it is what makes inference algorithms possible. Edges encode direct dependence under the chosen factorization, and missing edges encode conditional independence statements that algorithms exploit to avoid materializing the full joint table.

Leakage models have features that make them both well-suited and challenging for this representation. On the helpful side, many leakage CPTs are deterministic: given a hidden function and a secret value, the leakage output is fixed, not drawn from a distribution. Deterministic CPTs sharpen support constraints and make pruning far more effective than in noisy models. On the challenging side, the hidden domains can be enormous ((m!) for permutations), which means that the benefits of factorization can be swamped by the size of individual factors.

To get both the structural advantages of Bayesian networks and practical tractability, we need to restrict attention to a model class that is expressive enough for real leakage patterns while imposing enough regularity for specialized inference.

Hidden Function Networks

We call that class Hidden Function Networks (HFNs). An HFN has three layers: i.i.d. secrets (S_1, \ldots, S_n), hidden functions (H_1, \ldots, H_k) drawn from some prior, and observed leakage nodes where each observation is generated by applying all the hidden functions to the corresponding secret: [ L_i = (h_1(s_i), \ldots, h_k(s_i)). ]

Hidden Function Network architecture

This structure captures the i.i.d. variants of the leakage patterns that matter most in encrypted search. Query-equality leakage is an HFN with a single hidden bijection. Volume leakage is an HFN with a different function class. Conjunction patterns — where an adversary observes multiple leakage types simultaneously — are HFNs with multiple hidden functions. The class is not universal, but it covers the profiles that drive most current attack research.

The reason HFNs enable efficient inference is that they expose a factorization that generic algorithms miss. The secrets are conditionally independent given the hidden functions, and the hidden functions interact with the data only through a shared application pattern. This means that if we can solve the inference problem over hidden functions, we can recover secrets by inversion rather than by searching over the entire secret-sequence space.

Bayle’s Key Insight

This is the core reduction that Bayle exploits. When at least one hidden function is bijective, full-recovery MAP over secrets reduces to MAP over hidden-function assignments followed by inversion. Instead of optimizing over all possible secret sequences ((s_1, \ldots, s_n)), Bayle scores each candidate hidden-function tuple (\mathbf{h} = (h_1, \ldots, h_k)) by [ \widetilde{\pi}{\mathbf{h}} = p{\mathbf{H}}(\mathbf{h}) \prod_{i=1}^n \alpha_S!\left(\bigcap_{j=1}^k h_j^{-1}(\ell_{i,j})\right), ] and selects the argmax. Each factor in the product asks: given this candidate set of hidden functions, what does the auxiliary distribution say about the secret that produced the (i)-th leakage observation? The bijection condition guarantees that the inverse is well-defined.

This formulation also makes computational reuse explicit. When multiple observations produce the same leakage symbol, the corresponding factors are identical and can be computed once. When a candidate hidden function is inconsistent with an observation (the preimage intersection is empty), the entire candidate can be pruned without evaluating the remaining factors.

The practical difference is dramatic. For the same (m = 7, n = 7) query-equality instance that took generic variable elimination 48 seconds and 98 GB, Bayle runs in 28 milliseconds and uses 0.5 GB. That is a factor of roughly 1,700 in time and 200 in memory.

VE vs Bayle: runtime and memory on a toy instance

These numbers reflect a toy instance, but the scaling behavior is the real point. Generic variable elimination grows as (O(n \cdot (m!)^{n+1})) in the worst case for this model class, while Bayle’s cost is (O(k \cdot n \cdot m!)). The dependence on sequence length drops from exponential (in the width of intermediate factors) to linear.

Asymptotic scaling: the gap grows dramatically with query-space size

Two engineering choices matter for turning this into a reliable engine. The first is SIMD vectorization of the score updates, which maps naturally to the product-over-observations structure. The second is periodic normalization of candidate scores to prevent floating-point underflow in long products. Because normalization rescales all candidates by a common positive factor, it preserves the argmax ranking.

What the Experiments Show

With Bayle in place, it becomes possible to run systematic experiments across leakage profiles, query-space sizes, sequence lengths, and auxiliary-distribution assumptions. Several findings are worth highlighting.

The most consistent signal is that auxiliary-distribution quality dominates full exact recovery in the regimes we studied. An adversary with a well-matched auxiliary distribution does far better than one with a poorly-matched distribution, even when the leakage is identical. This is the empirical counterpart of the modal-entropy term from the first post: mismatch in the prior — particularly misalignment in the ranking of query frequencies — penalizes MAP recovery regardless of how much leakage is available.

Within the space of auxiliary mismatch, the experiments confirm that ranking mismatch is more damaging than parameter mismatch. An adversary who has the right relative ordering of query frequencies but the wrong Zipf exponent recovers substantially more than one who has a reasonable exponent but the wrong ordering. This aligns with what the bounds predict and adds empirical weight to the theoretical distinction.

A more surprising finding is that intuitively “more revealing” leakage does not always improve exact recovery. Adding a second leakage dimension (observing both query equality and volume, say) increases the information available to the adversary, but it also changes the structure of the inference problem — different hidden-function domains, different deterministic constraints, different effective ambiguity. In some regimes, the additional complexity does not translate into higher MAP advantage for the objectives and auxiliary distributions tested. This is a useful cautionary result for the design side: richer leakage is not monotonically worse in exact-recovery terms, at least not without controlling for objectives and priors.

The Methodological Point

The technical contributions of this work — the model, the bounds, the engine — are aimed at a methodological goal. Leakage analysis in encrypted search has historically proceeded one attack at a time: a new leakage profile is identified, a custom attack is designed, and recovery rates are reported under specific assumptions. This approach has produced important results, but it makes cumulative progress slow. Assumptions differ across papers, baselines differ, and it is hard to tell whether a new attack is genuinely more effective or simply evaluated in a more favorable regime.

The framework described in these two posts offers an alternative. Once leakage models are explicit probabilistic objects, once exploitability is measured relative to a baseline, and once attacks are generated by a reusable inference engine, evaluation becomes systematic. We can sweep over parameters, compare profiles, and isolate the effects of auxiliary quality, leakage structure, and adversary objectives in controlled experiments.

This does not make bespoke attacks obsolete — specialized attacks can still exploit model-specific structure that a general engine misses. But it does mean that leakage analysis can accumulate results in a more comparable and reproducible way. That is the real payoff of treating leakage exploitation as inference.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Escaping Flatland - STEM vs. Humanities Training
  • Contextualizing Cryptography
  • Television as World Model Installation