From Spec to Code: Closing the Last Mile in Protocol Security Analysis

Introduction

Since September 2025 I have been working with Dr. Dongfang Zhao on ProtocolAnalysisPipeline, an automation effort for protocol security. The idea is simple to state and hard to execute: systematize the extraction of Optional Security Features (OSF) from long specifications, organize them into an Optional Path Tree (OPT) that makes implementation choices explicit, map these choices into potential attack paths, and finally validate the hypotheses against real code to produce traceable risk reports and PoCs. In short, we want the path from “spec → security risk → practical exploit” to be an engineered, repeatable pipeline, not a one-off expert reading.

Where We Stalled

Once the pipeline touched real projects, the hardest part was not OSF extraction or building the OPT. It was the jump from semantics to code. LLMs could surface plausible risk narratives, but repository structures rarely mirrored specification prose. Names, layering, and control flow did not align cleanly, so we spent hours grepping for states, guards, and error paths, often without converging on evidence strong enough to justify a PoC. Progress slowed because our attention drifted toward finding code rather than validating semantics.

The Adjustment

I reframed the objective as “make the LLM speak in code-review instructions.” The pipeline already ended with structured JSONL containing quoted evidence, path identifiers, and threat hints. What we lacked was a bridge that turns this structure into directives a code-reading tool can execute. I built a new module that reads JSONL, uses a custom template to call an LLM via OpenRouter, and emits task-oriented prompts for Claude Code. These prompts avoid generic advice. They insist on concrete search anchors (subsystems, symbols, macros), enumerated state transitions to check, preconditions that must hold, and observable failure conditions that should trigger evidence capture.

Making It Executable

To keep outputs actionable, the template carries repository mapping cues such as path prefixes, build flags, and commit-message keywords, and it forces the model to phrase checks as observable conditions instead of aspirational guidance. With these constraints, Claude Code can jump directly to candidate files and branches, surface contrasting snippets, and flag missing guards. Researchers then spend their time on semantic validation and PoC design rather than on path discovery.

A Concrete Case: TLS 1.3 HRR in wolfSSL

TLS 1.3 requires the cipher suite after HelloRetryRequest to match the suite in the subsequent ServerHello, with violations triggering illegal_parameter. Feeding that evidence and intent into the new module produces a targeted prompt that guides Claude Code to wolfSSL’s handshake path, asking it to verify suite equivalence, alert behavior on exceptional paths, and state rollback. The tool responds with focused code regions in src/tls13.c and concrete branch analyses. The verification loop accelerates, and PoC design follows quickly.

Impact

This adjustment shifted how we spend time. Instead of hunting for files, functions, and conditions, we now judge whether semantics hold, design reproducible experiments, and discuss fixes. Internally, case localization moved from hours to minutes, and a higher fraction of cases advanced to PoC construction. More importantly, the pipeline finally behaves like a loop: evidence → localization → verification → traceable report.

Looking Forward

The work is intentionally modest in scope. We did not “make the LLM smarter”; we narrowed its job and enriched its context so it could serve code review. Cross-implementation generalization still requires adapter logic, and fully automatic PoC generation needs environment and timing glue. Prioritization will benefit from a stable scoring model that blends exploitability, blast radius, and fix cost. But the “last mile” problem is no longer the roadblock. By translating research insight into executable engineering tasks, ProtocolAnalysisPipeline becomes a practical way to scale protocol security analysis from text to code—and back again with evidence.