CASE STUDIES

Production systems.

End-to-end AI architectures for cyber threat intelligence: schema-first pipelines and evidence-backed agentic investigation.

CASE STUDY 01

AI-First Cyber Threat Intelligence Pipeline

PER-ITEM PIPELINE

State machine

Each article walks an explicit sequence of states (fetched, titled, classified, sourced, enriched, vetted, stored), observable end-to-end.

ENTITY EXTRACTION

Schema-first

Actors, malware, CVEs, indicators, MITRE ATT&CK and victimology come out as typed JSON.

CLUSTERING & TEMPORAL LINKS

Story-aware

Daily clusters, cross-day similarity links, and multi-day super-clusters reconstruct evolving narratives.

GRAY-ZONE DECISIONS

LLM-adjudicated

An adjudicator agent breaks the tie between "duplicate" and "ongoing story" when deterministic similarity is ambiguous.

IMPORTANCE SCORING

7-signal

Source diversity, size, CVE/actor/IOC density, temporal persistence and content category combine into a single 0–100 score, with no extra API calls.

PRODUCTION SURFACE

Operable

Cost tracking per call, structured run/step logs, manual reprocess path, and dashboards built around the same data model.

Problem

Cybersecurity teams operate on top of a constant stream of unstructured reporting: vendor blogs, advisories, news, intel feeds. Raw data is noisy, duplicated, and inconsistent. Generic summarization loses the entities and relationships that make intelligence actionable, and naïve deduplication either fragments stories into isolated articles or collapses unrelated events into a single noisy bucket. Producing structured, reliable, queryable intelligence at scale takes more than summarization or deduplication.

Approach

I designed an end-to-end system built around two explicit phases: a per-item state machine that walks each article through filtering, source resolution, schema-first enrichment and vetting; and a nightly post-processing chain that normalizes entities, rebuilds daily clusters, links them across time into multi-day stories, lets an LLM adjudicator settle gray-zone “duplicate vs ongoing” calls, scores everything by importance, and pushes the resulting intelligence into a central store and downstream sinks. I keep specialist responsibilities (title rewriting, relevance filtering, source resolution, structured extraction, secondary vetting) as separate agents so each stays small, testable, and replaceable.

Key Design Decisions

  • Per-item state machine with persisted state transitions makes the pipeline resumable, auditable, and easy to re-drive on a single URL.
  • Early relevance rejection happens before any expensive enrichment runs, keeping spend tied to articles that matter.
  • Schema-first enrichment emits typed entities (threat actors, malware, CVEs with CVSS, IOCs, MITRE ATT&CK, victimology) instead of free-form prose. Every downstream stage consumes structure.
  • Entity normalization applies curated alias mappings so the same actor, malware family, or product surfaces under one canonical name across sources.
  • Daily clustering + temporal links + super-clusters reconstruct multi-day campaigns instead of leaving analysts with isolated articles.
  • LLM adjudicator on gray-zone temporal links tie-breaks borderline “duplicate vs ongoing story” decisions that deterministic similarity cannot resolve, with verdict, confidence and rationale persisted alongside the link.
  • Local 7-signal importance scoring (source diversity, cluster size, CVE/actor/IOC presence, temporal persistence, content category) ranks stories without any extra API call.
  • Operational discipline: per-call cost tracking, structured run/step logs, soft-deletes, idempotent migrations and a single manual reprocess path that mirrors the nightly chain.

Outcome

The pipeline turns raw reporting into a queryable intelligence layer: ranked, deduplicated, and stitched into evolving stories that analysts can drill into by actor, malware, CVE, sector, or geography. IOC, CVE and morning-briefing exports feed downstream consumers.

Pipeline architecture

Default is left-to-right. Click the diagram or use the toggle to switch views.

Python FastAPI Streamlit PostgreSQL SQLite Redis / ARQ Qdrant OpenAI Agents SDK MITRE ATT&CK CVSS / NVD Docker Compose
CASE STUDY 02

Agentic CTI Investigator

ONE PLAN, MANY STEPS

Reasoning router

A single planner agent emits a typed execution plan (intents, scope, formatter, and a step DAG), replacing brittle classify-then-dispatch.

STEP GRAPH EXECUTION

Parallel

Independent steps (SQL, indicator lookups, vulnerability enrichment, MCP tools) run concurrently; topological layering preserves dependencies.

SKILL PACKS

Declarative

YAML packs capture recurring investigation patterns (triggers, slots, tool strategy, evidence contract, output template, validation rules) as one source of truth for the planner, retrieval, formatter, and verifier.

VERIFICATION

Layered

Plan-faithfulness, skill-contract, tool-strategy and evidence cross-checks run after every answer; below-confidence cases fall back to plain reasoning.

EVIDENCE-BACKED OUTPUT

Grounded

Every answer carries the rows and tools that produced it; a synthesizer merges CVE-bearing answers with authoritative vulnerability data before delivery.

CACHING

Plan-keyed

Cache signatures derive from the plan, so repeat investigations reuse work without leaking across scopes.

Problem

Analysts need an investigation system. Real CTI questions carry multiple intents at once (“the most exploited CVE this year” is both a ranking and a temporal scope), often need cross-referencing against external authoritative sources, and demand answers backed by traceable evidence. A single prompt, or a flat classify-then-dispatch table, cannot compose retrieval, specialist reasoning, and validation under those conditions.

Approach

I redesigned the orchestration around a reasoning router: one planner agent emits a structured execution plan with declared intents, entity scope, temporal scope, a DAG of steps, the formatter to use, and whether specialists are needed. A plan runner topologically sorts that DAG and runs independent steps in parallel. On top of the router sits a declarative skills layer: roughly 90% of recurring investigation patterns are captured as YAML packs (triggers, typed slots, tool strategy, evidence contract, output template, validation rules) that constrain every downstream stage from the same source of truth. Specialist subagents handle deeper timeline, correlation, comparison and trend reasoning when the plan calls for it; layered verification gates the final answer.

Key Design Decisions

  • Plan, don’t classify: one planner agent emits a typed execution plan that decomposes multi-intent and time-scoped questions into structured steps.
  • Parallel step DAG: independent retrievals and lookups run concurrently; topological layering preserves dependencies, and the runner rolls per-step traces up into the answer.
  • Declarative skill packs: a YAML pack describes each recurring pattern (ranking aggregation, single-CVE deep-dive, threat-actor profile, daily briefing, IOC pivots) once, and the planner, SQL expert, formatter and verifier all read from it.
  • Skill router with confidence threshold: when no skill clears the threshold, the system falls back to plain reasoning rather than forcing a misclassification.
  • Live schema awareness: the planner introspects the data model at request time via tool calls, so reasoning stays aligned with the current schema instead of a static prompt snapshot.
  • Layered verification: plan-faithfulness, skill-contract coverage, tool-strategy enforcement, and deterministic evidence cross-checks combine before the system releases an answer; bounded revision loops re-run the offending steps when grounding slips.
  • Plan-keyed caching: the cache derives signatures from the plan, so repeated investigations reuse work without leaking across scopes.
  • Validation harness: a curated end-to-end suite gates every change to skills, prompts or verifier logic, asserting selected skill, answer substrings, and verifier status against a live container.

Outcome

The system behaves like a controlled investigation workflow: grounded answers with traceable evidence, structured analyst reports, and proactive briefings, produced under explicit plans, declarative contracts, and verifiable checks.

Investigation architecture

Default is left-to-right. Click the diagram or use the toggle to switch views.

Python FastAPI OpenAI Agents SDK Model Context Protocol Declarative skill packs (YAML) PostgreSQL NVD / CISA KEV Docker