# System architecture

> How the product composes. The on-box vs off-box seam, components, data flow, segmentation, and trust model.

*Canonical HTML: https://www.conversationalfactory.com/docs/architecture*
*Markdown source: https://www.conversationalfactory.com/docs/architecture.md*

---

## Frame

Conversational Factory is **the product**. It is the witness, the historian, the MCP server, the operator interfaces, and the end-to-end packaging that ships to a customer site and lets them talk to a packing line.

It is built on the **Industrial Independence Architecture (IIA)**, which is a separate architectural specification governing how a sovereign-unit-per-zone industrial monitoring system is shaped. IIA is the principle. Conversational Factory is the conversational implementation of it.

## On-Box vs Off-Box

The product crosses a structural seam that the IIA spec calls out explicitly: the **box** (the sovereign zone appliance) hosts data, capture, and the i3X surface; the **conversational layer** (MCP server, NL→i3X, answer composer) runs **off-box** — on the operator's workstation, a broker box at broader scope, or a small adjunct host on the local network.

Why this seam exists:

1. **AI clients live where humans are.** Claude Desktop on a workstation, ChatGPT in a browser, a local LLM on the GPU host. The gateway has to be near the AI client; the AI client is never on the appliance.
2. **The box has SRP discipline.** Safety, Reliability, Performance govern the cell. LLM-adjacent code (model versions, prompt eval drift, token costs, vendor APIs, response variance) is IT-side complexity that doesn't fit inside a 4 GB / 2-core / 1 TB sovereign appliance with signed-config-only mutation.
3. **No HTTP at the boundary.** MCP's canonical Streamable HTTP transport would violate IIA SL3 if hosted on the box. Stdio MCP doesn't have this problem, but the larger argument (1) and (2) still apply.
4. **Independent scaling.** One zone box per cell, but one workstation may serve a whole site. The gateway scales with users; the witness scales with zones.
5. **Audit cleanliness.** Two ledgers — operator-side (gateway: "what was asked") and box-side (witness: "what was retrieved") — correlated by `request_id`. Different retention, different concerns.

Concretely:

```
ON-BOX (per IIA zone appliance)               OFF-BOX (workstation / broker / adjunct)
─────────────────────────────────             ────────────────────────────────────────
  services/witness/                              services/conversational-gateway/
    • Continuous capture                           • MCP server (stdio + HTTP)
    • Medallion lake (Iron→Bronze→Silver)          • NL → i3X translator
    • AssetDB (Postgres)                           • Answer composer + citations
    • RTDP mesh                                    • Operator-side audit chain
    • Operator console (HTML)                      • Read-only guardrails

  services/query-plane/                          AI CLIENT (Claude Desktop, browser,
    • i3X v1 over mTLS  ◄═══════════════════════   local LLM, MCP-aware tool)
    • Witness adapter
    • Capability flagging                        speaks → MCP / structured query

  services/dpi/  (planned)                       reaches the box via:
    • marlinspike-dpi                              • i3X over mTLS (preferred)
                                                   • Zenoh queryables (alt profile)

  Inter-box mesh, in-flight bus, attestation     Operator-side audit:
  observers, signed-config applier — all on-box.   gateway-audit.jsonl, syncable up.
```

The architectural seam is **i3X over mTLS**. Everything left of the seam (witness, query plane, DPI) runs on the appliance. Everything right of the seam (conversational gateway, AI client) runs off-box.

## The Layered Stack

```
                   CUSTOMER / OPERATOR / AI CLIENT
                              │
                              │  natural language, MCP tools
                              ▼
                   ┌──────────────────────────────┐
                   │  Conversational Gateway      │  OFF-BOX
                   │  • MCP server (stdio+HTTP)   │  Runs on workstation / broker /
                   │  • NL → i3X translator       │  adjunct host
                   │  • Answer composer           │
                   │  • Operator-side audit chain │
                   │  • Read-only guardrails      │
                   └──────────────┬───────────────┘
                                  │  i3X v1 over mTLS
                                  │  (the architectural seam)
                                  ▼
                   ┌──────────────────────────────┐
                   │  i3X Query Plane             │  ON-BOX
                   │  • Object types              │  In-tree today; folds into
                   │  • Relationships             │  witness-rust eventually
                   │  • Current values + history  │
                   │  • Subscription stream       │
                   │  • Multi-source router       │
                   └──────────────┬───────────────┘
                                  │
                                  ▼
        ┌─────────────────────────────────────────────────┐
        │             THE WITNESS (eriswitness)            │
        │  Continuous capture → DPI → medallion lake →     │
        │  asset DB → operator console → mesh / RTDP       │
        │                                                  │
        │  Iron (pcapng)  ─►  Bronze (typed events)        │
        │       │                  │                       │
        │       │                  ▼                       │
        │       │            Silver (conversations,        │
        │       │             asset edges, topology)       │
        │       │                  │                       │
        │       └──────► AssetDB (Postgres, materialized)  │
        │                                                  │
        │  + 26 reference PCAP fixtures, distributed       │
        │    asset replication, forensic hold tiers        │
        └────────────────────┬─────────────────────────────┘
                             │
                             │  packets
                             ▼
                   ┌──────────────────────┐
                   │   SPAN port / TAP /   │
                   │   ARP relay / OVS     │
                   │   (passive only)      │
                   └──────────────────────┘
                             │
                             ▼
                       ACS DOMAIN
                  PLCs, HMIs, drives, switches
```

## Components

### Witness (`services/witness/`)

Symlinked to `~/eriswitness/`. The Python implementation that already exists, runs in customer environments today, and supplies the substrate for everything above it.

**What it provides:**

- **Continuous capture** via tshark ring-buffer on every monitored interface (real NIC, SPAN port, OVS bridge from a MiniNet zone container). 50 MB segments × 10-file ring, zero-gap rotation.
- **34-protocol DPI** covering OT (Modbus, DNP3, IEC 104, IEC 61850 GOOSE/SV/MMS, S7comm, PROFINET, BACnet, EtherNet/IP, OPC UA, HART-IP, FINS, EtherCAT, MRP, PRP, BSAP, CIP, CODESYS, FOX, GE EGD, GE SRTP, IO-Link, KNXnet, MELSEC, OMRON FINS, PCCC, ROC) and IT (DNS, DHCP, HTTP, TLS, SNMP, SSH, FTP, NTP, MQTT, AMQP, CoAP, MDNS, NBNS, RADIUS, RDP, LDAP, Kerberos, SMB, others) protocols.
- **Frame-level integrity (Stovetop)**: runt/oversized frame detection, FCS validation, padding entropy analysis for covert channels, DNP3 CRC validation.
- **Stateful L2 monitoring (Bilgepump)**: ARP spoof detection, VLAN hopping, STP root hijacking, rogue DHCP, identity conflicts, MAC flapping.
- **ICMP threat detection (ICMPeeker)**: redirect detection, covert tunnel entropy analysis, suspicious type flagging.
- **Medallion lake** in DuckLake/Parquet: Iron pcapng → Bronze typed events (55 protocol STRUCT columns just for DPI conversations) → Silver conversations and asset edges → Gold dashboards.
- **Distributed AssetDB** in Postgres: per-collector asset tables, RTDP replication across the mesh, MAC-primary identity, OUI vendor lookup, DNS hostname enrichment, CVE correlation, finding tracking, intervention workflow.
- **Multi-tenant hierarchy**: org → site → zone → subzone, scoped scans, per-org Fernet encryption at rest, audit ledger, RBAC.
- **Forensic hold tiers**: per-asset, per-zone, global. Targeted PCAP retention with zstd + Fernet encryption.
- **Server-rendered HTML operator console** with 47 templates, no SPA, Three.js workspace viewer, Flask + Jinja2.
- **Continuous capture pipeline** that writes Iron and computes Bronze + Silver inline, no separate batch step.
- **i3X surface** with multi-source router (`_i3x_router.py`): AssetDB source for current values, DuckLake source for history, Historian source for OT signal time-series, Sparkplug source for MQTT/Sparkplug-B publishers.
- **Subscription stream** (`_i3x_subscriptions.py`) for SSE-based change notifications.

### Conversational Gateway (`services/conversational-gateway/`)

In-tree. The genuinely new work for this product.

**Responsibilities:**

- **MCP server** exposing i3X verbs (`list_objects`, `get_objects`, `get_values`, `get_history`, `subscribe`, `list_assets_in_zone`, `get_topology`, `get_findings`, `get_baseline_deviations`) as MCP tools to AI clients.
- **NL → i3X translation.** A natural-language query like *"why did line 3 lose throughput last shift?"* expands into a sequence of i3X calls: identify line 3's elementId → fetch its components → range-query history for the relevant signals → fetch findings in the time window → compose.
- **Answer composer.** Aggregates VQT history, asset metadata, topology, findings, and baseline deviations into LLM-readable context. Returns grounded answers with citations into the audit chain.
- **Audit binding.** Every conversational query writes to the witness audit ledger so an operator can ask later *"why did the AI tell you that?"* and trace back through the i3X calls and the underlying lake reads.
- **Read-only guardrails.** Architecturally enforced. The gateway has no path to a device write, no path to AssetDB mutation, no path to baseline modification. Read-only is a property of the call surface, not a prompt directive.

### i3X Query Plane (`services/query-plane/`)

In-tree, currently the home of the canonical i3X v1 reference implementation in Rust. Consumes the witness's i3X surface and re-exposes it under the v1 contract.

**Responsibilities:**

- Object/relationship type catalog backed by `schemas/i3x/v1/`.
- Address-space resolution (FQDN → element).
- Multi-source dispatch (AssetDB, DuckLake history, polling historian, Sparkplug) — same router pattern as the Python witness.
- `/v1/info` capability flags (`query.history`, `subscribe.stream`, etc.) flip based on which sources are reachable.

## Data Flow

The product is read-only at every architectural seam.

1. **Wire → Iron.** The witness captures every observable frame on the monitored interfaces into Iron pcapng segments. Capture mode (Full / DPI-only / Cleartext) selects fidelity. Capture is passive — no IP stack transmit on the ACS-facing interface.
2. **Iron → Bronze.** Tshark dissection emits structured Bronze events: protocol transactions, asset observations, topology observations, parse anomalies, extracted artifacts. Bronze is ~27× smaller than Iron.
3. **Bronze → Silver.** The witness's silver pipeline correlates conversations, builds asset edges, computes traffic matrices, fingerprints devices, runs baselines. Silver is locally computed at every appliance from its own Bronze.
4. **Silver → AssetDB.** `data_lake.at(now)` materialization. The current truth in Postgres, used by the operator console and the i3X surface for fast point-in-time reads.
5. **AssetDB / Lake → i3X.** The witness's multi-source router and the in-tree query plane expose the data as i3X v1 — namespaces, object types, current values, history, subscription streams.
6. **i3X → Conversational Gateway.** The gateway translates natural-language queries into i3X calls, composes the responses, and serves the conversation through MCP or its own structured query API.
7. **Gateway → Audit chain.** Every query, every i3X call, every composed answer is bound to the audit ledger.

## Segmentation Model

This is governed by IIA. Conversational Factory inherits the model:

- **One box per zone.** Each zone appliance is sovereign and complete for its scope. The witness, the lake, the asset DB, the query plane, and the conversational gateway all run on every appliance.
- **Box-internal partitioning.** INBOUND (passive collectors, witness, lake, IDS), INTERNAL DMZ (transient message bus, audit chain head publisher), OUTBOUND (edge publisher, structured query API on mTLS, outbound tunnel agent). Default-deny conduits between zones, mTLS at every internal hop.
- **Hierarchy.** Cloud (optional) → Site → Zone → Subzone → Collector. Data flows upward (Bronze + mesh state always; targeted Iron under forensic hold). Commands flow downward through the mesh (Riptide).
- **Inter-zone visibility.** Profile-mediated: Sparkplug B at L1/L2, OPC UA pub/sub, mTLS structured query, Iceberg/Delta batch, depending on the level. The architecture is profile-agnostic; deployments select.
- **Mesh discovery.** ARP knock (Undertow L2) for same-segment peers; ICMP (Riptide L3) for cross-subnet command channel; Historian HTTPS/QUIC for bulk Iron transport. Each fails independently.

## Trust and Security Model

- **SRP inside the cell, CIA outside.** Safety, Reliability, Performance govern the ACS data plane. Confidentiality, Integrity, Availability govern information at the boundary and above.
- **No HTTP at the boundary in either direction.** No HTTP listener on the ACS or IT NIC. No outbound HTTP/HTTPS — no registry pulls, no rule-feed updates, no telemetry, no CRL/OCSP. Updates arrive via signed bundles in OS updates or mTLS-tunneled deltas.
- **Configuration is a signed artifact.** No live mutation API. The management UI is a text generator; the parser is the trust boundary; the applier executes a gated internal call set, then exits. A configuration attestation observer cross-checks running state against the staged artifact and emits divergence events.
- **Read-only first.** No device writes anywhere in the platform. The conversational gateway has no path to a setpoint change. Architectural, not policy.
- **Contract catalog.** Every communication — internal and external — is governed by an explicit data contract. Contractlessness is a deployment defect.
- **Attestation observes prevention.** Network IDS doubles as contract-attestation observer; IO master cross-checks the physical substrate. Findings emit under `ot.attestation.*`.
- **Audit chain.** Append-only, externally verifiable, externally publishable. Every operator action, every conversational query, every baseline deviation, every forensic hold writes to it.
- **Per-org Fernet encryption at rest.** Reports, retained PCAPs, forensic extracts. Nothing hits disk unencrypted.
