Jun 8, 2026

Part 4: An Honest Comparison

Series: What production AI needs beyond an impressive model.

The same task can cost a deterministic pipeline one unit and a general-purpose agent tens of units, or more, if the agent completes it at all.

The previous essays explained how the system acquired its layers: why classification needs context, why LLMs belong in the ambiguous semantic remainder, and why extraction, normalization, stopping rules, PDF handling, state, and compliance need deterministic control.

This essay puts the two approaches side by side.

The figures below are an order-of-magnitude model, not a vendor quote or a universal benchmark. The agent is given favorable assumptions: batching, a capable model, a large context window, and access to browser tools. The comparison is therefore not between a careful pipeline and a deliberately naive agent.

The Cost of the Brain

Keyword scoring, URL-tree analysis, score inheritance, document-name signals, and iterative caches remove clear cases before an LLM is called.

Cost dimension	Deterministic pipeline	General-purpose agent
Model input	Ambiguous remainder after filtering	Broad raw candidate set plus context
First pass	Small and bounded	Grows with website size
Later passes	Reuses path and classification memory	Often pays for context again
Model responsibility	Semantic judgment	Judgment, exploration, and control
Relative cost	1× baseline	Often tens of times higher

The important difference is what determines model cost. In the pipeline, cost follows ambiguity. In the agent, cost tends to follow the amount of material encountered.

A large website may expose thousands of candidate links, but only a small subset should require semantic judgment. Sending everything to the model is not intelligence. It is failure to separate known behavior from unknown meaning.

The Cost of the Body

Classification is only the brain. The body is orchestration: extraction, deduplication, query normalization, stopping, PDF scheduling, browser fallback, state management, and compliance checks.

Stage	Deterministic pipeline	General-purpose agent
Link extraction	Fixed parsers and normalization	Tool calls plus runtime judgment
Query parameters	Rules prevent duplication and loops	Missing rules can create unbounded exploration
Stopping	Explicit budgets and priorities	“More content exists” keeps the task alive
PDF handling	Specialized download and verification	Each failure creates another reasoning loop
Deferral	Pages first, documents later	Arrival order often becomes priority
Browser fallback	Test, record, and reuse	Repeated trial and error
Compliance	Deterministic pre-request checks	Constraints must be reinterpreted during each run

These actions consume compute in a pipeline, but they do not need fresh model judgment. Their behavior is stable and their marginal cost is predictable.

An agentic implementation repeatedly turns bodily motion into thought. A failure creates another observation, another context update, another tool call, and another decision. Cost spreads along the failure path.

If the pipeline is normalized to 1×, an otherwise capable agent without the specialized engineering layers can plausibly land one or two orders of magnitude higher. The exact multiplier is less important than the mechanism: repeated context, browser interaction, recovery, and rediscovery.

Incompleteness Is More Expensive Than Tokens

The visible bill is not the main risk.

A deterministic pipeline does not merely hope that a model notices a report hidden behind a generic content path. It can propagate evidence from the page that discovered the file, preserve the parent-child relationship, and replay the decision later.

A general-purpose agent can often return useful material. But “useful material” and “systematic coverage of everything discoverable” are different products.

In an investment workflow, absence is especially dangerous. A wrong answer can be challenged. A document that was never discovered does not announce that it is missing.

Nondeterminism Compounds

Agents can perform well on short tasks. Long workflows change the mathematics because reliability multiplies across dependent decisions.

Suppose each decision is correct 95% of the time, a generous assumption. A workflow with 100 dependent decisions has:

Probability of a completely correct run = 0.95^100 ≈ 0.6%

This does not mean every run collapses visibly. More often, most decisions are correct and a few drift silently. Because the output remains plausible, partial failure can be harder to detect than total failure.

Dimension	Deterministic pipeline	General-purpose agent
Consistency	Same input follows the same path	Runs may drift
Reproducibility	State and logs can be replayed	Identical decisions are not guaranteed
Debugging	Trace a rule, state, or failure point	Interpret a generated trajectory
Completeness	Define, measure, and regression-test it	Often settles for “looks sufficient”
Compliance	Apply the same check every time	Depends on the model remembering the constraint

Parameter loops, silent document failures, and unbounded news exploration are not random anecdotes. They are structural consequences of missing state, budgets, and deterministic boundaries.

Do Not Confuse a Request with a Production Line

“Find the latest sustainability report for this company.”

That is an excellent agent task: the target is clear, the scope is small, and a person can verify the result.

“Systematically cover discoverable disclosures across a large company universe, with auditability and reproducibility.”

That is not a request. It is a production line.

The difference is not whether the model is intelligent. It is whether the task requires experience to be converted into behavior that executes every time.

The Useful Composition

The right architecture is not agent or pipeline. It is agent as interface, pipeline as engine.

User: Run disclosure collection for this company.
Agent: Interpret the intent and prepare validated parameters.
Pipeline: Execute classification, collection, retries, state, and compliance.
Agent: Summarize results and surface failures for human judgment.

The agent understands intent and explains results. The pipeline performs the behavior that should not be reinvented on every run.

“But Agents Can Write the Code”

They can, and this is one of their most valuable roles.

AI coding tools can accelerate implementation, testing, and refactoring of deterministic systems. But writing code quickly is not the same as knowing which code needs to exist.

The expensive discoveries emerge during real operation: links that appear only after rendering, nonstandard elements that contain downloads, parameters that form loops, and documents that open visually but never reach storage. Once the failure is understood and the boundary is described, AI can help implement the fix.

Code generation is becoming abundant. Identifying hidden requirements, choosing system boundaries, and deciding what must execute deterministically remain engineering judgment.

Where Each Approach Fits

Agents fit:

One-off tasks, where building a dedicated system would cost more than several model calls.
Exploration, before the shape of the problem is understood.
Code generation, to build and maintain deterministic systems.
Low-risk, high-variety work, where a person can review the result.

Deterministic pipelines fit:

High-throughput processing, where large volumes need consistent treatment.
High reliability, where small error probabilities compound.
Stateful multi-step workflows, where facts and decisions persist across stages.
Zero-tolerance constraints, including permissions, compliance, and budgets.
Frequent micro-decisions, such as batching, timeouts, retries, and rate limits.

This may not be the permanent frontier of model capability. It is the engineering boundary that reliable systems need to respect today.

Next: Part 5A — What the Research Says: The Data.