Part 4: An Honest Comparison

Series: What production AI needs beyond an impressive model.

Previous: Part 3B.
阅读中文版。

The same task can cost a deterministic pipeline one unit and a general-purpose agent tens of units, or more, if the agent completes it at all.

The previous essays explained how the system acquired its layers: why classification needs context, why LLMs belong in the ambiguous semantic remainder, and why extraction, normalization, stopping rules, PDF handling, state, and compliance need deterministic control.

This essay puts the two approaches side by side.

The figures below are an order-of-magnitude model, not a vendor quote or a universal benchmark. The agent is given favorable assumptions: batching, a capable model, a large context window, and access to browser tools. The comparison is therefore not between a careful pipeline and a deliberately naive agent.

The Cost of the Brain

Keyword scoring, URL-tree analysis, score inheritance, document-name signals, and iterative caches remove clear cases before an LLM is called.

Cost dimensionDeterministic pipelineGeneral-purpose agent
Model inputAmbiguous remainder after filteringBroad raw candidate set plus context
First passSmall and boundedGrows with website size
Later passesReuses path and classification memoryOften pays for context again
Model responsibilitySemantic judgmentJudgment, exploration, and control
Relative cost1× baselineOften tens of times higher

The important difference is what determines model cost. In the pipeline, cost follows ambiguity. In the agent, cost tends to follow the amount of material encountered.

A large website may expose thousands of candidate links, but only a small subset should require semantic judgment. Sending everything to the model is not intelligence. It is failure to separate known behavior from unknown meaning.

The Cost of the Body

Classification is only the brain. The body is orchestration: extraction, deduplication, query normalization, stopping, PDF scheduling, browser fallback, state management, and compliance checks.

StageDeterministic pipelineGeneral-purpose agent
Link extractionFixed parsers and normalizationTool calls plus runtime judgment
Query parametersRules prevent duplication and loopsMissing rules can create unbounded exploration
StoppingExplicit budgets and priorities“More content exists” keeps the task alive
PDF handlingSpecialized download and verificationEach failure creates another reasoning loop
DeferralPages first, documents laterArrival order often becomes priority
Browser fallbackTest, record, and reuseRepeated trial and error
ComplianceDeterministic pre-request checksConstraints must be reinterpreted during each run

These actions consume compute in a pipeline, but they do not need fresh model judgment. Their behavior is stable and their marginal cost is predictable.

An agentic implementation repeatedly turns bodily motion into thought. A failure creates another observation, another context update, another tool call, and another decision. Cost spreads along the failure path.

If the pipeline is normalized to , an otherwise capable agent without the specialized engineering layers can plausibly land one or two orders of magnitude higher. The exact multiplier is less important than the mechanism: repeated context, browser interaction, recovery, and rediscovery.

Incompleteness Is More Expensive Than Tokens

The visible bill is not the main risk.

A deterministic pipeline does not merely hope that a model notices a report hidden behind a generic content path. It can propagate evidence from the page that discovered the file, preserve the parent-child relationship, and replay the decision later.

A general-purpose agent can often return useful material. But “useful material” and “systematic coverage of everything discoverable” are different products.

In an investment workflow, absence is especially dangerous. A wrong answer can be challenged. A document that was never discovered does not announce that it is missing.

Nondeterminism Compounds

Agents can perform well on short tasks. Long workflows change the mathematics because reliability multiplies across dependent decisions.

Suppose each decision is correct 95% of the time, a generous assumption. A workflow with 100 dependent decisions has:

Probability of a completely correct run = 0.95^100 ≈ 0.6%

This does not mean every run collapses visibly. More often, most decisions are correct and a few drift silently. Because the output remains plausible, partial failure can be harder to detect than total failure.

DimensionDeterministic pipelineGeneral-purpose agent
ConsistencySame input follows the same pathRuns may drift
ReproducibilityState and logs can be replayedIdentical decisions are not guaranteed
DebuggingTrace a rule, state, or failure pointInterpret a generated trajectory
CompletenessDefine, measure, and regression-test itOften settles for “looks sufficient”
ComplianceApply the same check every timeDepends on the model remembering the constraint

Parameter loops, silent document failures, and unbounded news exploration are not random anecdotes. They are structural consequences of missing state, budgets, and deterministic boundaries.

Do Not Confuse a Request with a Production Line

“Find the latest sustainability report for this company.”

That is an excellent agent task: the target is clear, the scope is small, and a person can verify the result.

“Systematically cover discoverable disclosures across a large company universe, with auditability and reproducibility.”

That is not a request. It is a production line.

The difference is not whether the model is intelligent. It is whether the task requires experience to be converted into behavior that executes every time.

The Useful Composition

The right architecture is not agent or pipeline. It is agent as interface, pipeline as engine.

User: Run disclosure collection for this company.
Agent: Interpret the intent and prepare validated parameters.
Pipeline: Execute classification, collection, retries, state, and compliance.
Agent: Summarize results and surface failures for human judgment.

The agent understands intent and explains results. The pipeline performs the behavior that should not be reinvented on every run.

“But Agents Can Write the Code”

They can, and this is one of their most valuable roles.

AI coding tools can accelerate implementation, testing, and refactoring of deterministic systems. But writing code quickly is not the same as knowing which code needs to exist.

The expensive discoveries emerge during real operation: links that appear only after rendering, nonstandard elements that contain downloads, parameters that form loops, and documents that open visually but never reach storage. Once the failure is understood and the boundary is described, AI can help implement the fix.

Code generation is becoming abundant. Identifying hidden requirements, choosing system boundaries, and deciding what must execute deterministically remain engineering judgment.

Where Each Approach Fits

Agents fit:

  • One-off tasks, where building a dedicated system would cost more than several model calls.
  • Exploration, before the shape of the problem is understood.
  • Code generation, to build and maintain deterministic systems.
  • Low-risk, high-variety work, where a person can review the result.

Deterministic pipelines fit:

  • High-throughput processing, where large volumes need consistent treatment.
  • High reliability, where small error probabilities compound.
  • Stateful multi-step workflows, where facts and decisions persist across stages.
  • Zero-tolerance constraints, including permissions, compliance, and budgets.
  • Frequent micro-decisions, such as batching, timeouts, retries, and rate limits.

This may not be the permanent frontier of model capability. It is the engineering boundary that reliable systems need to respect today.

Next: Part 5A — What the Research Says: The Data.