Engine Bench · June 2026 · 8 models

Fingerprint Diagnostics


In the rapidly evolving landscape of network security and device identification, the question often arises: Can general-purpose Large Language Models (LLMs) replace specialized, purpose-built engines? To answer this, we conducted a comprehensive benchmark comparing Fingerbank's production fingerprinting engine against leading general-purpose LLMs on the same input telemetry. Each engine scored on speed, cost, self-reported confidence, and manufacturer accuracy.


§ 01

Cost vs latency

Average cost against average latency per call. Up-and-right is the sweet spot — cheaper and faster.
Efficiency frontier · log-log
Avg cost × avg latency
Cost per call
Min ↔ max range across scenarios, with average marked
§ 02

Confidence vs accuracy

Detection correctness by test category on the left; manufacturer match rate against Fingerbank’s OUI lookup on the right.
Device-name accuracy · by test category
Accuracy over device category
0% 100%
Accuracy · match rate
How often each engine matched the officially registered manufacturer
§ 03

Signal combinations

Co-occurrence heatmap showing how detection correctness varies with which signals appear together in the payload.
Correctness heatmap · signal × signal
Accuracy over signals combination
0% 100%
§ 04

Failure rate

How often each engine returned a device name containing “error” or “unknown” — i.e. failed to commit to an identification. Lower is better.
Failure rate
Share of cases with no confident device name
§ 05

The verdicts, case by case

Cases are grouped by test ID. Click a group to drop in its case list; click any case to expand. Use search to filter by test ID, description, MAC, or category — click reasoning inside an expanded case to see each engine's stated rationale.