Which LLM Finds Obscure Knife‑Brand URLs Cheapest?

A Mini‑Benchmark of 8 Web‑Enabled Models comparing accuracy and cost for finding knife manufacturer websites.

Which LLM Finds Obscure Knife‑Brand URLs Cheapest?

A Mini‑Benchmark of 8 Web‑Enabled Models (May 31 2025)

Which web-enabled LLM can actually find the official website for a tiny, half-forgotten knife brand—and do it without costing a fortune?

I’m building new.knife.day, a database where collectors log every maker from Al Mar to Zladotek. To automate the “official URL” field I tested eight models—GPT-4o, Gemini, Llama 3 and friends—on ten truly obscure brands (Actilam, Aiorosu, etc.). The brief was simple: return { brand, official_url, confidence }. The scorecard: accuracy and dollars spent per correct hit.

Goal: Given a brand name, return { brand, official_url, confidence }.
Metric: accuracy per brand vs total cost.
Dataset: ten obscure knife brands.

1 · Experimental Setup

Item	Details
Brands	ABKT · 5ive Star Gear · 5.11 Tactical · Aclim8 · Acta Non Verba Knives · Actilam · AGA Campolin · Aiorosu Knives · AKC · Al Mar Knives
Models (via OpenRouter)	Perplexity sonar‑deep‑research · OpenAI gpt‑4o & gpt‑4o‑mini · Anthropic claude‑sonnet‑4 · Google gemini‑2.5‑pro & gemini‑2.0‑flash · Meta llama‑3.1‑70b · Alibaba qwen‑2.5‑72b
Prompt (one‑liner)	`System: "Return ONLY JSON with keys brand, official_url, confidence."`
Scoring	Exact official domain = ✅ · “No official site” (with justification) = ✅
Costs	OpenRouter prices on 31 May 2025 (Perplexity billed separately)
Code + logs	https://github.com/pvijeh/find-knife-brands

2 · Results

2.1 Leaderboard (sorted by cost per correct URL)

Rank	Model	Correct/10	Total USD	USD/Correct	Tokens
1	Gemini 2.0 Flash	7	0.001	0.0001	4 k
2	GPT‑4o‑Mini	9	0.19	0.02	24.7 k
2	Llama‑3.1‑70B	9	0.19	0.02	25.1 k
4	Qwen 2.5‑72B	8	0.19	0.02	27.9 k
5	GPT‑4o	9	0.26	0.03	25.6 k
6	Claude Sonnet‑4	9	0.32	0.04	31.8 k
7	Gemini 2.5 Pro¹	5	0.31	0.06	36.8 k
8	Perplexity Sonar	10	9.42	0.94	860 k

_{¹ Gemini 2.5 Pro produced HTML tables in five cases; my JSON parser rejected them.}

2.2 Interpretation

Perplexity is flawless but costs $9.42 for ten queries—mostly due to an 860 k‑token footprint.
GPT‑4o‑Mini & Llama‑3.1‑70B reach 90 % accuracy at ~$0.02 per hit—best bang‑for‑buck.
Gemini Flash lands 70 % at one‑tenth of a cent; with a manual QA pass it’s unbeatable on price.
Structured output matters. Gemini 2.5 Pro’s HTML responses were unusable—well‑formed JSON is part of model quality.
Edge cases: only Perplexity explicitly declared “no official site” for Aiorosu Knives; GPT‑4o‑Mini offered a reseller link (helpful, but scored wrong).

3 · Key Take‑Aways

Define “good enough.” 90 % accuracy + quick human review beats 100 % accuracy at 45× the price.
Validate on ingestion. Malformed JSON breaks pipelines—enforce a schema check.
Watch prices. Promo rates shift; query the price API before batching thousands of calls.

4 · What’s Next?

I’m wiring GPT‑4o‑Mini into Knife.Day so collectors see verified manufacturer links on every brand page. Re‑crawling ≈ 250 brands now costs under $5.

If you collect knives—or just enjoy oddball benchmarks—join the Knife.Day beta and tell me which dataset you’d automate next.

Disclaimer: model versions, accuracy, and prices are a snapshot from 31 May 2025. Future mileage—and billing—will vary.

New.Knife.Day

Which LLM Finds Obscure Knife‑Brand URLs Cheapest?

Which LLM Finds Obscure Knife‑Brand URLs Cheapest?

A Mini‑Benchmark of 8 Web‑Enabled Models (May 31 2025)

1 · Experimental Setup

2 · Results

2.1 Leaderboard (sorted by cost per correct URL)

2.2 Interpretation

3 · Key Take‑Aways

4 · What’s Next?

Resources

About

Categories

Which LLM Finds Obscure Knife‑Brand URLs Cheapest?

Which LLM Finds Obscure Knife‑Brand URLs Cheapest?

A Mini‑Benchmark of 8 Web‑Enabled Models (May 31 2025)

1 · Experimental Setup

2 · Results

2.1 Leaderboard (sorted by cost per correct URL)

2.2 Interpretation

3 · Key Take‑Aways

4 · What’s Next?

Resources

About

Categories

A Mini‑Benchmark of 8 Web‑Enabled Models (May 31 2025)