A Mini‑Benchmark of 8 Web‑Enabled Models comparing accuracy and cost for finding knife manufacturer websites.
Every week a new LLM promises sharper reasoning, broader knowledge, or lower prices. Hype is easy; trustworthy data is harder—especially when your use‑case is as quirky as mine.
I run Knife.Day, a community where collectors catalogue every cutlery brand from Al Mar to Zladotek. To enrich those pages I need a scriptable way to fetch each maker’s official website, including tiny outfits like Actilam or Aiorosu Knives that barely register on Google. Perfect task for a quick LLM showdown:
Goal: Given a brand name, return
{ brand, official_url, confidence }
.
Metric: accuracy per brand vs total cost.
Dataset: ten obscure knife brands.
Item | Details |
---|---|
Brands | ABKT · 5ive Star Gear · 5.11 Tactical · Aclim8 · Acta Non Verba Knives · Actilam · AGA Campolin · Aiorosu Knives · AKC · Al Mar Knives |
Models (via OpenRouter) | Perplexity sonar‑deep‑research · OpenAI gpt‑4o & gpt‑4o‑mini · Anthropic claude‑sonnet‑4 · Google gemini‑2.5‑pro & gemini‑2.0‑flash · Meta llama‑3.1‑70b · Alibaba qwen‑2.5‑72b |
Prompt (one‑liner) | System: "Return ONLY JSON with keys brand, official_url, confidence." |
Scoring | Exact official domain = ✅ · “No official site” (with justification) = ✅ |
Costs | OpenRouter prices on 31 May 2025 (Perplexity billed separately) |
Code + logs | https://github.com/pvijeh/find-knife-brands |
Rank | Model | Correct/10 | Total USD | USD/Correct | Tokens |
---|---|---|---|---|---|
1 | Gemini 2.0 Flash | 7 | 0.001 | 0.0001 | 4 k |
2 | GPT‑4o‑Mini | 9 | 0.19 | 0.02 | 24.7 k |
2 | Llama‑3.1‑70B | 9 | 0.19 | 0.02 | 25.1 k |
4 | Qwen 2.5‑72B | 8 | 0.19 | 0.02 | 27.9 k |
5 | GPT‑4o | 9 | 0.26 | 0.03 | 25.6 k |
6 | Claude Sonnet‑4 | 9 | 0.32 | 0.04 | 31.8 k |
7 | Gemini 2.5 Pro¹ | 5 | 0.31 | 0.06 | 36.8 k |
8 | Perplexity Sonar | 10 | 9.42 | 0.94 | 860 k |
¹ Gemini 2.5 Pro produced HTML tables in five cases; my JSON parser rejected them.
I’m wiring GPT‑4o‑Mini into Knife.Day so collectors see verified manufacturer links on every brand page. Re‑crawling ≈ 250 brands now costs under $5.
If you collect knives—or just enjoy oddball benchmarks—join the Knife.Day beta and tell me which dataset you’d automate next.
Disclaimer: model versions, accuracy, and prices are a snapshot from 31 May 2025. Future mileage—and billing—will vary.
© 2025 New Knife Day. All rights reserved.