New.Knife.Day
HomeCategoriesSteel ComparisonsSteels

Which LLM Finds Obscure Knife‑Brand URLs Cheapest?

A Mini‑Benchmark of 8 Web‑Enabled Models comparing accuracy and cost for finding knife manufacturer websites.

Which LLM Finds Obscure Knife‑Brand URLs Cheapest?

A Mini‑Benchmark of 8 Web‑Enabled Models (May 31 2025)

Every week a new LLM promises sharper reasoning, broader knowledge, or lower prices. Hype is easy; trustworthy data is harder—especially when your use‑case is as quirky as mine.

I run Knife.Day, a community where collectors catalogue every cutlery brand from Al Mar to Zladotek. To enrich those pages I need a scriptable way to fetch each maker’s official website, including tiny outfits like Actilam or Aiorosu Knives that barely register on Google. Perfect task for a quick LLM showdown:

Goal: Given a brand name, return { brand, official_url, confidence }.
Metric: accuracy per brand vs total cost.
Dataset: ten obscure knife brands.


1 · Experimental Setup

ItemDetails
BrandsABKT · 5ive Star Gear · 5.11 Tactical · Aclim8 · Acta Non Verba Knives · Actilam · AGA Campolin · Aiorosu Knives · AKC · Al Mar Knives
Models (via OpenRouter)Perplexity sonar‑deep‑research · OpenAI gpt‑4o & gpt‑4o‑mini · Anthropic claude‑sonnet‑4 · Google gemini‑2.5‑pro & gemini‑2.0‑flash · Meta llama‑3.1‑70b · Alibaba qwen‑2.5‑72b
Prompt (one‑liner)System: "Return ONLY JSON with keys brand, official_url, confidence."
ScoringExact official domain = ✅ · “No official site” (with justification) = ✅
CostsOpenRouter prices on 31 May 2025 (Perplexity billed separately)
Code + logshttps://github.com/pvijeh/find-knife-brands

2 · Results

2.1 Leaderboard (sorted by cost per correct URL)

RankModelCorrect/10Total USDUSD/CorrectTokens
1Gemini 2.0 Flash70.0010.00014 k
2GPT‑4o‑Mini90.190.0224.7 k
2Llama‑3.1‑70B90.190.0225.1 k
4Qwen 2.5‑72B80.190.0227.9 k
5GPT‑4o90.260.0325.6 k
6Claude Sonnet‑490.320.0431.8 k
7Gemini 2.5 Pro¹50.310.0636.8 k
8Perplexity Sonar109.420.94860 k

¹ Gemini 2.5 Pro produced HTML tables in five cases; my JSON parser rejected them.


2.2 Interpretation

  • Perplexity is flawless but costs $9.42 for ten queries—mostly due to an 860 k‑token footprint.
  • GPT‑4o‑Mini & Llama‑3.1‑70B reach 90 % accuracy at ~$0.02 per hit—best bang‑for‑buck.
  • Gemini Flash lands 70 % at one‑tenth of a cent; with a manual QA pass it’s unbeatable on price.
  • Structured output matters. Gemini 2.5 Pro’s HTML responses were unusable—well‑formed JSON is part of model quality.
  • Edge cases: only Perplexity explicitly declared “no official site” for Aiorosu Knives; GPT‑4o‑Mini offered a reseller link (helpful, but scored wrong).

3 · Key Take‑Aways

  1. Define “good enough.” 90 % accuracy + quick human review beats 100 % accuracy at 45× the price.
  2. Validate on ingestion. Malformed JSON breaks pipelines—enforce a schema check.
  3. Watch prices. Promo rates shift; query the price API before batching thousands of calls.

4 · What’s Next?

I’m wiring GPT‑4o‑Mini into Knife.Day so collectors see verified manufacturer links on every brand page. Re‑crawling ≈ 250 brands now costs under $5.

If you collect knives—or just enjoy oddball benchmarks—join the Knife.Day beta and tell me which dataset you’d automate next.


Disclaimer: model versions, accuracy, and prices are a snapshot from 31 May 2025. Future mileage—and billing—will vary.

Resources
  • Knife Steel Comparisons
  • Knife Steels Database

New.Knife.Day

© 2025 New Knife Day. All rights reserved.