New.Knife.Day
HomeCategoriesBrandsSteel ComparisonsSteels

Which LLM Finds Obscure Knife‑Brand URLs Cheapest?

A Mini‑Benchmark of 8 Web‑Enabled Models comparing accuracy and cost for finding knife manufacturer websites.

Which LLM Finds Obscure Knife‑Brand URLs Cheapest?

A Mini‑Benchmark of 8 Web‑Enabled Models (May 31 2025)

Which web-enabled LLM can actually find the official website for a tiny, half-forgotten knife brand—and do it without costing a fortune?

I’m building new.knife.day, a database where collectors log every maker from Al Mar to Zladotek. To automate the “official URL” field I tested eight models—GPT-4o, Gemini, Llama 3 and friends—on ten truly obscure brands (Actilam, Aiorosu, etc.). The brief was simple: return { brand, official_url, confidence }. The scorecard: accuracy and dollars spent per correct hit.

Goal: Given a brand name, return { brand, official_url, confidence }.
Metric: accuracy per brand vs total cost.
Dataset: ten obscure knife brands.


1 · Experimental Setup

ItemDetails
BrandsABKT · 5ive Star Gear · 5.11 Tactical · Aclim8 · Acta Non Verba Knives · Actilam · AGA Campolin · Aiorosu Knives · AKC · Al Mar Knives
Models (via OpenRouter)Perplexity sonar‑deep‑research · OpenAI gpt‑4o & gpt‑4o‑mini · Anthropic claude‑sonnet‑4 · Google gemini‑2.5‑pro & gemini‑2.0‑flash · Meta llama‑3.1‑70b · Alibaba qwen‑2.5‑72b
Prompt (one‑liner)System: "Return ONLY JSON with keys brand, official_url, confidence."
ScoringExact official domain = ✅ · “No official site” (with justification) = ✅
CostsOpenRouter prices on 31 May 2025 (Perplexity billed separately)
Code + logshttps://github.com/pvijeh/find-knife-brands

2 · Results

2.1 Leaderboard (sorted by cost per correct URL)

RankModelCorrect/10Total USDUSD/CorrectTokens
1Gemini 2.0 Flash70.0010.00014 k
2GPT‑4o‑Mini90.190.0224.7 k
2Llama‑3.1‑70B90.190.0225.1 k
4Qwen 2.5‑72B80.190.0227.9 k
5GPT‑4o90.260.0325.6 k
6Claude Sonnet‑490.320.0431.8 k
7Gemini 2.5 Pro¹50.310.0636.8 k
8Perplexity Sonar109.420.94860 k

¹ Gemini 2.5 Pro produced HTML tables in five cases; my JSON parser rejected them.


2.2 Interpretation

  • Perplexity is flawless but costs $9.42 for ten queries—mostly due to an 860 k‑token footprint.
  • GPT‑4o‑Mini & Llama‑3.1‑70B reach 90 % accuracy at ~$0.02 per hit—best bang‑for‑buck.
  • Gemini Flash lands 70 % at one‑tenth of a cent; with a manual QA pass it’s unbeatable on price.
  • Structured output matters. Gemini 2.5 Pro’s HTML responses were unusable—well‑formed JSON is part of model quality.
  • Edge cases: only Perplexity explicitly declared “no official site” for Aiorosu Knives; GPT‑4o‑Mini offered a reseller link (helpful, but scored wrong).

3 · Key Take‑Aways

  1. Define “good enough.” 90 % accuracy + quick human review beats 100 % accuracy at 45× the price.
  2. Validate on ingestion. Malformed JSON breaks pipelines—enforce a schema check.
  3. Watch prices. Promo rates shift; query the price API before batching thousands of calls.

4 · What’s Next?

I’m wiring GPT‑4o‑Mini into Knife.Day so collectors see verified manufacturer links on every brand page. Re‑crawling ≈ 250 brands now costs under $5.

If you collect knives—or just enjoy oddball benchmarks—join the Knife.Day beta and tell me which dataset you’d automate next.


Disclaimer: model versions, accuracy, and prices are a snapshot from 31 May 2025. Future mileage—and billing—will vary.

Resources
  • Knife Steel Comparisons
  • Knife Steels Database
  • Brands
  • Categories

New.Knife.Day

© 2025 New Knife Day. All rights reserved.