Every AEO tracker I tried gave me a different number for the same brand — TypelessForm, a one-shot voice form-filling widget that's been live for five weeks. One scored us 44/100. Another said 28. A third refused to index us at all. None of them would show me the actual ChatGPT response they scored against. So I walked away from all four and built my own: @webappski/aeo-tracker. It runs on your machine, calls the real AI APIs, saves every raw response to disk, and produces a report you can fact-check line by line. It is free, MIT-licensed, and costs roughly twenty cents per weekly run. This is what it shows for TypelessForm today — the full score, the gaps, the competitors winning queries we lose, and exactly how we're going to move those numbers over the next week.
Why we didn't just use an existing tool
When I launched TypelessForm in March 2026, I did the obvious thing: I ran the brand through every AEO tracker I could find. Profound. Otterly. Peec.ai. HubSpot's free AEO Grader. The category is new, the tools are young, and I wanted a baseline number to improve against.
What I got back was four different numbers. HubSpot's grader awarded us 28 out of 100. One paid dashboard said 44. A third simply could not resolve the brand at all. A fourth was gated behind a $400-per-month plan before it would run the first query. And none of them would show me the underlying AI response the score was supposedly computed from. (Snapshot taken April 2026. Those vendors measure different things, don't publicly document their methodologies, and may score the brand differently by the time you read this.)
As an engineer, this was untenable. I needed an answer to two questions no vendor wanted to answer:
- Are you actually calling ChatGPT, or are you scraping Bing and inferring? The difference is enormous. ChatGPT's API uses its own grounding pool; Bing's SERP uses another. A "ChatGPT visibility score" derived from web scraping is not a ChatGPT visibility score.
- What counts as a mention? If my brand appears only in a cited URL but not in the spoken answer text, do you count that? Half-count it? Does "TypelessForm" in a source URL's slug equal "TypelessForm" pronounced as the recommended tool?
Nobody had documented answers. Every vendor had a proprietary scoring black box, a free trial that funnelled into a sales call, and methodology I could not reproduce.
So I wrote down the minimum I actually needed from a tracker:
- Hit the official APIs of ChatGPT, Gemini, Claude, and Perplexity — not scrape, not proxy.
- Save every raw response to disk so any number in the report can be audited back to the exact AI reply it came from.
- No third-party scoring layer — show mention counts and ranks, not a proprietary "Brand Presence Index".
- Open source — read the code, contest the logic, fork it.
- Cheap — usable on a pre-revenue project. Sub-dollar weekly runs.
Nothing on the market cleared that bar. So I built it.
What @webappski/aeo-tracker is
@webappski/aeo-tracker is a Node.js CLI published on npm under the MIT licence. It does one thing: it asks each major AI answer engine a handful of queries relevant to your category, records whether your brand shows up in each reply, extracts the competitor brands that did show up, and writes two deliverables — a Markdown report with inline SVG charts, and a fully interactive HTML dashboard.
Three commands, start to finish:
npm install -g @webappski/aeo-tracker
aeo-tracker init --auto
aeo-tracker run
aeo-tracker report --html
The first command installs the CLI. The second scrapes your own site (not AI scraping — just a fetch of your homepage), asks an LLM to suggest category-appropriate queries, validates them with a second model, and writes a config. The third runs the queries against every AI engine whose API key it finds in your shell environment. The fourth renders the report.
A few design choices worth calling out, because they map directly to the frustrations I had with the paid tools:
Direct API calls, nothing in between. When aeo-tracker says "ChatGPT", it means a call to gpt-5-search-api. When it says "Gemini", it means gemini-2.5-pro. When it says "Claude", it means claude-sonnet-4-6. Perplexity is either the official sonar-pro or a manual paste mode for Perplexity Pro's browser UI. No web scraping, no browser automation, no proxied sessions.
Two-model LLM cross-check on competitor extraction. After each AI response comes back, two cheaper classification models — gpt-5.4-mini and gemini-2.5-flash — independently extract every brand name mentioned in the text. If both agree, the brand lands in the "verified" tier of the report. If only one agreed, it lands in "unverified" with a dashed badge. Hallucinated brand mentions fall out automatically. I've not seen any competitor do this, and it matters — single-model extraction routinely invents brand names that never appeared in the source response.
Pre-flight query validation. Before any query hits the engines, a separate LLM pass checks whether each query is commercially ambiguous, too acronym-heavy, or outside the brand's actual category. Bad queries are rejected before you waste API spend on them.
Raw responses saved to disk. Every query-times-engine combination writes a JSON file under aeo-responses/YYYY-MM-DD/. If I ever want to contest a "not mentioned" verdict — or just re-read exactly what Gemini said — the file is right there.
This is the audit trail. Twelve JSON files for this run — three queries × four engines — each containing the full AI response, the grounding sources cited, token counts, latency, and the extractor's verdict. No number in the final report is unsourced.
Zero runtime dependencies. The package does not pull in a single third-party npm library at install time. The whole CLI — including its report renderer and its SVG chart generator — is plain Node.js 18+. It's auditable in an afternoon.
What the tracker shows for TypelessForm today
Here is the headline tile from today's run, 23 April 2026:
33 out of 100. PRESENT. Four out of twelve query-engine cells named TypelessForm. Four AI engines, three queries each — that's the twelve. The colour of the score is orange, not green. This is a pre-revenue brand in week five of existence. 0–15 at launch, 20–45 with some SEO investment, 60–85 for the category leaders. The tool is designed to chart progress from invisible to dominant across months, not to hand out grades today.
Breaking that apart by engine is where the real signal lives:
- Perplexity: 67%. TypelessForm appears on two of three queries — "top voice form automation tools 2026" and "best voice form filling software 2026". This is the strongest surface I have. Perplexity grounds in live web search, and the sources it cites back are blogs I can actually pitch.
- ChatGPT: 33%. One mention, on "best voice form filling software 2026". The other two queries return a list of competitors I do not want to be absent from: Voiceform, PolyAI, Thoughtly on the automation query; ConciCare, Anve Voice Forms, HealOS on healthcare intake.
- Gemini: 33%. Symmetric to ChatGPT but on a different query — Gemini lists us as #1 on "automation tools 2026" and ignores us on the other two. AnveVoice sits where we should be on the "best software" query.
- Claude: 0%. Complete invisibility. Zero hits across zero queries. Retell AI, Synthflow, Plivo, Vapi get the automation-tools answer. nVoq takes the healthcare slot. Form2Agent AI, Wispr Flow, Dragon by Nuance split the "best software" query. Claude pulls heavily from developer ecosystems — dev.to, GitHub, Product Hunt — and we don't yet have the footprint there.
Where we rank, where we don't, what's there instead
The "Position in AI answers" view is the one I look at most. Each cell is a full query-by-engine verdict, and if we're not listed the cell shows who the engine named instead. Competitor intelligence, in one screen:
The three queries the auto-suggester chose:
- "top voice form automation tools 2026" — the commercial query. Gemini and Perplexity name us #1. Claude returns voice-agent platforms (Retell, Synthflow, Vapi) — those aren't the same category but they sit in the answer. ChatGPT lists Voiceform, PolyAI, Thoughtly.
- "voice form filling for healthcare intake" — the vertical query. All four engines return zero TypelessForm. This is our single biggest visible gap. Claude cites nVoq; ChatGPT cites ConciCare, Anve Voice Forms, HealOS's AI Intake Agent; Perplexity cites AnveVoice, Compuser.ai, VoicePay. There is a clear set of incumbent names here that we are not competing with — and the tracker tells us exactly which ones to displace.
- "best voice form filling software 2026" — the problem query. ChatGPT and Perplexity name us #1. Claude names Form2Agent AI, Wispr Flow, Dragon by Nuance. Gemini names Voicy, Wispr Flow, DictaFlow, Dragon Professional.
Who gets named instead of us
Aggregated across all cells, TypelessForm and AnveVoice are the two most-named brands in our category — 4 and 3 mentions respectively — followed by Wispr Flow on 2. That's a map of mindshare. If I want the Gemini "best software" cell, AnveVoice is who I'm displacing. If I want the ChatGPT healthcare cell, it's ConciCare and HealOS.
Below that, the tracker lists the canonical sources AI engines cited while answering: usevoicy.com twice, then raftlabs.com, plivo.com, dev.to, salesforce.com, vellum.ai, aloware.com, lindy.ai, revavenues.ai, getvoip.com. These are the fastest path to AEO leverage: a single mention on usevoicy.com would propagate across every engine that cites it. That's one email, not a quarter of content marketing.
What the tracker tells us to do this week
The last card in the report is the one I actually use on Monday morning. It's a set of LLM-generated, engine-specific actions — each tied to a concrete query, a concrete gap, and the engines where the action would move the score:
This week's action list, generated from the run data, no edits:
- IGNORE — the tracker flagged healthcare intake. All four engines return zero TypelessForm on that query, and the tool's logical next move is a dedicated healthcare landing page. We're not taking it. TypelessForm's privacy policy excludes healthcare as a vertical, so this is where automation stops and a human reads the output. Worth calling out: a tracker surfaces gaps — it does not know which gaps you actually want filled.
- COMPETE — publish a 2026 alternatives comparison page. Several engines answered the "automation tools 2026" query with AnveVoice, Form2Agent, Dragon by Nuance, Wispr Flow. A head-to-head page naming those tools gives the engines a citation target for that exact query shape.
- LOCK IN WIN — pitch usevoicy this week.
usevoicy.comis the highest-cited canonical source in our category. One placement there propagates across multiple engines. This is the single highest-leverage action on the list. - FIX GAP — launch an ecommerce checkout demo page. Ecommerce is the second vertical query the auto-suggester proposed; unlike healthcare this one fits the product — speaking a delivery address, card expiry and contact details in one sentence is exactly the use-case the widget is built for.
Four items from the tracker: three we're executing this week, one we're consciously skipping. Every remaining move traces back to a specific cell in the query-engine matrix and can ship without waiting on anyone.
Try it on your own brand
The whole point of open-sourcing this tool is that you should run it on your brand, not take my numbers on faith. The minimum viable setup costs two API keys and under a minute of terminal time:
npm install -g @webappski/aeo-tracker
export OPENAI_API_KEY="sk-proj-..."
export GEMINI_API_KEY="AIzaSy..."
aeo-tracker init --yes --brand=YOURBRAND --domain=YOURDOMAIN.COM --auto
aeo-tracker run
aeo-tracker report --html
That covers the ChatGPT and Gemini columns and costs roughly $0.20 per run. Add an Anthropic key for the Claude column (+~$0.30) or a Perplexity key for the Perplexity column (+~$0.05). Full four-engine coverage: ~$0.55 per weekly run. The free tier of every required API is enough for the first month.
After the first run, the workflow is two commands once a week: aeo-tracker run && aeo-tracker report --html. The HTML report auto-opens in your browser.
npm package. The package README has a founder-friendly "first time in a terminal" walkthrough.
Why we open-sourced this instead of selling it
This is a fair question. Webappski is an AEO-optimisation agency — we sell exactly the kind of work this tracker points at, and a reasonable business move would have been to wrap it as a $49/month SaaS. We went the other way for one straightforward reason: the measurement should be commodity, the interpretation and execution should not.
A client who can independently run aeo-tracker and see their own raw numbers is a client who can check our work. That's the relationship we want. We'll show up on Monday with the report already open, explain which gaps are real and which are query-noise, and do the pitches and content that move the matrix — none of which is in the CLI, none of which is cheap, and all of which is why the agency exists.
If that sounds interesting: webappski.com/en/aeo-services. If it doesn't: the tool is yours anyway. No telemetry, no analytics, no traffic to our servers — your keys and your data stay on your machine.
The honest frame
For the record, here are the TypelessForm numbers that go with this tracker run, week five of the 60-day SaaS challenge:
| Metric | Day 1 | Today (Day 32) |
|---|---|---|
| AEO visibility score | — | 33/100 |
| AI engine cells mentioning us | 0 of 12 | 4 of 12 |
| Claude cells (training-data sensitive) | 0 of 3 | 0 of 3 |
| Perplexity cells (web-search sensitive) | — | 2 of 3 |
| Paying customers | 0 | 0 |
Zero revenue. Zero paying customers. A 33/100 score that a B2B-dashboard vendor would dress up as "emerging momentum" but which, measured honestly, means one visible gap per vertical and an entire AI engine we don't exist on yet.
The tool doesn't fix any of that. The tool just stops me from lying to myself about it. Next Monday, aeo-tracker run again, same three queries, same four engines, and the column that matters is the week-over-week diff — whether the action list actually moved the score.
Run it on your own brand. Tell me what surprised you.