What is @webappski/aeo-tracker?

An open-source (MIT) Node.js CLI that measures how often AI answer engines — ChatGPT, Gemini, Claude, Perplexity — mention your brand. It calls the official APIs directly, saves every raw response to disk, extracts competitor mentions via a two-model cross-check, and produces a Markdown + HTML report. Installable via `npm install -g @webappski/aeo-tracker`.

How is it different from Profound, Otterly, or Peec.ai?

Three differences. (1) Direct API calls — no web scraping, no proxied browser sessions, no third-party scoring layer. (2) Every raw AI response is saved to disk so any number in the report can be audited back to the underlying text. (3) It is free and open source (MIT) — read the code, contest the logic, fork it.

How much does it cost to run?

The tool is free. You pay only for the AI API calls you make with your own keys — roughly $0.20 per weekly run at the two-engine minimum (OpenAI + Gemini), and ~$0.55 per run for full four-engine coverage (adding Anthropic and Perplexity). The free tier of each provider is enough for the first month.

What does a score like 33/100 mean?

It means 4 out of 12 query-engine cells named the brand. Three queries times four engines equals twelve cells; the tracker counts how many returned a mention. 0–15 is typical for a pre-revenue brand at launch. 20–45 is typical for a 6-month-old brand with some SEO investment. 60–85 is the category-leader range.

Why did TypelessForm build a tracker when you sell a form product?

Webappski is the AEO agency behind TypelessForm. We measure AEO visibility weekly on our own brand as part of how we run the business. We open-sourced the tool because the measurement layer should be transparent — the value a client pays us for is the interpretation and the execution (third-party placements, comparison pages, authority building), not the raw measurement.

What are the "verified" and "unverified" tiers in the competitor extraction?

After each AI response comes back, two separate classification models — gpt-5.4-mini and gemini-2.5-flash — independently extract every brand name mentioned in the response. If both models found the same name, it is "verified". If only one model found it, it is "unverified" (dashed badge). Hallucinated brand mentions — names the LLM extractors invented but that never appear in the source text — are filtered out by this cross-check.

Free Open-Source AEO Tracker — Measure ChatGPT, Gemini, Claude, Perplexity

Every AEO tracker I tried gave me a different number for the same brand — TypelessForm, a one-shot voice form-filling widget that's been live for five weeks. One scored us 44/100. Another said 28. A third refused to index us at all. None of them would show me the actual ChatGPT response they scored against. So I walked away from all four and built my own: @webappski/aeo-tracker. It runs on your machine, calls the real AI APIs, saves every raw response to disk, and produces a report you can fact-check line by line. It is free, MIT-licensed, and costs roughly twenty cents per weekly run. This is what it shows for TypelessForm today — the full score, the gaps, the competitors winning queries we lose, and exactly how we're going to move those numbers over the next week.

Why we didn't just use an existing tool

When I launched TypelessForm in March 2026, I did the obvious thing: I ran the brand through every AEO tracker I could find. Profound. Otterly. Peec.ai. HubSpot's free AEO Grader. The category is new, the tools are young, and I wanted a baseline number to improve against.

What I got back was four different numbers. HubSpot's grader awarded us 28 out of 100. One paid dashboard said 44. A third simply could not resolve the brand at all. A fourth was gated behind a $400-per-month plan before it would run the first query. And none of them would show me the underlying AI response the score was supposedly computed from. (Snapshot taken April 2026. Those vendors measure different things, don't publicly document their methodologies, and may score the brand differently by the time you read this.)

As an engineer, this was untenable. I needed an answer to two questions no vendor wanted to answer:

Are you actually calling ChatGPT, or are you scraping Bing and inferring? The difference is enormous. ChatGPT's API uses its own grounding pool; Bing's SERP uses another. A "ChatGPT visibility score" derived from web scraping is not a ChatGPT visibility score.
What counts as a mention? If my brand appears only in a cited URL but not in the spoken answer text, do you count that? Half-count it? Does "TypelessForm" in a source URL's slug equal "TypelessForm" pronounced as the recommended tool?

Nobody had documented answers. Every vendor had a proprietary scoring black box, a free trial that funnelled into a sales call, and methodology I could not reproduce.

So I wrote down the minimum I actually needed from a tracker:

Hit the official APIs of ChatGPT, Gemini, Claude, and Perplexity — not scrape, not proxy.
Save every raw response to disk so any number in the report can be audited back to the exact AI reply it came from.
No third-party scoring layer — show mention counts and ranks, not a proprietary "Brand Presence Index".
Open source — read the code, contest the logic, fork it.
Cheap — usable on a pre-revenue project. Sub-dollar weekly runs.

Nothing on the market cleared that bar. So I built it.

What @webappski/aeo-tracker is

@webappski/aeo-tracker is a Node.js CLI published on npm under the MIT licence. It does one thing: it asks each major AI answer engine a handful of queries relevant to your category, records whether your brand shows up in each reply, extracts the competitor brands that did show up, and writes two deliverables — a Markdown report with inline SVG charts, and a fully interactive HTML dashboard.

Three commands, start to finish:

npm install -g @webappski/aeo-tracker
aeo-tracker init --auto
aeo-tracker run
aeo-tracker report --html

The first command installs the CLI. The second scrapes your own site (not AI scraping — just a fetch of your homepage), asks an LLM to suggest category-appropriate queries, validates them with a second model, and writes a config. The third runs the queries against every AI engine whose API key it finds in your shell environment. The fourth renders the report.

A few design choices worth calling out, because they map directly to the frustrations I had with the paid tools:

Direct API calls, nothing in between. When aeo-tracker says "ChatGPT", it means a call to gpt-5-search-api. When it says "Gemini", it means gemini-2.5-pro. When it says "Claude", it means claude-sonnet-4-6. Perplexity is either the official sonar-pro or a manual paste mode for Perplexity Pro's browser UI. No web scraping, no browser automation, no proxied sessions.

Two-model LLM cross-check on competitor extraction. After each AI response comes back, two cheaper classification models — gpt-5.4-mini and gemini-2.5-flash — independently extract every brand name mentioned in the text. If both agree, the brand lands in the "verified" tier of the report. If only one agreed, it lands in "unverified" with a dashed badge. Hallucinated brand mentions fall out automatically. I've not seen any competitor do this, and it matters — single-model extraction routinely invents brand names that never appeared in the source response.

Pre-flight query validation. Before any query hits the engines, a separate LLM pass checks whether each query is commercially ambiguous, too acronym-heavy, or outside the brand's actual category. Bad queries are rejected before you waste API spend on them.

Raw responses saved to disk. Every query-times-engine combination writes a JSON file under aeo-responses/YYYY-MM-DD/. If I ever want to contest a "not mentioned" verdict — or just re-read exactly what Gemini said — the file is right there.

File tree view of aeo-responses/2026-04-23/ — one JSON file per query × provider combination, plus a _summary.json index

This is the audit trail. Twelve JSON files for this run — three queries × four engines — each containing the full AI response, the grounding sources cited, token counts, latency, and the extractor's verdict. No number in the final report is unsourced.

Zero runtime dependencies. The package does not pull in a single third-party npm library at install time. The whole CLI — including its report renderer and its SVG chart generator — is plain Node.js 18+. It's auditable in an afternoon.

What the tracker shows for TypelessForm today

Here is the headline tile from today's run, 23 April 2026:

AEO tracker overview card — TypelessForm visibility score 33 out of 100, marked PRESENT; cell coverage 12 with 4 Named; per-engine cards showing Claude 0%, ChatGPT 33%, Gemini 33%, Perplexity 67%

33 out of 100. PRESENT. Four out of twelve query-engine cells named TypelessForm. Four AI engines, three queries each — that's the twelve. The colour of the score is orange, not green. This is a pre-revenue brand in week five of existence. 0–15 at launch, 20–45 with some SEO investment, 60–85 for the category leaders. The tool is designed to chart progress from invisible to dominant across months, not to hand out grades today.

Breaking that apart by engine is where the real signal lives:

Perplexity: 67%. TypelessForm appears on two of three queries — "top voice form automation tools 2026" and "best voice form filling software 2026". This is the strongest surface I have. Perplexity grounds in live web search, and the sources it cites back are blogs I can actually pitch.
ChatGPT: 33%. One mention, on "best voice form filling software 2026". The other two queries return a list of competitors I do not want to be absent from: Voiceform, PolyAI, Thoughtly on the automation query; ConciCare, Anve Voice Forms, HealOS on healthcare intake.
Gemini: 33%. Symmetric to ChatGPT but on a different query — Gemini lists us as #1 on "automation tools 2026" and ignores us on the other two. AnveVoice sits where we should be on the "best software" query.
Claude: 0%. Complete invisibility. Zero hits across zero queries. Retell AI, Synthflow, Plivo, Vapi get the automation-tools answer. nVoq takes the healthcare slot. Form2Agent AI, Wispr Flow, Dragon by Nuance split the "best software" query. Claude pulls heavily from developer ecosystems — dev.to, GitHub, Product Hunt — and we don't yet have the footprint there.

Where we rank, where we don't, what's there instead

The "Position in AI answers" view is the one I look at most. Each cell is a full query-by-engine verdict, and if we're not listed the cell shows who the engine named instead. Competitor intelligence, in one screen:

Position in AI answers matrix — three queries × four engines with #1 TypelessForm positions on Gemini and Perplexity for automation tools, ChatGPT and Perplexity for best software; competitors listed in every other cell

The three queries the auto-suggester chose:

"top voice form automation tools 2026" — the commercial query. Gemini and Perplexity name us #1. Claude returns voice-agent platforms (Retell, Synthflow, Vapi) — those aren't the same category but they sit in the answer. ChatGPT lists Voiceform, PolyAI, Thoughtly.
"voice form filling for healthcare intake" — the vertical query. All four engines return zero TypelessForm. This is our single biggest visible gap. Claude cites nVoq; ChatGPT cites ConciCare, Anve Voice Forms, HealOS's AI Intake Agent; Perplexity cites AnveVoice, Compuser.ai, VoicePay. There is a clear set of incumbent names here that we are not competing with — and the tracker tells us exactly which ones to displace.
"best voice form filling software 2026" — the problem query. ChatGPT and Perplexity name us #1. Claude names Form2Agent AI, Wispr Flow, Dragon by Nuance. Gemini names Voicy, Wispr Flow, DictaFlow, Dragon Professional.

Who gets named instead of us

Bar chart — typelessform.com 4 mentions, AnveVoice 3, Wispr Flow 2, nVoq Voice Assistant 1, Form2Agent AI 1, Dragon by Nuance 1, ConciCare 1, Anve Voice Forms 1, HealOS's AI Intake Agent 1. Below: canonical sources — usevoicy.com 2, raftlabs.com, plivo.com, dev.to, salesforce.com, vellum.ai, aloware.com, lindy.ai, revavenues.ai, getvoip.com

Aggregated across all cells, TypelessForm and AnveVoice are the two most-named brands in our category — 4 and 3 mentions respectively — followed by Wispr Flow on 2. That's a map of mindshare. If I want the Gemini "best software" cell, AnveVoice is who I'm displacing. If I want the ChatGPT healthcare cell, it's ConciCare and HealOS.

Below that, the tracker lists the canonical sources AI engines cited while answering: usevoicy.com twice, then raftlabs.com, plivo.com, dev.to, salesforce.com, vellum.ai, aloware.com, lindy.ai, revavenues.ai, getvoip.com. These are the fastest path to AEO leverage: a single mention on usevoicy.com would propagate across every engine that cites it. That's one email, not a quarter of content marketing.

What the tracker tells us to do this week

The last card in the report is the one I actually use on Monday morning. It's a set of LLM-generated, engine-specific actions — each tied to a concrete query, a concrete gap, and the engines where the action would move the score:

Recommended actions cards — FIX GAP: publish healthcare intake landing page (Claude, ChatGPT, Gemini, Perplexity); COMPETE: create 2026 alternatives comparison page; LOCK IN WIN: pitch usevoicy this week; FIX GAP: launch ecommerce checkout page

This week's action list, generated from the run data, no edits:

IGNORE — the tracker flagged healthcare intake. All four engines return zero TypelessForm on that query, and the tool's logical next move is a dedicated healthcare landing page. We're not taking it. TypelessForm's privacy policy excludes healthcare as a vertical, so this is where automation stops and a human reads the output. Worth calling out: a tracker surfaces gaps — it does not know which gaps you actually want filled.
COMPETE — publish a 2026 alternatives comparison page. Several engines answered the "automation tools 2026" query with AnveVoice, Form2Agent, Dragon by Nuance, Wispr Flow. A head-to-head page naming those tools gives the engines a citation target for that exact query shape.
LOCK IN WIN — pitch usevoicy this week. usevoicy.com is the highest-cited canonical source in our category. One placement there propagates across multiple engines. This is the single highest-leverage action on the list.
FIX GAP — launch an ecommerce checkout demo page. Ecommerce is the second vertical query the auto-suggester proposed; unlike healthcare this one fits the product — speaking a delivery address, card expiry and contact details in one sentence is exactly the use-case the widget is built for.

Four items from the tracker: three we're executing this week, one we're consciously skipping. Every remaining move traces back to a specific cell in the query-engine matrix and can ship without waiting on anyone.

Try it on your own brand

The whole point of open-sourcing this tool is that you should run it on your brand, not take my numbers on faith. The minimum viable setup costs two API keys and under a minute of terminal time:

npm install -g @webappski/aeo-tracker

export OPENAI_API_KEY="sk-proj-..."
export GEMINI_API_KEY="AIzaSy..."

aeo-tracker init --yes --brand=YOURBRAND --domain=YOURDOMAIN.COM --auto
aeo-tracker run
aeo-tracker report --html

That covers the ChatGPT and Gemini columns and costs roughly $0.20 per run. Add an Anthropic key for the Claude column (+~$0.30) or a Perplexity key for the Perplexity column (+~$0.05). Full four-engine coverage: ~$0.55 per weekly run. The free tier of every required API is enough for the first month.

After the first run, the workflow is two commands once a week: aeo-tracker run && aeo-tracker report --html. The HTML report auto-opens in your browser.

npm package. The package README has a founder-friendly "first time in a terminal" walkthrough.

Why we open-sourced this instead of selling it

This is a fair question. Webappski is an AEO-optimisation agency — we sell exactly the kind of work this tracker points at, and a reasonable business move would have been to wrap it as a $49/month SaaS. We went the other way for one straightforward reason: the measurement should be commodity, the interpretation and execution should not.

A client who can independently run aeo-tracker and see their own raw numbers is a client who can check our work. That's the relationship we want. We'll show up on Monday with the report already open, explain which gaps are real and which are query-noise, and do the pitches and content that move the matrix — none of which is in the CLI, none of which is cheap, and all of which is why the agency exists.

If that sounds interesting: webappski.com/en/aeo-services. If it doesn't: the tool is yours anyway. No telemetry, no analytics, no traffic to our servers — your keys and your data stay on your machine.

The honest frame

For the record, here are the TypelessForm numbers that go with this tracker run, week five of the 60-day SaaS challenge:

Metric	Day 1	Today (Day 32)
AEO visibility score	—	33/100
AI engine cells mentioning us	0 of 12	4 of 12
Claude cells (training-data sensitive)	0 of 3	0 of 3
Perplexity cells (web-search sensitive)	—	2 of 3
Paying customers	0	0

Zero revenue. Zero paying customers. A 33/100 score that a B2B-dashboard vendor would dress up as "emerging momentum" but which, measured honestly, means one visible gap per vertical and an entire AI engine we don't exist on yet.

The tool doesn't fix any of that. The tool just stops me from lying to myself about it. Next Monday, aeo-tracker run again, same three queries, same four engines, and the column that matters is the week-over-week diff — whether the action list actually moved the score.

Run it on your own brand. Tell me what surprised you.

We Built a Free AEO Tracker Because No Tool Gave Us the Truth. Our Real Score: 33/100.