How to track brand mentions in ChatGPT for free (no $499 tool)
Profound is $499 a month. Otterly is $29. The DIY prompt panel is $0 and teaches you which queries actually matter. Here is the free methodology.
The DIY prompt panel methodology costs $0, runs in about an hour a month, and gives you the same baseline the paid tools are selling. We use it ourselves for client audits at Signals before we recommend any paid tool, and we run it alongside our 10,000+ campaign pipeline since 2017 to verify placements. This is the free methodology, the exact spreadsheet layout, and the three questions that tell you when it is actually time to upgrade.
The DIY prompt panel in one sentence
That is the whole trick. The paid tools run the same methodology at scale across thousands of prompts and dozens of engines. For a single brand tracking 25-50 prompts across 4 engines once a month, the DIY version is identical in information value and costs nothing. The paid tools start earning their keep once you cross 200+ prompts or need daily monitoring, which we cover at the end.
Step 1: Build the prompt list
Start with 25 prompts. Under 25 misses too many patterns; over 50 makes the monthly run tedious and kills your own compliance with the schedule. 25 is the practical minimum and 50 is the ceiling before a tool becomes worth the money.
Spread your 25 across five clusters so you catch different buyer intents. Each cluster serves a different retrieval pattern, and missing any one of them hides a real visibility gap.
| Cluster | Prompt count | Example |
|---|---|---|
| Category best | 5 | "What is the best CRM for early-stage startups?" |
| Use-case best | 5 | "What CRM should a 3-person sales team use if they already use HubSpot forms?" |
| Alternatives | 5 | "What are the best alternatives to Salesforce for small SaaS companies?" |
| Integration / compatibility | 5 | "Which CRMs integrate with Slack and Zapier natively?" |
| Pricing / trust | 5 | "Is |
The Pricing / Trust cluster is the one most brands skip and it is the single highest-signal cluster because it catches direct competitive displacement. If ChatGPT tells a buyer that a competitor is worth the price over you, that is a specific claim you can test, document, and counter. Our backlinks vs brand mentions thesis covers why that competitive displacement happens and what fixes it.
Step 2: Run the panel (the clean session protocol)
The run protocol:
Open ChatGPT, Perplexity, and Google AI Mode in separate incognito windows. Do not sign in on any of them.
Run each of your 25 prompts against each engine. Log the full response into a spreadsheet. Capture the brands mentioned, their order, and a short note on context.
For ChatGPT specifically, run each prompt at least twice. The model's response varies across runs due to sampling randomness. Two runs catch most of the variance; three runs eliminate essentially all of it for a 25-prompt panel.
When the response cites sources (Perplexity always, ChatGPT in browsing mode, AI Mode sometimes), copy the cited URLs too. Those are the placement targets the engine is trusting.
The whole run takes about 45 minutes for a 25-prompt panel across four engines, assuming you copy-paste efficiently. Set a monthly calendar reminder. Do not skip months, because the pattern only shows up in the delta between runs.
Step 3: Score the output (the 4-column spreadsheet)
Your tracking spreadsheet needs only four columns per engine per prompt. Fewer than that loses signal; more than that turns tracking into a second job. Keep it simple or you will not do it.
| Column | What to put in it |
|---|---|
| Mentioned? | Yes / No, binary, did your brand appear at all |
| Position | 1st, 2nd, 3rd, etc. order in the response when mentioned |
| Sentiment | Positive / Neutral / Negative: the framing the model used |
| Top competitor | The single brand the engine recommended instead or alongside |
Share of voice is the key derived metric. Count how many of the 25 prompts mentioned your brand, divide by 25, multiply by 100. A category leader typically hits 35-50% panel share of voice. In a crowded market, 5-10% is a credible starting number. Track the number month over month and the direction matters more than the absolute value.
Step 4: Diagnose the gaps (the 3 questions)
Once you have two months of data you can start seeing patterns. The three questions to ask of your own spreadsheet, in order:
Which clusters are you losing? If you show up in Category Best but not Use-Case Best, your brand is not specifically associated with your use cases in the model's training corpus. You need to earn mentions in use-case-framed content (Reddit threads, use-case-tagged listicles).
Which competitor is winning the prompts you lose? If the same competitor keeps appearing instead of you, they have solved the mention problem for your category and you have not. Read their Wikipedia page, their G2 profile, and their top Reddit threads. That is the footprint you need to match.
Which engines are worst for you? If you show up in Perplexity but not ChatGPT, you have a Reddit-heavy presence without the Wikipedia or Forbes footprint ChatGPT leans on. If you show up in ChatGPT but not Google AI Mode, you have the editorial layer but not the video or long-form UGC layer. Each gap points at a specific placement strategy.
Our 50 domains source graph covers which sites feed each engine in detail, and the tier system tells you where to direct placement effort after the panel diagnoses a gap.
The DIY tool stack (all free)
You need three things and none of them cost money. Spend zero dollars until you have two months of data proving the methodology is working, then decide whether to scale.
A clean browser profile. Create a fresh Chrome or Firefox profile with no extensions, no sign-in, no history. Use only this profile for panel runs.
A spreadsheet. Google Sheets or Excel with one tab per engine and rows for each prompt. Use the 4-column scoring system.
A calendar reminder. Monthly, first weekday of the month, 1-hour block. The reminder is the single most important piece of the system because compliance is the bottleneck, not methodology.
For automation, you can optionally layer in Google Alerts on your brand name (free, catches unlinked mentions across the web that predict future AI citations) and Brand24 or Mention free tiers for roughly 1,000 mentions a month of tracking. The baseline DIY spreadsheet remains the source of truth; these add peripheral signal.
When to actually pay for a tool
Paid GEO tracking tools (Profound, AthenaHQ, Peec AI, Otterly, Semrush AI Toolkit, Ahrefs Brand Radar) become worth the money when three specific conditions apply together. Fewer than three and the DIY panel is still the better value.
You are tracking 200+ prompts across multiple verticals or product lines. DIY runs get impractical above 100 prompts because the monthly workload exceeds 4 hours.
You need weekly or daily updates instead of monthly. Retrieval sources change fast (the September 2025 Reddit collapse happened over 5 days) and if your content strategy needs to respond at that cadence, automated tracking earns its keep.
You have competitive benchmarking against 5+ named competitors where you need to track every mention, not just your own. Panel-level share of voice is enough for your own visibility; competitive programs need per-brand granularity.
At $29/month Otterly is the most affordable entry point. At $499/ month Profound is the enterprise end and in a recent 30-day independent test it actually lost 1% answer share while AthenaHQ gained 45% at $295/month, so price is not a reliable quality proxy. Run the DIY panel for two months first, decide what your pain point is, and shop for the tool that fits that specific gap.
Frequently asked questions
Why incognito? Can I just sign out of ChatGPT?
Incognito is safer because it eliminates browser-level tracking cookies, local storage, and cached state that can affect response personalization. Signing out is usually enough, but incognito guarantees a clean session. On Perplexity specifically, the difference matters because signed-in Perplexity biases heavily toward your past query history.
How many times should I run each prompt?
Two for most prompts, three for high-stakes ones. ChatGPT's response varies across runs due to sampling randomness. Two runs catch the 80th-percentile variance; three runs catch essentially all of it. Any more is diminishing returns unless you are doing statistical significance testing.
Which engine should I prioritize if I only have time for one?
Pick the engine your buyers actually use. For B2B SaaS that is usually ChatGPT in browsing mode plus Google AI Mode. For consumer research it is Perplexity and Google AI Overviews. Running a single-engine panel caps your addressable visibility at roughly 14% of the full cross-engine pool (only 11% of cited domains appear on multiple engines), so treat the one-engine panel as a starting point, not the finish line.
What if my brand never appears? Is the panel still worth it?
Yes, and it is more valuable at that stage, not less. The panel tells you which specific prompts your competitors own, which engines they own them on, and which context. That is the gap-analysis input you need to build a placement strategy from zero. Our complete ChatGPT mentions guide covers the 90-day plan to move from zero to measurable share of voice.
How long before the panel shows movement?
Retrieval-layer moves (Perplexity, SearchGPT in browsing mode, AI Overviews) show up in days to weeks after a new placement lands. Training-layer moves (ChatGPT default, Gemini) take a full corpus refresh cycle, which is typically months. Plan for two quarters before judging the strategy, and run the panel every month in between.
Should I track branded mentions (unlinked)?
Yes, separately. Google Alerts on your brand name catches most unlinked mentions for free. Branded mention volume correlates 0.664 with AI visibility per Ahrefs' 75K brand study, which means the unlinked mention count is almost the strongest leading indicator of future citation growth. When branded mentions climb, citations follow about two to three months later.
What about voice search and assistants like Siri or Alexa?
Treat them as extensions of the underlying LLM retrieval graph. Siri increasingly routes to ChatGPT; Alexa has its own ranking stack biased toward Amazon content. Both are still small enough in share of voice that DIY tracking is not worth the extra workload for most brands. Revisit in 12 months.
Related Services
Continue Reading
Sources: Ahrefs ChatGPT brand monitoring guide, Profound prompt volumes feature documentation, AthenaHQ vs Profound vs Peec.ai 30-day independent test results, Frase.io AI search monitoring prompt taxonomy guide, Semrush panel methodology documentation.