Behind the Scenes: How ReviewSense Turns Any LLM Into a Review Reply Engine
Shiv Srivastava
Product Architect & Founder · AdvikLabs
There is a question I get asked at almost every demo: "Which AI model does ReviewSense use?" The honest answer is: whichever one you want. ReviewSense does not run its own language model, does not fine-tune anything in the background, and does not make AI decisions on your behalf without your API key. You bring your credentials. We bring the prompt engineering, the review workflow, and the orchestration layer that turns a raw WooCommerce review into a polished, on-brand reply.
This post is a transparent look at how that architecture works — the design decisions behind it, what the plugin actually sends to an LLM, and why we think this approach is better for store owners than building a proprietary black-box model.
Why We Don't Run Our Own Model
When I started building ReviewSense, the obvious move seemed to be: train a model on review data, host it, and charge for inference. A lot of early AI startups did exactly that. And almost all of them hit the same wall — the cost of keeping a hosted model competitive with GPT-4o or Gemini 1.5 Pro is enormous, the latency is hard to beat, and the quality gap widens every time OpenAI or Google ships an update.
More importantly: store owners already pay for AI. They have ChatGPT Plus. They have a Google Cloud account. They have an Anthropic API key from a previous project. The last thing they need is another subscription to another company's model wrapped inside a SaaS product. So we made a different call — ReviewSense is a bring-your-own-key product. Your API key, your model, your usage bill. We are the intelligence layer on top.
"We are not in the model business. We are in the prompt engineering and workflow business. The best models in the world are already available — our job is to make them useful for a very specific task."
— Shiv Srivastava, Founder, AdvikLabs
Which Models Can You Use?
ReviewSense currently supports four AI providers. Each requires you to paste your own API key into the plugin settings — the key is stored encrypted in your WordPress database and is never sent to our servers.
- OpenAI — GPT-4o, GPT-4o-mini, GPT-4 Turbo. The free plan of ReviewSense is locked to GPT-4o-mini, which is fast and cheap while still producing solid reply quality.
- Google Gemini — Gemini 1.5 Pro, Gemini 1.5 Flash. Excellent for stores with a large volume of reviews that need fast, cost-effective generation.
- Anthropic Claude — Coming in a future release. Claude is in our integration pipeline and will be available once testing is complete.
- xAI Grok — Coming in a future release. Grok support is planned and will ship once we have validated quality and stability.
Note
v1.0.0 ships with OpenAI and Google Gemini support. Both require your own API key, stored encrypted in WordPress. Anthropic Claude and xAI Grok are in testing and will be added in a future release. You switch providers at any time from ReviewSense → Settings → AI Provider.
What ReviewSense Actually Sends to the LLM
This is the part of the architecture that most users never see — and it is where the real work happens. When you click Generate Reply on a review, ReviewSense does not just forward the review text to the API and hope for the best. It assembles a structured prompt from several sources and sends a carefully constructed request.
// Simplified prompt assembly (actual implementation in PHP)
function buildReplyPrompt(review: Review, brandVoice: BrandVoice): string {
return `
You are a customer support specialist for ${brandVoice.businessName}.
BRAND VOICE:
Tone: ${brandVoice.toneDescription}
Phrases to use: ${brandVoice.preferredPhrases.join(', ')}
Phrases to avoid: ${brandVoice.avoidPhrases.join(', ')}
REVIEW TO REPLY TO:
Star rating: ${review.starRating}/5
Product: ${review.productName}
Customer review: "${review.text}"
Write a reply that:
- Matches the tone and voice described above
- Addresses the specific points the customer raised
- Is warm and human-sounding — not corporate or generic
- Is 2–4 sentences long unless the review warrants more
- Does NOT use phrases from the avoid list
Reply only with the response text. No labels, no formatting.
`.trim();
}The system prompt includes your brand voice configuration — the tone description you wrote, the phrases you want used, and the phrases that should never appear. The user prompt contains the star rating, the product name, and the full review text. Star rating matters because a five-star review and a two-star review require completely different emotional registers, and sending that context explicitly produces far better results than leaving the model to infer it.
Sentiment Classification Without a Classifier
One of the design questions early on was whether we needed a separate sentiment classification step before generation — something like a dedicated model that reads the review and labels it as positive, frustrated, sarcastic, or urgent before passing those labels to the generation prompt.
We tested this and the answer turned out to be no — at least not as a separate model call. Modern LLMs are capable enough that you can ask them to both classify and respond in a single prompt, which cuts latency roughly in half. The generation prompt implicitly handles sentiment by including the star rating and the raw review text. A three-star review with the word "disappointed" in it generates a different reply than a three-star review that says "decent but expected more" — the model picks that up without needing a separate classification pass.
Tip
If you get a generated reply that misses the emotional register of the review — too cheerful for a complaint, or too apologetic for a positive review — the fix is usually in your Brand Voice setup. Add a note like: "For 1-2 star reviews, always acknowledge the frustration before offering a solution." That single instruction changes the output dramatically.
Automation: When the LLM Runs Without You
On Starter plan and above, you can enable Auto-Reply. When a new review comes in, ReviewSense calls your chosen LLM automatically using the same prompt structure described above. Based on your automation rules — auto-publish 4-star and 5-star, hold everything else — the generated reply either goes live immediately or lands in your approval queue.
The important thing to understand is that Auto-Reply is calling your API key, consuming your quota, and using the model you selected. If you are on a tight API budget, you can set automation to only trigger for reviews above a certain star rating, which dramatically reduces the number of API calls made per month. You are always in control of both the quality gate and the spend.
Why Your API Key Stays in WordPress
This is a deliberate security decision. When ReviewSense makes an LLM call, it calls the OpenAI or Gemini API directly from your WordPress server — not via a ReviewSense proxy. Your API key never leaves your hosting environment and is never transmitted to our servers. We have no access to it. The plugin stores it encrypted using WordPress's built-in options API, and it is only decrypted in memory at the moment of the API call.
There is a practical upside to this architecture beyond security: latency. Direct API calls from your server to OpenAI are typically 200–600ms depending on model and prompt size. Running those calls through an intermediary proxy would add 50–200ms of overhead on every single request. For a feature you might use dozens of times a day, that adds up.
Model Quality Differences in Practice
After running ReviewSense through a private beta with a group of WooCommerce store owners, a few patterns in model preference emerged. These are not definitive benchmarks — just observations from real usage.
- GPT-4o-mini: Best for stores with high review volume and tight API budgets. Consistently solid output, fast response times, and the lowest cost per call. The default choice for most stores.
- GPT-4o: Noticeably better at nuanced complaints — the kind of review where the customer is frustrated but trying to be fair. Worth the extra cost for stores in premium or high-touch niches.
- Gemini 1.5 Flash: Fastest of the currently supported models. Excellent for stores that prioritise speed and want near-instant auto-publishing on high-volume review days.
- Gemini 1.5 Pro: More thorough reasoning than Flash. Good middle ground between GPT-4o quality and Flash speed, particularly for longer, more detailed reviews.
Note
Anthropic Claude and xAI Grok are on the roadmap and will be added in a future release once integration testing is complete.
What We Are Building Next
The next meaningful prompt engineering improvement in our roadmap is context injection — the ability to pull relevant FAQ entries and policy text into the prompt automatically based on what the review is about. If a customer complains about a return, the model would see your return policy inline. If they ask about shipping, it would see your shipping FAQ. This is retrieval-augmented generation applied to a very specific, practical problem, and early tests show a significant improvement in how specific and trustworthy the replies feel.
We are also looking at per-model output comparison — showing you side-by-side what GPT-4o and Claude would each generate for the same review, so you can pick your preferred output before publishing. This is not about adding complexity. It is about giving you confidence that what goes live is the best version available to you.
"The LLMs are going to keep getting better on their own. Our job is to make sure ReviewSense gets smarter about how it uses them."
— Shiv Srivastava, Founder, AdvikLabs
Previous
Setting Up the ReviewSense WordPress Plugin in 5 Minutes
Next
Good Sales, Silent Inbox: The Review Problem Every WooCommerce Store Ignores
Ready to try AI ReviewSense?
Bring your own API key and start generating on-brand replies for every WooCommerce review. Free plan available.