Steps to Using Built-In AI Theme Classifier to Detect Fraud

Survey fraud is on the rise—and honestly, it’s evolving faster than most of us can keep up. Not to sound like an old lady yelling at bots to get off her lawn, but back in my day, it used to be easy to spot a fake open-end. Now? Not so much. These days, identifying fraudulent responses takes a lot more time and energy—and unfortunately, it’s only getting worse.

Just how bad is it? A year ago, I ran a pet food study and ended up cutting about 20% of responses for quality. Last month, I ran a similar study… and cut 50%. Half. That’s bananas.

I have always used a multi-step QA process to catch fraud, and open-ends are just one part of that process, but they’re often the first place I look when something feels off—a gut-check, if you will. Unfortunately, these days, responses sound reasonable enough to slide under the radar, especially when you’re dealing with hundreds (or thousands) of them. What used to be a quick scan now feels like hunting for deepfakes in a pile of earnest comments.

So I started wondering: could I use an AI-powered thematic coding tool to make the review process faster—and smarter?

I already had a Displayr license, so I figured I’d experiment with their built-in classification tool. Originally built to help with coding qual responses into themes, I wanted to see if I could give it aside hustle: sniffing out the fakes. If you want to skip ahead to the results and how to try it, you can watch my video on YouTube.

Test 1 – Let’s Start with Themes

First, I used Displayr’s native coding to flag responses that didn’t fall into any clear category. The logic? Real people tend to say similar things—like “it’s healthy” or “my dog loves the taste”—while bots or lazy survey-takers say… weird stuff. Or nothing meaningful at all.

I started with 10 themes, which is the Displayr default.That flagged a few oddballs—about 20. Meh.

So I narrowed it down:

8 themes = 31 unclassified
6 themes = a few more, but still not the volumeI expected—especially knowing half this dataset was garbage.

Bottom line: not good enough.

Test 2 – AI fraud detection prompt

Next, I tried a different approach. I asked ChatGPT to help me write a prompt for Displayr’s AI coding bot. The goal? Teach the bot how to sniff out suspicious responses using specific fraud signals. Yes, I used a bot to teach a bot. See…I’m not the old lady yelling at kids to get off my lawn, I’m creating a bot drum circle (ChatGPT probably could have come up with a better metaphor there).

Here was my first attempt:

Analyze each open-ended response for signs of fraud, often caused by bots or AI tools used to complete surveys quickly for incentives. Use the criteria below. Flag as:

Fraudulent – if 2+ indicators are present
Suspicious – if 1 indicator is present
Clean – if no indicators are present

Fraud Indicators:

Generic or Irrelevant – Vague or off-topic responses that don’t directly answer the question.
AI-Like Language – Overly formal phrasing, perfect grammar, or unnatural transitions suggesting AI-generated text.
High Semantic Similarity – Meaning is nearly identical to other responses, even if wording differs.
Fast + Long – 25+ word responses completed in under 10 seconds.
Effort Inconsistency – One detailed response paired with minimal or blank others.
Contradicts Closed-Ends – Says one thing in open text that conflicts with earlier answers.

Sounds good, right? Except it didn’t really work. Turns out Displayr’s bot isn’t great at doing three things at once (identify indicators, count them, and classify accordingly). So I simplified. As you can see, it basically classified responses into one of the 6 fraud categories I created…not what I wanted.

I ran several versions of the prompt, iterating as I went. I even considered cheating and feeding in the known frauds from my hand-coding (I didn’t). Instead, I gave ChatGPT a few open ends and asked for help flagging red flags. Eventually, I landed on something that actually worked.

The Final Prompt: Cleaner, Sharper, More Results

Here’s the best approach that I came up with, and I hope it saves you some time. I did test multiple different versions of this where I varied the number of repeated phrases and the examples I gave. This was the best.

By the way, I didn’t come up with the list of repeated phrases, that was ChatGPT responding to the50+ open ends I loaded into it.

I also limited the bot to assigning just one category per response. That cleaned up the analysis and actually made it much easier to review responses that fell into each category.

What Happened When I Compared to Hand Coding?

Out of 435 responses flagged as “natural and detailed,” only23 were ones I had manually cut—and those were mostly dismissed based on other inconsistencies in the data. Here’s one:

“It stands out because it's practical, well-designed, and offers great value for the price. Plus, it just seems like something that would make life easier or more enjoyable!”

Sounds legit, right? Well, this respondent failed other QA checks, so out they went.

The easiest to dismiss? Strange grammar and broken English(code 3). I cut all of these.

The hardest? Repeated but legit-sounding phrases (code 1).Some of those were genuine; others weren’t. It’s murky.

So… Can Displayr’s AI Catch Fraud?

Kind of.

No, it’s not the magic button that I hope it one day can become. But this process can make it much easier to sort open-ended responses as problematic or not much faster. By combining the AI theme classifier with a custom fraud-detection prompt, I got surprisingly close to what I’d expect from a careful manual review.

My Big Takeaway?

You can’t trust the bots to do all the thinking (yet)—but if you train it well, it’ll spot the obvious junk, narrow your focus, and help you clean up your dataset a whole lot faster.

I would love to hear if others are using AI as fraud detectors. I decided to try the tools that I already had at my disposal first, but if you want to recommend another tool, I’m happy to test some others.

‍