52 AI experiments
Posts
Week 37: Testing whether better prompts actually matter

Week 37: Testing whether better prompts actually matter

(and why I've been overthinking my AI conversations)

Brendan McNulty
September 10, 2025

The Experiment

I've become completely systematic about prompt engineering. While most people wing it with AI tools, I've built a ChatGPT project dedicated to crafting perfect prompts. Here's how nerdy this gets: I'll feed my rough question into my prompt optimization framework (based on numerous guides), let it ask me clarifying questions until we've refined everything, then use that polished prompt. I even have a separate system for image prompts. Peak optimization/nerd behaviour, I know.

I also have custom instructions set up in most of my AI tools. The kind of detailed preferences that tell Claude exactly how I like my responses structured, what tone to use, and what background context to consider. Between the prompt guidelines and the custom instructions, I've basically built a whole system around getting better AI responses.

But what's been bugging me: am I actually getting better results from all this extra effort? Or am I just making myself feel more in control while getting the same mediocre outputs I'd get from typing "help me with this thing"?

So I decided to test it properly. I wanted to figure out whether spending time on prompt engineering actually delivers better results, or if custom instructions can do the heavy lifting for lazy prompts.

The Process

I designed what felt like a proper scientific experiment 🧪:

Four different approaches:

Basic prompt - Just the bare minimum question, no fancy formatting
Improved prompt - The full treatment from my prompt guidelines project: context, structure, specific requirements, format specifications
Improved prompt + Custom instructions - My systematic prompt framework PLUS my detailed custom instructions
Custom instructions + Basic prompt - Testing whether good instructions can save lazy prompting

Three different types of questions to test:

Technical analysis: "Why running AI models locally matters more than cloud APIs"
Strategic thinking: "Design a business model avoiding subscription fatigue and VC dependence"
Creative exploration: "Why AI might make humans more creative"

I figured if prompt engineering really mattered, I'd see consistent improvements across different types of thinking.

The Outcome

The results were clearer than I expected—and honestly, a bit surprising.

The winner by a mile: My systematic prompt guidelines. They scored an average of 81% across all three question types, crushing the basic prompts (62%) and even slightly outperforming the "everything optimized" approach (73%).

Here's what really stood out: the improved prompt handled technical questions brilliantly, delivering structured analysis with actual metrics (like "50-150ms local vs 600-1200ms cloud latency"). For the business strategy question, it created a comprehensive blueprint with detailed revenue streams and a clear implementation roadmap. Even for the creative question about AI and human creativity, it produced a well-researched essay with psychological foundations.

Custom instructions were... fine. When paired with improved prompts, they actually slightly decreased performance. When paired with basic prompts, results were all over the place—great for creativity (80%) but weak for business analysis (65%).

The technical question showed the biggest improvement gap: 18 points between basic and improved prompts. Apparently, analytical tasks really do benefit from clear structure and specific requirements.

What This Actually Means

I went into this expecting that custom instructions + improved prompts would be the clear winner. More optimization = better results, right? But it turns out that good prompting technique trumps everything else.

The improved prompts worked because my guidelines framework does three things consistently:

Clear structure: Instead of "tell me about X," they asked for specific frameworks and comparisons
Evidence requirements: They explicitly requested metrics, examples, and concrete data
Format guidance: They specified how to present the information (executive summaries, comparison tables, implementation steps)

Custom instructions, while handy for setting general preferences, couldn't compensate for vague or poorly structured questions. And when layered on top of already-good prompts, they seemed to add unnecessary complexity without much benefit.

Key Takeaway

Stop overthinking the system and start improving the conversation. Well-crafted individual prompts consistently outperform complex instruction layering or elaborate setups. The time you spend writing clearer, more specific questions pays off immediately.

Pro Tips for Beginners:

Structure your asks: Instead of "What do you think about X?" try "Compare X and Y across these three dimensions: [specific criteria]"
Request specific formats: "Give me an executive summary, then detailed analysis, then three actionable recommendations" works better than hoping the AI guesses your preferred structure
Ask for evidence: "Include specific metrics and examples" gets you concrete, useful responses rather than generic overviews
Don't layer complexity: One well-written prompt beats multiple optimization layers. Start with clear communication before adding bells and whistles
Match effort to task type: Spend more time structuring prompts for analytical questions, less for creative exploration

(or be lazy and use the files I used to set up my prompt project so you can do it all quickly)

What's Next?

I'm relooking at my custom instructions and testing what will give me better feedback. One thing I really like to have an educated opinion in my responses, but I'm going to tweak further to see what I like best (and hopefully get a better response)

Want to Try It Yourself?

Pick a question you'd normally ask an AI tool
Rewrite it with specific structure, evidence requirements, and format guidance
Compare the results to your usual approach
Focus on clarity over complexity