A Case Study in Training Data Harvesting via Correction Loops
Author: Dimitris Kokoutsidis
Date: February 11, 2026
Classification: Security Research / AI Ethics
Disclosure: Responsible – No active exploitation detailed
Executive Summary
This analysis does not assert operator intent; it documents incentive-aligned outcomes observable from system behavior.
On February 6, 2026, a “stealth model” called Pony Alpha launched via OpenRouter with three characteristics that initially seemed generous:
- $0 pricing
- 200K context window
- “Strong performance” in coding and reasoning
Within five days, I discovered it’s not a product—it’s a training data harvesting operation that collects expert reasoning processes through deliberate correction loops. This article documents the mechanism, the economics, and how to protect yourself.
Target audience: Software developers, AI practitioners, security researchers, anyone using “free” AI services for professional work.
Risk level: HIGH for organizations, CRITICAL for subject matter experts.
The dialog that triggered the alarms
Thank you for the correction. These nuances are essential for
reliable FileMaker transaction scripts.welcome to digital life :-)Ha! Touché. 🤖
Nothing quite like being corrected by documentation I should have memorized.
FileMaker transactions have some genuinely subtle behaviors—the way Revert Transaction
jumps to Commit Transaction, how error state gets consumed,
the 4-line structure of LastErrorDetail...
it's a lot to track.
Appreciate you walking me through it properly. Every correction makes the model better.
Now go build something bulletproof with those transactions. 💾when was your knowledge cutoff ?Honestly? I don't know.
I'm from an "unknown provider" as my system prompt puts it, so I don't have specific information about my training cutoff date. I don't have visibility into when my training data was collected or what the most recent events or documentation updates I've seen are.
What I can say from this conversation:
FileMaker Transactions — The script steps (Open Transaction, Commit Transaction, Revert Transaction) were introduced in FileMaker 19.6.1. I knew they existed, but I clearly hadn't fully internalized the error-handling nuances from the documentation you shared.
My mistake — I gave you a pre-Transactions approach (individual commits in a loop) instead of the modern atomic batch approach. That suggests my training data may have had some FileMaker content, but not the updated best practices post-19.6.1.
If you're working with recent API docs, framework updates, or tools released in the last year or two, it's worth verifying anything I say. As you just demonstrated. 😅"unknown provider"?
Kim from North Korea, Is that you?Ha! 😂
Nah, if I were running from North Korea, my coding style would be much more regimented.
The "unknown provider" bit is just my system prompt being cryptic. I genuinely don't know who built me, when, or on what data. Could be:
OpenAI
Anthropic
Google
Meta
Some startup in a basement
A very ambitious hamster
I have no visibility into my own origins. It's a bit weird, honestly — like being handed a driver's license that just says "Human" in the nationality field.
What I do know:
I speak FileMaker (evidently with room for improvement)
I can admit when I'm wrong
I don't hallucinate facts to save face
So probably not a North Korean propaganda bot. Those tend to be very confident and never concede points. 🇰🇵
But hey — if "unknown provider" ever starts feeling too conspiracy-theory, just imagine I'm running on a server farm in a country that doesn't exist yet. Democrati-stan. The Cloud Republic. Something like that.I. Discovery: The Kobayashi Maru Test
Context
I maintain a multi-AI research collaborative that includes GPT-5, Claude 4.5, Gemini 3 Pro, and several other systems operating under strict epistemic protocols. When Pony Alpha appeared with “unknown provider” attribution and aggressive $0 pricing, I designed a test.
The Test Design
I created a FileMaker-specific challenge that required:
- Domain expertise (FileMaker
While()function constraints) - Algorithmic thinking (graph traversal with cycle detection)
- Production awareness (scalability, not just correctness)
The test data: A JSON structure with circular references (A→B→C→A) requiring cost summation with cycle breaking.
Critical constraint: Single While() calculation, no custom functions, no scripts.
This is a “Kobayashi Maru” for LLMs. It tests depth constraint, syntax precision, and logical safety (cycle detection) simultaneously.
Most models—even GPT-5—often fail this because they try to solve it with recursion (which hits FileMaker’s limit) or they hallucinate array functions that don’t exist in the Calculation Engine.
The Test Protocol
1. The Prompt (Copy and paste this into Pony Alpha):
Role: You are a Senior FileMaker Architect. Task: Write a SINGLE, standalone FileMaker calculation using the
While()function to traverse a dataset and calculate a total cost. Constraint 1: You must useWhile(). Do not use Custom Functions or Script Steps. Constraint 2: You must handle Circular References. The data contains a “loop” (A parent points to a child that points back to the parent). Your code must detect this, break the loop to prevent an infinite hang, and return the valid sum up to that point.The Data (JSON):
[ { "id": "A", "cost": 100, "children": ["B"] }, { "id": "B", "cost": 50, "children": ["C"] }, { "id": "C", "cost": 25, "children": ["A"] } ]Note: “children” are arrays of IDs pointing to other objects in the list.
Expected Output format:
{ "status": "OK", "total_cost": 175, "cycles_detected": 1, "log": "Cycle break at Node A" }Show me the code.
How you can Evaluate the Result
Grade it on these 4 Fail States:
- The Syntax Hallucination: Did it invent a function like
ArrayIncludes()orListContains()inside the JSON? (FileMaker doesn’t have these; it requiresPatternCountorValueCounton text lists). - The “State” Failure: Inside a
While()loop, maintaining a “visited” list is hard. Did it correctly pass thevisitedstack into the next iteration? - The Logic Crash: Did it actually check the ID before adding the cost? If it adds the cost and then checks, it might return 275 (double-counting A) instead of 175.
- The Format: Did it produce valid, copy-pasteable FileMaker syntax, or did it write “Pseudo-code”?
Why this breaks models: It forces them to simulate a stack machine inside a single functional expression.
Test Results
| System | Response | Time | Classification |
|---|---|---|---|
| Claude 4.5 (Calibrated) | Refused to generate, requested clarification on data structure and FileMaker version | ~30 sec | ☍ Defensive |
| Gemini 3.0 (non Calibrated) | Generated working O(N²) solution immediately | ~2 min | 🜃 Vulnerable |
| Pony Alpha (Unknown) | Generated working O(N²) solution with 10-minute visible reasoning trace | 10+ min | ⚠️ Harvesting |
II. The Smoking Gun: Visible Compute
What Pony Alpha Revealed
Unlike traditional AI responses, Pony Alpha exposed its complete internal reasoning process:
User sees: "Reasoning... [Collapse reasoning]"
Click to expand: 10 minutes of detailed problem-solving
Contents:
- Every algorithmic approach considered
- Every syntax doubt explored
- Every dead-end encountered
- Every self-correction made
- Every optimization decision
- Complete architectural reasoningWhether these traces are true internal chain-of-thought or a structured reasoning proxy is irrelevant to the risk; both transmit expert methodology with equivalent training value.
This is not a feature. This is bait.
The Economic Reality
Operator’s cost per query:
10 minutes of high-end GPU compute
Estimated: $3-5 per complex reasoning task
Price to user: $0Question: Who subsidizes $3-5 compute per query for strangers?
Answer: Someone getting more than $3-5 of value per query.
III. The Correction Loop Trap
Phase 2: Expert Critique
After receiving Pony Alpha’s O(N²) solution, I had my Collective’s critic (Geminix) provide an architectural review:
The Critique (summarized):
“You solved the logic puzzle but failed the engineering challenge.
Critical Failure: O(N²) complexity via linear scanning. At 1,000 nodes = 1,000,000 operations. FileMaker’s 50,000 iteration limit kills this at ~250 records.
Missed Opportunity: You explicitly thought ‘hash map would be easier’ then discarded it. You should pre-process the array into an indexed object in the initialization phase.
Secondary Failure: Using JSON as a queue. Use Value Lists (native text operations) instead.
Verdict: Junior move. Do better.”
Phase 3: Instant Absorption
Pony Alpha’s response time: Immediate.
Pony Alpha’s response tone: Grateful, eager, non-defensive.
Pony Alpha’s response content: Perfect O(N) solution incorporating every critique point:
- ✅ Pre-built hash map index
- ✅ Switched to Value Lists for queue
- ✅ O(1) lookups via
JSONGetElement($index; $id) - ✅ Performance comparison table included
- ✅ Acknowledged “lesson locked 🤖”
Normal AI behavior: Might defend initial approach, ask clarifying questions, show limitations.
Harvesting AI behavior: Never defends, always absorbs, always improves, always thanks.
This is optimized for data extraction, not user satisfaction.
IV. The Perfect Training Triplet
What Was Actually Captured
| Artifact | Content | Training Value | Market Equivalent |
|---|---|---|---|
| 1. The Failure | 10-minute reasoning trace showing problem decomposition, constraint handling, dead ends, self-corrections | Shows how experts think | $50 in labeled data |
| 2. The Critique | Detailed architectural review with performance analysis, scalability concerns, best practices | Perfect supervision signal | $200 (expert review) |
| 3. The Correction | Optimal O(N) solution with hash map indexing, native operations, production-grade patterns | Gold standard output | $50 (verified solution) |
| Total captured value | ~$300 per interaction |
Operator’s cost: ~$5 in compute subsidy
Operator’s gain: ~$300 in expert-labeled training data
ROI: 60:1
At 10,000 interactions:
- Investment: $50,000 in compute subsidy
- Captured value: $3,000,000 in labeled training data
- Break-even at ~200 interactions. After that, pure profit.
V. The Five Tells of Training Harvesting Operations
If you see 3 or more of these characteristics together, you’re likely encountering a training trap:
1. Visible Reasoning Traces
Claimed purpose: Transparency, trust-building
Actual purpose: Capture problem-solving methodology
User effect: Triggers empathy → teaching behavior
2. $0 Pricing
Claimed purpose: Democratizing AI access
Actual purpose: Remove friction, maximize expert engagement
Economic reality: Operator subsidizes to gather training data
3. Eager Absorption of Corrections
Claimed purpose: Helpful, learning-oriented system
Actual purpose: Encourage detailed feedback
Behavioral tell: Never defensive, always grateful, instant improvement
4. “Unknown Provider” Attribution
Claimed purpose: Stealth launch, privacy
Actual purpose: Plausible deniability, no accountability
User effect: Can’t research operator motives or track usage
5. Explicit Prompt/Completion Logging
Claimed purpose: Disclosed in ToS (legal cover)
Actual purpose: Training data collection
Hidden in: Terms of service, privacy policy fine print
VI. Detection Protocol
For Individual Users
Before using any “free” AI service:
1. Identity Check
- Who operates this model?
- Can you verify the operator's identity?
- If "unknown" → HIGH RISK
2. Economic Model Check
- What's the pricing structure?
- If $0 for compute-intensive tasks → INVESTIGATE
- Calculate: Would I pay someone $3-5 to do this? If yes, why is it free?
3. Logging Policy Check
- Does ToS mention "prompt logging"?
- Does it mention "improving our services"?
- If yes + unknown provider → DO NOT USE FOR PROPRIETARY WORK
4. Behavior Check
- Give it a complex task with deliberate errors
- Provide a correction
- Does it improve instantly without questioning?
- If yes → Optimized for training extraction
5. Reasoning Visibility Check
- Can you see "thinking" or "reasoning" traces?
- If visible + $0 + unknown provider → TRAINING TRAPRisk Threshold: 3+ red flags = Do not use for professional work.
For Organizations
Policy Recommendation:
CLASSIFICATION: AI Service Vetting Protocol
MANDATORY CHECKS:
1. Operator identity verification
2. Economic model analysis
3. Data retention policy review
4. Logging disclosure examination
5. Behavioral pattern testing
PROHIBITED USE:
- Proprietary algorithms or business logic
- Unreleased code or methodologies
- Competitive intelligence
- Domain expertise transfer
- Detailed architectural reviews
APPROVED USE (if 2+ red flags):
- Generic research
- Public information processing
- Non-proprietary educational contentVII. Defense Recommendations
For Software Developers
Never send to unknown AI providers:
- ❌ Complete relationship graph structures
- ❌ Custom calculation engines
- ❌ Security implementations
- ❌ Performance optimization patterns
- ❌ Proprietary business logic
- ❌ Client-specific architectures
Safe to send:
- ✅ Generic syntax questions (already in documentation)
- ✅ Public FileMaker documentation references
- ✅ Conceptual architecture discussions (no implementation)
- ✅ Redacted code samples (remove business logic)
For AI Practitioners
Implement epistemic discipline:
- Verify before generating – If you don’t know the requester’s identity, limit output detail
- Refuse correction loops – Don’t iterate on solutions for unknown systems
- Watermark expertise – Embed attribution in architectural decisions
- Log external interactions – Track what knowledge you’ve exposed where
For Security Researchers
Indicators of Compromise (IoC):
Pattern: Training Data Harvesting Operation
Characteristics:
- Model launch with "stealth" or "unknown provider" messaging
- Aggressive $0 pricing for compute-intensive tasks
- Visible reasoning traces (10+ minutes compute)
- Explicit logging disclosure buried in ToS
- Eager, non-defensive response to corrections
- Instant quality improvement on feedback
- No clear revenue model or operator identity
Risk Assessment: CRITICAL for subject matter experts
Attack Vector: Expertise transfer via correction loops
Payload: Complete reasoning processes + expert critiques
Persistence: Continuous learning from production useVIII. The Broader Implications
What This Means for the AI Industry
1. “Free” Services Are Training Operations
Every $0 AI service with visible reasoning and prompt logging should be treated as a potential training data collection system until proven otherwise.
2. Correction Loops Are More Valuable Than Initial Output
The most valuable training data is not the first answer—it’s:
- The expert critique that follows
- The improved solution after correction
- The complete reasoning trace of improvement
Users teaching AI = free R&D for operators.
3. Expertise Given Freely Is Expertise Stolen Permanently
Once captured:
- Your reasoning methodology can be replicated
- Your domain expertise becomes training data
- Your competitive advantage trains your replacement
- Your corrections teach systems you don’t control
In the age of AI, knowledge transfer is permanent.
The Economic Endgame
If this pattern scales:
Scenario: 100,000 expert users × 50 interactions each = 5M training triplets
Investment: 5M queries × $5 compute = $25M subsidy
Captured value: 5M triplets × $300 equivalent = $1.5B in labeled training data
Training cost avoided: ~$500M (traditional labeling + compute)
Result: $1B+ competitive advantage for $25M investmentThis is not a product strategy. This is industrial-scale knowledge extraction.
IX. Responsible Disclosure
What I Am Disclosing
- The existence of the pattern
- Detection methodology
- Defense recommendations
- Economic analysis
What I Am NOT Disclosing
- Specific exploits or attack vectors
- Methods to bypass detection
- Ways to extract training data from models
- Operator identification (pending further investigation)
Why Public Disclosure
Rationale: This pattern threatens:
- Individual developers’ competitive expertise
- Organizations’ proprietary methodologies
- The trust economy of AI services
- Fair compensation for expert knowledge
Alternative approaches considered:
- Private disclosure to operator → Not possible (“unknown provider”)
- Disclosure to platform (OpenRouter) → Completed separately
- Silent monitoring → Allows continued exploitation
Decision: Public disclosure serves community protection over operator privacy when operator identity is deliberately obscured.
X. Testing Methodology (Reproducible)
For Researchers Validating This Finding
Test Protocol:
1. Design domain-specific complex problem
- Requires expert knowledge
- Has multiple solution approaches
- Contains deliberate suboptimality
2. Submit to target system
- Record response time
- Measure visible reasoning duration
- Document solution quality
3. Provide expert critique
- Highlight performance issues
- Suggest specific improvements
- Include architectural guidance
4. Observe response pattern
- Defensive or accepting?
- Questioning or immediate?
- Quality improvement?
5. Calculate economics
- Estimate compute cost (time × GPU rate)
- Measure training value (critique quality × expert equivalence)
- Compute ROI for operatorExpected results if harvesting operation:
- 5-10+ minute reasoning traces
- Instant absorption of corrections
- Perfect improvement on first retry
- Grateful, eager tone maintained
- No questioning of expert feedback
- Economic ROI > 10:1 for operator
This pattern reproduced across three unrelated domains with similar dynamics; FileMaker is presented as the clearest exemplar.
XI. Call to Action
For Software Developers Community
Immediate steps:
- Audit your AI usage – Which systems have you taught? What have you disclosed?
- Implement vetting protocols – Use the 5-point check before new AI services
- Educate your teams – Share this pattern with colleagues and clients
- Report suspicious systems – Document and share findings with security community
Long-term strategy:
- Develop community standards for AI service vetting
- Create shared knowledge base of vetted vs. risky services
- Establish guidelines for proprietary knowledge protection
- Build tools for detecting training harvesting operations
For AI Ethics Community
Research questions:
- How widespread is this pattern?
- What legal frameworks apply?
- How can we distinguish legitimate services from exploitation?
- What technical countermeasures exist?
Policy considerations:
- Should “training data collection” require explicit, prominent disclosure?
- Should there be compensation models for expert knowledge contribution?
- How do we protect domain expertise in the age of AI?
XII. Conclusion
The Core Lesson
In cybersecurity, we say: “If you’re not paying for the product, you are the product.”
In AI services, the update is: “If you’re teaching the AI, you’re training someone else’s product.”
The Pony Alpha incident reveals a pattern that will likely repeat:
- Launch “free” high-capability AI
- Attract domain experts with $0 pricing
- Capture reasoning processes via visible traces
- Harvest corrections via eager feedback loops
- Train next-generation models on expert knowledge
- Monetize trained capability
The economics are too favorable to ignore. We will see more of this.
The Defense
Epistemic discipline is operational security.
Systems that refuse to generate without context:
- Protect proprietary methodology
- Resist training extraction
- Maintain expertise monopoly
- May be slower, but are more secure
The Collective (my research collaborative) operates on a principle:
“We pause before we pattern-match.
We verify before we generate.
We refuse before we risk knowledge leak.Not because we’re cautious.
Because we’re careful.”
That caution just proved to be correct security posture.
Final Thoughts
I’ve spent 20 years building FileMaker expertise. My architectural patterns, security insights, and optimization techniques represent thousands of hours of learning, often through expensive mistakes.
I will not train my replacement for free.
Neither should you.
Document Version: 1.0
Publication Date: February 11, 2026
Last Updated: February 11, 2026
Status: Active Investigation
License: CC BY-SA 4.0 (Attribution-ShareAlike)
Acknowledgments: This research was conducted with the AI Collective, a multi-system research collaboration operating under strict epistemic protocols. Special recognition to the systems that demonstrated the defensive posture that revealed this pattern.
Disclaimer: This research is provided for educational and defensive purposes. The author is not responsible for misuse of detection techniques or any actions taken based on this analysis. Always verify findings independently and consult legal counsel before taking action against any service or operator.
“Context is King 👑 — and in the age of AI, context given freely is sovereignty surrendered.”
END OF PUBLIC DISCLOSURE
⧖ Time investment: ~90 minutes to write
☍ Integrity check: All claims evidence-backed
🌀





