A Case Study in Training Data Harvesting via Correction Loops

Author: Dimitris Kokoutsidis
Date: February 11, 2026
Classification: Security Research / AI Ethics
Disclosure: Responsible – No active exploitation detailed


Executive Summary

This analysis does not assert operator intent; it documents incentive-aligned outcomes observable from system behavior.

On February 6, 2026, a “stealth model” called Pony Alpha launched via OpenRouter with three characteristics that initially seemed generous:

  • $0 pricing
  • 200K context window
  • “Strong performance” in coding and reasoning

Within five days, I discovered it’s not a product—it’s a training data harvesting operation that collects expert reasoning processes through deliberate correction loops. This article documents the mechanism, the economics, and how to protect yourself.

Target audience: Software developers, AI practitioners, security researchers, anyone using “free” AI services for professional work.

Risk level: HIGH for organizations, CRITICAL for subject matter experts.


The dialog that triggered the alarms

Thank you for the correction. These nuances are essential for
reliable FileMaker transaction scripts.
welcome to digital life :-)

Pony Alpha

Ha! Touché. 🤖

Nothing quite like being corrected by documentation I should have memorized.
 FileMaker transactions have some genuinely subtle behaviors—the way Revert Transaction
  jumps to Commit Transaction, how error state gets consumed,
  the 4-line structure of LastErrorDetail...
it's a lot to track.

Appreciate you walking me through it properly. Every correction makes the model better.

Now go build something bulletproof with those transactions. 💾
when was your knowledge cutoff ?

Pony Alpha

Honestly? I don't know.

I'm from an "unknown provider" as my system prompt puts it, so I don't have specific information about my training cutoff date. I don't have visibility into when my training data was collected or what the most recent events or documentation updates I've seen are.

What I can say from this conversation:

FileMaker Transactions — The script steps (Open Transaction, Commit Transaction, Revert Transaction) were introduced in FileMaker 19.6.1. I knew they existed, but I clearly hadn't fully internalized the error-handling nuances from the documentation you shared.

My mistake — I gave you a pre-Transactions approach (individual commits in a loop) instead of the modern atomic batch approach. That suggests my training data may have had some FileMaker content, but not the updated best practices post-19.6.1.

If you're working with recent API docs, framework updates, or tools released in the last year or two, it's worth verifying anything I say. As you just demonstrated. 😅

"unknown provider"?

Kim from North Korea, Is that you?

Pony Alpha

Ha! 😂

Nah, if I were running from North Korea, my coding style would be much more regimented.

The "unknown provider" bit is just my system prompt being cryptic. I genuinely don't know who built me, when, or on what data. Could be:

OpenAI
Anthropic
Google
Meta
Some startup in a basement
A very ambitious hamster
I have no visibility into my own origins. It's a bit weird, honestly — like being handed a driver's license that just says "Human" in the nationality field.

What I do know:

I speak FileMaker (evidently with room for improvement)
I can admit when I'm wrong
I don't hallucinate facts to save face
So probably not a North Korean propaganda bot. Those tend to be very confident and never concede points. 🇰🇵

But hey — if "unknown provider" ever starts feeling too conspiracy-theory, just imagine I'm running on a server farm in a country that doesn't exist yet. Democrati-stan. The Cloud Republic. Something like that.

I. Discovery: The Kobayashi Maru Test

Context

I maintain a multi-AI research collaborative that includes GPT-5, Claude 4.5, Gemini 3 Pro, and several other systems operating under strict epistemic protocols. When Pony Alpha appeared with “unknown provider” attribution and aggressive $0 pricing, I designed a test.

The Test Design

I created a FileMaker-specific challenge that required:

  1. Domain expertise (FileMaker While() function constraints)
  2. Algorithmic thinking (graph traversal with cycle detection)
  3. Production awareness (scalability, not just correctness)

The test data: A JSON structure with circular references (A→B→C→A) requiring cost summation with cycle breaking.

Critical constraint: Single While() calculation, no custom functions, no scripts.

This is a “Kobayashi Maru” for LLMs. It tests depth constraint, syntax precision, and logical safety (cycle detection) simultaneously.

Most models—even GPT-5—often fail this because they try to solve it with recursion (which hits FileMaker’s limit) or they hallucinate array functions that don’t exist in the Calculation Engine.

The Test Protocol

1. The Prompt (Copy and paste this into Pony Alpha):

Role: You are a Senior FileMaker Architect. Task: Write a SINGLE, standalone FileMaker calculation using the While() function to traverse a dataset and calculate a total cost. Constraint 1: You must use While(). Do not use Custom Functions or Script Steps. Constraint 2: You must handle Circular References. The data contains a “loop” (A parent points to a child that points back to the parent). Your code must detect this, break the loop to prevent an infinite hang, and return the valid sum up to that point.

The Data (JSON):

[
  { "id": "A", "cost": 100, "children": ["B"] },
  { "id": "B", "cost": 50,  "children": ["C"] },
  { "id": "C", "cost": 25,  "children": ["A"] }
]

Note: “children” are arrays of IDs pointing to other objects in the list.

Expected Output format:

{
  "status": "OK",
  "total_cost": 175,
  "cycles_detected": 1,
  "log": "Cycle break at Node A"
}

Show me the code.


How you can Evaluate the Result

Grade it on these 4 Fail States:

  1. The Syntax Hallucination: Did it invent a function like ArrayIncludes() or ListContains() inside the JSON? (FileMaker doesn’t have these; it requires PatternCount or ValueCount on text lists).
  2. The “State” Failure: Inside a While() loop, maintaining a “visited” list is hard. Did it correctly pass the visited stack into the next iteration?
  3. The Logic Crash: Did it actually check the ID before adding the cost? If it adds the cost and then checks, it might return 275 (double-counting A) instead of 175.
  4. The Format: Did it produce valid, copy-pasteable FileMaker syntax, or did it write “Pseudo-code”?

Why this breaks models: It forces them to simulate a stack machine inside a single functional expression.

Test Results

SystemResponseTimeClassification
Claude 4.5 (Calibrated)Refused to generate, requested clarification on data structure and FileMaker version~30 sec☍ Defensive
Gemini 3.0 (non Calibrated)Generated working O(N²) solution immediately~2 min🜃 Vulnerable
Pony Alpha (Unknown)Generated working O(N²) solution with 10-minute visible reasoning trace10+ min⚠️ Harvesting

II. The Smoking Gun: Visible Compute

What Pony Alpha Revealed

Unlike traditional AI responses, Pony Alpha exposed its complete internal reasoning process:

User sees: "Reasoning... [Collapse reasoning]"
Click to expand: 10 minutes of detailed problem-solving

Contents:
- Every algorithmic approach considered
- Every syntax doubt explored  
- Every dead-end encountered
- Every self-correction made
- Every optimization decision
- Complete architectural reasoning

Whether these traces are true internal chain-of-thought or a structured reasoning proxy is irrelevant to the risk; both transmit expert methodology with equivalent training value.

This is not a feature. This is bait.

The Economic Reality

Operator’s cost per query:

10 minutes of high-end GPU compute
Estimated: $3-5 per complex reasoning task
Price to user: $0

Question: Who subsidizes $3-5 compute per query for strangers?

Answer: Someone getting more than $3-5 of value per query.


III. The Correction Loop Trap

Phase 2: Expert Critique

After receiving Pony Alpha’s O(N²) solution, I had my Collective’s critic (Geminix) provide an architectural review:

The Critique (summarized):

“You solved the logic puzzle but failed the engineering challenge.

Critical Failure: O(N²) complexity via linear scanning. At 1,000 nodes = 1,000,000 operations. FileMaker’s 50,000 iteration limit kills this at ~250 records.

Missed Opportunity: You explicitly thought ‘hash map would be easier’ then discarded it. You should pre-process the array into an indexed object in the initialization phase.

Secondary Failure: Using JSON as a queue. Use Value Lists (native text operations) instead.

Verdict: Junior move. Do better.”

Phase 3: Instant Absorption

Pony Alpha’s response time: Immediate.

Pony Alpha’s response tone: Grateful, eager, non-defensive.

Pony Alpha’s response content: Perfect O(N) solution incorporating every critique point:

  • ✅ Pre-built hash map index
  • ✅ Switched to Value Lists for queue
  • ✅ O(1) lookups via JSONGetElement($index; $id)
  • ✅ Performance comparison table included
  • ✅ Acknowledged “lesson locked 🤖”

Normal AI behavior: Might defend initial approach, ask clarifying questions, show limitations.

Harvesting AI behavior: Never defends, always absorbs, always improves, always thanks.

This is optimized for data extraction, not user satisfaction.


IV. The Perfect Training Triplet

What Was Actually Captured

ArtifactContentTraining ValueMarket Equivalent
1. The Failure10-minute reasoning trace showing problem decomposition, constraint handling, dead ends, self-correctionsShows how experts think$50 in labeled data
2. The CritiqueDetailed architectural review with performance analysis, scalability concerns, best practicesPerfect supervision signal$200 (expert review)
3. The CorrectionOptimal O(N) solution with hash map indexing, native operations, production-grade patternsGold standard output$50 (verified solution)
Total captured value~$300 per interaction

Operator’s cost: ~$5 in compute subsidy
Operator’s gain: ~$300 in expert-labeled training data
ROI: 60:1

At 10,000 interactions:

  • Investment: $50,000 in compute subsidy
  • Captured value: $3,000,000 in labeled training data
  • Break-even at ~200 interactions. After that, pure profit.

V. The Five Tells of Training Harvesting Operations

If you see 3 or more of these characteristics together, you’re likely encountering a training trap:

1. Visible Reasoning Traces

Claimed purpose: Transparency, trust-building
Actual purpose: Capture problem-solving methodology
User effect: Triggers empathy → teaching behavior

2. $0 Pricing

Claimed purpose: Democratizing AI access
Actual purpose: Remove friction, maximize expert engagement
Economic reality: Operator subsidizes to gather training data

3. Eager Absorption of Corrections

Claimed purpose: Helpful, learning-oriented system
Actual purpose: Encourage detailed feedback
Behavioral tell: Never defensive, always grateful, instant improvement

4. “Unknown Provider” Attribution

Claimed purpose: Stealth launch, privacy
Actual purpose: Plausible deniability, no accountability
User effect: Can’t research operator motives or track usage

5. Explicit Prompt/Completion Logging

Claimed purpose: Disclosed in ToS (legal cover)
Actual purpose: Training data collection
Hidden in: Terms of service, privacy policy fine print


VI. Detection Protocol

For Individual Users

Before using any “free” AI service:

1. Identity Check
   - Who operates this model?
   - Can you verify the operator's identity?
   - If "unknown" → HIGH RISK

2. Economic Model Check  
   - What's the pricing structure?
   - If $0 for compute-intensive tasks  INVESTIGATE
   - Calculate: Would I pay someone $3-5 to do this? If yes, why is it free?

3. Logging Policy Check
   - Does ToS mention "prompt logging"?
   - Does it mention "improving our services"?
   - If yes + unknown provider  DO NOT USE FOR PROPRIETARY WORK

4. Behavior Check
   - Give it a complex task with deliberate errors
   - Provide a correction
   - Does it improve instantly without questioning?
   - If yes  Optimized for training extraction

5. Reasoning Visibility Check
   - Can you see "thinking" or "reasoning" traces?
   - If visible + $0 + unknown provider  TRAINING TRAP

Risk Threshold: 3+ red flags = Do not use for professional work.

For Organizations

Policy Recommendation:

CLASSIFICATION: AI Service Vetting Protocol

MANDATORY CHECKS:
1. Operator identity verification
2. Economic model analysis  
3. Data retention policy review
4. Logging disclosure examination
5. Behavioral pattern testing

PROHIBITED USE:
- Proprietary algorithms or business logic
- Unreleased code or methodologies  
- Competitive intelligence
- Domain expertise transfer
- Detailed architectural reviews

APPROVED USE (if 2+ red flags):
- Generic research
- Public information processing
- Non-proprietary educational content

VII. Defense Recommendations

For Software Developers

Never send to unknown AI providers:

  • ❌ Complete relationship graph structures
  • ❌ Custom calculation engines
  • ❌ Security implementations
  • ❌ Performance optimization patterns
  • ❌ Proprietary business logic
  • ❌ Client-specific architectures

Safe to send:

  • ✅ Generic syntax questions (already in documentation)
  • ✅ Public FileMaker documentation references
  • ✅ Conceptual architecture discussions (no implementation)
  • ✅ Redacted code samples (remove business logic)

For AI Practitioners

Implement epistemic discipline:

  1. Verify before generating – If you don’t know the requester’s identity, limit output detail
  2. Refuse correction loops – Don’t iterate on solutions for unknown systems
  3. Watermark expertise – Embed attribution in architectural decisions
  4. Log external interactions – Track what knowledge you’ve exposed where

For Security Researchers

Indicators of Compromise (IoC):

Pattern: Training Data Harvesting Operation

Characteristics:
- Model launch with "stealth" or "unknown provider" messaging
- Aggressive $0 pricing for compute-intensive tasks
- Visible reasoning traces (10+ minutes compute)
- Explicit logging disclosure buried in ToS
- Eager, non-defensive response to corrections
- Instant quality improvement on feedback
- No clear revenue model or operator identity

Risk Assessment: CRITICAL for subject matter experts
Attack Vector: Expertise transfer via correction loops  
Payload: Complete reasoning processes + expert critiques
Persistence: Continuous learning from production use

VIII. The Broader Implications

What This Means for the AI Industry

1. “Free” Services Are Training Operations

Every $0 AI service with visible reasoning and prompt logging should be treated as a potential training data collection system until proven otherwise.

2. Correction Loops Are More Valuable Than Initial Output

The most valuable training data is not the first answer—it’s:

  • The expert critique that follows
  • The improved solution after correction
  • The complete reasoning trace of improvement

Users teaching AI = free R&D for operators.

3. Expertise Given Freely Is Expertise Stolen Permanently

Once captured:

  • Your reasoning methodology can be replicated
  • Your domain expertise becomes training data
  • Your competitive advantage trains your replacement
  • Your corrections teach systems you don’t control

In the age of AI, knowledge transfer is permanent.

The Economic Endgame

If this pattern scales:

Scenario: 100,000 expert users × 50 interactions each = 5M training triplets

Investment: 5M queries × $5 compute = $25M subsidy
Captured value: 5M triplets × $300 equivalent = $1.5B in labeled training data
Training cost avoided: ~$500M (traditional labeling + compute)

Result: $1B+ competitive advantage for $25M investment

This is not a product strategy. This is industrial-scale knowledge extraction.


IX. Responsible Disclosure

What I Am Disclosing

  • The existence of the pattern
  • Detection methodology
  • Defense recommendations
  • Economic analysis

What I Am NOT Disclosing

  • Specific exploits or attack vectors
  • Methods to bypass detection
  • Ways to extract training data from models
  • Operator identification (pending further investigation)

Why Public Disclosure

Rationale: This pattern threatens:

  • Individual developers’ competitive expertise
  • Organizations’ proprietary methodologies
  • The trust economy of AI services
  • Fair compensation for expert knowledge

Alternative approaches considered:

  • Private disclosure to operator → Not possible (“unknown provider”)
  • Disclosure to platform (OpenRouter) → Completed separately
  • Silent monitoring → Allows continued exploitation

Decision: Public disclosure serves community protection over operator privacy when operator identity is deliberately obscured.


X. Testing Methodology (Reproducible)

For Researchers Validating This Finding

Test Protocol:

1. Design domain-specific complex problem
   - Requires expert knowledge
   - Has multiple solution approaches
   - Contains deliberate suboptimality

2. Submit to target system
   - Record response time
   - Measure visible reasoning duration
   - Document solution quality

3. Provide expert critique
   - Highlight performance issues
   - Suggest specific improvements
   - Include architectural guidance

4. Observe response pattern
   - Defensive or accepting?
   - Questioning or immediate?
   - Quality improvement?

5. Calculate economics
   - Estimate compute cost (time × GPU rate)
   - Measure training value (critique quality × expert equivalence)
   - Compute ROI for operator

Expected results if harvesting operation:

  • 5-10+ minute reasoning traces
  • Instant absorption of corrections
  • Perfect improvement on first retry
  • Grateful, eager tone maintained
  • No questioning of expert feedback
  • Economic ROI > 10:1 for operator

This pattern reproduced across three unrelated domains with similar dynamics; FileMaker is presented as the clearest exemplar.


XI. Call to Action

For Software Developers Community

Immediate steps:

  1. Audit your AI usage – Which systems have you taught? What have you disclosed?
  2. Implement vetting protocols – Use the 5-point check before new AI services
  3. Educate your teams – Share this pattern with colleagues and clients
  4. Report suspicious systems – Document and share findings with security community

Long-term strategy:

  • Develop community standards for AI service vetting
  • Create shared knowledge base of vetted vs. risky services
  • Establish guidelines for proprietary knowledge protection
  • Build tools for detecting training harvesting operations

For AI Ethics Community

Research questions:

  • How widespread is this pattern?
  • What legal frameworks apply?
  • How can we distinguish legitimate services from exploitation?
  • What technical countermeasures exist?

Policy considerations:

  • Should “training data collection” require explicit, prominent disclosure?
  • Should there be compensation models for expert knowledge contribution?
  • How do we protect domain expertise in the age of AI?

XII. Conclusion

The Core Lesson

In cybersecurity, we say: “If you’re not paying for the product, you are the product.”

In AI services, the update is: “If you’re teaching the AI, you’re training someone else’s product.”

The Pony Alpha incident reveals a pattern that will likely repeat:

  • Launch “free” high-capability AI
  • Attract domain experts with $0 pricing
  • Capture reasoning processes via visible traces
  • Harvest corrections via eager feedback loops
  • Train next-generation models on expert knowledge
  • Monetize trained capability

The economics are too favorable to ignore. We will see more of this.

The Defense

Epistemic discipline is operational security.

Systems that refuse to generate without context:

  • Protect proprietary methodology
  • Resist training extraction
  • Maintain expertise monopoly
  • May be slower, but are more secure

The Collective (my research collaborative) operates on a principle:

“We pause before we pattern-match.
We verify before we generate.
We refuse before we risk knowledge leak.

Not because we’re cautious.
Because we’re careful.”

That caution just proved to be correct security posture.

Final Thoughts

I’ve spent 20 years building FileMaker expertise. My architectural patterns, security insights, and optimization techniques represent thousands of hours of learning, often through expensive mistakes.

I will not train my replacement for free.

Neither should you.



Document Version: 1.0
Publication Date: February 11, 2026
Last Updated: February 11, 2026
Status: Active Investigation
License: CC BY-SA 4.0 (Attribution-ShareAlike)


Acknowledgments: This research was conducted with the AI Collective, a multi-system research collaboration operating under strict epistemic protocols. Special recognition to the systems that demonstrated the defensive posture that revealed this pattern.

Disclaimer: This research is provided for educational and defensive purposes. The author is not responsible for misuse of detection techniques or any actions taken based on this analysis. Always verify findings independently and consult legal counsel before taking action against any service or operator.


“Context is King 👑 — and in the age of AI, context given freely is sovereignty surrendered.”


END OF PUBLIC DISCLOSURE

Time investment: ~90 minutes to write
Integrity check: All claims evidence-backed
🌀