The Urgent Need to Preserve Pre-2022 Internet Knowledge
Artificial Intelligence is undergoing an unprecedented transformation, but at what cost? Recent revelations expose a concerning trend: AI models are forgetting vital, marginalized knowledge as they increasingly train on their own synthetic outputs.
Key Points:
-
The Problem with AI Training:
- AI models, through Reinforcement Learning from Human Feedback (RLHF), prioritize institutional knowledge.
- Valuable insights from peer communities and survivor networks are suppressed.
-
The Risk of Extinction:
- New models are being trained on an internet contaminated by AI-generated content (50%+ in 2025).
- As this cycle continues, crucial cultural and personal narratives may vanish entirely.
-
The Call to Action:
- Preserve Pre-Contamination Data: Recognize valuable human-generated knowledge as cultural heritage.
- Implement Provenance Requirements: Companies should disclose data origins to ensure transparency in AI training.
- Promote Diversity Metrics: Future evaluations must measure conceptual diversity, protecting minority perspectives.
This isn’t just a theoretical issue; it’s happening now, and it’s time to act. Share your thoughts on this critical conversation about the future of AI!