Private API Keys and Passwords Found in AI Training Dataset - Nearly 12,000 Details Leaked
Truffle Security found thousands of pieces of private info in Common Crawl dataset.Common Crawl is a nonprofit organization that provides a freely accessible archive of web data, collected through large-scale web crawling. The researchers notified the vendors and helped fix the problemCybersecurity researchers have uncovered thousands of login credentials and other secrets in the Common Crawl dataset, compromising the security of various popular services like AWS, MailChimp, and WalkScore.
- This alarming discovery highlights the importance of regular security audits and the need for developers to be more mindful of leaving sensitive information behind during development.
- Can we trust that current safeguards, such as filtering out sensitive data in large language models, are sufficient to prevent similar leaks in the future?