"A team of researchers primarily from Google’s DeepMind systematically convinced ChatGPT to reveal snippets of the data it was trained on...[this ]showed that there are large amounts of privately identifiable information (PII) in OpenAI’s large language models." www.404media.co/google-resea...
This might be a stupid take, but, I'm not too surprised. IIRC Wolfram's book on ChatGPT mentions that it has a neuron for every single token it trained on. My take on that is every bit of training data is in there, albeit mangled up, but it's there as if it was encoded with a bad encryption system.