So I think the fundamental disconnect here is that I think that you need intelligence *in order to* figure out what words are more likely to go next to other words.
I don't think I commune with them. I hate them actually.
The intelligence is in the person who figured out how to make it work. The LLM just consults a set of already-figured-out probabilities and spits out the answer with the highest score based on the previous word. What they call "training" is just statistical analysis. Counting words.
I think you are confusing the human invented and coded idea of assigning phrases as units statistical weight (and the human input about what phrases should be weighted as most significant statistically to increase probability of utility) with the idea the machine *came up with* that insight.
But LLMs don't figure out what words SHOULD go next to other words to convey meaning - they just measured what words DID go next to other words in their input, and how often, and use those probabilities, devoid of any context except the words.
There's a learning process, but it's just counting stuff
They’re using the power of corpus linguistics. Once you have access to a big pile of text, you don’t need intelligence to go “hey, these words appear together a lot,” you just need a string search and statistical weighting of the resulting potential strings.