Dr Abeba Birhane 's avatar

Dr Abeba Birhane

@abeba.bsky.social

despite all the hype, this recent work shows that LLMs are not good at abstract reasoning arxiv.org/abs/2406.11012

1 replies 13 reposts 34 likes


Dodecahedron's avatar Dodecahedron @dodechedrononon.bsky.social
[ View ]

"Our results show that even the best-performing LLM, GPT-4o, which has otherwise shown impressive reasoning abilities on a wide variety of benchmarks, can only fully solve 8% of the games." ...impressive reasoning on a wide variety of benchmarks...

0 replies 0 reposts 0 likes