Dr Abeba Birhane

@abeba.bsky.social

despite all the hype, this recent work shows that LLMs are not good at abstract reasoning arxiv.org/abs/2406.11012

Jun 29, 2024 at 17:32 UTC

1 replies 13 reposts 34 likes

Dodecahedron @dodechedrononon.bsky.social
[ View ] Jun 29, 2024 at 23:00 UTC

"Our results show that even the best-performing LLM, GPT-4o, which has otherwise shown impressive reasoning abilities on a wide variety of benchmarks, can only fully solve 8% of the games." ...impressive reasoning on a wide variety of benchmarks...

0 replies 0 reposts 0 likes