Meta gets caught gaming AI benchmarks with Llama 4

Over the weekend, Meta dropped two new Llama 4 models: a smaller model named Scout, and Maverick, a mid-size model that the company claims can beat GPT-4o and Gemini 2.0 Flash “across a broad range of widely reported benchmarks.” Maverick quickly secured the number-two spot on LMArena, the AI benchmark site where humans compare outputs…

Read More

Will Smith eating spaghetti and other weird AI benchmarks that took off in 2024

When a company releases a new AI video generator, it’s not long before someone uses it to make a video of actor Will Smith eating spaghetti. It’s become something of a meme as well as a benchmark: Seeing whether a new video generator can realistically render Smith slurping down a bowl of noodles. Smith himself…

Read More