-
6 Gems
7090 Points
Badges:Rookie
Sunila, Naresh and 2 others-
6 Gems
7090 Points
Badges:Rookie
Veo vs. Reality: Can AI Learn the World by Watching
A discussion between Lex Fridman (American computer scientist and podcaster) and Demis Hassabis (2024 Nobel Prize in Chemistry for protein structure prediction using AlphaFold; CEO, Google DeepMind Technologies):
________________________________________
They explore how modern AI systems are becoming surprisingly adept at modeling complex, non-linear phenomena like fluid dynamics, which have traditionally been difficult for classical computers.
Hassabis highlights Google’s video generation model Veo, which can infer the physics of materials, liquids, and lighting simply by observing video data—essentially “reverse-engineering” how the world works.
Fridman marvels at this, arguing that Veo must have some level of real-world understanding, pushing back against the idea that AI models are merely advanced pattern matchers.
Hassabis clarifies that while it’s not philosophical or conscious understanding, the model does learn “intuitive physics”—akin to a human child’s grasp of how the world works, allowing it to make accurate predictions.
They reflect on how Veo’s capabilities challenge the embodied cognition theory, which claims that AI must physically interact with the world to understand it. Veo shows that passive observation alone can yield a powerful internal model of the world.
Hassabis concludes that this is a major step toward Artificial General Intelligence (AGI), with Veo and similar systems paving the way for AI to develop comprehensive “world models”—potentially even enabling fully interactive, AI-generated environments in the near future.2
-