Google’s Gemini 3 is living up to the hype and creating games in one shot

November 19, 2025 admin

Google’s Gemini 3 is finally here, and we’re impressed with the results, especially when it comes to building simple games.

Gemini 3 Pro is an impressive model, and early benchmarks confirm it.

For example, it tops the LMArena Leaderboard with a score of 1501 Elo. It also offers PhD-level reasoning with top scores on Humanity’s Last Exam (37.5% without the usage of any tools) and GPQA Diamond (91.9%).

Real life results also back these numbers.

Pietro Schirano, who created MagicPath, a vibe coding tool for designers, says we’re entering a new era with Gemini 3.

In his tests, Gemini 3 Pro successfully created a 3D LEGO editor in one shot. This means a single prompt is enough to create simple games in Gemini 3, which is a big deal if you ask me.

I asked Gemini 3 Pro to create a 3D LEGO editor.

In one shot it nailed the UI, complex spatial logic, and all the functionality.

We’re entering a new era. pic.twitter.com/Y7OndCB8CK

— Pietro Schirano (@skirano) November 18, 2025

LLMs have been traditionally bad with games, but Gemini 3 shows some improvements in that direction.

It’s also amazing at games.

It recreated the old iOS game called Ridiculous Fishing from just a text prompt, including sound effects and music. pic.twitter.com/XIowqGt4dc

— Pietro Schirano (@skirano) November 18, 2025

This aligns with Google’s claims that Gemini 3 Pro redefines multimodal reasoning with 81% on MMMU-Pro and 87.6% on Video-MMMU benchmarks.

“It also scores a state-of-the-art 72.1% on SimpleQA Verified, showing great progress on factual accuracy,” Google noted in a blog post.

“This means Gemini 3 Pro is highly capable of solving complex problems across a vast array of topics like science and mathematics with a high degree of reliability.”

Gemini 3 is impressive in my early tests, but adherence remains an issue

I’ve been using Claude Code for a year now, and it’s been a great help with my Flutter/Dart projects.

Gemini 3 is a better model than Claude Sonnet 4.5, but there are some areas where Claude shines.

So far, no model has come close to Claude Code, particularly with adherence, and Gemini 3 is no exception.

One of the areas is adherence.

I personally found Claude Code better for following instructions. Likewise, Claude Code is also a better CLI than Gemini 3 Pro, which gives it an edge over competitors.

For everything else, Gemini 3 is a better choice, especially if you’ve been using Gemini 2.5 Pro.

If you use LLMs, I’d recommend sticking to Sonnet 4.5 for regular tasks and Gemini 3 Pro for complex queries.