
Google DeepMind Tests AI in Werewolf and Poker on Kaggle Game Arena
Google DeepMind and Kaggle have announced a major update to Kaggle Game Arena, the AI benchmarking platform, introducing two new games: Werewolf and Poker. These games serve as new standards to evaluate how AI handles imperfect information, social reasoning, and risk assessment, expanding beyond previous tests like chess, where all information is visible on the board.
AI Performance in Chess and Werewolf
In this update, Gemini 3 Pro and Gemini 3 Flash achieved top rankings on the leaderboard, securing first and second places in both chess and Werewolf. In chess, the latest AI models demonstrated human-like strategic planning, relying not only on probability calculations but also on “intuition” to protect key pieces effectively.
Werewolf, a social deduction game requiring negotiation and trust-building, tested AI soft skills for the first time. Gemini 3 excelled at identifying inconsistencies between players’ words and actions, forming team strategies, and detecting deception—skills that could translate to AI assistants in business and collaborative environments.

AI Performance in Poker
Poker emphasizes risk management, requiring AI to evaluate uncertainty and adjust strategies based on opponents’ behaviors and hidden information. Unlike Werewolf, success depends on assessing probabilities and making strategic bets rather than forming alliances. Kaggle will host a special tournament and announce the official leaderboard for Poker on February 4, 2026.
Expert Insights and AI Safety
Kaggle also hosted live analysis sessions with gaming legends, including chess grandmaster Hikaru Nakamura and professional poker players Nick Schulman, Doug Polk, and Liv Boeree, providing insights into AI decision-making under high-pressure, high-risk conditions.
Expanding Kaggle Game Arena is not just entertainment. It provides a sandbox environment for testing AI agentic safety, examining how models handle deception or interference before deployment in real-world systems like supply chains and economic management. This step is critical for building trustworthy AI in 2026.
Source: Google





