OpenAI on Monday published benchmark results showing its newest model, GPT-5.4, surpasses the median human performance across a standardized suite of desktop productivity tasks, marking what executives called the beginning of the agentic era for enterprise software.
The evaluation, conducted by independent researchers at Stanford's Human-Centered AI Institute, measured GPT-5.4 against 200 knowledge workers on tasks including complex spreadsheet modeling, multi-document synthesis, code refactoring, and cross-application workflow automation. The model outperformed the human median on 14 of 18 task categories.
"This is not a parlor trick," said OpenAI CEO Sam Altman at a product briefing in San Francisco. "We are measuring real work output in real desktop environments. These numbers represent genuine economic value that can be created with these systems today."
Google moved simultaneously, releasing Gemma 4, its open-weight enterprise model, with an 800-billion-parameter version that the company claims achieves comparable performance to GPT-5.4 on coding and reasoning benchmarks while offering on-premises deployment for organizations with data sovereignty requirements.
A survey by Deloitte released concurrently showed 61 percent of Fortune 500 CTOs now expect AI agents to handle at least 30 percent of current knowledge-worker task volume within 18 months, up from 29 percent in January.