YC Partner Suggests AI Should Evolve Like Scientists by Writing Self-Improving Code

ME AI message, according to monitoring by Beating, Diana Hu, partner at Y Combinator, pointed out on X that the future frontier lies not in simply scaling up parameters, but in building thin software layers on top of foundation models that enable AI to write its own rules for solving problems (executable world models). AI can continuously test, modify, and simplify code based on execution results, without requiring expensive fine-tuning of the base model itself. This path of gradient-free code learning validates the heuristic learning paradigm proposed last month by Wang Jiayi, a core member of OpenAI’s post-training team. Traditional reinforcement learning requires thousands of trials to teach an AI a task, forcibly embedding experience into the black box of a neural network—consumption-heavy and prone to forgetting. In contrast, Wang Jiayi’s experiment achieved perfect scores on the Atari Breakout game without adjusting any parameters of the large model, relying solely on the model writing Python code and finding bugs to refine rules. This demonstrates that knowledge can be entirely stored in human-readable, testable code systems rather than incomprehensible neural network weights. According to YC co-founder Paul Graham, the cycle of writing code, validating it, and compressing it closely mirrors the daily work of scientists. Large models do not need to reconstruct their “brains”; instead, they act like scientists—formulating hypothesis models as code for new environments, running experiments to validate them, and distilling the most concise rules to solve problems. The process of finding the shortest program is also the ultimate standard for measuring AI efficiency according to ARC-AGI. The most critical advantage is that gradient-free learning can directly ride on the improving capabilities of underlying large models: as these models grow smarter, the code and strategies generated by agents become exponentially stronger. Building upon Richard Sutton’s famous “The Bitter Lesson,” gradient-free code learning is charting an entirely new S-curve. As large models’ coding abilities surge, the path of AI self-evolution is ushering in the next generation of AI paradigms. (Source: MLion)