Caltech Open-Sources 1-bit Bonsai Model: 8B Parameters at 1.15GB, 44 tokens/s on iPhone

ChainThink reports that on April 1, 2026, according to monitoring by 1M AI News, PrismML, an AI laboratory co-founded by Caltech mathematician Babak Hassibi, has emerged from stealth and open-sourced its 1-bit Bonsai series of large language models. The flagship model, 1-bit Bonsai 8B, features 8.2 billion parameters and occupies only 1.15 GB of memory—approximately 14 times more compressed than comparable 16-bit models. Two smaller models have also been released: 1-bit Bonsai 4B (0.5 GB) and 1-bit Bonsai 1.7B (0.24 GB).

Bonsai 8B is an end-to-end true 1-bit model, with all embedding, attention, MLP, and output head weights represented solely as +1 or -1, without any high-precision patches. PrismML claims its inference and language understanding capabilities on standard benchmarks are comparable to those of 16-bit full-precision models. The core compression mathematics were developed over several years by the team at Caltech, with intellectual property owned by Caltech and PrismML as the exclusive licensed party. The model was trained on Google TPU v4.

In real-world speed tests, the M4 Pro Mac achieves 136 tok/s, the RTX 4090 achieves 440 tok/s, and the iPhone 17 Pro Max achieves approximately 44 tok/s. A standard 16-bit 8B model cannot fit on any iPhone, and energy consumption is reduced by approximately 4 to 5 times compared to the 16-bit model. PrismML notes that current hardware is not designed for 1-bit inference; the speed and energy advantages primarily stem from reduced memory usage. If future hardware is specifically designed for 1-bit inference, efficiency could improve by another order of magnitude.

PrismML has completed a $16.25 million SAFE and seed round led by Khosla Ventures, Cerberus Capital, and Caltech. Khosla Ventures founder Vinod Khosla praised the achievement as "not a minor iteration, but a major technological breakthrough—a mathematical breakthrough—not just another small model."