Caltech Open-Sources 1-bit Bonsai Model: 8B Parameters at 1.15GB, 44 tokens/s on iPhone

iconChainthink
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
On-chain news: Caltech’s PrismML, led by Babak Hassibi, has open-sourced the 1-bit Bonsai AI models. The 8B variant has 8.2 billion parameters, requires 1.15GB of memory, and runs at 44 tokens per second on an iPhone 17 Pro Max. AI + crypto news: The model uses 4 to 5 times less energy than 16-bit versions. PrismML has raised $16.25 million in SAFE and seed funding from Khosla Ventures, Cerberus Capital, and Caltech.

ChainThink reports that on April 1, 2026, according to monitoring by 1M AI News, PrismML, an AI laboratory co-founded by Caltech mathematician Babak Hassibi, has emerged from stealth and open-sourced its 1-bit Bonsai series of large language models. The flagship model, 1-bit Bonsai 8B, features 8.2 billion parameters and occupies only 1.15 GB of memory—approximately 14 times more compressed than comparable 16-bit models. Two smaller models have also been released: 1-bit Bonsai 4B (0.5 GB) and 1-bit Bonsai 1.7B (0.24 GB).


Bonsai 8B is an end-to-end true 1-bit model, with all embedding, attention, MLP, and output head weights represented solely as +1 or -1, without any high-precision patches. PrismML claims its inference and language understanding capabilities on standard benchmarks are comparable to those of 16-bit full-precision models. The core compression mathematics were developed over several years by the team at Caltech, with intellectual property owned by Caltech and PrismML as the exclusive licensed party. The model was trained on Google TPU v4.


In real-world speed tests, the M4 Pro Mac achieves 136 tok/s, the RTX 4090 achieves 440 tok/s, and the iPhone 17 Pro Max achieves approximately 44 tok/s. A standard 16-bit 8B model cannot fit on any iPhone, and energy consumption is reduced by approximately 4 to 5 times compared to the 16-bit model. PrismML notes that current hardware is not designed for 1-bit inference; the speed and energy advantages primarily stem from reduced memory usage. If future hardware is specifically designed for 1-bit inference, efficiency could improve by another order of magnitude.


PrismML has completed a $16.25 million SAFE and seed round led by Khosla Ventures, Cerberus Capital, and Caltech. Khosla Ventures founder Vinod Khosla praised the achievement as "not a minor iteration, but a major technological breakthrough—a mathematical breakthrough—not just another small model."

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.