PrismML Launches 1.58-Bit Ternary Bonsai Model with 9x Fewer Parameters and Enhanced Intelligence

According to ME News, on April 17 (UTC+8), monitoring by Beating revealed that PrismML has released the Ternary Bonsai series of language models, utilizing a 1.58-bit (ternary weights) technique to reduce GPU memory usage to one-ninth that of a 16-bit model while maintaining high performance. The series includes three parameter sizes: 8B, 4B, and 1.7B, and is now open-sourced on Hugging Face with native support for Apple devices. The term “1.58-bit model” refers to constraining neural network weights to only three values: {-1, 0, +1}. Compared to earlier ultra-compressed 1-bit models (weights limited to {-1, +1}), introducing the “0” value effectively eliminates redundant connections, enabling the model to retain sophisticated reasoning capabilities despite its extremely small size. The Ternary Bonsai 8B weight file is only 1.75 GB and achieves an average benchmark score of 75.5—5 points higher than its own 1-bit version and significantly outperforming similar dense models like Qwen3 in terms of “intelligence density” (performance per GB of GPU memory). Energy efficiency and inference speed are another core advantage of this series. On the iPhone 17 Pro Max, the 8B version achieves a speed of 27 tokens per second, with energy efficiency improved by approximately 3 to 4 times. For developers seeking to deploy high-performance AI on edge devices such as smartphones and laptops, this means achieving near-full-precision intelligence with minimal memory overhead. Currently, the Ternary Bonsai models are natively supported on Apple devices via the MLX framework, and the model weights are distributed under the Apache 2.0 license. (Source: BlockBeats)