Nous Research open-sources Lighthouse Attention, achieving a 17x speed boost on B200

AIMPACT News, May 16 (UTC+8): According to monitoring by Beating, Nous Research has open-sourced Lighthouse Attention, a long-context pretraining mechanism. When processing 512K-length text on a single B200 GPU, this approach achieves approximately 17x faster computation speed compared to traditional mechanisms, and delivers 1.4x to 1.7x end-to-end training acceleration at 98K length. Traditional attention mechanisms require computing pairwise relationships between all tokens, causing computational costs to surge quadratically as text length increases. Lighthouse Attention adopts a two-stage approach: first, it rapidly scans compressed summaries of the text at multiple levels, scores and selects key segments to form a shorter sequence, then directly feeds it to the existing efficient operator FlashAttention. By completely decoupling the selection logic from the core kernel, developers avoid the need to write low-level code or introduce additional training objectives. Previous acceleration methods using similar ideas often suffered side effects, causing models to lose their ability to carefully process text word-by-word after becoming accustomed to skipping. To avoid this pitfall, the research team trained the model primarily using the accelerated mode, only briefly switching back to full attention computation at the very end of training for fine-tuning. In real-world tests involving a 530-million-parameter model trained on 50 billion tokens, the resulting model not only significantly reduced training time but also matched or even surpassed the performance of baseline models trained entirely with traditional attention. (Source: BlockBeats)