LlamaIndex Launches LiteParse v2.0, Rewritten in Rust with Speed Improvements of Up to 100x

ME AI News, according to monitoring by Beating, LlamaIndex has announced a complete rewrite of its document parsing library, LiteParse, in Rust, releasing version 2.0. The rebuilt core parser achieves up to 100x faster processing for small documents and nearly 3x faster parsing for large documents. The rewrite aims to provide a local, ultra-fast, and large-model-free spatial layout parsing foundation for AI agents and Retrieval-Augmented Generation (RAG) pipelines. LiteParse 2.0 retains its design for local, model-independent operation, integrating a deeply customized fork of PDFium for spatial layout analysis and combining with the tesseract-rs library to enable local optical character recognition (OCR). The tool currently supports PDF and Office documents including DOCX, XLSX, and PPTX. The parser projects text into a two-dimensional spatial layout, outputting structured text that preserves positional and layout relationships, enabling high-fidelity positioning and contextual reference for large models with minimal power consumption. For ecosystem integration and distribution, LlamaIndex offers native package support across major runtimes. Developers can quickly integrate into their workflows via pip install liteparse in Python, npm i @llamaindex/liteparse in JavaScript, or through Rust’s Cargo registry. Thanks to its Rust-based foundation, the new version compiles to WebAssembly, enabling local execution in browsers and edge computing nodes. It should be noted that due to environmental constraints, OCR functionality is not built-in under WebAssembly; developers must implement file scanning via external callbacks (e.g., invoking tesseract.js). (Source: BlockBeats)