OpenAI has released the new evaluation benchmark LifeSciBench, designed to measure AI systems’ capabilities in real-world scientific research scenarios. LifeSciBench is based on 750 expert-crafted tasks covering seven types of research workflows and seven biological domains, sourced from 173 researchers with PhDs and experience in biotechnology or pharmaceutical industries. The benchmark emphasizes the assessment of complex scientific abilities—including evidence integration, experimental design, data analysis, scientific reasoning, and scientific communication—rather than simple factual questions. Over 79% of the tasks require multi-step reasoning, averaging approximately four reasoning steps per question, and include 1,062 real-world scientific data attachments (such as papers, figures, sequence data, and structural files).
OpenAI Launches LifeSciBench to Evaluate AI Systems in Real Scientific Research
TechFlowShare
OpenAI has launched LifeSciBench, a new benchmark for evaluating AI systems in real scientific research. The tool includes 750 expert-designed tasks across seven biology fields, with contributions from 173 PhD-level researchers. It focuses on complex skills such as experimental design and data analysis, with 79% of tasks requiring multi-step reasoning. Real-world assets (RWA) news highlights the inclusion of 1,062 scientific data files. AI + crypto news observers may note the growing intersection of AI and specialized research tools.
Source:Show original
Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information.
Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.