Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen:
http://dx.doi.org/10.25673/122702| Titel: | A benchmark of expert-level academic questions to assess AI capabilities |
| Autor(en): | Kalinin, Mikhail [und viele weitere] |
| Erscheinungsdatum: | 2026 |
| Art: | Artikel |
| Sprache: | Englisch |
| Zusammenfassung: | Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve more than 90% accuracy on popular benchmarks such as Measuring Massive Multitask Language Understanding1, limiting informed measurement of state-of-the-art LLM capabilities. Here, in response, we introduce Humanity’s Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be an expert-level closed-ended academic benchmark with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable but cannot be quickly answered by internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a marked gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai. |
| URI: | https://opendata.uni-halle.de//handle/1981185920/124647 http://dx.doi.org/10.25673/122702 |
| Open-Access: | Open-Access-Publikation |
| Nutzungslizenz: | (CC BY-NC-ND 4.0) Creative Commons Namensnennung - Nicht kommerziell - Keine Bearbeitungen 4.0 International |
| Journal Titel: | Nature |
| Verlag: | Nature Publ. Group |
| Verlagsort: | London [u.a.] |
| Band: | 649 |
| Originalveröffentlichung: | 10.1038/s41586-025-09962-4 |
| Seitenanfang: | 1139 |
| Seitenende: | 1146 |
| Enthalten in den Sammlungen: | Open Access Publikationen der MLU |
Dateien zu dieser Ressource:
| Datei | Größe | Format | |
|---|---|---|---|
| s41586-025-09962-4.pdf | 3.39 MB | Adobe PDF | Öffnen/Anzeigen |
Open-Access-Publikation