Addressing the gaps in human intelligence assessment and GenAI benchmarking: a comparative analysis of evaluation frameworks and practical implementation with SparkBeyond

Brauner, Joshua (A.A. 2024/2025) Addressing the gaps in human intelligence assessment and GenAI benchmarking: a comparative analysis of evaluation frameworks and practical implementation with SparkBeyond. Tesi di Laurea in AI frontiers: large language models, Luiss Guido Carli, relatore Simone Di Somma, pp. 102. [Master's Degree Thesis]

[img]
Preview
PDF (Full text)
Download (4MB) | Preview

Abstract/Index

Fundamentals of artificial intelligence. The evaluation gap. Defining intelligence. Intelligence as a concept. Definition of artificial intelligence. The problem in emulating human intelligence. Measuring intelligence. Measuring human intelligence. Measuring artificial intelligence. Dynamic approaches for evaluation. A dynamic benchmarking evaluation framework for LLMs and the impact of complexity. Theoretical foundation behind complexity: the illusion of thinking. SparkBeyond case study: development of a strong dynamic benchmarking framework and agents testing. Learnings, what to expect next.

References

Bibliografia: pp. 98-102.

Thesis Type: Master's Degree Thesis
Institution: Luiss Guido Carli
Degree Program: Master's Degree Programs > Master's Degree Program in Data Science e Management (LM-91)
Chair: AI frontiers: large language models
Thesis Supervisor: Di Somma, Simone
Thesis Co-Supervisor: Italiano, Giuseppe Francesco
Academic Year: 2024/2025
Session: Autumn
Deposited by: Alessandro Perfetti
Date Deposited: 24 Feb 2026 14:01
Last Modified: 24 Feb 2026 14:01
URI: https://tesi.luiss.it/id/eprint/44953

Downloads

Downloads per month over past year

Repository Staff Only

View Item View Item