Evaluating LLM Generated Task Codes
Published in HIPS@IPDPS, 2026
Recommended citation: Sanjana Yasna, Simon Garcia de Gonzalo, and Michael Robson. "Evaluating LLM Generated Task Codes." Workshop on High-level Parallel Programming Models and Supportive Environments. New Orleans, LA, May 25, 2026. http://mprobson.github.io/files/ParEval_AMT_HIPS_2026.pdf
We extend the ParEval benchmark to evaluate the ability of large language models to generate code for HPX, a representative AMT runtime, testing both closed and open-source models on existing parallel programming tasks plus new AMT-specific prompts This work establishes a comprehensive baseline for LLM performance on under-resourced AMT languages and identifies areas for improvement in developing intelligent assistants for advanced parallel programming frameworks.
Recommended citation: Sanjana Yasna, Simon Garcia de Gonzalo, and Michael Robson. “Evaluating LLM Generated Task Codes.” Workshop on High-level Parallel Programming Models and Supportive Environments. New Orleans, LA, May 25, 2026.
