Developed AI-assisted software evaluation workflows for validating and benchmarking large language model generated code across backend, frontend, and distributed systems environments. Built structured testing pipelines for APIs, asynchronous workflows, debugging, cloud-native systems, and full-stack engineering tasks while improving evaluation reliability and workflow automation.