Back to Jobs

Binance

We're hiring

AI Evaluation Specialist

Asia
Full-time

About this role

Job Overview

As an A.I. Agent Evaluation and Optimisation Specialist, you will play a critical role in ensuring both the outstanding performance and continuous improvement of large language model (LLM)-driven autonomous agents. Responsibilities span from designing and implementing robust evaluation frameworks to proactively identifying and executing optimisation strategies that enhance reliability, adaptability, and compliance across the agent lifecycle.

Responsibilities:

  • Design, Develop & Optimise Evaluation Plans: Create structured, risk-aware, and adaptive evaluation and optimisation plans. Align these with user goals, governance requirements, and system architectures. Translate objectives into measurable criteria, scenarios, and optimisation targets.
  • Test Suite Development & Performance Tuning: Develop and curate tests covering standard, edge, and emergent agent behaviours. Collaborate to generate synthetic data and incorporate domain expertise and use hands-on optimisation techniques to improve agent robustness.
  • Multi-Stage Evaluation & Optimisation: Execute controlled (offline) and real-world (online) evaluations, assessing not just outputs but also reasoning steps, tool usage, and workflow execution. Identify and resolve performance bottlenecks, drive fine-tuning, and recommend systemic improvements.
  • Analyse, Diagnose & Optimise: Conduct deep analysis of evaluation results to find performance gaps, failure modes, and optimisation opportunities at both the model and system level. Provide clear, actionable recommendations to directly improve agent efficiency, accuracy, and reliability.
  • Drive Continuous Improvement: Collaborate closely with development teams to translate evaluation and optimisation findings into runtime adaptations, code performance enhancements, architectural upgrades, and targeted model retraining, including prompt engineering and reinforcement learning from human feedback (RLHF) methodologies.
  • Implement Feedback Loops: Establish feedback mechanisms that combine human and machine evaluator input for ongoing monitoring, anomaly detection, and dynamic agent behaviour adjustment, integrating optimisation insights into deployment pipelines.
  • Ensure Compliance and Safety: Maintain up-to-date governance documentation and safety cases, overseeing regulatory, ethical, and operational compliance through both evaluation and optimisation cycles.
  • Cross-Functional Collaboration: Work with A.I. researchers, engineers, and domain experts to align evaluation and optimisation strategies with product objectives and user needs.

Requirements:

  • Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, Data Science, or a related field.
  • Demonstrated hands-on A.I. agent development experience, with a track record of identifying and implementing agent performance improvements.
  • In-depth understanding of large language models (LLMs), their optimisation, and agent system architectures.
  • Experience in both A.I. evaluation methodologies (like benchmarking, online/offline analysis) and direct agent optimisation, such as model fine-tuning or prompt design.
  • Familiarity with software engineering best practices (e.g. TDD, BDD), and deep exposure to AI-specific frameworks, observability, and lifecycle analytics.
  • Proven ability to perform data-driven diagnostics and root cause analysis, with direct contributions to measurable improvement in A.I. agent performance.
  • Strong communication skills, especially for documenting evaluation plans, optimisation strategies, result rationales, and technical recommendations.
  • Effective teamwork and cross-functional feedback process experience, bridging evaluation, development, and operations.
  • Programming skills in Python plus experience with major A.I./ML libraries and APIs, including hands-on development of LLM agents.

Check Your ATS Score

See how well your resume matches this AI Evaluation Specialist position and get instant optimization tips.

Check ATS Score →