Senior AI Evaluation Scientist

steampunk
McLean, VA

Overview

We are seeking an experienced Senior AI Evaluation Scientist to design and lead rigorous evaluation programs for predictive and generative AI systems across our enterprise and client engagements. This role is critical to ensuring that AI solutions are accurate reliable safe and aligned with mission outcomes. The Senior AI Evaluation Scientist will develop evaluation frameworks build automated testing pipelines and act as a subject-matter expert on AI quality risk and performance measurement. This role blends deep technical expertise with analytical rigor experimentation and cross-functional collaboration.

Contributions

  • Lead the design and implementation of comprehensive evaluation frameworks for generative and predictive AI models including accuracy robustness relevance trustworthiness fairness hallucination rates and safety.
  • Develop and maintain automated evaluation pipelines that continuously audit model outputs monitor quality drift and validate alignment with mission-specific constraints.
  • Create custom benchmark datasets challenge sets and adversarial evaluation strategies tailored to client domains and regulatory requirements.
  • Conduct in-depth error analysis model behavior studies and sensitivity assessments to inform iterative improvements in prompts retrieval systems models and orchestration frameworks.
  • Partner with AI Product Engineers LLMOps Engineers and Data Scientists to drive model improvements through structured experimentation A/B testing and scientifically grounded evaluation cycles.
  • Advise teams on measurement methodologies statistical significance and best practices for Trustworthy AI evaluation in alignment with NIST AI RMF MLSecOps and agency governance requirements.
  • Document evaluation results risks and findings for technical and non-technical audiences including engineering teams leadership and government clients.
  • Contribute to the development of standardized tools reusable templates and evaluation components to improve repeatability and quality across engagements.
  • Stay informed of advances in LLM assessment safety science red-teaming methodologies and evaluation frameworks emerging from academia and industry.
  • Mentor junior evaluation staff and help grow Steampunks AI measurement and evaluation capabilities.
  • You will contribute to the growth of our AI & Data Exploitation Practice!

Qualifications

  • Ability to hold a position of public trust with the U.S. government.
  • Bachelors Masters or Ph.D. in Computer Science Statistics Machine Learning Cognitive Science Human-Computer Interaction or a related field.
  • 8 years of experience evaluating machine learning NLP or generative AI systems with strong familiarity with LLMs and retrieval-based architectures.
  • Deep understanding of evaluation metrics statistical testing dataset construction experimental design and model validation methodologies.
  • Hands-on experience with Python and libraries such as PyTorch Hugging Face LangChain scikit-learn and evaluation tooling (LLM-as-a-judge rubric-based evaluators or custom harnesses).
  • Demonstrated experience designing automated evaluation pipelines and integrating them into CI/CD or LLMOps workflows.
  • Strong understanding of AI governance responsible AI principles bias detection fairness metrics and risk identification.
  • Experience working with structured and unstructured datasets across multiple modalities (text tabular documents).
  • Familiarity with vector databases RAG architectures and multi-step LLM workflows.
  • Excellent analytical written and verbal communication skills with the ability to translate evaluation insights into clear technical recommendations.
  • Proven ability to collaborate with cross-functional engineering and product teams while independently driving evaluation strategy.
  • Experience working in agile or iterative development environments and documenting scientific processes clearly.

About steampunk

Steampunk relies on several factors to determine salary including but not limited to geographic location contractual requirements education knowledge skills competencies and experience. The projected compensation range for this position is $140000 to $190000. The estimate displayed represents a typical annual salary range for this position. Annual salary is just one aspect of Steampunks total compensation package for employees. Learn more about additional Steampunk benefits here.

Identity Statement

As part of the application process you are expected to be on camera during interviews and assessments. We reserve the right to take your picture to verify your identity and prevent fraud.

Steampunk is a Change Agent in the Federal contracting industry bringing new thinking to clients in the Homeland Federal Civilian Health and DoD sectors. Through our Human-Centered delivery methodology we are fundamentally changing the expectations our Federal clients have for true shared accountability in solving their toughest mission challenges. As an employee owned company we focus on investing in our employees to enable them to do the greatest work of their careers and rewarding them for outstanding contributions to our growth. If you want to learn more about our story visit .

Required Experience:

Senior IC

Posted 2025-11-23

Recommended Jobs

Medical Accounts Receivable Specialist

Robert Half
Richmond, VA

Description We are looking for a dedicated Medical Accounts Receivable Specialist to join our team in Richmond, Virginia. In this long-term contract role, you will play a key part in managing billing …

View Details
Posted 2025-11-18

Director, Technical Program Management - Card Decisioning Platform

Capital One
McLean, VA

Overview Director, Technical Program Management - Card Decisioning Platform Are you interested in leading programs that deliver on critical business goals and build large scale products & plat…

View Details
Posted 2025-10-09

Client General Manager - Health Sciences

American Express Global Business Travel
Richmond, VA

Amex GBT is a place where colleagues find inspiration in travel as a force for good and - through their work - can make an impact on our industry. We're here to help our colleagues achieve success and…

View Details
Posted 2025-11-19

Service Advisor/Service Writer

LITTLE JOE'S MITSUBISHI
Chesapeake, VA

Automotive Service Advisor  Job Description Leverage your automotive experience, sales talent, and customer service skills in this exciting opportunity! We are seeking an Automotive Service Advi…

View Details
Posted 2025-10-23

Registered Nurse (RN) - Nursing Coordinator - Harbour View Medical Office

Bon Secours Mercy Health
Suffolk, VA

At Bon Secours Mercy Health, we are dedicated to continually improving health care quality, safety and cost effectiveness. Our hospitals, care sites and clinicians are recognized for clinical and oper…

View Details
Posted 2025-11-18

Systems/Mechanical/Aerospace Engineer

The MITRE Corporation
McLean, VA

Why choose between doing meaningful work and having a fulfilling life? At MITRE, you can have both. That's because MITRE people are committed to tackling our nation's toughest challenges-and we're com…

View Details
Posted 2025-11-20

MSF Motorcycle RiderCoach Marine Corps MULTIPLE LOCATIONS

American Management Group, LLC (AMG)
Manassas, VA

MSF Motorcycle RiderCoach Marine Corps  We have an excellent opportunity and positions open at many locations for qualified Motorcycle Safety Foundation Rider Coaches to act as instructors for the…

View Details
Posted 2025-10-23

Patient Services Supervisor - Chesapeake General Hospital

Aramark
Chesapeake, VA

Job Description   Patient Services Supervisors are responsible for a variety of specialized duties related to the receipt, interpretation, and follow-through of patient diet orders in hospital an…

View Details
Posted 2025-09-21

Systems Engineer - Sensor Integration

Peraton
Chantilly, Loudoun County, VA

Program Overview Engineering modernization and integration practices to include Digital Engineering and Agile at Scale for all of NGA, NSG, and ASG. About The Role As a Senior Systems Eng…

View Details
Posted 2025-11-14

Psychiatric Unit Registered Nurse RN PRN

HCA Healthcare
Hopewell, VA

**Description** **Introduction** TriCities Hospital is committed to investing in the latest technology enabling nurses to work more efficiently. **Are you passionate about delivering patient-centered …

View Details
Posted 2025-11-13