Data Engineer - Training Pipelines & Inference
TLDR: Build the data backbone for the next era of AI-powered spatial biology.
Please include a cover letter with your application detailing your qualifications and experience for this position. Describe a deep learning project you have executed. Projects in computer vision for microscopy image analysis are especially relevant. Include a link to a code repository if possible. If you contributed to a joint project, please describe your specific contributions. Briefly discuss the project's results, limitations, and challenges you encountered. Finally, include a link to your GitHub profile, personal website, or similar and/ or any relevant projects at the bottom of your cover letter.
About the Role:
AI@HHMI: HHMI is investing $500 million over the next 10 years to support AI-driven projects and to embed AI systems throughout every stage of the scientific process in labs across HHMI. The Foundational Microscopy Image Analysis (MIA) project sits at the heart of AI@HHMI. Our ambition is big: to create one of the world’s most comprehensive, multimodal 3D/4D microscopy datasets and use it to power a vision foundation model capable of accelerating discovery across the life sciences.
We're seeking a skilled Data Engineer to drive scientific innovation through robust data infrastructure, model training, and inference systems. You'll design, develop, and optimize scalable data pipelines and build multi-node GPU training and inference pipelines for foundational models. You'll also develop tools for ingesting, transforming, and integrating large, heterogeneous microscopy image datasets—including writing production-quality Python code to parse, validate, and transform microscopy data from published research papers, public databases, and internal repositories.
This role requires technical excellence in data engineering and the ability to understand biological research contexts to ensure data integrity and scientific validity. Your work will directly support computational research initiatives, including machine learning and AI applications.
You'll collaborate closely with multidisciplinary teams of computational and experimental scientists to define and implement best practices in data engineering, ensuring data quality, accessibility, and reproducibility. You'll maintain detailed documentation, potentially mentor junior engineers, and automate workflows to streamline the path from raw data to scientific insight.
What we provide:
A competitive compensation package, with comprehensive health and welfare benefits.
A supportive team environment that promotes collaboration and knowledge sharing.
The opportunity to engage with world-class researchers, software engineers and AI/ML experts, contribute to impactful science, and be part of a dynamic community committed to advancing humanity’s understanding of fundamental scientific questions.
Amenities that enhance work-life balance such as on-site childcare, free gyms, available on-campus housing, social and dining spaces, and convenient shuttle bus service to Janelia from the Washington D.C. metro area.
Opportunity to partner with frontier AI labs on scientific applications of AI (see ).
What you’ll do:
Design and implement scalable, robust data, model training and inference pipelines for foundational microscopy datasets & vision foundation models. Deploy such pipelines on multi-node GPU environments and make data & trained models publicly available.
Stay up to date with scientific literature to understand data context and processing requirements
Document data provenance and transformation steps comprehensively
Apply statistical tools and programming languages (e.g., Python, R) to analyze large datasets, develop custom functions, and extract actionable insights through effective visualization.
Establish and maintain data standards, formats, workflows, and documentation to ensure data quality, accessibility, and reproducibility across projects.
Collaborate with interdisciplinary teams, potentially mentor junior engineers, and direct or assist in directing the work of others to meet project goals while advising stakeholders on data strategies and best practices.
What you bring:
Bachelor’s degree in Computer Science, Data Science, Statistics, Applied Mathematics, or a related field with 3+ years of experience applying and customizing data mining, model training & inference methods and techniques. An equivalent combination of education and relevant experience will be considered.
Experience with data formats such as Zarr, Parquet, HDF5, and efficient IO (e.g., webdataset).
Experience with volumetric 3D/4D microscopy data analysis tools.
Experience with high performance compute environments (cloud-based and slurm/lsf clusters) and model deployment platforms (e.g., Kubernetes, AWS SageMaker, Google Vertex AI, HF Inference).
Experience with distributed data processing, Multi-node GPU processing and ML development frameworks such as PyTorch and/or JAX
Excellent technical documentation and communication skills
Experience in building scalable data solutions, working with big data technologies, and ensuring data quality and accessibility.
Expertise in utilizing data visualization libraries and software (e.g., Matplotlib, R, Jupyter notebooks).
Detail-oriented, creative, and organized team player with strong communication skills and a collaborative mindset.
Able to effectively manage time, prioritize tasks, and clearly convey complex data concepts to technical and non-technical audiences.
Physical Requirements:
Remaining in a normal seated or standing position for extended periods of time; reaching and grasping by extending hand(s) or arm(s); dexterity to manipulate objects with fingers, for example using a keyboard; communication skills using the spoken word; ability to see and hear within normal parameters; ability to move about workspace. The position requires mobility, including the ability to move materials weighing up to several pounds (such as a laptop computer or tablet).
Persons with disabilities may be able to perform the essential duties of this position with reasonable accommodation. Requests for reasonable accommodation will be evaluated on an individual basis.
Please Note:
This job description sets forth the job’s principal duties, responsibilities, and requirements; it should not be construed as an exhaustive statement, however. Unless they begin with the word “may,” the Essential Duties and Responsibilities described above are “essential functions” of the job, as defined by the Americans with Disabilities Act.
Compensation Range
Data Engineer I: $86,181.60 (minimum) - $107,727.00 (midpoint) - $140,045.10 (maximum)
Data Engineer II: $98,039.20 (minimum) - $122,549.00 (midpoint) - $159,313.70 (maximum)
Data Engineer III: $112,629.60 (minimum) - $140,787.00 (midpoint) - $183,023.10 (maximum)
Pay Type: Salary
HHMI’s salary structure is developed based on relevant job market data. HHMI considers a candidate's education, previous experiences, knowledge, skills and abilities, as well as internal consistency when making job offers. Typically, a new hire for this position in this location is compensated between the minimum and the midpoint of the salary range.
#LI-BG1
Recommended Jobs
Au Pair
Experienced Host Family in need of an English Speaking Rematch in Virginia!!!We have wonderful relationships with all of our past 3 au pairs and sadly have brought in an au pair through the agency who…
Emergency Management Specialist - Federal Coordinating Center Support
Come make your mark with Watermark! &##127894;️ FOUNDED BY USAF VETERANS in 2007, we are proud to be a Service-Disabled Veteran Owned Small Business. &##127758; SUBJECT MATTER EXPERTS specializi…
Superintendent
REMOTE, EQUITY, CPG, DTC This Jobot Job is hosted by: Dylan Currier Are you a fit? Easy Apply now by clicking the "Apply" button and sending us your resume. Salary: $140,000 - $175,000 per y…
Sales Development Representative
Our client is a premier provider of SaaS solutions for the financial services industry, specializing in Fair Lending and Regulatory Compliance . They are a high-growth subsidiary of a global powerh…
Billing Specialist
We are looking for a detail-oriented, reliable Billing Clerk to correspond with customers regarding payments, issue invoices, and process credit memos. The Billing Clerk will also be tasked with upda…
Construction Superintendent (Federal Projects)
Experienced Superintendent needed for Federal Construction Projects This Jobot Job is hosted by: Kal Mayer Are you a fit? Easy Apply now by clicking the "Apply" button and sending us your resum…
Toyota Shop Foreman( Experience Necessary )
SUMMARY Toyota Hampton is looking for certified Toyota technician for a shop foreman position to join our team. Experience is necessary. Are you ready to be a leader? Come and see what we have to of…
Outside Sales Representative
Are you a driven sales professional ready to dominate the Mid-Atlantic paper market? Join a premier towel and tissue manufacturer and take control of a thriving territory! You will enjoy the freedom o…
Two Year Old Teacher
Summary Just for Kids Preschool and Child Care Center is hiring responsible, dependable and energetic teachers that can be a great asset to our center, this is a rewarding opportunity for care g…
Original Equipment Manufacturer (OEM) Property Administrator_TS/SCI with Polygraph
Public Trust: None Requisition Type: Regular Your Impact Own your opportunity to serve as a critical component of our nation’s safety and security. Make an impact by using your expertise t…