Lead Data Engineer, Genome Informatics


: $90,875.00 - $148,460.00 /year *

Employment Type

: Full-Time


: Information Technology

Loading some great jobs for you...

The Regeneron Genetics Center (RGC) is a wholly-owned subsidiary of the Company, whose goals are to apply large scale human genetics to identify new drug targets and to guide the development of therapeutics programs and precision medicine. Building upon Regeneron's strengths in mouse genetics and genetics-driven drug discovery and development, the RGC specializes in ultra-high-throughput exome sequencing, large scale informatics and data analysis encompassing genomics and electronic health records, and translating genetic discoveries into new biology and drug discovery opportunities. The RGC leverages multiple approaches including large population based studies, Mendelian genetics and family based studies, founder population genetics, and large-scale disease focused projects and has developed a network of over 50 collaborations with research organizations around the world. Including some of the largest sequencing studies in the world, such as the DiscovEHR study in collaboration with Geisinger Health System, and an initiative to sequence 500,000 participants with the UK Biobank, the RGC has built one of the largest human genetics databases, including sequence data from over several hundred thousand participants and rapidly growing. Our interests encompass a breadth of different areas across all therapeutic areas and the RGC is highly integrated into all facets of research and development at Regeneron. Program goals include target discovery, indication discovery, and patient-disease stratification. Objectives include advancing basic science around the world through public sharing of discoveries, providing clinically-valuable insights to physicians and providers of collaborating health-care systems, improving patient outcomes, and identifying novel targets for drug development.

The RGC's Genome Informatics team leads the primary and secondary analysis of more than 500,000 samples a year, including production pipelines, cloud-compute infrastructure, and sequencing and variant quality control. Working closely with other RGC teams, our extensive genomics R&D portfolio supports multi-omics applications (RNA, long reads), unprecedented-scale variant calling, disease association studies, and loci-specific analyses that directly impact cutting-edge drug development.
The Senior Production Engineer will lead the innovation, design and development of the RGC's production genomics compute infrastructure. Under the direction of the GI-Prod Director, this role will apply the latest hardware, software and cloud technologies to innovative solutions in genomics data structuring, distributed compute workflows, and large-scale data manipulation and mining. This role requires extensive interaction with Sequencing, LIMS, IT and Compliance operations across Regeneron and with external partners (AWS, Databricks, DNAnexus).
  • Optimize and innovate genomics workflows and architecture for >500k sample-per-year production throughput
  • Interact with Regeneron IT and external technology partners to ensure 24/7 uptime and a rapid development cycle
  • Lead GI-Prod software development best practices (code repositories, SDLC, continuous integration testing)
  • Working with GI-Prod Leads, develop tools for GI-Prod and RGC users to facilitate at-scale, at-speed genomics (QC, variant calling, multi-omics)
  • Work closely with RGC Data Engineering to integrate production with distributed compute environment (Databricks, Spark)

Bachelors in Computer Science, Software Engineering or similar with 4-8+ years experience
Expert software engineer and developer (version control, SDLC)
Cloud Platform (AWS, GCP)
Standard bioinformatics tools (Samtools, BCFtools, VCFtools, BWA, GATK, Picard, BEDtools, PLINK)
Genomics data formats and analysis
Demonstrated experience in development and maintenance of production-quality software and compute infrastructure
Preferred: Distributed compute platforms
Preferred: Leadership of development team
Preferred: Compliance environments (e.g. FISMA, HIPAA)

This is an opportunity to join our select team that is already leading the way in the Pharmaceutical/Biotech industry. Apply today and learn more about Regeneron's unwavering commitment to combining good science & good business.

To all agencies: Please, no phone calls or emails to any employee of Regeneron about this opening. All resumes submitted by search firms/employment agencies to any employee at Regeneron via-email, the internet or in any form and/or method will be deemed the sole property of Regeneron, unless such search firms/employment agencies were engaged by Regeneron for this position and a valid agreement with Regeneron is in place. In the event a candidate who was submitted outside of the Regeneron agency engagement process is hired, no fee or payment of any kind will be paid.

Regeneron is an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability status, protected veteran status, or any other characteristic protected by law.
Associated topics: data analyst, data analytic, data center, data management, data quality, data scientist, data warehousing, database, mongo database, mongo database administrator * The salary listed in the header is an estimate based on salary data for similar jobs in the same area. Salary or compensation data found in the job description is accurate.

Launch your career - Create your profile now!

Create your Profile

Loading some great jobs for you...