Portfolio Careers

Explore the Lead Edge Portfolio to find your next job

Senior Big Data/ML Engineer



Software Engineering, Data Science
India · Remote
Posted on Tuesday, April 18, 2023
At H1, we believe access to the best healthcare information is a basic human right. Our mission is to provide a platform that can optimally inform every doctor interaction globally. This promotes health equity and builds needed trust in healthcare systems. To accomplish this our teams harness the power of data and AI-technology to unlock groundbreaking medical insights and convert those insights into actions that result in optimal patient outcomes and accelerates an equitable and inclusive drug development lifecycle. Visit h1.co to learn more about us.
Data Engineering has teams which are responsible for collecting, curating, normalizing and matching data from hundreds of disparate sources from around the globe. Data sources include scientific publications, clinical trials, conference presentations and claims among others. In addition to developing the necessary data pipelines to keep every piece of information updated in real-time and provide the users with relevant insights, the teams are also building automated, scalable and low-latency systems for the recognition and linking of various types of entities, such as linking researchers and physicians to their scholarly research and clinical trials. As we rapidly expand the markets we serve and the breadth and depth of data we want to collect for our customers, the team must grow and scale to meet that demand.
As a Senior Engineer on the Data Engineering team, you will work alongside a multi-disciplinary team of software engineers, machine learning engineers, product managers, front-end engineers, and designers. You will work on utilizing and/or adapting various types of algorithms to solve challenging business problems in a variety of areas including entity recognition and resolution, natural language understanding, knowledge graphs and information systems. You will also design novel experiments and create implementations to enable model integration into the production stack. Much needs to be built and quickly; so you will need to have a good understanding of system design and an ability to build quickly and iterate.
Roles & Responsibilities
- Build data-based software products that use large amounts of data of various types – structured and unstructured; numeric, text, and graph. These software products will be responsible for ingesting, cleaning, transforming and efficiently storing data.
- Build relevant data processing capabilities and optimization workflows to support large-scale learning from such multi-modal data.
- Implement analytical models (using techniques from NLP, machine learning, deep learning, etc.) that derive insights from this data.
- Implement and scale models and algorithms by incorporating recent advancements in relevant fields of study. This includes adapting, customizing or building upon state-of-the-art AI/ML solutions to support H1’s unique use cases.
- Build quick Proof of Concepts (POC) and take ownership around projects to demonstrate utilization and value, and drive to production-ready solutions.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
You have strong hands-on technical skills including conventional ETL and SQL skills, experience with multiple programming languages like Python, Java or Scala, as well as streaming or other data processing techniques. You are a self-starter with the ability to manage projects through all stages (requirements, design, coding, testing, implementation, and support).
- 5+ years of experience in working with strong engineering teams and deploying products, preferably in big data technologies like Spark or Hadoop (e.g., on AWS EMR).
- Strong coding skills in Python, Java, Scala or any proficient language of choice and stacks supporting large scale data processing and/or machine learning.
- Strong grasp of computer science fundamentals: data structures, algorithmic trade-offs, etc.
- Experience with big data / distributed computing ecosystems (e.g., Spark, Hadoop).
- Experience with streaming technologies (e.g., Kafka, Flink).
- Experience with deploying to AWS/Google/Azure cloud systems.
- Understanding of various distributed file formats such as Apache AVRO, Apache Parquet and common methods in data transformation.
- Strong knowledge and understanding of concepts in machine learning is desirable.
- Experience in utilizing ML and deep learning frameworks (e.g., tensorflow), AutoML techniques (e.g. hyperparameter optimization) and large-scale and distribution training and optimization approaches are a plus.
- Should be willing to manage projects through all the stages (requirements, design, coding, testing, implementation, and support).
Not meeting all the requirements but still feel like you’d be a great fit? Tell us how you can contribute to our team in a cover letter!
- Competitive compensation package including stock options
- Full suite of health insurance options, in addition to generous paid time off
- Pre-planned company-wide wellness holidays
- Retirement options- Health & charitable donation stipends
- Impactful Business Resource Groups
- Flexible work hours & the opportunity to work from anywhere
- The opportunity to work with leading biotech and life sciences companies in an innovative industry with a mission to improve healthcare around the globe