Software Engineer, Model Performance

Full Time
London, UK
1 week ago

At Google DeepMind, we value diversity of experience, knowledge, backgrounds and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy, or related condition (including breastfeeding) or any other basis as protected by applicable law. If you have a disability or additional need that requires accommodation, please do not hesitate to let us know.

 

Snapshot

We are looking for a software engineer passionate about improving the performance of cutting-edge ML models on hardware accelerators. You will be part of a team responsible for deploying models, e.g. large language models (LLMs), at scale for use throughout Alphabet. This involves working across the stack, from ML frameworks to compilers, with the aim to serve models at maximum efficiency.

About Us

Artificial Intelligence could be one of humanity’s most useful inventions. At Google DeepMind, we’re a team of scientists, engineers, machine learning experts and more, working together to advance the state of the art in artificial intelligence. We use our technologies for widespread public benefit and scientific discovery, and collaborate with others on critical challenges, ensuring safety and ethics are the highest priority.

The Role

The rising use of large language models (LLMs) demands efficient and performant serving solutions. As a member of the deployment team you will help us improve the efficiency of our model serving stack, and optimize the performance of the latest models on Google’s fleet of hardware accelerators - throughout the entire LLM deployment lifecycle.

This involves:

  • Collaborating with research teams to ensure models are production-ready.
  • Optimizing the serving environment with other infrastructure teams.
  • Streamlining the release process for cutting-edge research.
  • Improving low-level performance of models on hardware.
  • Enhancing request handling and distribution across systems.

This role provides an opportunity to work on a broad set of problems and gain a broad understanding of ML models and hardware performance.

Key responsibilities:

Depending on your skills and interests, some of your responsibilities will be:

  • Improve efficiency of ML model serving on hardware accelerators
  • Profile models to identify performance bottlenecks and opportunities
  • Write low-level code targeting hardware accelerators
  • Implement model sharding techniques to efficiently partition models across accelerators
  • Identify the best hardware setup for deploying a diverse set of models
  • Work closely with ML compiler teams to improve efficiency
  • Design and implement optimisations for distributed serving systems, e.g. reducing network transfers and redundant computations
About You

In order to set you up for success as a Software Engineer at Google DeepMind,  we look for the following skills and experience:

  • Interpersonal skills, such as discussing technical ideas effectively with colleagues
  • Excellent knowledge of either C++ or Python

In addition we are looking for experience with at least two of the following:

  • Experience programming hardware accelerators (GPUs, TPUs etc) via ML frameworks (e.g. JAX, PyTorch) or low-level programming models (e.g. CUDA, OpenCL)
  • Profiling software to find performance bottlenecks
  • Leveraging compiler infrastructure to improve performance on hardware
  • Distributed ML systems optimization
  • Training and using large models (>10 billion parameters)
  • Interest in AI and basic knowledge of AI algorithms and models (e.g. Transformer)

Application deadline: 5pm BST, Friday 3rd May 2024