Efficient Systems for Foundation Models

Workshop at the International Conference on Machine Learning (ICML) 2023.

Banner Code it, run it, crash it–restart it.

🔥 the gist

➡️ ES-FoMo 2023 was a blast! Thank you for joining us at ICML in Hawaii, and stay tuned for a 2024 edition.

  • what? A workshop to bring together interdisciplinary experts working on the emerging research questions and challenges associated with foundation model training and inference.
  • when & where?
    • Check-out the recorded talks and panels, on the official ICML platform (videos are available only for individuals with an ICML registration until end of August, but will be released widely after);
    • Check-out the accepted papers on OpenReview.
  • questions? Contact us at esfomo.workshop@gmail.com.

Banner Awards and post-workshop happy hour sponsored by Together.


📆 the plan

All times HST, UTC-10. Find us in Ballroom A (floor L4) of the convention center.

  Topic Speaker
8:55am Opening remarks  
9:00am 🔥 Session I: Large-Scale Distributed Pretraining  
  Using Megatron to Train Large Language Models Deepak Narayanan
(Microsoft Research)
  Distributed Systems for Decentralized AI Ce Zhang
(ETH, Together)
  Training Large Language Models on Cerebras Wafer-Scale Clusters AI Natalia Vassilieva
(Cerebras)
10:10am Coffee break  
10:25am 🎤 Contributed Talk 1  
  SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores Yi Wu
(Tsinghua University, Shanghai Qi Zhi Institute)
10:40am 🎤 Contributed Talk 2  
  Finetuning Language Models with Just Forward Passes Sadhika Malladi
(Princeton University)
10:55am 🚀 Session II: Efficient Inference  
  The Case for 4-bit Inference Tim Dettmers
(University of Washington)
  Efficienly Scaling Transformer Inference Aakanksha Chowdhery
(Google DeepMind)
11:55am 🎤 Contributed Talk 3  
  Memory-Efficient Selective Finetuning Antoine Simoulin
(Meta)
12:10pm Lunch break  
1:00pm 🧑‍🎓 Poster Session  
2:15pm 💬 Panel: Large Language Models Tooling Across Industry and Academia  
  Anna Goldie (Anthropic), Rishi Bommasani (Stanford University), Susan Zhang, Emily Webber (AWS), James Bradbury (Google DeepMind)  
  Moderated by Abhi Venigalla (Mosaic.ML), Dylan Patel (SemiAnalysis).  
3:15pm Coffee break  
3:30pm 🎤 Contributed Talk 5  
  Fast Causal Attention with Dynamic Sparsity Daniele Paliotta
(University of Geneva)
     
3:45pm ⚙️ Session III: Deep Optimization  
  PyTorch 2.x: Faster, More Pythonic, and as Dynamic as Ever Natalia Gimelshein
(OpenAI)
  High-Performance Kernel Programming with Triton Philippe Tillet
(OpenAI)
4:45pm 🏅 Awards  
6:00pm 🎉 Post-workshop happy hour sponsored by Together
  RSVP on Partiful to get a ticket!  


🦾 the pitch

As models increase in size and training budget, they not only systematically improve in upstream quality, but also exhibit novel emergent capabilities. This increase in scale raises proportionate difficulties for practitioners: foundation model training and inference lie at a unique interdisciplinary crossroad, combining open problems in algorithms, system design, and software engineering.

Machine learning practitioners are key stakeholders here: on the one hand, researchers may contribute algorithmic insights and novel methods to improving training and inference of large models; on the other hand, novel research findings may be best demonstrated at scale—which may require training models as efficiently as possible to make the best use of available resources.

The goal of this workshop is to bring together interdisciplinary experts working on the emerging research questions and challenges associated with foundation model training and inference. We welcome submissions around training and inference systems/algorithms for foundation models, focusing on scaling-up or on reducing compute, time, memory, bandwidth, and energy requirements. Notably, we encourage submissions concerning the entire spectrum of foundation models: from BERT-sized Transformers, to large models with 100B+ parameters. Topics include but are not limited to (see our 📝 call for papers for details):

  • Training and inference systems, either distributed at large scale or in resource-constrained scenarios;
  • Algorithms for improved training and inference efficiency;
  • Systems for foundation models, such as novel programming languages or compilers.


🧑‍🏫 the speakers


💬 the panelists (& moderators)


😎 the organizers


😍 the sponsors