Projects
We would like to thank the Oak Ridge Leadership Computing Facility (OLCF) for granting us access to their Summit and Frontier supercomputers for the realization of these projects as part of the INCITE compute grant supported under Contract DE-AC05-00OR22725 by the DoE Office of Science User Facility. As a part of the collaborative INCITE effort, several project tracks are being pursued - see the details below.
For an overview, see the talks by our team at the 6th Neural Scaling Workshop in Dec 2023:
Irina Rish: Open-Source Foundation Models on Supercomputers: projects and models built by CERC-AAI Lab and INCITE 2023 Collab
Kshitij Gupta, Benjamin Thérien, Adam Ibrahim et al: Continual Pretraining of Foundation Models (paper, blog, tweet, slides, video)
Kshitij Gupta, Daniel Kaplan: Robin Suite of Open-Source Multimodal Foundation Models (paper, blog, tweet, slides, video)
Alexis Roger: Multimodal Alignment: Towards ethical multimodal systems (paper, blog, tweet, slides, video)
Arjun Ashok, Andrew Williams: Time-Series Foundation Models (paper , blog, tweet, slides, video)
Nolano.ai team (Tejas Vaidhya, Ayush Kaushal, Irina Rish)
Irina Rish Introducing Nolano.ai
Ayush Kaushal Nolano: Compression and Fast Inference in Foundation Models (paper , blog, tweet, slides, video)
Ayush Kaushal: Nolano: Introducing Hi-NOLIN - the First Hindi-English LLM ( tweet, blog, slides, video)
A talk by Irina Rish on Complex systems view of large-scale AI systems (video) at Alignment Workshop (Dec 2023).
Project 1: Pretraining LLMs From Scratch
RedPajama-INCITE (RPI) Models (May 2023)
Team: Dan Fu (Stanford/Together), Ce Zheng (ETH Zurich/Together), Huu Nguyen (Ontocord), Quentin Anthony (EleutherAI, University of Montreal/Mila), Irina Rish (university of Montreal/Mila)
Together Blog: Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models
In April-May 2023, we have used Summit to train and release two open-source LLMs, trained on 1T tokens each, using the RedPajama dataset. These two models, RedPajama-INCITE-3B and RedPajama-INCITE-7B, were then fine-tuned externally into chat and instruction-tuned versions, respectively. They are now each downloaded thousands of times per month, and have been ported to (and are currently running on) various environments, including Apple iPhone. We are excited to report that the RedPajama-INCITE-3B has been the best performing 3-billion parameter model since the time of release. After fine-tuning, the RedPajama-INCITE-3B model is even competitive with the 7B-parameter Llama model. Similarly, after fine-tuning, the RedPajama-INCITE-7B model was the best-performing 7B-parameter model at the time of release.
Overall, the Summit supercomputer was critical for training and releasing these models. The compute time on Summit allowed us to validate that it is possible to collect a high-quality pretraining dataset using public resources, and pretrain a competitive language model. Furthermore, this work has helped maintain an open ecosystem for large language models, and made its contribution towards AI democratization via open-source, which is a continuing effort, with more and more open LLMs made available as time progresses (the most recent examples include LLMs such as Falcon, MPT, and Llama-2).
INCITE 9B Models (Aug-Sept 2023)
Team: Kshitij Gupta, Benjamin Therien, Adam Ibrahim, Mats Leon Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothee Lesort
We trained three Pythia-like models but with 9B parameters each, on three different datasets: Pile, SlimPajama, and a mix of both.
Project 2: Continual Pretraining of LLMs
Team: Kshitij Gupta, Benjamin Therien, Adam Ibrahim, Mats Leon Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothee Lesort
Presentations given by the team:
The 6th Scaling workshop: Continual Pretraining of Foundation Models
AI @ Scale: Continual Pretraining of Foundation Models
Paper: "Simple and Scalable Strategies to Continually Pre-train Large Language Models" (submitted).
Earlier version: Continual Pre-Training of Large Language Models: How to (re)warm your model, in Efficient Natural Language and Speech Processing (ENLSP-III) workshop at NeurIPS 2023.
Blog: Continual Learning of Foundation Models: CL-FoMo Suite of 9B and 420M LLMs
Project Summary: The main focus of this project is on developing strategies to continuously pre-train models on new datasets, so as to avoid retraining models from scratch on new datasets. We study the impact of warmup, learning rate and replay strategies on model performance when training models on IID datasets vs sequential datasets.
The access to the Summit Supercomputer was essential to test the impact of these different strategies at scale (410M & 9B parameter models, 600B tokens from The Pile (300B) and SlimPajama (300B)). From this research, we know that we can increase the learning rate ("rewarming") of a pretrained model and then decay it to improve performance on new datasets while minimizing forgetting on old datasets. We can therefore rewarm the learning rate of old checkpoints to continue pretraining on new datasets, enabling transfer between upstream and downstream tasks to the point of outperforming models optimized directly for downstream tasks. This work ultimately contributes to the democratization of AI by showing a way forward to drastically reduce the cost of maintaining the relevancy of foundation models as new datasets are collected.
Project 3: Aligned Multimodal Models
Project 3.1: Developing Robin Suite of Vision-Language Models
Team (Alphabetical Order) Alexis Roger, Andrew R Williams, Daniel Kaplan, Edwin Fennell, George Adamopoulos, Kshitij Gupta, Prateek Humane, Quentin Anthony, Rishika Bhagwatkar, Sun Qi, Yuchen Lu, Irina Rish
Blog: Release of Robin v1.0 - a Suite of Multimodal Models
Project Summary: This project focuses on multimodal models, and more precisely Visual Language Models, which take an image and text as input and output text. We train and evaluate models based on many different large language models (Pythia, GPT, Mistral...) and study their combination with various visual encoders, such as CLIP and SigLIP, within different architectures. We also have a keen interest in the alignment and ethics of these models.
Project 3.2: Aligning Vision-Language Models
Blog: TBA
Papers & presentations:
Alexis Roger, Esma Aimeur, Irina Rish, "Towards Ethical Multimodal Systems", in the First Workshop on AI meets Moral Philosophy and Moral Psychology (MP2) at NeurIPS 2023.
JC Layoun, A Roger, I Rish. Aligning MAGMA by Few-Shot Learning and Finetuning, Montreal AI Symposium 2022. arXiv:2210.14161, 2022. Class project presentation in Neural Scaling Laws course, Winter 2022: slides, video (part 2).
Project Summary: TBA
Project 4: Time-Series Foundation Models
Team: Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Arian Khorasani, George Adamopoulos, Rishika Bhagwatkar, Marin Biloš, Hena Ghonia, Nadhir Hassen, Anderson Schneider, Sahil Garg, Alexandre Drouin, Nicolas Chapados, Yuriy Nevmyvaka, Irina Rish
Project Summary: Over the past years, foundation models have caused a paradigm shift in machine learning due to their unprecedented capabilities for zero-shot and few-shot generalization. However, despite the success in modalities such as natural language processing and computer vision, foundation models for time series forecasting has lagged behind (pun intended :)
We introduce Lag-Llama: the first open-source foundation model for time series forecasting!
Paper: Lag-Llama: Towards Foundation Models for Time Series Forecasting, an extended version of our workshop paper in R0-FoMo:Robustness of Few-shot and Zero-shot Learning in Large Foundation Models workshop at NeurIPS 2023.