www.mosaicml.com

MPT-30B: Raising the bar for open-source foundation models

Introducing MPT-30B, a new, more powerful member of our Foundation Series of open-source models, trained with an 8k context length on NVIDIA H100 Tensor Core GPUs.

Feb 19, 2024

Research

Blazingly Fast LLM Evaluation for In-Context Learning

With MosaicML you can now evaluate LLMs on in-context learning tasks (LAMBADA, HellaSwag, PIQA, and more) hundreds of times faster than other evaluation harnesses. For 70B parameter models, LAMBADA takes only 100 seconds to evaluate on 64 A100 GPUs, and evaluation of a 1.2 trillion parameter model takes less than 12 minutes when using 256 NVIDIA A100 GPUs.

Dec 20, 2023

Research

BioMedLM: a Domain-Specific Large Language Model for Biomedical Text

The Stanford Center for Research on Foundation Models (CRFM) and MosaicML announce the release of BioMedLM, a purpose-built AI model trained to interpret biomedical language. Editorial update: this blog post was revised on 1/30/2023 to reflect name change from PubMed GPT.

Sep 27, 2023

Research

Training Stable Diffusion from Scratch Costs <$160k

We wanted to know how much time (and money) it would cost to train a Stable Diffusion model from scratch using our Streaming datasets, Composer, and MosaicML platform. Our results: it would take us 79,000 A100-hours in 13 days, for a total training cost of less than $160,000. Our tooling not only reduces time and cost by 2.5x, but it is also extensible and simple to use.

Sep 8, 2023

Research

Mosaic LLMs (Part 2): GPT-3 quality for <$500k

Training large language models (LLMs) costs less than you think. Using the MosaicML platform, we show how fast, cheap, and easy it is to train these models at scale (1B -> 70B parameters). With new training recipes and infrastructure designed for large workloads, we enable you to train LLMs while maintaining total customizability over your model and dataset.

Sep 8, 2023

Research

Blazingly Fast Computer Vision Training with the Mosaic ResNet and Composer

Match benchmark accuracy on ImageNet (He et al., 2015) in 27 minutes, a 7x speedup (ResNet-50 on 8xA100s). Reach higher levels of accuracy up to 3.8x faster than existing state of the art (Wightman et al., 2021). Try it out in Composer, our open-source library for efficient neural network training. It’s written in standard, easy-to-use PyTorch, so modify it to suit your needs and build on it!

Sep 5, 2023

Research

5x Faster Image Segmentation Training with MosaicML Recipes

Can’t stop, won’t stop. Earlier this year, we shared a new baseline for semantic segmentation (basically, classifying an image at the pixel level) using DeepLabv3+ model architecture on the ADE20k dataset. Now, we’re introducing recipes for training semantic segmentation models that either reduce time-to-train by up to 5.4x or improve quality by up to +4.6 mIoU. If you want to train your segmentation models on the best ML training platform available, learn more at mosaicml.com/cloud

Sep 5, 2023

Research

MosaicBERT: Pretraining BERT from Scratch for $20

With the MosaicBERT architecture + training recipe, you can now pretrain a competitive BERT-Base model from scratch on the MosaicML platform for $20. We’ve released the pretraining and finetuning code, as well as the pretrained weights.

Sep 5, 2023

Research

Training Stable Diffusion from Scratch for <$50k with MosaicML (Part 2)

We've replicated Stable Diffusion 2 for less than $50k, and we've open-sourced the training code so you can too! This is a 3x cost reduction from our last blog post and an 8x reduction from the original Stable Diffusion 2, making training large-scale diffusion models from scratch more accessible than ever before.

Sep 5, 2023

Research

How We Trained Stable Diffusion for Less than $50k (Part 3)

In our previous blog post, we showed how we used the MosaicML platform, Streaming datasets, and the Composer library to train a Stable Diffusion model from scratch for less than $50,000. Now, we do a deep dive into the technical details behind this speedup, demonstrating how we were able to replicate the Stable Diffusion 2 base model in just 6.8 days.

Sep 5, 2023

Research

Behind the Scenes: Setting a Baseline for Image Segmentation Speedups

We establish a new semantic segmentation baseline of 45.56 mIoU on the ADE20k segmentation benchmark in 3.5 hours on a system with 8x NVIDIA A100 GPUs.

Aug 8, 2023

Research

Announcing MPT-7B-8K: 8K Context Length for Document Understanding

Today, we are releasing MPT-7B-8K, a 7B parameter open-source LLM with 8k context length trained with the MosaicML platform. MPT-7B-8K was pretrained starting from the MPT-7B checkpoint in 3 days on 256 NVIDIA H100s with an additional 500B tokens of data.

Jul 19, 2023

Research

Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs

Introducing MPT-7B, the first entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and matches the quality of LLaMA-7B. MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k.

Jun 14, 2023

Research

Mosaic ResNet Deep Dive

TL;DR: We recently released a set of recipes which can accelerate training of a ResNet-50 on ImageNet by up to 7x over standard baselines. In this report we take a deep dive into the technical details of our work and share the insights we gained about optimizing the efficiency of model training over a broad range of compute budgets.

May 30, 2023

Research

Composer + FFCV: Faster Together

Composer is pushing the envelope on speed and efficiency in model training. Integrating Composer with FFCV, a fast dataloading library from Aleks Madry’s lab at MIT, unlocks new speedup methods by eliminating the dataloader bottleneck often experienced when using CPU-intensive operations in the training loop. The FFCV dataloader is one of the ingredients of our Mosaic ResNet recipe, which demonstrates how algorithmic efficiency can dramatically speed up model training.

May 30, 2023

Connect With The Community

Let’s make ML better, one method at a time.

We want our community to be a safe and inclusive space for all current and future ML practitioners. Learn more in our Community Guidelines and Code of Conduct

We have even more exciting things in the works. Get early access to our technology preview

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

By clicking Sign Up above, you consent to allow Mosaic ML, Inc. to store and process the personal information submitted above to provide you the content requested.