MPT-30B: Raising the bar for open-source foundation models
Introducing MPT-30B, a new, more powerful member of our Foundation Series of open-source models, trained with an 8k context length on NVIDIA H100 Tensor Core GPUs.
Blazingly Fast LLM Evaluation for In-Context Learning
With MosaicML you can now evaluate LLMs on in-context learning tasks (LAMBADA, HellaSwag, PIQA, and more) hundreds of times faster than other evaluation harnesses. For 70B parameter models, LAMBADA takes only 100 seconds to evaluate on 64 A100 GPUs, and evaluation of a 1.2 trillion parameter model takes less than 12 minutes when using 256 NVIDIA A100 GPUs.
BioMedLM: a Domain-Specific Large Language Model for Biomedical Text
The Stanford Center for Research on Foundation Models (CRFM) and MosaicML announce the release of BioMedLM, a purpose-built AI model trained to interpret biomedical language. Editorial update: this blog post was revised on 1/30/2023 to reflect name change from PubMed GPT.
Training Stable Diffusion from Scratch Costs <$160k
We wanted to know how much time (and money) it would cost to train a Stable Diffusion model from scratch using our Streaming datasets, Composer, and MosaicML platform. Our results: it would take us 79,000 A100-hours in 13 days, for a total training cost of less than $160,000. Our tooling not only reduces time and cost by 2.5x, but it is also extensible and simple to use.
Mosaic LLMs (Part 2): GPT-3 quality for <$500k
Training large language models (LLMs) costs less than you think. Using the MosaicML platform, we show how fast, cheap, and easy it is to train these models at scale (1B -> 70B parameters). With new training recipes and infrastructure designed for large workloads, we enable you to train LLMs while maintaining total customizability over your model and dataset.
Blazingly Fast Computer Vision Training with the Mosaic ResNet and Composer
Match benchmark accuracy on ImageNet (He et al., 2015) in 27 minutes, a 7x speedup (ResNet-50 on 8xA100s). Reach higher levels of accuracy up to 3.8x faster than existing state of the art (Wightman et al., 2021). Try it out in Composer, our open-source library for efficient neural network training. It’s written in standard, easy-to-use PyTorch, so modify it to suit your needs and build on it!
5x Faster Image Segmentation Training with MosaicML Recipes
Can’t stop, won’t stop. Earlier this year, we shared a new baseline for semantic segmentation (basically, classifying an image at the pixel level) using DeepLabv3+ model architecture on the ADE20k dataset. Now, we’re introducing recipes for training semantic segmentation models that either reduce time-to-train by up to 5.4x or improve quality by up to +4.6 mIoU. If you want to train your segmentation models on the best ML training platform available, learn more at mosaicml.com/cloud
MosaicBERT: Pretraining BERT from Scratch for $20
With the MosaicBERT architecture + training recipe, you can now pretrain a competitive BERT-Base model from scratch on the MosaicML platform for $20. We’ve released the pretraining and finetuning code, as well as the pretrained weights.
Training Stable Diffusion from Scratch for <$50k with MosaicML (Part 2)
We've replicated Stable Diffusion 2 for less than $50k, and we've open-sourced the training code so you can too! This is a 3x cost reduction from our last blog post and an 8x reduction from the original Stable Diffusion 2, making training large-scale diffusion models from scratch more accessible than ever before.
How We Trained Stable Diffusion for Less than $50k (Part 3)
In our previous blog post, we showed how we used the MosaicML platform, Streaming datasets, and the Composer library to train a Stable Diffusion model from scratch for less than $50,000. Now, we do a deep dive into the technical details behind this speedup, demonstrating how we were able to replicate the Stable Diffusion 2 base model in just 6.8 days.
Behind the Scenes: Setting a Baseline for Image Segmentation Speedups
We establish a new semantic segmentation baseline of 45.56 mIoU on the ADE20k segmentation benchmark in 3.5 hours on a system with 8x NVIDIA A100 GPUs.
Announcing MPT-7B-8K: 8K Context Length for Document Understanding
Today, we are releasing MPT-7B-8K, a 7B parameter open-source LLM with 8k context length trained with the MosaicML platform. MPT-7B-8K was pretrained starting from the MPT-7B checkpoint in 3 days on 256 NVIDIA H100s with an additional 500B tokens of data.
Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
Introducing MPT-7B, the first entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and matches the quality of LLaMA-7B. MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k.
Mosaic ResNet Deep Dive
TL;DR: We recently released a set of recipes which can accelerate training of a ResNet-50 on ImageNet by up to 7x over standard baselines. In this report we take a deep dive into the technical details of our work and share the insights we gained about optimizing the efficiency of model training over a broad range of compute budgets.
Composer + FFCV: Faster Together
Composer is pushing the envelope on speed and efficiency in model training. Integrating Composer with FFCV, a fast dataloading library from Aleks Madry’s lab at MIT, unlocks new speedup methods by eliminating the dataloader bottleneck often experienced when using CPU-intensive operations in the training loop. The FFCV dataloader is one of the ingredients of our Mosaic ResNet recipe, which demonstrates how algorithmic efficiency can dramatically speed up model training.
We have even more exciting things in the works. Get early access to our technology preview
By clicking Sign Up above, you consent to allow Mosaic ML, Inc. to store and process the personal information submitted above to provide you the content requested.