Llama2-70B-Chat is now available on MosaicML Inference

MosaicML is now part of Databricks

Introducing MPT-30B, the latest addition to the MosaicML Foundation Series of Models.

MosaicML Training is where the magic happens. Build models like MPT-30B, the latest addition to the MosaicML Foundation Series.

Our code, your data

Complete ownership of your AI. 
  • Train multi-billion-parameter models in days, not weeks
  • Maintain complete data privacy and full model ownership
  • Ensure compliance with regulatory requirements
  • Stream your data directly to and from your existing cloud storage

Our platform

Scale and reliability baked in.
  • Immediate access to 1000s of A100s and H100s at competitive rates
  • Flexible pay-what-you-use consumption and contracting model
  • A highly efficient training stack that enables you to seamlessly scale and execute multi-node training
  • Automatic resumption from node failures and loss spikes. No need to babysit model training
"Using the MosaicML platform, we were able to train and deploy our LLM with our own data within a week and achieve leading results."


Amjad Masad, CEO

"MosaicML was able to abstract away all the complexity of distributed model training."


Tony Francis, CEO

“MosaicML has helped us make the training of our large models so much faster.”

Twelve Labs

Aiden Lee, Co-Founder & CTO


Train multi-billion-parameter models in hours, not days. Efficient scaling for large (>70B parameter) models.


Train 2x-7x faster, without changing your code. Our software automatically applies the latest optimizations.


No vendor lock-in. Automatic orchestration across 1000s of GPUs. Escape data gravity with our StreamingDatset.

Complete Control

Train advanced LLMs and generative AI models with complete data privacy and full model ownership.

Effortless Scale

Train Large Language Models (LLMs) at scale with a single command. Just point to your S3 bucket and we take care of the rest: launching, monitoring, auto-recovery.

Automated Performance

Stay on the bleeding edge of efficiency. Our performance gurus continually add the latest optimizations into our platform.

Designed for Large Models

Organizations like Replit and Stanford's Center for Research on Foundation Models (CRFM) are training GPT models on specialized datasets with MosaicML. We built features to address the pain points for training LLMs and other generative models.

Training large models is expensive. Extracting performance requires tuning everything from the network interconnects to GPU parallelism strategies to software frameworks. With our optimized platform, you can skip the setup and get training right the first time.
Learn more

⏯ AutoRecovery

Automatic resumption from node failures and loss spikes. No need to babysit LLM training. We monitor and restart from previous checkpoints.


Train any size model on any hardware, without tedious settings trial-and-error. We dynamically adjust memory usage on-the-fly to prevent OOM.

🚀 Efficient

40%+ utilization out of the box with our tuned parallelism settings across model and compute scales.

🔀 Stream

Stream datasets from anywhere quickly and accurately. Resume from checkpoints instantly, no need to wait an hour for dataloader spinning.

Rich Python SDK

Build custom workflows and tooling on top of the MosaicML platform with our comprehensive python SDK. We support integrations with your favorite MLOps tools. Automatically package and submit local files with a few lines of code.