
Mastering Long Context AI through MiniMax-01
MiniMax-01 achieves up to 4M tokens with lightning attention and MoE, setting new standards for
MiniMax-01 achieves up to 4M tokens with lightning attention and MoE, setting new standards for
Constitutional Classifiers provide a robust framework to defend LLMs against universal jailbreaks, leveraging adaptive filtering
Author(s): Mohamed Azharudeen M, Balaji Dhamodharan
Attention-Based Distillation efficiently compresses large language models by aligning attention patterns between teacher and student.