Issue: Date: 2026-01-29T07:52:36-08:00



Summary

Yes, Kimi K2.5 is open source and was released by Moonshot AI in January 2026. It is a 1 trillion parameter mixture-of-experts multimodal model available on Hugging Face. However, running it locally requires substantial hardware resources that exceed what a Mac Studio M3 Ultra can practically handle.


Key Findings

Open Source Status

  • Yes, fully open source - Released by Moonshot AI and available on Hugging Face
  • Code and model weights are publicly available
  • Licensed under open-source terms, enabling local deployment and customization

Model Parameters & Architecture

  • 1 Trillion total parameters with 32 billion activated per token
  • Mixture-of-Experts (MoE) architecture with 384 experts (8 selected per token)
  • 61 transformer layers with 7,168 attention hidden dimension
  • Vision encoder: MoonViT with 400M parameters (supports images, video, PDFs)
  • Context window: 256K tokens
  • Native INT4 quantization support

System Requirements

Memory Requirements:

  • Full precision (FP16): ~2TB
  • Quantized (INT4): ~500GB minimum
  • Recommended minimum: 240GB unified memory for reasonable performance

Supported Inference Engines:

  • vLLM
  • SGLang
  • KTransformers
  • MLX (for Apple Silicon)

Mac Studio M3 Ultra Performance Reality

Critical Limitation: Mac Studio M3 Ultra with 256GB memory is insufficient for optimal performance:

  • Max unified memory: 192GB (below the 240GB recommended)
  • Expected performance: ~21 tokens/second (very slow) if you could run it
  • Practical requirement: Would need 2× Mac Studio M3 Ultra systems clustered together (512GB total)
  • Single M3 Ultra: Cannot adequately run the full model due to memory bottleneck

Practical Recommendation

For Mac Studio M3 Ultra users, the viable options are:

  1. Use the API - Access via Moonshot’s platform at $0.60/M input tokens (much more practical)
  2. Cluster multiple Macs - Requires 2+ Mac Studio systems with external high-bandwidth interconnect
  3. Use quantized versions - MLX-optimized INT4 quantizations available, but still challenging on single 256GB system

Sources