KTransformers

Welcome to KTransformers

KTransformers is a flexible, Python-centric framework designed to enhance your experience with advanced LLM inference optimizations. Built with researchers and developers in mind, it allows you to run large language models efficiently on consumer hardware.

Key Features

  • Heterogeneous Computing: Leverage CPU, GPU, and other accelerators together for optimal performance
  • MoE Offloading: Run massive Mixture-of-Experts models like DeepSeek-R1-671B on a single GPU
  • Flexible Configuration: Fine-tune every aspect through YAML configuration files
  • Python-Centric: Easy to understand, modify, and extend

Quick Start

Install KTransformers via pip:

pip install ktransformers

Basic usage:

from ktransformers import AutoModel

model = AutoModel.from_pretrained(
    "deepseek-ai/DeepSeek-R1-671B",
    device_map="auto"
)

output = model.generate("Hello, world!")

What's Next?