Welcome to KTransformers

KTransformers is a flexible, Python-centric framework designed to enhance your experience with advanced LLM inference optimizations. Built with researchers and developers in mind, it allows you to run large language models efficiently on consumer hardware.

Key Features

Heterogeneous Computing: Leverage CPU, GPU, and other accelerators together for optimal performance
MoE Offloading: Run massive Mixture-of-Experts models like DeepSeek-R1-671B on a single GPU
Flexible Configuration: Fine-tune every aspect through YAML configuration files
Python-Centric: Easy to understand, modify, and extend

Quick Start

Install KTransformers via pip:

pip install ktransformers

Basic usage:

from ktransformers import AutoModel

model = AutoModel.from_pretrained(
    "deepseek-ai/DeepSeek-R1-671B",
    device_map="auto"
)

output = model.generate("Hello, world!")

What's Next?

Getting Started - Step-by-step introduction
Installation - Detailed setup instructions
Configuration - Learn how to optimize for your hardware
API Reference - Complete API documentation