Welcome to KTransformers
KTransformers is a flexible, Python-centric framework designed to enhance your experience with advanced LLM inference optimizations. Built with researchers and developers in mind, it allows you to run large language models efficiently on consumer hardware.
Key Features
- Heterogeneous Computing: Leverage CPU, GPU, and other accelerators together for optimal performance
- MoE Offloading: Run massive Mixture-of-Experts models like DeepSeek-R1-671B on a single GPU
- Flexible Configuration: Fine-tune every aspect through YAML configuration files
- Python-Centric: Easy to understand, modify, and extend
Quick Start
Install KTransformers via pip:
pip install ktransformers
Basic usage:
from ktransformers import AutoModel
model = AutoModel.from_pretrained(
"deepseek-ai/DeepSeek-R1-671B",
device_map="auto"
)
output = model.generate("Hello, world!")
What's Next?
- Getting Started - Step-by-step introduction
- Installation - Detailed setup instructions
- Configuration - Learn how to optimize for your hardware
- API Reference - Complete API documentation