KTransformers - Flexible LLM Inference Framework

We're excited to launch the official KTransformers blog! This is where we'll share:

What to Expect

Product Updates: Stay informed about new releases, features, and improvements
Tutorials: Step-by-step guides to help you get the most out of KTransformers
Performance Tips: Best practices for optimizing inference on your hardware
Community Highlights: Showcasing amazing projects built with KTransformers

Getting Started

If you're new to KTransformers, check out our documentation to get started. You can run DeepSeek-R1-671B on a single RTX 4090 with our optimized MoE offloading.

from ktransformers import AutoModel

model = AutoModel.from_pretrained(
    "deepseek-ai/DeepSeek-R1-671B",
    device_map="auto",
    ktransformers_config="./config.yaml"
)

Join the Community

We'd love to hear from you! Join our community:

GitHub - Star us and contribute
Submit Benchmarks - Share your performance results

Stay tuned for more updates!