KTransformers

Getting Started

This guide will help you get up and running with KTransformers in just a few minutes.

Prerequisites

Before you begin, make sure you have:

  • Python 3.9 or higher
  • CUDA 11.8+ (for GPU acceleration)
  • At least 16GB RAM (256GB+ recommended for large models)

Installation

pip install ktransformers

Your First Model

Let's run a simple inference:

from ktransformers import AutoModel

# Load a model
model = AutoModel.from_pretrained(
    "deepseek-ai/DeepSeek-R1-671B",
    device_map="auto",
    ktransformers_config="./config.yaml"
)

# Generate text
response = model.generate(
    "Explain quantum computing in simple terms",
    max_new_tokens=512
)

print(response)

Configuration

Create a config.yaml file to customize inference:

backend: torch
quantization: Q4_K_M
offload:
  enabled: true
  ratio: 0.8

Next Steps