Skip to content

LLM Compressor Docs

Qwen3

Initializing search

GitHub

LLM Compressor Docs

GitHub

Home
Why use LLM Compressor?
Compresssing your model, step-by-step
Compresssing your model, step-by-step
Deploying with vLLM
Getting started
Getting started
- Installing LLM Compressor
Key Models
Key Models
- Llama 4
  Llama 4
  - FP8 Example
- Qwen3
  Qwen3
  - FP8 Example
- Kimi-K2
  Kimi-K2
  - FP8 Example
- Mistral Large 3
  Mistral Large 3
  - FP8 Example
Guides
Guides
Examples
Examples
Developer
Developer
- LLM Compressor Code of Conduct
- Contributing to LLM Compressor
API Reference
API Reference
- llmcompressor
  llmcompressor
  - logger
  - sentinel
  - args
    
    args
    
    dataset_arguments
    
    model_arguments
    
    recipe_arguments
    
    utils
  - core
    
    core
    
    helpers
    
    lifecycle
    
    model_layer
    
    session
    
    session_functions
    
    state
    
    events
    
    events
    
    event
  - datasets
    
    datasets
    
    utils
  - entrypoints
    
    entrypoints
    
    oneshot
    
    utils
    
    model_free
    
    model_free
    
    helpers
    
    lifecycle
    
    microscale
    
    model_utils
    
    process
    
    reindex_fused_weights
    
    save_utils
    
    validate
  - metrics
    
    metrics
    
    logger
    
    utils
    
    utils
    
    frequency_manager
  - modeling
    
    modeling
    
    deepseek_v3
    
    fuse
    
    glm4_moe
    
    gpt_oss
    
    granite4
    
    llama4
    
    moe_context
    
    qwen3_moe
    
    qwen3_next_moe
    
    qwen3_vl_moe
  - modifiers
    
    modifiers
    
    factory
    
    interface
    
    modifier
    
    autoround
    
    autoround
    
    base
    
    awq
    
    awq
    
    base
    
    mappings
    
    experimental
    
    logarithmic_equalization
    
    logarithmic_equalization
    
    base
    
    obcq
    
    obcq
    
    sgpt_base
    
    pruning
    
    pruning
    
    helpers
    
    constant
    
    constant
    
    base
    
    magnitude
    
    magnitude
    
    base
    
    sparsegpt
    
    sparsegpt
    
    base
    
    sgpt_base
    
    sgpt_sparsify
    
    utils
    
    utils
    
    pytorch
    
    pytorch
    
    layer_mask
    
    mask_factory
    
    wanda
    
    wanda
    
    base
    
    wanda_sparsify
    
    quantization
    
    quantization
    
    calibration
    
    group_size_validation
    
    gptq
    
    gptq
    
    base
    
    gptq_quantize
    
    quantization
    
    quantization
    
    base
    
    mixin
    
    smoothquant
    
    smoothquant
    
    base
    
    utils
    
    transform
    
    transform
    
    quip
    
    quip
    
    base
    
    smoothquant
    
    smoothquant
    
    base
    
    utils
    
    spinquant
    
    spinquant
    
    base
    
    mappings
    
    norm_mappings
    
    utils
    
    utils
    
    constants
    
    helpers
    
    hooks
    
    pytorch_helpers
  - observers
    
    observers
    
    base
    
    helpers
    
    min_max
    
    moving_base
    
    mse
  - pipelines
    
    pipelines
    
    cache
    
    registry
    
    basic
    
    basic
    
    pipeline
    
    data_free
    
    data_free
    
    pipeline
    
    independent
    
    independent
    
    pipeline
    
    sequential
    
    sequential
    
    ast_helpers
    
    helpers
    
    pipeline
    
    transformers_helpers
    
    ast_utils
    
    ast_utils
    
    auto_wrapper
    
    control_flow_analyzer
    
    name_analyzer
  - pytorch
    
    pytorch
    
    model_load
    
    model_load
    
    helpers
    
    utils
    
    utils
    
    helpers
    
    sparsification
    
    sparsification_info
    
    sparsification_info
    
    configs
    
    helpers
    
    module_sparsification_info
  - recipe
    
    recipe
    
    metadata
    
    recipe
    
    utils
  - transformers
    
    transformers
    
    compression
    
    compression
    
    compressed_tensors_utils
    
    helpers
    
    sparsity_metadata_config
    
    data
    
    data
    
    base
    
    c4
    
    cnn_dailymail
    
    custom
    
    data_helpers
    
    evolcodealpaca
    
    flickr_30k
    
    gsm8k
    
    open_platypus
    
    peoples_speech
    
    ultrachat_200k
    
    wikitext
    
    tracing
    
    tracing
    
    debug
    
    utils
    
    utils
    
    helpers
    
    preprocessing_functions
  - utils
    
    utils
    
    dev
    
    dist
    
    helpers
    
    metric_logging
    
    transformers
    
    pytorch
    
    pytorch
    
    module
    
    utils
FAQ

Home
Key Models

Qwen3

Quantization examples for the Qwen3-VL MoE vision-language model.

FP8 Example