Skip to content
LLM Compressor Docs
Qwen3
Initializing search
GitHub
LLM Compressor Docs
GitHub
Home
Why use LLM Compressor?
Compresssing your model, step-by-step
Compresssing your model, step-by-step
Choosing your model
Choosing the right compression scheme
Choosing the right compression algorithm
Choosing a dataset
Compressing your model
Deploying with vLLM
Getting started
Getting started
Installing LLM Compressor
Key Models
Key Models
Llama 4
Llama 4
FP8 Example
Qwen3
Qwen3
FP8 Example
Kimi-K2
Kimi-K2
FP8 Example
Mistral Large 3
Mistral Large 3
FP8 Example
Guides
Guides
Compression Schemes
Saving a Model
Observers
Memory Requirements
Runtime Performance
Examples
Examples
AutoRound Quantization
Quantizing Models with Activation-Aware Quantization (AWQ)
Big Modeling with Sequential Onloading
Quantizing models without a model definition
Quantizing Multimodal Audio Models
Quantizing Multimodal Vision-Language Models
int4 Weight Quantization of a 2:4 Sparse Model
fp8 Weight, Activation, and KV Cache Quantization
Non-uniform Quantization
fp4 Quantization
int4 Weight Quantization
fp8 Weight and Activation Quantization
int8 Weight and Activation Quantization
Quantizing Mixtral-8x7B-Instruct-v0.1 Model with FP8
Applying 2:4 Sparsity with Optional FP8 Quantization
Applying Transforms to Improve Quantization Accuracy
Developer
Developer
LLM Compressor Code of Conduct
Contributing to LLM Compressor
API Reference
API Reference
llmcompressor
llmcompressor
logger
sentinel
args
args
dataset_arguments
model_arguments
recipe_arguments
utils
core
core
helpers
lifecycle
model_layer
session
session_functions
state
events
events
event
datasets
datasets
utils
entrypoints
entrypoints
oneshot
utils
model_free
model_free
helpers
lifecycle
microscale
model_utils
process
reindex_fused_weights
save_utils
validate
metrics
metrics
logger
utils
utils
frequency_manager
modeling
modeling
deepseek_v3
fuse
glm4_moe
gpt_oss
granite4
llama4
moe_context
qwen3_moe
qwen3_next_moe
qwen3_vl_moe
modifiers
modifiers
factory
interface
modifier
autoround
autoround
base
awq
awq
base
mappings
experimental
logarithmic_equalization
logarithmic_equalization
base
obcq
obcq
sgpt_base
pruning
pruning
helpers
constant
constant
base
magnitude
magnitude
base
sparsegpt
sparsegpt
base
sgpt_base
sgpt_sparsify
utils
utils
pytorch
pytorch
layer_mask
mask_factory
wanda
wanda
base
wanda_sparsify
quantization
quantization
calibration
group_size_validation
gptq
gptq
base
gptq_quantize
quantization
quantization
base
mixin
smoothquant
smoothquant
base
utils
transform
transform
quip
quip
base
smoothquant
smoothquant
base
utils
spinquant
spinquant
base
mappings
norm_mappings
utils
utils
constants
helpers
hooks
pytorch_helpers
observers
observers
base
helpers
min_max
moving_base
mse
pipelines
pipelines
cache
registry
basic
basic
pipeline
data_free
data_free
pipeline
independent
independent
pipeline
sequential
sequential
ast_helpers
helpers
pipeline
transformers_helpers
ast_utils
ast_utils
auto_wrapper
control_flow_analyzer
name_analyzer
pytorch
pytorch
model_load
model_load
helpers
utils
utils
helpers
sparsification
sparsification_info
sparsification_info
configs
helpers
module_sparsification_info
recipe
recipe
metadata
recipe
utils
transformers
transformers
compression
compression
compressed_tensors_utils
helpers
sparsity_metadata_config
data
data
base
c4
cnn_dailymail
custom
data_helpers
evolcodealpaca
flickr_30k
gsm8k
open_platypus
peoples_speech
ultrachat_200k
wikitext
tracing
tracing
debug
utils
utils
helpers
preprocessing_functions
utils
utils
dev
dist
helpers
metric_logging
transformers
pytorch
pytorch
module
utils
FAQ
Home
Key Models
Qwen3
Quantization examples for the Qwen3-VL MoE vision-language model.
FP8 Example
Back to top