quant dissable second stage compression!

#17
by zdxpan - opened
        new_layer = bnb.nn.Linear4bit(
            in_features,
            out_features,
            bias=has_bias,
            compute_dtype=bnb_4bit_compute_dtype,
            compress_statistics=False,  # control if quantinize as double quantization
            quant_type=quant_type
        )
        # quantize happens here
        new_layer.load_state_dict(child.state_dict())
        new_layer = new_layer.to(device)

see the comment compress_statistics. as False

from transformers import BitsAndBytesConfig

double_quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
)
i tried it for diffusers and it work ,, How i could make it for comfyui and stable "safetensores " could you help me ?

Sign up or log in to comment