Edit model card

Meta-Llama-3.1-8B-Instruct-TurboMind-AWQ-4bit

Overview

This repository contains a 4-bit AWQ version of Meta-Llama-3.1-8B-Instruct, optimized for the LMDeploy TurboMindEngine. The model is designed to provide efficient and accurate performance with reduced computational requirements.

Model Details

  • Model Name: Meta-Llama-3.1-8B-Instruct-TurboMind-AWQ-4bit
  • Base Model: meta-llama/Meta-Llama-3.1-8B-Instruct
  • Quantization: 4-bit AWQ
  • Engine: LMDeploy TurboMindEngine
lmdeploy lite auto_awq \
   $HF_MODEL \
  --calib-dataset 'ptb' \
  --calib-samples 128 \
  --calib-seqlen 2048 \
  --w-bits 4 \
  --w-group-size 128 \
  --batch-size 10 \
  --search-scale True \
  --work-dir $WORK_DIR
Downloads last month
7
Inference API
Unable to determine this model's library. Check the docs .

Model tree for Aaron2599/Meta-Llama-3.1-8B-Instruct-TurboMind-AWQ-4bit

Quantized
this model