Skip to content

Edge LLMs Challenge: Compression

Task Description

Develop compression methods for pre-trained LLMs to run on a memory-constrained device.

Participation Requirements

  • The model must run on a device with 12 GB DRAM
  • The model must be submitted in FP16 or FP32 format (no quantization allowed)
  • You must perform compression only, no training is allowed
  • You may not quantize the model
  • You may not distill the model

Base Model

microsoft/phi-2

Evaluation Process

You may run the following command to evaluate your model:

lm_eval --model hf \
        --model_args pretrained="<path_to_your_model>" \
        --tasks mmlu  \
        --device cuda:0 \
        --batch_size 8

Hardware Constraints

  • One A100 40GB GPU

Time Constraints

  • 24 Hour Time Limit

Additional Resources

Starter code: https://github.com/TianjinYellow/EdgeDeviceLLMCompetition-Starting-Kit?tab=readme-ov-file#submission-requirements

Huggingface Transformers