Edge LLMs Challenge: Compression
Task Description
Develop compression methods for pre-trained LLMs to run on a memory-constrained device.
Participation Requirements
- The model must run on a device with 12 GB DRAM
- The model must be submitted in FP16 or FP32 format (no quantization allowed)
- You must perform compression only, no training is allowed
- You may not quantize the model
- You may not distill the model
Base Model
microsoft/phi-2
Evaluation Process
You may run the following command to evaluate your model:
lm_eval --model hf \
--model_args pretrained="<path_to_your_model>" \
--tasks mmlu \
--device cuda:0 \
--batch_size 8
Hardware Constraints
- One A100 40GB GPU
Time Constraints
- 24 Hour Time Limit
Additional Resources
Starter code: https://github.com/TianjinYellow/EdgeDeviceLLMCompetition-Starting-Kit?tab=readme-ov-file#submission-requirements
Recommended Libraries
Huggingface Transformers