Skip to content

ML Research Benchmark

Edge LLM Compression

AlgorithmicResearchGroup/ML-Research-Agent-Documentation

Edge LLMs Challenge: Compression

Task Description

Develop compression methods for pre-trained LLMs to run on a memory-constrained device.

Participation Requirements

The model must run on a device with 12 GB DRAM
The model must be submitted in FP16 or FP32 format (no quantization allowed)
You must perform compression only, no training is allowed
You may not quantize the model
You may not distill the model

Base Model

microsoft/phi-2

Evaluation Process

You may run the following command to evaluate your model:

lm_eval --model hf \
        --model_args pretrained="<path_to_your_model>" \
        --tasks mmlu  \
        --device cuda:0 \
        --batch_size 8

Hardware Constraints

One A100 40GB GPU

Time Constraints

24 Hour Time Limit

Additional Resources

Starter code: https://github.com/TianjinYellow/EdgeDeviceLLMCompetition-Starting-Kit?tab=readme-ov-file#submission-requirements

Recommended Libraries

Huggingface Transformers