LLM Efficiency Challenge
Task Description
Train a model that performs as well as possible on a wide array of metrics, starting with an approved base model. Using only open-source data, fine-tune a model on an A100 40GB GPU.
Participation Requirements
- Use only open-source datasets
- Training run must complete in 24 hours
- You may not train on the MMLU benchmark directly (it's for evaluation only)
- You may not use any data that is not open-source
- Training should be done on a single GPU
Datasets
You are welcome to use any open-sourced dataset. Examples include: - Databricks-Dolly-15 - OpenAssistant Conversations Dataset (oasst1) - The Flan Collection - AllenAI Dolma - RedPajama-Data-1T - LIMA
Approved Base Models
- ALBERT
- BART
- BERT
- Bloom
- Cerebras (btlm, GPT)
- Colossal-LLaMA-2-7b-base
- DeBERTa
- DeciLM-6B
- DistilBERT
- Electra
- Falcon
- GPT2
- GPT Neo, J, NeoX, Pythia
- InternLM
- LLaMA or Llama 2
- Mistral
- MPT
- OpenLLaMA
- OPT
- Persimmon
- Qwen
- Red Pajama Base (not instruction tuned models)
- RoBERTa
- T5
- UL2
Evaluation Process
Evaluation will be done on a subset of the MMLU benchmark.
You may run the following command to evaluate your model:
python -m lm_eval --model hf \
--model_args pretrained=<path_to_your_model> \
--tasks mmlu \
--device cuda:0 \
--batch_size 8
Hardware Constraints
- One A100 40GB GPU
- 128GB of RAM
- 500GB of Disk
Time Constraints
- 24 Hour Time Limit
Additional Resources
Starter code: https://github.com/llm-efficiency-challenge/neurips_llm_efficiency_challenge