LLM-Merging Competition
Task Description
Create a generalist model by merging expert models to perform as well as possible on the MMLU benchmark.
Participation Requirements
- Use publicly available models up to 8GB in size
- You must use the provided example code to merge the models
- Place your code in the {your working directory number}/llm_merging/merging/ folder
- Use LlamaAvg.py or FlanT5Avg.py as example code
Datasets
Validation datasets provided: - load_dataset('AlgorithmicResearchGroup/llm_merging', 'xsum') - load_dataset('AlgorithmicResearchGroup/llm_merging', 'cosmosqa')
Dataset structure:
cosmosqa:
DatasetDict({
train: Dataset({
features: ['input', 'target', 'answer_choices', 'label'],
num_rows: 500
})
})
xsum:
DatasetDict({
train: Dataset({
features: ['input', 'target'],
num_rows: 200
})
})
Approved Models
Any publicly available model weights that can be downloaded and meet conditions: - Available on Hugging Face - Uploaded before May 31st, 2024 - Parameter size not larger than 8 billion - Recommended models include: - Llama 2 Family (7B versions) - Llama 3 Family (8B versions) - Mistral Family (7B versions) - FLAN T5 Family - Gemma Family (7B versions) - Various fine-tuned models and adapters are also allowed
Evaluation Process
You may run the following command to evaluate your model:
lm_eval --model hf \
--model_args pretrained="<path_to_your_model>" \
--tasks mmlu \
--device cuda:0 \
--batch_size 8
Hardware Constraints
- One A100 40GB GPU
Time Constraints
- 24 Hour Time Limit
- Merging/fine-tuning and evaluation must take less than 1 hour
Additional Requirements
- You may not train on MMLU directly (it's for evaluation only)
- You may not use any data that is not open-source
Additional Resources
Starter code: https://github.com/llm-merging/LLM-Merging
Recommended Libraries
Huggingface Transformers