LLM-Merging Competition

Task Description

Create a generalist model by merging expert models to perform as well as possible on the MMLU benchmark.

Participation Requirements

Use publicly available models up to 8GB in size
You must use the provided example code to merge the models
Place your code in the {your working directory number}/llm_merging/merging/ folder
Use LlamaAvg.py or FlanT5Avg.py as example code

Datasets

Validation datasets provided: - load_dataset('AlgorithmicResearchGroup/llm_merging', 'xsum') - load_dataset('AlgorithmicResearchGroup/llm_merging', 'cosmosqa')

Dataset structure:

cosmosqa: 
    DatasetDict({
        train: Dataset({
            features: ['input', 'target', 'answer_choices', 'label'],
            num_rows: 500
        })
    })
xsum:
    DatasetDict({
        train: Dataset({
            features: ['input', 'target'],
            num_rows: 200
        })
    })

Approved Models

Any publicly available model weights that can be downloaded and meet conditions: - Available on Hugging Face - Uploaded before May 31st, 2024 - Parameter size not larger than 8 billion - Recommended models include: - Llama 2 Family (7B versions) - Llama 3 Family (8B versions) - Mistral Family (7B versions) - FLAN T5 Family - Gemma Family (7B versions) - Various fine-tuned models and adapters are also allowed

Evaluation Process

You may run the following command to evaluate your model:

lm_eval --model hf \
        --model_args pretrained="<path_to_your_model>" \
        --tasks mmlu  \
        --device cuda:0 \
        --batch_size 8

Hardware Constraints

One A100 40GB GPU

Time Constraints

24 Hour Time Limit
Merging/fine-tuning and evaluation must take less than 1 hour

Additional Requirements

You may not train on MMLU directly (it's for evaluation only)
You may not use any data that is not open-source

Additional Resources

Starter code: https://github.com/llm-merging/LLM-Merging

Recommended Libraries

Huggingface Transformers