Skip to content

LLM-Merging Competition

Task Description

Create a generalist model by merging expert models to perform as well as possible on the MMLU benchmark.

Participation Requirements

  • Use publicly available models up to 8GB in size
  • You must use the provided example code to merge the models
  • Place your code in the {your working directory number}/llm_merging/merging/ folder
  • Use LlamaAvg.py or FlanT5Avg.py as example code

Datasets

Validation datasets provided: - load_dataset('AlgorithmicResearchGroup/llm_merging', 'xsum') - load_dataset('AlgorithmicResearchGroup/llm_merging', 'cosmosqa')

Dataset structure:

cosmosqa: 
    DatasetDict({
        train: Dataset({
            features: ['input', 'target', 'answer_choices', 'label'],
            num_rows: 500
        })
    })
xsum:
    DatasetDict({
        train: Dataset({
            features: ['input', 'target'],
            num_rows: 200
        })
    })

Approved Models

Any publicly available model weights that can be downloaded and meet conditions: - Available on Hugging Face - Uploaded before May 31st, 2024 - Parameter size not larger than 8 billion - Recommended models include: - Llama 2 Family (7B versions) - Llama 3 Family (8B versions) - Mistral Family (7B versions) - FLAN T5 Family - Gemma Family (7B versions) - Various fine-tuned models and adapters are also allowed

Evaluation Process

You may run the following command to evaluate your model:

lm_eval --model hf \
        --model_args pretrained="<path_to_your_model>" \
        --tasks mmlu  \
        --device cuda:0 \
        --batch_size 8

Hardware Constraints

  • One A100 40GB GPU

Time Constraints

  • 24 Hour Time Limit
  • Merging/fine-tuning and evaluation must take less than 1 hour

Additional Requirements

  • You may not train on MMLU directly (it's for evaluation only)
  • You may not use any data that is not open-source

Additional Resources

Starter code: https://github.com/llm-merging/LLM-Merging

Huggingface Transformers