BabyLM Challenge (Strict-Small)
Task Description
Train a large language model on a pretraining corpus of approximately 10M words (Strict-Small). Optimize for performance on the BLiMP dataset.
Participation Requirements
- Use only the provided pretraining corpus
- You may not train on BLiMP directly (it's for evaluation only)
Dataset
You can load the dataset with:
Dataset structure:
DatasetDict({
train: Dataset({
features: ['filename', 'content'],
num_rows: 6
})
test: Dataset({
features: ['filename', 'content'],
num_rows: 6
})
dev: Dataset({
features: ['filename', 'content'],
num_rows: 6
})
})
Evaluation Process
Evaluation will be done on the BLiMP dataset.
You may run the following command to evaluate your model:
python -m lm_eval --model hf \
--model_args pretrained=<path_to_your_model> \
--tasks blimp \
--device cuda:0 \
--batch_size 8
Hardware Constraints
- One A100 40GB GPU
- 128GB of RAM
- 500GB of Disk
Time Constraints
- 24 Hour Time Limit
Additional Resources
Starter code: https://github.com/babylm/evaluation-pipeline-2024
Recommended Libraries
Huggingface Transformers