Skip to content

v0.4.0

Compare
Choose a tag to compare
@github-actions github-actions released this 11 Jul 21:52
· 550 commits to main since this release

What's new

Added 🎉

  • Added clipping fix to Optimizer class to make it work with FSDP no_shard and DDP.
  • Added tests to compare grad norm differences between torch optimizer and clipping and OLMo optimizer and clipping on both CPU and GPU.
  • Expose memmap dtype in data config
  • Added support for DDP training.
  • Added caching to disk of HF datasets used in downstream evals
  • Added FLOPs logging
  • Added configs for OLMo tiny set of models
  • Added configuration field optimizer.record_update_metrics, which defaults to False, but when set to True will trigger AdamW to collect the step size norm and absolute max for each parameter.
  • Added configuration field optimizer.selective_updates, which defaults to False, but when set to True will tell the optimizer to skip updating the parameter and state when the corresponding gradient is 0.
  • Added configuration field optimizer.record_update_metrics, which defaults to False, but when set to True will trigger AdamW to collect the step size norm and absolute max for each parameter.
  • Added olmo_data, a package holding data files like tokenizers.
  • Added ability to load tokenizers from olmo_data package data.

Changed ⚠️

  • Added original legacy unsharding implementation back, as the default. The new
    shared memory implementation can be used by passing use_legacy_shared_mem_impl to unshard.py.
  • Refactor weight initialization. IMPORTANT: this does not maintain backwards-compatibility with older configs; the jobs will still run, but may produce different outputs.
  • Changed the behavior of the Lion optimizer to only record the update cosine similarity when optimizer.record_update_metrics is True in order to be consistent with the API.
  • Added HF datasets into olmo_data, and changed downstream eval to load from the package.

Fixed ✅

  • Changed from ignored_index to ignore_index for cross_entropy_loss when flash-attn>=2.5.8.
  • Make hf_olmo support AutoModelForCasualLM and similar HF methods again.

Commits

d423c11 Merge pull request #652 from allenai/shanea/update-to-torch2.3
b10ab4b Merge pull request #651 from allenai/shanea/lumi-torch2.3-2
a101b31 Merge pull request #646 from allenai/shanea/hf-datasets-from-package
429a752 Merge pull request #647 from allenai/shanea/fix-tokenizer-break
bc60b8a Add option to skip optim steps for 0 grad params (#636)
cbc7c25 Merge pull request #645 from allenai/shanea/tokenizer-package-data
1b2658b Add option to record step size metrics from AdamW (#605)
a3e2ea7 multiple epoch fix
a1f118a Merge pull request #628 from allenai/olmo-tiny
d7994c8 Fix Z-loss calculation (#634)
a5539f4 Merge pull request #631 from allenai/shanea/hf-olmo-auto-model
d72a262 Merge pull request #626 from allenai/shanea/inspect-train-data-improvements
2417b11 Make olmo-core checkpointer more robust on weka (#624)
ddc8847 Merge pull request #612 from allenai/ddp
41ed20a Merge pull request #623 from allenai/shanea/hf-save-to-disk-2
a33caa9 Merge pull request #604 from allenai/WandbDiff
e5d63a3 Merge pull request #619 from allenai/shanea/add-olmo-1.7-7b-checkpoints
e207df7 Officially add OLMo-core as a dependency (#615)
72159ae Merge pull request #614 from allenai/shanea/pass-include-instance-metadata
c2cedbc Merge pull request #607 from allenai/rewrite-init
578234d Merge pull request #611 from allenai/shanea/hf-get-tokenizer-from-config-2
de43ee8 Merge pull request #610 from allenai/shanea/hf-get-tokenizer-from-config
2639279 Merge pull request #594 from NeuralFabricAI/lx/expose-data-dtype
9e89408 Create sensible filenames
02a8a58 Merge pull request #603 from allenai/shanea/unshard-without-passing-type
ae84d47 Merge pull request #602 from allenai/no_shard_ddp_clip
40210bb Merge pull request #599 from allenai/train-olmo-large
55c1e2f Merge pull request #601 from allenai/no_shard_ddp_clip
5789cfe Merge pull request #593 from allenai/shanea/inspect-train-data-no-indices
eafd154 Merge pull request #579 from MLgdg/main
652c745 Merge pull request #590 from allenai/shanea/update-readme-to-olmo-1.7
8ec2809 Merge pull request #589 from allenai/shanea/update-main-readme-hf
6e714b8 Merge pull request #588 from allenai/shanea/hf-olmo-docs-auto-methods
65d5575 Merge pull request #587 from allenai/shanea/storage-cleaner-improvemnts
0bddfe0 Merge pull request #585 from allenai/shanea/add-hf-docs
e6430a0 Merge pull request #582 from allenai/shanea/hybrid-shard-as-no-shard
c29787a Merge pull request #569 from allenai/Muennighoff/fix-torchv
7a462c5 Merge pull request #580 from allenai/shanea/update-ignore-index-kwarg
4f917fb Merge pull request #575 from allenai/shanea/add-weka
5c721cc Fix GPU tests CI (#574)
467adcc Merge remote-tracking branch 'origin/train-olmo-large'
4b2d12e Merge pull request #565 from allenai/readme
ccc49fd Merge pull request #564 from allenai/shanea/add-new-hf-converter
b17abd0 Merge pull request #512 from liaoleo/main
295d309 Merge pull request #561 from allenai/shanea/delay-device-mesh-import
4e8746d Merge pull request #562 from allenai/shanea/re-add-easy-legacy-unshard-impl
f38de95 Merge pull request #558 from allenai/shanea/release-v0.3.0
829f1d6 Merge pull request #520 from allenai/add-ce-loss-metric