Skip to content

Releases: allenai/OLMo

v0.5.0

27 Aug 02:00
Compare
Choose a tag to compare

What's new

  • Fixed conversion to HuggingFace model for DDP-trained models.
  • Added support for remote source and destination for HuggingFace model conversion.

Added 🎉

  • Added support for document masking via flash-attn during training with --data.generate_doc_lengths.
  • Added config options for model.norm_after, model.scale_emb_init, and auxiliary_loss_multiplier (used with zloss).
  • Added scripts for running experiments on qk_norm, norm reordering, and zloss.
  • Added model.rope_theta configuration option.
  • Added model.embedding_layer_norm configuration option for adding a LN to the embeddings.
  • Added model.emb_init_std configuration option to override the standard deviation used to initialize the embeddings.
  • Added downstream eval task for requests dumped from oe-eval tasks
  • Added CosLinearEnvelope scheduler, which is a pointwise product of a cosine schedule and a linear decay.
  • Added ability to save outputs of submodules for debugging purposes.
  • Version dolma flan change in named_data_mix.py

Changed ⚠️

  • Changed default distributed training strategy from single-GPU to FSDP
  • Fixed behavior of effective_memmap_dtype to prevent unrecognized dtypes to be parsed as uint16.

Fixed ✅

  • Fixed restarting a training run in later epochs so that we no longer need to set the flag --epoch=INT.
  • Swapped in correct flan data mix.
  • Fix bug where the attention norm, when applied before the attention block, was modifying the residual stream.
  • Fixed OLMo.from_checkpoint() so that it correctly loads olmo_core and torch_new style checkpoints.
  • Fixed preserve_rng_state being incorrectly set to False when doing gradient checkpointing with dropout

Commits

cee1a5d Merge pull request #710 from allenai/version-dolma-flan-change
213a639 Merge pull request #711 from allenai/epwalsh/fix-unbound-qkv
4575d40 Fix Conversion Issues + add support for remote upload. (#694)
78d79a5 Merge pull request #709 from allenai/shanea/debugging-docs
9147889 Merge pull request #685 from allenai/ot-oe-eval-requests
6cdc4cc Merge pull request #698 from allenai/shanea/compare-model-state
e5217cf Merge pull request #705 from allenai/dave/checkpoint_style_naming
f4b386e Merge pull request #704 from allenai/shanea/fix-olmo-1.7-batch-size
1e71ce3 Merge pull request #547 from allenai/shanea/add-olmo-1.7-7b-to-readme
6c4d53f Merge pull request #702 from chrisc36/main
0bc7f6c Merge pull request #690 from allenai/shanea/trace-model-outputs-2
4332c32 Merge pull request #691 from allenai/dave/cosine_linear_envelope
6587ddb Merge pull request #674 from allenai/dave/flan_data_mix
7d63fe0 Merge pull request #671 from allenai/s3_unshard_to_hf
c322b9a Merge pull request #686 from allenai/fix-from-checkpoint
c482df7 Merge pull request #680 from allenai/shanea/fix-incorrect-attn-norm
3e30710 Merge pull request #629 from allenai/epwalsh/amberish
4e00460 Add support for document masking during training (#661)
b45002e make epoch logging less confusing
1b7d275 Fix restarts in later epochs (#670)
345edc6 Merge branch 'main' of https://github.com/allenai/LLM
66d2be7 Revert "Update Beaker image"
0757223 Merge pull request #649 from allenai/ModelLadder
90b3889 Merge pull request #660 from allenai/fix_convert_olmo_to_hf
dfb7212 Merge pull request #616 from allenai/chameleon
d627c94 Merge pull request #665 from allenai/ddp-ckpt-fix
ab63296 Improving memmap type parser (#663)
b55fb5f Merge pull request #662 from allenai/tiny-olmo-config-fix
56d1fe0 Merge pull request #657 from allenai/shanea/lumi-torch2.3-3
26c2d53 Merge pull request #648 from allenai/shanea/default-fsdp-strategy
65f1fff Merge pull request #656 from jeqcho/patch-1
20b82f8 Merge pull request #653 from allenai/shanea/olmo-v0.4.0

v0.4.0

11 Jul 21:52
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added clipping fix to Optimizer class to make it work with FSDP no_shard and DDP.
  • Added tests to compare grad norm differences between torch optimizer and clipping and OLMo optimizer and clipping on both CPU and GPU.
  • Expose memmap dtype in data config
  • Added support for DDP training.
  • Added caching to disk of HF datasets used in downstream evals
  • Added FLOPs logging
  • Added configs for OLMo tiny set of models
  • Added configuration field optimizer.record_update_metrics, which defaults to False, but when set to True will trigger AdamW to collect the step size norm and absolute max for each parameter.
  • Added configuration field optimizer.selective_updates, which defaults to False, but when set to True will tell the optimizer to skip updating the parameter and state when the corresponding gradient is 0.
  • Added configuration field optimizer.record_update_metrics, which defaults to False, but when set to True will trigger AdamW to collect the step size norm and absolute max for each parameter.
  • Added olmo_data, a package holding data files like tokenizers.
  • Added ability to load tokenizers from olmo_data package data.

Changed ⚠️

  • Added original legacy unsharding implementation back, as the default. The new
    shared memory implementation can be used by passing use_legacy_shared_mem_impl to unshard.py.
  • Refactor weight initialization. IMPORTANT: this does not maintain backwards-compatibility with older configs; the jobs will still run, but may produce different outputs.
  • Changed the behavior of the Lion optimizer to only record the update cosine similarity when optimizer.record_update_metrics is True in order to be consistent with the API.
  • Added HF datasets into olmo_data, and changed downstream eval to load from the package.

Fixed ✅

  • Changed from ignored_index to ignore_index for cross_entropy_loss when flash-attn>=2.5.8.
  • Make hf_olmo support AutoModelForCasualLM and similar HF methods again.

Commits

d423c11 Merge pull request #652 from allenai/shanea/update-to-torch2.3
b10ab4b Merge pull request #651 from allenai/shanea/lumi-torch2.3-2
a101b31 Merge pull request #646 from allenai/shanea/hf-datasets-from-package
429a752 Merge pull request #647 from allenai/shanea/fix-tokenizer-break
bc60b8a Add option to skip optim steps for 0 grad params (#636)
cbc7c25 Merge pull request #645 from allenai/shanea/tokenizer-package-data
1b2658b Add option to record step size metrics from AdamW (#605)
a3e2ea7 multiple epoch fix
a1f118a Merge pull request #628 from allenai/olmo-tiny
d7994c8 Fix Z-loss calculation (#634)
a5539f4 Merge pull request #631 from allenai/shanea/hf-olmo-auto-model
d72a262 Merge pull request #626 from allenai/shanea/inspect-train-data-improvements
2417b11 Make olmo-core checkpointer more robust on weka (#624)
ddc8847 Merge pull request #612 from allenai/ddp
41ed20a Merge pull request #623 from allenai/shanea/hf-save-to-disk-2
a33caa9 Merge pull request #604 from allenai/WandbDiff
e5d63a3 Merge pull request #619 from allenai/shanea/add-olmo-1.7-7b-checkpoints
e207df7 Officially add OLMo-core as a dependency (#615)
72159ae Merge pull request #614 from allenai/shanea/pass-include-instance-metadata
c2cedbc Merge pull request #607 from allenai/rewrite-init
578234d Merge pull request #611 from allenai/shanea/hf-get-tokenizer-from-config-2
de43ee8 Merge pull request #610 from allenai/shanea/hf-get-tokenizer-from-config
2639279 Merge pull request #594 from NeuralFabricAI/lx/expose-data-dtype
9e89408 Create sensible filenames
02a8a58 Merge pull request #603 from allenai/shanea/unshard-without-passing-type
ae84d47 Merge pull request #602 from allenai/no_shard_ddp_clip
40210bb Merge pull request #599 from allenai/train-olmo-large
55c1e2f Merge pull request #601 from allenai/no_shard_ddp_clip
5789cfe Merge pull request #593 from allenai/shanea/inspect-train-data-no-indices
eafd154 Merge pull request #579 from MLgdg/main
652c745 Merge pull request #590 from allenai/shanea/update-readme-to-olmo-1.7
8ec2809 Merge pull request #589 from allenai/shanea/update-main-readme-hf
6e714b8 Merge pull request #588 from allenai/shanea/hf-olmo-docs-auto-methods
65d5575 Merge pull request #587 from allenai/shanea/storage-cleaner-improvemnts
0bddfe0 Merge pull request #585 from allenai/shanea/add-hf-docs
e6430a0 Merge pull request #582 from allenai/shanea/hybrid-shard-as-no-shard
c29787a Merge pull request #569 from allenai/Muennighoff/fix-torchv
7a462c5 Merge pull request #580 from allenai/shanea/update-ignore-index-kwarg
4f917fb Merge pull request #575 from allenai/shanea/add-weka
5c721cc Fix GPU tests CI (#574)
467adcc Merge remote-tracking branch 'origin/train-olmo-large'
4b2d12e Merge pull request #565 from allenai/readme
ccc49fd Merge pull request #564 from allenai/shanea/add-new-hf-converter
b17abd0 Merge pull request #512 from liaoleo/main
295d309 Merge pull request #561 from allenai/shanea/delay-device-mesh-import
4e8746d Merge pull request #562 from allenai/shanea/re-add-easy-legacy-unshard-impl
f38de95 Merge pull request #558 from allenai/shanea/release-v0.3.0
829f1d6 Merge pull request #520 from allenai/add-ce-loss-metric

v0.3.0

25 Apr 19:23
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added support for Grouped Query Attention.
  • Added commonsense_qa and social_iqa downstream evaluation tasks
  • Makes it possible to read from http/https the same way we read from s3/r2.
  • Added MMLU multiple choice (A/B/C/D) 5-shot variant downstream tasks
  • Tokenizer patch
  • Added option to specify number of model replicas when using hybrid sharding.

Changed ⚠️

  • Rename Olmo to OLMo everywhere in the codebase
  • Disabled automatic garbage collection during training, instead we run manually at regular intervals to avoid ranks getting out-of-sync with their own gc.

Removed 👋

  • Removed AMDLayerNorm, since the original layer norm bug has been fixed and we don't need this workaround anymore.
  • Removed OLMoParallelBlock.

Fixed ✅

  • Don't log garbage on nodes that aren't rank 0
  • Don't crash in the HF code when we are referring to a tokenizer in a local file
  • Point official training scripts to publicly available URLs
  • Corrected the resize_token_embeddings method in the OLMoForCausalLM class to properly update the token embeddings when resizing the vocabulary.
  • Changed tie_weights method to a no-op as weight tying is handled in olmo/model.py
  • Fixed the size calculation for qk layer norm
  • Fixed pipeline test failure that occurs due to a bug in transformers version 4.39.1
  • Make hf_olmo compatible with transformers versions >=4.40.0

Commits

3b16e21 Merge pull request #556 from allenai/shanea/make-hf-olmo-support-new-transformers
ccf7bf0 Merge pull request #555 from allenai/shanea/wandb-cancel-failure-bypass
7be71cd use correct PG when collecting metrics with HYBRID shard (#551)
06786a7 Merge pull request #548 from allenai/shanea/fix-olmo-name-hf
4ed135e Merge pull request #540 from allenai/shanea/hybrid-sharding-num-groups-2
2eae988 Merge pull request #546 from allenai/shanea/add-olmo-1.7-7b-checkpoints
d2afcaa Add cfg option --scheduler.warmup_min_lr (#542)
9d40898 Merge pull request #537 from allenai/AkshitaB-tokenizer-patch
62c7954 Merge pull request #536 from allenai/shanea/storage-cleaner-wandb-path-from-checkpoint
657a55e Merge pull request #494 from allenai/shanea/storage-cleaner-move-entry
9a0a84a Merge pull request #527 from allenai/PublicTrainingData
0de5fdc Merge pull request #501 from djliden/dl/fix-embedding-resize
4792f94 Adds a new experimental sharded checkpointer from OLMo-core (#532)
1c12980 make garbage collection interval configurable (#533)
db2dee2 Merge pull request #503 from djliden/dl/hf-weight-tying
8fad649 Merge pull request #534 from allenai/shanea/fix-transformer-cache-position-regression
71f7014 Merge pull request #528 from allenai/add-mmlu-mc-5shot
8472d0b Merge pull request #521 from allenai/davidbrandfonbrener-patch-1
194012a Merge pull request #523 from allenai/davidbrandfonbrener-patch-2
8949bd8 Added deprecation for memmap (#517)
83cc8b1 Merge pull request #464 from allenai/olmo7-ablations
f8aef84 Merge pull request #509 from allenai/epwalsh/manual-gc
0ac82a9 Merge pull request #508 from allenai/RunDataloader
74de51d Merge pull request #414 from allenai/mitchish65-2
417af0e Merge pull request #504 from allenai/add-csqa-siqa
666da70 Patch other S3 methods with 404 detection fix
0b6e28c Fix checking HTTP status code for boto3 responses
0b835a8 Merge pull request #500 from allenai/shanea/expose-official-checkpoints
50da7a4 Add work-arounds for new-style checkpointing issues
6d42d7a Fix hang when training is canceled
7eb7f3d Merge pull request #455 from gahdritz/main
ed47c29 Merge pull request #453 from hxdtest/only_rank0_log_metrics
ad8198e Merge pull request #495 from allenai/add-basic-math
1511fed Merge pull request #487 from allenai/fix-mmlu-prompt-bug
c2840e4 Merge pull request #493 from allenai/shanea/storage-cleaner-move-improvements
658f7cc Merge pull request #466 from allenai/rename
eb5b2da Merge pull request #490 from allenai/RemoveAMDLN
752353b Merge pull request #488 from allenai/shanea/optimize-unsharding-2

v0.2.5

07 Mar 00:31
Compare
Choose a tag to compare

What's new

Fixed ✅

  • Fixed default value of --tokenizer argument to scripts/prepare_tulu_data.py to be an absolute path, not relative path, the script can be run from other directories.
  • Added the option to directly pass input embeddings to OLMo and OLMoForCausalLM.
  • Added support for Python 3.8.
  • Added code to throw an error if output_attentions is set to True in forward call to OLMoForCausalLM. This functionality hasn't been implemented yet.
  • Fixed running with data loading workers on LUMI

Added 🎉

  • Added output_hidden_states argument and associated functionality to OLMo and OLMoForCausalLM to return model intermediate hidden states.
  • Added MMLU downstream evaluation tasks, with prompt variations.
  • Added support for PyTorch v2.2.
  • Added ability to show logs from all ranks
  • Added option for QKV clipping.

Changed ⚠️

  • Refactor torch.load monkey patching for legacy checkpoint unsharding in anticipation of unsharding implementation change.

Commits

c499632 Add option for QKV clipping (#489)
31d8528 Pull checkpoint patch from mitchish-gqa-2
03d7643 Merge pull request #486 from allenai/shanea/monkey-patch-ctx-manager
fd3a57b Merge pull request #483 from allenai/shanea/storage-cleaner-unshard-improvements
1d264e4 Merge pull request #481 from allenai/WorkersOnLumi
70ad30c Merge pull request #480 from allenai/Firehose
493c0b8 Add MMLU prompt variants (#484)
cb711e2 Add support for PyTorch v2.2 (#476)
67d24f5 Merge pull request #468 from allenai/mmlu-downstream
0c58bee Fix bug when clipping is disabled
922db6a Only run the profiler through a single cycle (#463)
37ca789 Merge pull request #462 from allenai/epwalsh/fsdp-wrap-patch
cc36709 Add attn bias arg to HF wrapper (#458)
7f7abbb Merge pull request #451 from sarahwie/main
9fd9130 Add support for Python 3.8 (#448)
d9c0993 Require Python>=3.9 for now
97296e6 Merge pull request #442 from allenai/shanea/add-input-embedding-arg
3be4c1e add link to W&B logs for 1B run
d7d4de4 Add link to OLMo-7B-Twin-2T W&B logs
cf12108 Update README.md (#429)
15af668 freeze official configs for reproductions (#421)
7739fe1 Add link to W&B logs for OLMo-7B
80db5e3 Fix default value of --tokenizer
6765317 Add link to paper in README badge

v0.2.4

02 Feb 18:40
Compare
Choose a tag to compare

What's new

Fixed ✅

  • Fixed an issue with the HuggingFace integration where we were inadvertently using a feature that was introduced in Python 3.10, causing an error for older Python versions.

Commits

8a3f2d8 Fix HF integration for Python < 3.10 (#426)
49c8647 Use temp branding GIF for logo (for now) (#419)

v0.2.3

31 Jan 18:36
Compare
Choose a tag to compare

What's new

Commits

98c115c Bump version to v0.2.3 for release
0e53b33 specify dependencies in pyproject.toml (#418)
18e5dad update PyPI release process
141cc94 Merge pull request #415 from allenai/readme-inf
2587240 Merge pull request #417 from allenai/Muennighoff/ckpt
a5a01a2 Merge pull request #416 from allenai/nol_rdme
98425a5 Merge pull request #413 from allenai/shanea/storage-cleaner-s3-upload-cleanup
3053bfa Update install instructions in README
f36ac42 Merge pull request #410 from allenai/epwalsh/fine-tune-with-label-masking
dcae8e8 Merge pull request #411 from allenai/epwalsh/lr-schedule-tokens
45ed078 Add more mcli configs
905359e fix bug with saving unsharded checkpoint
3e3df71 Merge pull request #409 from allenai/epwalsh/tulu-fine-tune
a2e1d13 Merge pull request #368 from allenai/mitchish-lumi
5a735dd Merge pull request #350 from allenai/mitchish
df19554 Merge pull request #388 from allenai/mitchish65
23eb949 Train a few steps after time limit reached (#362)
ac1aee1 Merge pull request #408 from allenai/NixLogz
6da42cf ensure we save checkpoint at end of loop
568a3d8 Merge pull request #406 from allenai/hf-olmo-loading
3c51402 Merge pull request #407 from allenai/shanea/storage-cleaner-avoid-redundant-copy
53217d2 Merge pull request #405 from allenai/shanea/storage-cleaner-fix-upload-path
5eb26aa Merge pull request #404 from allenai/shanea/storage-cleaner-minor-fixes
87ed747 backwards compat fix
1c13e5f Merge pull request #403 from allenai/shanea/storage-cleaner-fix-max-archive-size
685d11b Merge pull request #400 from allenai/shanea/storage-cleaner-wandb
5bdccc3 Merge pull request #402 from allenai/shanea/storage-cleaner-is-run-improvement
75d6738 Merge pull request #401 from allenai/shanea/storage-cleaner-is-file-no-key
0475f3a Make logo a little smaller
1184050 Add logo to README
e2d77c4 Ephemeral checkpoints (#397)
6f2abfb Merge pull request #399 from allenai/shane/storage-cleaner-fix-s3-upload
f8beb5b Merge pull request #398 from allenai/shanea/storage-cleaner-move-run
185d7e2 Move remaining top-level mkd docs into docs folder (#395)
5d03d38 Merge pull request #396 from allenai/shanea/storage-cleaner-delete-temp-files
fe49693 Merge pull request #382 from allenai/shanea/storage-cleaner-unsharding-legacy
1ede949 Merge pull request #381 from allenai/shanea/storage-cleaner-unsharding-2
9cc7154 update some links to new repo (#394)

v0.2.2

11 Dec 05:58
Compare
Choose a tag to compare

What's new

Commits

364e21e Merge pull request #393 from allenai/hf-olmo-auto-map

v0.2.1

11 Dec 00:11
Compare
Choose a tag to compare

What's new

Commits

ad3e676 missing readme
9fa23b4 Merge pull request #392 from allenai/hf-bug-fix

v0.2.0

10 Dec 06:43
Compare
Choose a tag to compare

What's new

Added 🎉

  • GPT-based model.
  • Tokenizer and data pre-processing pipeline.
  • training script.
  • Triton-based FlashAttention.

Commits

e801af8 add release proc
e643f5e update pyproject
dbc8177 Bump version to v0.2.0 for release
e99dbe5 Merge pull request #391 from allenai/hf-olmo-new
a120ab2 Merge pull request #380 from allenai/shanea/storage-cleaner-download-upload
4e849e4 Merge pull request #390 from allenai/shanea/storage-cleaner-archive-fix-2
1dbc346 Merge pull request #378 from allenai/shanea/storage-cleaner-cached-path
22cefa2 Merge pull request #389 from allenai/shanea/add-r2-scheme
ac01778 fix
6c79c63 add option to only unshard model
d1c185b Merge pull request #387 from allenai/epwalsh/dist-init
e30d29f Merge pull request #364 from allenai/shanea/storage-cleaner
ff883e5 Merge pull request #385 from allenai/epwalsh/max-duration-tokens
e16e606 Merge pull request #383 from allenai/epwalsh/start-new-epoch

v0.1.1

27 Nov 01:04
Compare
Choose a tag to compare

What's new

Commits