{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":250510075,"defaultBranch":"main","name":"trl","ownerLogin":"huggingface","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2020-03-27T10:54:55.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/25720743?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1720445891.0","currentOid":""},"activityList":{"items":[{"before":"e10792032be644a65dcbcf2ebe9ec947497d4d46","after":"314e8eb367cbfaf74c2e9717085346360e779508","ref":"refs/heads/main","pushedAt":"2024-07-08T13:41:36.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"kashif","name":"Kashif Rasul","path":"/kashif","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8100?s=80&v=4"},"commit":{"message":"fix broken url in `docs\\source\\index.mdx` (#1813)","shortMessageHtmlLink":"fix broken url in docs\\source\\index.mdx (#1813)"}},{"before":"cbe9fd8328e9f678f1d9904fbfd5462af14c8e2e","after":null,"ref":"refs/heads/0.9.6-release","pushedAt":"2024-07-08T13:38:11.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"vwxyzjn","name":"Costa Huang","path":"/vwxyzjn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5555347?s=80&v=4"}},{"before":"78045dedc8678af04f4e35ffe63f37be196a435b","after":"e10792032be644a65dcbcf2ebe9ec947497d4d46","ref":"refs/heads/main","pushedAt":"2024-07-08T13:38:10.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vwxyzjn","name":"Costa Huang","path":"/vwxyzjn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5555347?s=80&v=4"},"commit":{"message":"0.9.6 release (#1816)","shortMessageHtmlLink":"0.9.6 release (#1816)"}},{"before":null,"after":"cbe9fd8328e9f678f1d9904fbfd5462af14c8e2e","ref":"refs/heads/0.9.6-release","pushedAt":"2024-07-08T13:09:43.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"vwxyzjn","name":"Costa Huang","path":"/vwxyzjn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5555347?s=80&v=4"},"commit":{"message":"0.9.6 release","shortMessageHtmlLink":"0.9.6 release"}},{"before":"747612f9d3063de56b6524e5feb0c9feab21d4c4","after":"78045dedc8678af04f4e35ffe63f37be196a435b","ref":"refs/heads/main","pushedAt":"2024-07-07T23:59:26.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vwxyzjn","name":"Costa Huang","path":"/vwxyzjn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5555347?s=80&v=4"},"commit":{"message":"Fix `TRL_USE_RICH` environment variable handling (#1808)\n\n* Add `strtobool` custom implementation from `distutils`\r\n\r\n* Fix `TRL_USE_RICH` handling via `strtobool`\r\n\r\n* Run `make precommit`","shortMessageHtmlLink":"Fix TRL_USE_RICH environment variable handling (#1808)"}},{"before":null,"after":"d88ee55dc2797398ff684c71671286f2973b719b","ref":"refs/heads/online-trainer-refactor","pushedAt":"2024-07-05T14:37:45.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"vwxyzjn","name":"Costa Huang","path":"/vwxyzjn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5555347?s=80&v=4"},"commit":{"message":"push changes","shortMessageHtmlLink":"push changes"}},{"before":"9e3a35bd3d85ee506d180120f01bde2229b60265","after":"747612f9d3063de56b6524e5feb0c9feab21d4c4","ref":"refs/heads/main","pushedAt":"2024-07-05T14:28:59.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"kashif","name":"Kashif Rasul","path":"/kashif","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8100?s=80&v=4"},"commit":{"message":"Fix `torch_dtype` handling in `{DPO,SFT}Trainer` when provided via CLI (#1807)\n\n* Fix `torch_dtype` handling through CLI\r\n\r\nThe `torch_dtype` is not properly handled when provided via the TRL CLI\r\nsince it's provided initially as a string, but is then casted to\r\n`torch.dtype` before providing it to the `{DPO,SFT}Trainer`, which means\r\nthat those trainers should handle the scenario where `torch_dtype` is a\r\n`torch.dtype` too.\r\n\r\n* Add `torch_dtype` tests in `test_{dpo,sft}_trainer.py`\r\n\r\n* Forward contribution credits\r\n\r\n* Run `make precommit`\r\n\r\n---------\r\n\r\nCo-authored-by: Tash Srivastava ","shortMessageHtmlLink":"Fix torch_dtype handling in {DPO,SFT}Trainer when provided via CLI ("}},{"before":"4402b36dcf79a0921a858c77375cfbb285d603c7","after":"9e3a35bd3d85ee506d180120f01bde2229b60265","ref":"refs/heads/main","pushedAt":"2024-07-05T11:29:48.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"kashif","name":"Kashif Rasul","path":"/kashif","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8100?s=80&v=4"},"commit":{"message":"Remove extra print in reward_trainer.py (#1799)\n\n`print_rich_table` is called twice and the first call doesn't restrict to `num_print_samples`. Remove the first, extra call","shortMessageHtmlLink":"Remove extra print in reward_trainer.py (#1799)"}},{"before":"78f8228874d5cf9c0e68952533cb377202e1eb22","after":"4402b36dcf79a0921a858c77375cfbb285d603c7","ref":"refs/heads/main","pushedAt":"2024-07-04T12:29:25.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"kashif","name":"Kashif Rasul","path":"/kashif","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8100?s=80&v=4"},"commit":{"message":"clean examples (#1791)\n\nCo-authored-by: Quentin Gallouédec ","shortMessageHtmlLink":"clean examples (#1791)"}},{"before":"b6af2edc93b275afcee22a3eb71f9a5702ff9fd8","after":"78f8228874d5cf9c0e68952533cb377202e1eb22","ref":"refs/heads/main","pushedAt":"2024-07-03T18:10:50.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"kashif","name":"Kashif Rasul","path":"/kashif","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8100?s=80&v=4"},"commit":{"message":"Bugfix: Preserve token fields when converting TrainingArguments to SFTConfig (#1794)\n\n* Preserve token fields when converting TrainingArguments to SFTConfig\r\n\r\nTrainingArguments.to_dict() redacts token fields, so we have to\r\nindividually copy them over when converting to SFTConfig to avoid\r\nbreaking push_to_hub functionality.\r\n\r\nAlso adds a test.\r\n\r\n* run precommit\r\n\r\n* one-line args_as_dict definition per suggestion from kashif\r\n\r\n* generalize token copying to match TrainingArguments behavior\r\n\r\n* unwrap |= on dict, to support python 3.8\r\n\r\n* use .update instead of |= or for-loop","shortMessageHtmlLink":"Bugfix: Preserve token fields when converting TrainingArguments to SF…"}},{"before":"cd85b14fbbaf7e4d9b01ef8ec19655666af20047","after":"b6af2edc93b275afcee22a3eb71f9a5702ff9fd8","ref":"refs/heads/main","pushedAt":"2024-07-03T06:29:16.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"kashif","name":"Kashif Rasul","path":"/kashif","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8100?s=80&v=4"},"commit":{"message":"add model_init_kwargs to training_args (#1787)","shortMessageHtmlLink":"add model_init_kwargs to training_args (#1787)"}},{"before":"a57544f47a2fbc4940b4d49dde32f54406398c91","after":"cd85b14fbbaf7e4d9b01ef8ec19655666af20047","ref":"refs/heads/main","pushedAt":"2024-06-29T13:35:48.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"kashif","name":"Kashif Rasul","path":"/kashif","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8100?s=80&v=4"},"commit":{"message":"Fixed typo in SFT trainer docs (#1788)\n\n'STFConfig' instead of 'SFTConfig' appears multiple times in the doc, causing error when running the code snippets.","shortMessageHtmlLink":"Fixed typo in SFT trainer docs (#1788)"}},{"before":"b68ff96f0c74368961e194081e122959cd1f4d4d","after":"a57544f47a2fbc4940b4d49dde32f54406398c91","ref":"refs/heads/main","pushedAt":"2024-06-27T13:47:58.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"kashif","name":"Kashif Rasul","path":"/kashif","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8100?s=80&v=4"},"commit":{"message":"fix docs and examples (#1780)","shortMessageHtmlLink":"fix docs and examples (#1780)"}},{"before":"c8c01cc05569f5ffea6726b2111f799a63e03aaa","after":"b68ff96f0c74368961e194081e122959cd1f4d4d","ref":"refs/heads/main","pushedAt":"2024-06-26T14:26:38.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"kashif","name":"Kashif Rasul","path":"/kashif","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8100?s=80&v=4"},"commit":{"message":"Visual DPO (#1647)\n\n* Remove extra whitespaces\r\n\r\n* idefics\r\n\r\n* vdpo\r\n\r\n* sft idefics\r\n\r\n* pad with test\r\n\r\n* use prompt instead of tokenizer\r\n\r\n* rm name main\r\n\r\n* support vlm in tokenize row\r\n\r\n* temp fix for regex in lora_target_module\r\n\r\n* format\r\n\r\n* vdpo\r\n\r\n* tmp float16 hard code\r\n\r\n* concatenated_forward support for vision\r\n\r\n* style and new command line\r\n\r\n* all-linear\r\n\r\n* format\r\n\r\n* delete old examples\r\n\r\n* get image\r\n\r\n* upcast\r\n\r\n* new test\r\n\r\n* modified test\r\n\r\n* new strat for tokenizer\r\n\r\n* rm token transfer\r\n\r\n* integrate vision in dpo example\r\n\r\n* format\r\n\r\n* add FDivergenceType back\r\n\r\n* precommit\r\n\r\n* pillow test dep\r\n\r\n* optional prompt\r\n\r\n* `evaluation_strategy` to `eval_strategy`\r\n\r\n* revert vsft change (oos)\r\n\r\n* update test\r\n\r\n* test\r\n\r\n* comment and support more in process\r\n\r\n* update process\r\n\r\n* update doc for vdpo\r\n\r\n* caution about limited support\r\n\r\n* Update docs/source/dpo_trainer.mdx\r\n\r\nCo-authored-by: Kashif Rasul \r\n\r\n* revert DPO example changes\r\n\r\n* cleaner way to check if a model is vision\r\n\r\n* comment\r\n\r\n* update vdpo example\r\n\r\n* rename\r\n\r\n---------\r\n\r\nCo-authored-by: Quentin Gallouédec \r\nCo-authored-by: Kashif Rasul ","shortMessageHtmlLink":"Visual DPO (#1647)"}},{"before":"3ed0b71eb4815973cd5821f3298c468662c47f6c","after":"d6761fdb1ff9d86526fa8c7e5fb2db5c854b0fe6","ref":"refs/heads/dataset-processor","pushedAt":"2024-06-26T13:37:19.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vwxyzjn","name":"Costa Huang","path":"/vwxyzjn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5555347?s=80&v=4"},"commit":{"message":"precommit","shortMessageHtmlLink":"precommit"}},{"before":"7a5fbd0f46740477d3382ea192d7518e4447db95","after":"3ed0b71eb4815973cd5821f3298c468662c47f6c","ref":"refs/heads/dataset-processor","pushedAt":"2024-06-26T13:32:39.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vwxyzjn","name":"Costa Huang","path":"/vwxyzjn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5555347?s=80&v=4"},"commit":{"message":"update the docs","shortMessageHtmlLink":"update the docs"}},{"before":"ca059d8275a130a1a4b10a533afcc158f61a4fe6","after":"7a5fbd0f46740477d3382ea192d7518e4447db95","ref":"refs/heads/dataset-processor","pushedAt":"2024-06-26T13:32:30.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vwxyzjn","name":"Costa Huang","path":"/vwxyzjn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5555347?s=80&v=4"},"commit":{"message":"update reward model script","shortMessageHtmlLink":"update reward model script"}},{"before":"3479606c8c6dbb5da96e4990b491e63a48fc7483","after":"c8c01cc05569f5ffea6726b2111f799a63e03aaa","ref":"refs/heads/main","pushedAt":"2024-06-26T09:23:37.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"kashif","name":"Kashif Rasul","path":"/kashif","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8100?s=80&v=4"},"commit":{"message":"Fix Documentation Overflow Issues for Long URLs in SFTConfig (#1774)\n\n* Update sft_config.py\r\n\r\n* Update sft_config.py","shortMessageHtmlLink":"Fix Documentation Overflow Issues for Long URLs in SFTConfig (#1774)"}},{"before":"7965b7834052ab3d60a1cc5de382e2f56b3772e7","after":"3479606c8c6dbb5da96e4990b491e63a48fc7483","ref":"refs/heads/main","pushedAt":"2024-06-26T07:18:22.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"kashif","name":"Kashif Rasul","path":"/kashif","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8100?s=80&v=4"},"commit":{"message":"Remove the leading space in the tldr preference dataset (#1773)","shortMessageHtmlLink":"Remove the leading space in the tldr preference dataset (#1773)"}},{"before":"e747c06339f43c23960467d9c1a03a4e6eee63a1","after":"ca059d8275a130a1a4b10a533afcc158f61a4fe6","ref":"refs/heads/dataset-processor","pushedAt":"2024-06-25T19:16:34.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vwxyzjn","name":"Costa Huang","path":"/vwxyzjn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5555347?s=80&v=4"},"commit":{"message":"update docs","shortMessageHtmlLink":"update docs"}},{"before":"39a0245df35fcfe1f49f3b733c9c14a1d7e12f2a","after":"e747c06339f43c23960467d9c1a03a4e6eee63a1","ref":"refs/heads/dataset-processor","pushedAt":"2024-06-25T18:23:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vwxyzjn","name":"Costa Huang","path":"/vwxyzjn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5555347?s=80&v=4"},"commit":{"message":"better visualization","shortMessageHtmlLink":"better visualization"}},{"before":"56bd1bba26ac52aad976c1a1a0b3d9e1137b18c7","after":"7965b7834052ab3d60a1cc5de382e2f56b3772e7","ref":"refs/heads/main","pushedAt":"2024-06-25T14:47:32.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"kashif","name":"Kashif Rasul","path":"/kashif","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8100?s=80&v=4"},"commit":{"message":"add Efficient Exact Optimization (EXO) (#1735)\n\n* add exo\r\n\r\n* fix a detail\r\n\r\n* Update trl/trainer/dpo_trainer.py\r\n\r\n* Update trl/trainer/dpo_trainer.py\r\n\r\n* Update trl/trainer/dpo_trainer.py\r\n\r\n---------\r\n\r\nCo-authored-by: Kashif Rasul ","shortMessageHtmlLink":"add Efficient Exact Optimization (EXO) (#1735)"}},{"before":"94d53e6617edc6434a38b2ac51c21e5da3329cda","after":"56bd1bba26ac52aad976c1a1a0b3d9e1137b18c7","ref":"refs/heads/main","pushedAt":"2024-06-25T14:14:26.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vwxyzjn","name":"Costa Huang","path":"/vwxyzjn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5555347?s=80&v=4"},"commit":{"message":"`evaluation_strategy` to `eval_strategy` (#1771)\n\nCo-authored-by: Quentin Gallouédec ","shortMessageHtmlLink":"evaluation_strategy to eval_strategy (#1771)"}},{"before":"b5be100ae0b37d743cd49435297f917eb54a0574","after":"94d53e6617edc6434a38b2ac51c21e5da3329cda","ref":"refs/heads/main","pushedAt":"2024-06-24T19:27:00.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"kashif","name":"Kashif Rasul","path":"/kashif","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8100?s=80&v=4"},"commit":{"message":"MoE Models: option to add load balancing loss (#1765)\n\n* KTO: add aux loss\r\n\r\n* use router_aux_loss_coef in KtoTrainer when aux_loss enabled\r\n\r\n* align optional aux_loss in DPO, KTO, CPO, ORPO\r\n\r\n* precommit changes\r\n\r\n* fix KL forward kwargs\r\n\r\n* add aux_loss doku entry\r\n\r\n* apply docs suggestions\r\n\r\n---------\r\n\r\nCo-authored-by: Clara Luise Pohland ","shortMessageHtmlLink":"MoE Models: option to add load balancing loss (#1765)"}},{"before":"6e1652bc5e8ff6d348c7f06048f4102a050f1544","after":"b5be100ae0b37d743cd49435297f917eb54a0574","ref":"refs/heads/main","pushedAt":"2024-06-24T16:05:44.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vwxyzjn","name":"Costa Huang","path":"/vwxyzjn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5555347?s=80&v=4"},"commit":{"message":"Added Reward Backpropogation Support (#1585)\n\n* added alignprop template\r\n\r\n* added alignprop support\r\n\r\n* Update alignprop_trainer.mdx\r\n\r\n* Update alignprop_trainer.mdx\r\n\r\n* added better why statement\r\n\r\n* fixed inference code\r\n\r\n* changed self to pipeline\r\n\r\n* removed aesthetic classifier\r\n\r\n* added aesthetic to auxiliary models\r\n\r\n* added unseen prompt logging\r\n\r\n* removed unseen prompt log\r\n\r\n* fixed minor\r\n\r\n* remove not needed import in trl/__init__.py\r\n\r\nCo-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>\r\n\r\n* fixed styling\r\n\r\n* updated _toctree\r\n\r\n---------\r\n\r\nCo-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>","shortMessageHtmlLink":"Added Reward Backpropogation Support (#1585)"}},{"before":"932a78cdfd1b12062754683eea90043b1def4ed8","after":"39a0245df35fcfe1f49f3b733c9c14a1d7e12f2a","ref":"refs/heads/dataset-processor","pushedAt":"2024-06-24T16:03:37.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vwxyzjn","name":"Costa Huang","path":"/vwxyzjn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5555347?s=80&v=4"},"commit":{"message":"update docs for reward model","shortMessageHtmlLink":"update docs for reward model"}},{"before":"65374c6a711709157ea59297dce43dfb458d1c78","after":"6e1652bc5e8ff6d348c7f06048f4102a050f1544","ref":"refs/heads/main","pushedAt":"2024-06-23T16:54:30.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"kashif","name":"Kashif Rasul","path":"/kashif","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8100?s=80&v=4"},"commit":{"message":"Add CPO-SimPO method (#1760)\n\n* enable cpo-simpo\r\n\r\n* highlight SimPO and CPO-SimPO\r\n\r\n* add test for cpo_alpha\r\n\r\n* formatting\r\n\r\n* Update docs/source/cpo_trainer.mdx\r\n\r\n---------\r\n\r\nCo-authored-by: Kashif Rasul ","shortMessageHtmlLink":"Add CPO-SimPO method (#1760)"}},{"before":"57035934bcb0b64f8dc3fbd83fc1ec5740df71ce","after":null,"ref":"refs/heads/new-sentiment-descriptiveness-dataset","pushedAt":"2024-06-21T15:20:54.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"vwxyzjn","name":"Costa Huang","path":"/vwxyzjn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5555347?s=80&v=4"}},{"before":"99560911123f739226b77813f27d5c90ed7f9ba2","after":"65374c6a711709157ea59297dce43dfb458d1c78","ref":"refs/heads/main","pushedAt":"2024-06-21T15:20:54.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"vwxyzjn","name":"Costa Huang","path":"/vwxyzjn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5555347?s=80&v=4"},"commit":{"message":"New sentiment and descriptiveness dataset (#1757)\n\n* push changes\r\n\r\n* handle edge cases where the chosen and the rejected are the same","shortMessageHtmlLink":"New sentiment and descriptiveness dataset (#1757)"}},{"before":"07f3bf879aed081b847f195fb6d0035c3f8263b6","after":"932a78cdfd1b12062754683eea90043b1def4ed8","ref":"refs/heads/dataset-processor","pushedAt":"2024-06-21T15:17:18.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"vwxyzjn","name":"Costa Huang","path":"/vwxyzjn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5555347?s=80&v=4"},"commit":{"message":"refactor RM training","shortMessageHtmlLink":"refactor RM training"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEeaXQZwA","startCursor":null,"endCursor":null}},"title":"Activity · huggingface/trl"}