Skip to content

Releases: huggingface/optimum-habana

v1.13.2: Patch release

06 Sep 20:17
Compare
Choose a tag to compare

Llava(-next) improvements

This patch release adds multi-card support for Llava(-next) and enables users to turn on/off recomputing for flash attention.

  • Llava: Added flash_attention_recompute arg to provide an option to enable/disable recompute #1278 @tthakkal
  • Add the deepspeed injection_policy of mistral #1309 @yuanwu2017

Full Changelog: v1.13.1...v1.13.2

v1.13.1: Patch release

25 Aug 13:34
Compare
Choose a tag to compare

Fixed memory regressions

  • Remove _expand_inputs_for_generation for greedy search (#1266) @libinta
  • Fix memory regression for modeling llama (#1271) @libinta

FSDP

FSDP checkpoint saving is fixed.

Known limitations

  • ESMFold does not work on Gaudi1, this will be fixed in a future version

Full Changelog: v1.13.0...v1.13.1

v1.13.0: Stable Diffusion 3, Sentence Transformers, SAM, DETR, Kubernetes example

16 Aug 14:25
Compare
Choose a tag to compare

SynapseAI 1.17

  • Upgrade SynapseAI version to 1.17.0 #1217

Transformers 4.43

Diffusers 0.29

  • Upgrade optimum-habana diffusers dependency from 0.26.3 to 0.29.2 #1150 @dsocek

Stable Diffusion 3

Training with Sentence Transformers

Model optimizations

SAM, FastVIT, VideoMAE, OpenCLIP, DETR, Table Transformer, deciLM

Stable Diffusion inpainting, unconditional image generation

  • Add the Stable diffusion inpaint support #869 @yuanwu2017
  • Enable Unconditional Image Generation on Gaudi 2 [Diffuser/Tasks] #859 @cfgfung

Text feature extraction example

Tensor parallelism

  • Tensor parallel distributed strategy without using deepspeed #1121 @kalyanjk
  • Disable torch.compile for all_reduce when parallel_strategy is set to "tp" #1174 @kalyanjk

Kubernetes cluster example

  • Adds a helm chart, dockerfile, and instructions for running examples using a Kubernetes cluster #1099 @dmsuehir
  • Fix PyTorch version in the Kubernetes docker-compose to match image #1246 @dmsuehir

FP8 training

Other

Known limitations

  • For Llama, some big batch sizes lead to out-of-memory errors whereas they used to work

v1.12.1: Patch Release

11 Jul 13:51
Compare
Choose a tag to compare

Fix 1st token latency time measure

Fix for Mixtral

Other

  • Fix for selective seq length test with batch size 1 #1110 @libinta

Full Changelog: v1.12.0...v1.12.1

v1.12: Qwen2, Gemma, SVD, Dreambooth, speculative sampling

22 Jun 18:28
Compare
Choose a tag to compare

SynapseAI v1.16

Transformers 4.40

Speculative Sampling

Model optimizations

Stable Video Diffusion

PEFT

TRL

Object Segmentation Example

  • Add an example of object segmentation (ClipSeg) #801 @cfgfung

Dreambooth

  • Diffuser dreambooth full/lora/lokr/loha/oft finetune, dreambooth XL lora finetune #881 @sywangyi

Others

v1.11.1: Patch Release

20 Apr 05:28
Compare
Choose a tag to compare

Llama3 has been validated on Gaudi

Fix issue with pytest

The latest SynapseAI Docker images come with Pytest v8 already installed, which is incompatible with the Transformers library and leads to an error in a few non-test cases. As a temporary workaround, Pytest is pinned and moved as a hard dependency.

Other

Full Changelog: v1.11.0...v1.11.1

v1.11: SDXL fine-tuning, Whisper, Phi, ControlNet

04 Apr 14:55
Compare
Choose a tag to compare

SynapseAI v1.15

The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.15.0.

SDXL fine-tuning

Whisper

Phi

ControlNet

Transformers v4.38

The codebase is fully validated for Transformers v4.38.

Model optimizations

Image-to-text and VQA examples

  • Add image-to-text and visual question answering example #738 @sywangyi

torch.compile

Bug fixes

Others

Known issue

v1.10.4: Patch release

23 Feb 03:26
Compare
Choose a tag to compare

Fix Llama memory issue with DeepSpeed ZeRO-3

  • Fix Llama initialization #712

Full Changelog: v1.10.2...v1.10.4

v1.10.2: Patch release

18 Feb 02:23
Compare
Choose a tag to compare

Upgrade to Transformers v4.37

  • Upgrade to Transformers 4.37 #651

Full Changelog: v1.10.0...v1.10.2

v1.10: SDXL, Textual-Inversion, TRL, SynapseAI v1.14

30 Jan 21:50
Compare
Choose a tag to compare

SynapseAI v1.14

The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.14.0.

Stable Diffusion XL

SDXL is now supported and optimized for Gaudi.

Textual inversion fine-tuning

An example of textual-inversion fine-tuning has been added.

TRL

The 🤗 TRL library is now supported on Gaudi for performing DPO and SFT.

  • Add DPO and SFT of TRL support in Gaudi and example #601
  • Restructure example/trl/stack_llama_2 for generic DPO #635 @libinta
  • Add DPO of TRL in README.md #652 @libinta
  • Add seed in DPO for reproduce the training result #646 @sywangyi

Full bf16 evaluation

Full bf16 evaluation inside the trainer can now be performed like in Transformers.

Text-generation pipeline

A text-generation pipeline fully optimized for Gaudi has been added.

Model optimizations

TGI

TGI on Gaudi has been moved to a dedicated repo: https://github.com/huggingface/tgi-gaudi

Various fixes

Others