Intx Quantization Tensor Class #468

vayuda · 2024-07-02T21:21:53Z

PR fulfilling #439

benchmark results:

Performance with dtypes that aren't multiples of 2 is significantly worse, but that was to be expected without custom kernels.

pytorch-bot · 2024-07-02T21:21:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/468

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1505bca with merge base de4a1fb ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchao/prototype/intx/intx.py

torchao/_models/llama/eval.py

torchao/prototype/intx/Intx.py

torchao/dtypes/affine_quantized_tensor.py

torchao/prototype/intx/Intx.py

torchao/prototype/intx/bitpacking.py

test/prototype/test_intx.py

torchao/prototype/intx/Intx.py

benchmarks/benchmark_intx.py

msaroufim · 2024-08-06T18:21:53Z

@andrewor14 and @Hanxian97 who have been looking into intX for QAT. Do y'all mind giving this a quick review as well?

jerryzh168

I think this is a good starting point, next I think we could think more on kernels. current path we hit is dequantize() and then apply F.linear path I think. Also we can refactor IntxTensor to work with native uint1 to uint7 dtype as well

old

vayuda · 2024-08-07T02:12:53Z

@andrewor14 and @Hanxian97 who have been looking into intX for QAT. Do y'all mind giving this a quick review as well?

Not an expert on QAT, but does this mean we would have to enable support for autograd

jerryzh168 · 2024-08-07T16:43:38Z

torchao/prototype/intx/bitpacking.py

+    2 bit shard: [0b00100111, 00010001]
+    4 bit shard: [0b00000000, 0b01101001, 0b10010111, 0b00100101]


so this seems to be interleaved packing, that is packing two elements far away together, will packing the adjacent elements together be more efficient because of data locality? I guess this might be covered by setting different pack_dim values, but still want to see if we want to explicitly test this

Yea it is interleaved. I think its something worth exploring when making optimized kernels. The issue is if you pack adjacent elements then you have to perform interleaved shifting and bit-wise or. Not sure which is faster without using memory/compute profiler.

andrewor14

Looks great! Mostly just minor comments and suggestions on testing

torchao/prototype/intx/Intx.py

andrewor14 · 2024-08-07T16:26:51Z

torchao/prototype/intx/Intx.py

+        shards = [shard.to(torch.uint8) for shard in shards]
+        self.shard = shards
+        for i, atrib in enumerate(self.bits_to_shard[bit_size]):
+            setattr(self, atrib, shards[i])


should we assert len(shards) == len(bits_to_shard[self.bit_size])?

Users can only pass in the data to be quantized and the element_size. The pack function (called by the constructor) will automatically create the correct amount of shards based on the element size. I don't really think it would ever break

torchao/prototype/intx/Intx.py

torchao/prototype/intx/bitpacking.py

jerryzh168 · 2024-08-07T20:12:22Z

torchao/prototype/uintx/Uintx.py

+
+aten = torch.ops.aten
+
+class UintxTensor(torch.Tensor):


nit: UIntx feels better I think, same for the filename

I was thinking about that, but I felt it looks too much like user interface UI

oh OK, I was mainly concerned about naming convention, but Uintx seems fine for naming convention as well (assuming it means uintx datatype and capitalize u) although it looks a bit weird.

jerryzh168 · 2024-08-07T20:13:54Z

torchao/prototype/uintx/Uintx.py

+            setattr(self, attrib, shards[i])
+
+        self.packed_shape = packed_shape
+        self.bit_size = bit_size    


also I feel it would be helpful to add an assert for accepted bit_size, also mayby rename this to bit_width? feels more commen

What would the assert be checking?

it should one of 1 to 7 right?

jerryzh168 · 2024-08-07T21:53:32Z

merging now since the remaining are minor nits

vayuda added 12 commits June 27, 2024 21:37

init class

72df99d

tensor subclasses work but slow?

66150a7

fixed frame break

ad2be7e

removed a print

236dbac

llama profile added

2ababfe

perf

72d32cf

added profile time

c01110a

Merge branch 'pytorch:main' into intx

692bc94

added intx quantization to benchmark scripts

0401468

add tests

df9bac5

Merge branch 'intx' of https://github.com/vayuda/ao into intx

803cf4c

Merge branch 'pytorch:main' into intx

342e325

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 2, 2024

vayuda added 2 commits July 2, 2024 17:23

Delete trace.json

5acc5b4

Delete profile.txt

94481a3

jerryzh168 reviewed Jul 2, 2024

View reviewed changes

torchao/prototype/intx/intx.py Outdated Show resolved Hide resolved

HDCharles previously requested changes Jul 3, 2024

View reviewed changes

torchao/_models/llama/eval.py Outdated Show resolved Hide resolved

vayuda added 5 commits July 11, 2024 16:09

seperated dtype and affine quant WIP

5928394

Merge branch 'intx' of https://github.com/vayuda/ao into intx

ab171ff

Merge branch 'main' of https://github.com/pytorch/ao into intx

5c76d22

Merge branch 'pytorch:main' into intx

e058179

works without compile

98309f0

jerryzh168 mentioned this pull request Jul 22, 2024

[RFC] Add Auto-Round support #533

Closed

jerryzh168 reviewed Aug 2, 2024

View reviewed changes

torchao/prototype/intx/Intx.py Outdated Show resolved Hide resolved

vayuda added 4 commits August 2, 2024 01:29

seperated stuff, added tests

b8b5e2a

Merge branch 'main' into intx

c0eecbf

remove intx from api til ready

d13952c

undo spacing in aqt

be488ec

vayuda commented Aug 2, 2024

View reviewed changes

torchao/dtypes/affine_quantized_tensor.py Outdated Show resolved Hide resolved

vayuda added 9 commits August 6, 2024 10:02

updated test

8028039

re-added missing comment

983d964

remove new line

a27e06d

add new line

a86c2a8

white space fix

de5d6ee

whitespace fix

f9b4a6a

fixed test

87300f6

refactored implements, actually fixed tests

5868576

tests only run on nightly

398701b

jerryzh168 reviewed Aug 6, 2024

View reviewed changes

torchao/prototype/intx/Intx.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Aug 6, 2024

View reviewed changes

torchao/prototype/intx/bitpacking.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Aug 6, 2024

View reviewed changes

test/prototype/test_intx.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Aug 6, 2024

View reviewed changes

torchao/prototype/intx/Intx.py Outdated Show resolved Hide resolved

msaroufim requested review from andrewor14 and Hanxian97 August 6, 2024 18:21

jerryzh168 reviewed Aug 6, 2024

View reviewed changes

benchmarks/benchmark_intx.py Outdated Show resolved Hide resolved

jerryzh168 approved these changes Aug 6, 2024

View reviewed changes

jerryzh168 mentioned this pull request Aug 6, 2024

unwrap_tensor_subclass and nested tensor subclasses issue #515

Closed

jerryzh168 reviewed Aug 7, 2024

View reviewed changes

andrewor14 approved these changes Aug 7, 2024

View reviewed changes

clean up from pr reviews

1505bca

jerryzh168 reviewed Aug 7, 2024

View reviewed changes

jerryzh168 merged commit 87869f2 into pytorch:main Aug 7, 2024
13 checks passed

vayuda mentioned this pull request Aug 7, 2024

[RFC] Intx Tensor Subclasses Quantization #439

Open

6 tasks

vayuda deleted the intx branch August 8, 2024 16:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intx Quantization Tensor Class #468

Intx Quantization Tensor Class #468

vayuda commented Jul 2, 2024

pytorch-bot bot commented Jul 2, 2024 •

edited

Loading

msaroufim commented Aug 6, 2024

jerryzh168 left a comment •

edited

Loading

vayuda commented Aug 7, 2024

jerryzh168 Aug 7, 2024 •

edited

Loading

vayuda Aug 7, 2024

andrewor14 left a comment

andrewor14 Aug 7, 2024

vayuda Aug 7, 2024

jerryzh168 Aug 7, 2024 •

edited

Loading

vayuda Aug 7, 2024

jerryzh168 Aug 7, 2024

jerryzh168 Aug 7, 2024

vayuda Aug 7, 2024

jerryzh168 Aug 7, 2024 •

edited

Loading

jerryzh168 commented Aug 7, 2024

		2 bit shard: [0b00100111, 00010001]
		4 bit shard: [0b00000000, 0b01101001, 0b10010111, 0b00100101]


		aten = torch.ops.aten

		class UintxTensor(torch.Tensor):

Intx Quantization Tensor Class #468

Intx Quantization Tensor Class #468

Conversation

vayuda commented Jul 2, 2024

pytorch-bot bot commented Jul 2, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/468

✅ No Failures

msaroufim commented Aug 6, 2024

jerryzh168 left a comment • edited Loading

Choose a reason for hiding this comment

vayuda commented Aug 7, 2024

jerryzh168 Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

vayuda Aug 7, 2024

Choose a reason for hiding this comment

andrewor14 left a comment

Choose a reason for hiding this comment

andrewor14 Aug 7, 2024

Choose a reason for hiding this comment

vayuda Aug 7, 2024

Choose a reason for hiding this comment

jerryzh168 Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

vayuda Aug 7, 2024

Choose a reason for hiding this comment

jerryzh168 Aug 7, 2024

Choose a reason for hiding this comment

jerryzh168 Aug 7, 2024

Choose a reason for hiding this comment

vayuda Aug 7, 2024

Choose a reason for hiding this comment

jerryzh168 Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

jerryzh168 commented Aug 7, 2024

pytorch-bot bot commented Jul 2, 2024 •

edited

Loading

jerryzh168 left a comment •

edited

Loading

jerryzh168 Aug 7, 2024 •

edited

Loading

jerryzh168 Aug 7, 2024 •

edited

Loading

jerryzh168 Aug 7, 2024 •

edited

Loading