Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : fix iq4_nl dot product with odd number of blocks #8549

Merged
merged 2 commits into from
Jul 19, 2024

Conversation

slaren
Copy link
Collaborator

@slaren slaren commented Jul 17, 2024

Ref: #8495

Runs the last block on the pure C implementation if there is an odd number of blocks. Only AVX and AVX2 tested, likely affects NEON and others as well.

@github-actions github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning labels Jul 17, 2024
@oldgithubman
Copy link

Ref: #8495

Runs the last block on the pure C implementation if there is an odd number of blocks. Only AVX and AVX2 tested, likely affects NEON and others as well.

Nice solution

@ggerganov
Copy link
Owner

On M2 Ultra, test-backend-ops runs successful.

Enabling the random tests causes many failures:

ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2 Ultra
ggml_metal_init: picking default device: Apple M2 Ultra
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name:   Apple M2 Ultra
ggml_metal_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 154618.82 MB
  MUL_MAT(type_a=q4_1,type_b=f32,m=43,n=92,k=288,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 0.105420734 > 0.000500000 FAIL
  MUL_MAT(type_a=q5_1,type_b=f32,m=12,n=24,k=480,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 11773.473165934 > 0.000500000 FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=4,n=38,k=480,bs=[1,1],nr=[1,1]): [MUL_MAT] NaN at index 148 (Metal=-5.410255 CPU=nan) FAIL
  MUL_MAT(type_a=q5_0,type_b=f32,m=69,n=86,k=96,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 12530281.861510953 > 0.000500000 FAIL
  MUL_MAT(type_a=q5_1,type_b=f32,m=75,n=113,k=96,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 65.765954396 > 0.000500000 FAIL
  MUL_MAT(type_a=q5_1,type_b=f32,m=74,n=114,k=480,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 3563.260783020 > 0.000500000 FAIL
  MUL_MAT(type_a=q5_0,type_b=f32,m=7,n=126,k=288,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 43.655039701 > 0.000500000 FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=20,n=115,k=480,bs=[1,1],nr=[1,1]): [MUL_MAT] NaN at index 19 (Metal=11.000481 CPU=nan) FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=56,n=38,k=352,bs=[1,1],nr=[1,1]): [MUL_MAT] NaN at index 55 (Metal=-16.780495 CPU=nan) FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=15,n=16,k=160,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 10617966.006262200 > 0.000500000 FAIL
  MUL_MAT(type_a=q5_0,type_b=f32,m=103,n=126,k=352,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 10811661558.555959702 > 0.000500000 FAIL
  MUL_MAT(type_a=q5_1,type_b=f32,m=122,n=67,k=352,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 1.668788946 > 0.000500000 FAIL
  MUL_MAT(type_a=q5_1,type_b=f32,m=21,n=19,k=352,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 7835.887396502 > 0.000500000 FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=77,n=120,k=352,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 137.825738287 > 0.000500000 FAIL
  MUL_MAT(type_a=q5_0,type_b=f32,m=31,n=102,k=160,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 0.192240393 > 0.000500000 FAIL
  MUL_MAT(type_a=q5_0,type_b=f32,m=127,n=42,k=288,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 1.022682184 > 0.000500000 FAIL
  MUL_MAT(type_a=q5_0,type_b=f32,m=121,n=70,k=416,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 0.436010703 > 0.000500000 FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=90,n=120,k=288,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 153120430276.364410400 > 0.000500000 FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=12,n=47,k=480,bs=[1,1],nr=[1,1]): [MUL_MAT] NaN at index 552 (Metal=13.171962 CPU=nan) FAIL
  MUL_MAT(type_a=q5_0,type_b=f32,m=51,n=127,k=224,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 162.877185397 > 0.000500000 FAIL
  MUL_MAT(type_a=q5_1,type_b=f32,m=97,n=24,k=352,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 20659224025105313792.000000000 > 0.000500000 FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=45,n=26,k=288,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 0.159211832 > 0.000500000 FAIL
  MUL_MAT(type_a=q5_0,type_b=f32,m=68,n=58,k=288,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 59558404564.251739502 > 0.000500000 FAIL
  MUL_MAT(type_a=q5_1,type_b=f32,m=4,n=112,k=160,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 35656903797.003738403 > 0.000500000 FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=70,n=124,k=416,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 0.864959230 > 0.000500000 FAIL
  MUL_MAT(type_a=q5_0,type_b=f32,m=108,n=49,k=224,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 21854.549477808 > 0.000500000 FAIL
  MUL_MAT(type_a=q5_1,type_b=f32,m=70,n=71,k=224,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 44287746.743597277 > 0.000500000 FAIL
GGML_ASSERT: ggml/src/ggml-metal.m:1790: ne00 >= nth0*nth1

Will look into those now

@JohannesGaessler
Copy link
Collaborator

Does the intended scope of this PR include the random tests?

@slaren
Copy link
Collaborator Author

slaren commented Jul 18, 2024

No, this PR does not add random tests. It adds some commented code to test-backend-ops that I thought other people may find useful to test this or other PRs, so I decided to keep it there disabled by default, but it can be removed otherwise.

@mofosyne mofosyne added the Review Complexity : High Generally require indepth knowledge of LLMs or GPUs label Jul 19, 2024
* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix q4_1

* ggml : fix q5_0

* ggml : fix q5_1

* ggml : fix iq4_nl metal

ggml-ci

* ggml : fix q4_0

* ggml : fix q8_0

ggml-ci

* ggml : remove special Q4_0 code for first 2 blocks

* ggml : fix sumf redefinition

---------

Co-authored-by: slaren <[email protected]>
@slaren slaren merged commit 87e397d into master Jul 19, 2024
58 checks passed
@slaren slaren deleted the sl/fix-iqnl-odd-blocks branch July 19, 2024 15:17
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 27, 2024
* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix odd blocks for ARM_NEON (ggerganov#8556)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix q4_1

* ggml : fix q5_0

* ggml : fix q5_1

* ggml : fix iq4_nl metal

ggml-ci

* ggml : fix q4_0

* ggml : fix q8_0

ggml-ci

* ggml : remove special Q4_0 code for first 2 blocks

* ggml : fix sumf redefinition

---------

Co-authored-by: slaren <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Review Complexity : High Generally require indepth knowledge of LLMs or GPUs testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants