Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : add Q5_0 and Q5_1 quantization #1187

Merged
merged 10 commits into from
Apr 26, 2023
Merged

ggml : add Q5_0 and Q5_1 quantization #1187

merged 10 commits into from
Apr 26, 2023

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Apr 26, 2023

Follow up on the idea by @ikawrakow in #729 (comment)

Q5_0

#define QK5_0 32
typedef struct {
    ggml_fp16_t d;          // delta
    uint8_t qh[4];          // 5-th bit of quants (uint32_t)
    uint8_t qs[QK5_0 / 2];  // nibbles / quants
} block_q5_0;

On M1 Pro, it evaluates at about 53 ms / token for 7B model
This format is bigger than Q4_0 and Q4_2.

Perplexity for 7B: 6.0139

main: seed = 1682523351
llama.cpp: loading model from ../models/7B/ggml-model-q5_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 8 (mostly Q5_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4525003.11 KB
llama_model_load_internal: mem required  = 6210.95 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 12 / 64 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
perplexity : calculating perplexity over 655 chunks, batch_size=512
1.90 seconds per pass - ETA 20 minutes
[1]4.2484,[2]4.7547,[3]5.6316,[4]6.2345,[5]6.3575,[6]6.3361,[7]6.5288,[8]6.6259,[9]6.9688,[10]7.2116,[11]7.4185,[12]7.4477,[13]7.3615,[14]7.4028,[15]7.6442,[16]7.2662,[17]7.1544,[18]7.1013,[19]6.7447,[20]6.7341,[21]6.6423,[22]6.4727,[23]6.4417,[24]6.3490,[25]6.3524,[26]6.1948,[27]6.0225,[28]5.9267,[29]5.8384,[30]5.6840,[31]5.6553,[32]5.6751,[33]5.6167,[34]5.6509,[35]5.6727,[36]5.7108,[37]5.7162,[38]5.7249,[39]5.7569,[40]5.8063,[41]5.8181,[42]5.8563,[43]5.8172,[44]5.8757,[45]5.8774,[46]5.8511,[47]5.8708,[48]5.8464,[49]5.8466,[50]5.8085,[51]5.8044,[52]5.7966,[53]5.8401,[54]5.8246,[55]5.8020,[56]5.8319,[57]5.8511,[58]5.8723,[59]5.8888,[60]5.9307,[61]5.9239,[62]5.9817,[63]6.0119,[64]6.0250,[65]6.0682,[66]6.0762,[67]6.0930,[68]6.1060,[69]6.1291,[70]6.1593,[71]6.1827,[72]6.2131,[73]6.2709,[74]6.2743,[75]6.2872,[76]6.2990,[77]6.3104,[78]6.2966,[79]6.3240,[80]6.3168,[81]6.3290,[82]6.3332,[83]6.2818,[84]6.2637,[85]6.2520,[86]6.2305,[87]6.1659,[88]6.1399,[89]6.1208,[90]6.1060,[91]6.1289,[92]6.1233,[93]6.1227,[94]6.1208,[95]6.1480,[96]6.1481,[97]6.1437,[98]6.1379,[99]6.1246,[100]6.1235,[101]6.1472,[102]6.1415,[103]6.1609,[104]6.1672,[105]6.1674,[106]6.1840,[107]6.1833,[108]6.1953,[109]6.1898,[110]6.1861,[111]6.2078,[112]6.2281,[113]6.2300,[114]6.2262,[115]6.2329,[116]6.2228,[117]6.2279,[118]6.2566,[119]6.2776,[120]6.3119,[121]6.3264,[122]6.3510,[123]6.3874,[124]6.4045,[125]6.3951,[126]6.4335,[127]6.4703,[128]6.5000,[129]6.4848,[130]6.4939,[131]6.4899,[132]6.4836,[133]6.4700,[134]6.4798,[135]6.4761,[136]6.4651,[137]6.4581,[138]6.4401,[139]6.4302,[140]6.4270,[141]6.3973,[142]6.3939,[143]6.3640,[144]6.3438,[145]6.3341,[146]6.3221,[147]6.3254,[148]6.3252,[149]6.3196,[150]6.3152,[151]6.3173,[152]6.3077,[153]6.2910,[154]6.2824,[155]6.2890,[156]6.2838,[157]6.3001,[158]6.3039,[159]6.3088,[160]6.3113,[161]6.3228,[162]6.2941,[163]6.2831,[164]6.2598,[165]6.2292,[166]6.2024,[167]6.1659,[168]6.1355,[169]6.1222,[170]6.1113,[171]6.0850,[172]6.0680,[173]6.0515,[174]6.0219,[175]6.0007,[176]5.9895,[177]5.9700,[178]5.9476,[179]5.9303,[180]5.9207,[181]5.8998,[182]5.8821,[183]5.8682,[184]5.8678,[185]5.8605,[186]5.8607,[187]5.8668,[188]5.8631,[189]5.8800,[190]5.8808,[191]5.9013,[192]5.9171,[193]5.9332,[194]5.9440,[195]5.9652,[196]5.9808,[197]6.0014,[198]6.0161,[199]6.0190,[200]6.0240,[201]6.0190,[202]6.0373,[203]6.0446,[204]6.0430,[205]6.0534,[206]6.0602,[207]6.0560,[208]6.0648,[209]6.0689,[210]6.0739,[211]6.0842,[212]6.0916,[213]6.1022,[214]6.1043,[215]6.1072,[216]6.1210,[217]6.1388,[218]6.1515,[219]6.1514,[220]6.1479,[221]6.1431,[222]6.1408,[223]6.1310,[224]6.1242,[225]6.1201,[226]6.1407,[227]6.1492,[228]6.1545,[229]6.1608,[230]6.1582,[231]6.1744,[232]6.1626,[233]6.1464,[234]6.1317,[235]6.1126,[236]6.1058,[237]6.0962,[238]6.0987,[239]6.0844,[240]6.0742,[241]6.0768,[242]6.0802,[243]6.0784,[244]6.0674,[245]6.0641,[246]6.0532,[247]6.0416,[248]6.0345,[249]6.0322,[250]6.0368,[251]6.0298,[252]6.0264,[253]6.0170,[254]6.0116,[255]6.0000,[256]5.9825,[257]5.9702,[258]5.9622,[259]5.9603,[260]5.9523,[261]5.9478,[262]5.9425,[263]5.9367,[264]5.9148,[265]5.9142,[266]5.9126,[267]5.9060,[268]5.9154,[269]5.9131,[270]5.9141,[271]5.9219,[272]5.9253,[273]5.9252,[274]5.9277,[275]5.9362,[276]5.9423,[277]5.9579,[278]5.9679,[279]5.9771,[280]5.9799,[281]5.9897,[282]5.9957,[283]6.0103,[284]6.0183,[285]6.0268,[286]6.0399,[287]6.0392,[288]6.0455,[289]6.0373,[290]6.0221,[291]6.0074,[292]5.9927,[293]5.9794,[294]5.9817,[295]5.9804,[296]5.9848,[297]5.9835,[298]5.9864,[299]5.9839,[300]5.9735,[301]5.9736,[302]5.9659,[303]5.9570,[304]5.9485,[305]5.9450,[306]5.9325,[307]5.9347,[308]5.9381,[309]5.9224,[310]5.9169,[311]5.9105,[312]5.9130,[313]5.9078,[314]5.9061,[315]5.8907,[316]5.8854,[317]5.8697,[318]5.8496,[319]5.8614,[320]5.8732,[321]5.8779,[322]5.8741,[323]5.8675,[324]5.8646,[325]5.8744,[326]5.8745,[327]5.8768,[328]5.8806,[329]5.8864,[330]5.8888,[331]5.9009,[332]5.8980,[333]5.9046,[334]5.8992,[335]5.8932,[336]5.8969,[337]5.8943,[338]5.8933,[339]5.8882,[340]5.8840,[341]5.8921,[342]5.8947,[343]5.8994,[344]5.8996,[345]5.9001,[346]5.8977,[347]5.9012,[348]5.9044,[349]5.9067,[350]5.9034,[351]5.9042,[352]5.9046,[353]5.8989,[354]5.8991,[355]5.9041,[356]5.9069,[357]5.9036,[358]5.9126,[359]5.9150,[360]5.9116,[361]5.9112,[362]5.9180,[363]5.9290,[364]5.9354,[365]5.9405,[366]5.9415,[367]5.9496,[368]5.9472,[369]5.9480,[370]5.9495,[371]5.9441,[372]5.9489,[373]5.9536,[374]5.9518,[375]5.9520,[376]5.9588,[377]5.9543,[378]5.9570,[379]5.9628,[380]5.9551,[381]5.9519,[382]5.9471,[383]5.9465,[384]5.9459,[385]5.9449,[386]5.9444,[387]5.9443,[388]5.9407,[389]5.9354,[390]5.9286,[391]5.9210,[392]5.9171,[393]5.9157,[394]5.9183,[395]5.9171,[396]5.9099,[397]5.9167,[398]5.9206,[399]5.9285,[400]5.9288,[401]5.9302,[402]5.9312,[403]5.9331,[404]5.9394,[405]5.9302,[406]5.9272,[407]5.9267,[408]5.9283,[409]5.9398,[410]5.9506,[411]5.9615,[412]5.9771,[413]5.9875,[414]5.9950,[415]6.0003,[416]6.0078,[417]6.0197,[418]6.0234,[419]6.0302,[420]6.0391,[421]6.0505,[422]6.0545,[423]6.0617,[424]6.0719,[425]6.0805,[426]6.0869,[427]6.0912,[428]6.0997,[429]6.1048,[430]6.1131,[431]6.1270,[432]6.1308,[433]6.1302,[434]6.1262,[435]6.1271,[436]6.1297,[437]6.1392,[438]6.1467,[439]6.1436,[440]6.1426,[441]6.1377,[442]6.1362,[443]6.1377,[444]6.1378,[445]6.1361,[446]6.1386,[447]6.1417,[448]6.1458,[449]6.1432,[450]6.1442,[451]6.1402,[452]6.1267,[453]6.1184,[454]6.1129,[455]6.1138,[456]6.1184,[457]6.1204,[458]6.1181,[459]6.1187,[460]6.1272,[461]6.1247,[462]6.1232,[463]6.1274,[464]6.1262,[465]6.1234,[466]6.1157,[467]6.1158,[468]6.1155,[469]6.1175,[470]6.1179,[471]6.1131,[472]6.1174,[473]6.1121,[474]6.1135,[475]6.1075,[476]6.1092,[477]6.1020,[478]6.1010,[479]6.1070,[480]6.1113,[481]6.1133,[482]6.1088,[483]6.1046,[484]6.1065,[485]6.1049,[486]6.0994,[487]6.0992,[488]6.0971,[489]6.0926,[490]6.0904,[491]6.0875,[492]6.0820,[493]6.0792,[494]6.0777,[495]6.0774,[496]6.0738,[497]6.0683,[498]6.0665,[499]6.0624,[500]6.0532,[501]6.0467,[502]6.0470,[503]6.0463,[504]6.0378,[505]6.0400,[506]6.0406,[507]6.0350,[508]6.0310,[509]6.0304,[510]6.0339,[511]6.0384,[512]6.0419,[513]6.0439,[514]6.0502,[515]6.0448,[516]6.0439,[517]6.0446,[518]6.0445,[519]6.0473,[520]6.0499,[521]6.0512,[522]6.0538,[523]6.0544,[524]6.0600,[525]6.0632,[526]6.0643,[527]6.0661,[528]6.0610,[529]6.0615,[530]6.0564,[531]6.0551,[532]6.0596,[533]6.0619,[534]6.0606,[535]6.0629,[536]6.0575,[537]6.0554,[538]6.0603,[539]6.0614,[540]6.0651,[541]6.0654,[542]6.0666,[543]6.0681,[544]6.0691,[545]6.0673,[546]6.0682,[547]6.0641,[548]6.0594,[549]6.0594,[550]6.0565,[551]6.0532,[552]6.0513,[553]6.0478,[554]6.0457,[555]6.0427,[556]6.0423,[557]6.0445,[558]6.0410,[559]6.0407,[560]6.0405,[561]6.0408,[562]6.0384,[563]6.0380,[564]6.0423,[565]6.0443,[566]6.0443,[567]6.0423,[568]6.0427,[569]6.0415,[570]6.0443,[571]6.0446,[572]6.0457,[573]6.0458,[574]6.0425,[575]6.0419,[576]6.0417,[577]6.0402,[578]6.0383,[579]6.0389,[580]6.0326,[581]6.0291,[582]6.0280,[583]6.0288,[584]6.0291,[585]6.0216,[586]6.0149,[587]6.0154,[588]6.0202,[589]6.0255,[590]6.0285,[591]6.0305,[592]6.0295,[593]6.0265,[594]6.0274,[595]6.0252,[596]6.0284,[597]6.0265,[598]6.0236,[599]6.0257,[600]6.0253,[601]6.0240,[602]6.0252,[603]6.0281,[604]6.0290,[605]6.0323,[606]6.0345,[607]6.0328,[608]6.0295,[609]6.0304,[610]6.0339,[611]6.0321,[612]6.0346,[613]6.0311,[614]6.0262,[615]6.0191,[616]6.0218,[617]6.0160,[618]6.0112,[619]6.0058,[620]5.9924,[621]5.9857,[622]5.9840,[623]5.9856,[624]5.9860,[625]5.9861,[626]5.9850,[627]5.9872,[628]5.9874,[629]5.9869,[630]5.9900,[631]5.9954,[632]6.0009,[633]5.9995,[634]6.0030,[635]6.0036,[636]6.0002,[637]5.9969,[638]5.9994,[639]5.9963,[640]5.9972,[641]5.9975,[642]6.0040,[643]6.0063,[644]6.0075,[645]6.0056,[646]6.0096,[647]6.0059,[648]6.0067,[649]6.0068,[650]6.0106,[651]6.0159,[652]6.0168,[653]6.0206,[654]6.0144,[655]6.0139,
llama_print_timings:        load time =  4416.87 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 1163995.97 ms / 335360 tokens (    3.47 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 1198461.81 ms

Q5_1

#define QK5_1 32
typedef struct {
    ggml_fp16_t d;          // delta
    ggml_fp16_t m;          // min
    uint32_t qh;            // 5-th bit of quants
    uint8_t qs[QK5_1 / 2];  // nibbles / quants
} block_q5_1;

This format is the same size as Q4_1 and Q4_3.
On M1 Pro, it evaluates at about 55 ms / token for 7B model

The AVX implementation might make use of the following trick: https://stackoverflow.com/a/24242696

Perplexity for 7B: 5.9934

main: seed = 1682491079
llama.cpp: loading model from ../models/7B/ggml-model-q5_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 8 (mostly Q5_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4936267.11 KB
llama_model_load_internal: mem required  = 6612.57 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 12 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
perplexity : calculating perplexity over 655 chunks, batch_size=512
4.47 seconds per pass - ETA 48 minutes
[1]4.2726,[2]4.7565,[3]5.6331,[4]6.2042,[5]6.3451,[6]6.3059,[7]6.4909,[8]6.5871,[9]6.9243,[10]7.1597,[11]7.3774,[12]7.4015,[13]7.3209,[14]7.3676,[15]7.6199,[16]7.2420,[17]7.1286,[18]7.0729,[19]6.7181,[20]6.7082,[21]6.6191,[22]6.4438,[23]6.4184,[24]6.3280,[25]6.3274,[26]6.1686,[27]5.9965,[28]5.8979,[29]5.8120,[30]5.6595,[31]5.6332,[32]5.6517,[33]5.5956,[34]5.6265,[35]5.6486,[36]5.6873,[37]5.6899,[38]5.7015,[39]5.7330,[40]5.7819,[41]5.7887,[42]5.8273,[43]5.7886,[44]5.8450,[45]5.8481,[46]5.8224,[47]5.8428,[48]5.8164,[49]5.8186,[50]5.7792,[51]5.7755,[52]5.7657,[53]5.8109,[54]5.7964,[55]5.7747,[56]5.8023,[57]5.8232,[58]5.8428,[59]5.8607,[60]5.9020,[61]5.8953,[62]5.9527,[63]5.9840,[64]5.9978,[65]6.0403,[66]6.0480,[67]6.0658,[68]6.0795,[69]6.1037,[70]6.1335,[71]6.1559,[72]6.1870,[73]6.2448,[74]6.2483,[75]6.2627,[76]6.2745,[77]6.2864,[78]6.2724,[79]6.3003,[80]6.2942,[81]6.3078,[82]6.3123,[83]6.2612,[84]6.2434,[85]6.2310,[86]6.2091,[87]6.1446,[88]6.1200,[89]6.1001,[90]6.0861,[91]6.1102,[92]6.1045,[93]6.1043,[94]6.1014,[95]6.1292,[96]6.1288,[97]6.1234,[98]6.1171,[99]6.1039,[100]6.1026,[101]6.1260,[102]6.1220,[103]6.1422,[104]6.1490,[105]6.1488,[106]6.1662,[107]6.1657,[108]6.1787,[109]6.1732,[110]6.1700,[111]6.1917,[112]6.2121,[113]6.2146,[114]6.2101,[115]6.2159,[116]6.2056,[117]6.2103,[118]6.2398,[119]6.2614,[120]6.2956,[121]6.3101,[122]6.3337,[123]6.3701,[124]6.3873,[125]6.3786,[126]6.4164,[127]6.4521,[128]6.4821,[129]6.4672,[130]6.4757,[131]6.4718,[132]6.4630,[133]6.4508,[134]6.4598,[135]6.4561,[136]6.4451,[137]6.4376,[138]6.4205,[139]6.4098,[140]6.4064,[141]6.3775,[142]6.3740,[143]6.3440,[144]6.3233,[145]6.3139,[146]6.3020,[147]6.3048,[148]6.3045,[149]6.2989,[150]6.2941,[151]6.2961,[152]6.2859,[153]6.2701,[154]6.2611,[155]6.2679,[156]6.2632,[157]6.2792,[158]6.2835,[159]6.2884,[160]6.2909,[161]6.3036,[162]6.2761,[163]6.2647,[164]6.2420,[165]6.2117,[166]6.1852,[167]6.1488,[168]6.1189,[169]6.1056,[170]6.0951,[171]6.0693,[172]6.0527,[173]6.0368,[174]6.0077,[175]5.9864,[176]5.9749,[177]5.9553,[178]5.9332,[179]5.9165,[180]5.9070,[181]5.8855,[182]5.8680,[183]5.8547,[184]5.8541,[185]5.8471,[186]5.8478,[187]5.8534,[188]5.8494,[189]5.8663,[190]5.8672,[191]5.8874,[192]5.9032,[193]5.9191,[194]5.9298,[195]5.9514,[196]5.9668,[197]5.9877,[198]6.0027,[199]6.0056,[200]6.0104,[201]6.0051,[202]6.0232,[203]6.0304,[204]6.0287,[205]6.0390,[206]6.0462,[207]6.0426,[208]6.0506,[209]6.0543,[210]6.0596,[211]6.0700,[212]6.0769,[213]6.0873,[214]6.0898,[215]6.0925,[216]6.1063,[217]6.1243,[218]6.1372,[219]6.1368,[220]6.1330,[221]6.1274,[222]6.1253,[223]6.1157,[224]6.1089,[225]6.1052,[226]6.1252,[227]6.1332,[228]6.1387,[229]6.1447,[230]6.1416,[231]6.1583,[232]6.1464,[233]6.1301,[234]6.1153,[235]6.0955,[236]6.0891,[237]6.0797,[238]6.0823,[239]6.0676,[240]6.0576,[241]6.0593,[242]6.0630,[243]6.0612,[244]6.0501,[245]6.0469,[246]6.0357,[247]6.0245,[248]6.0174,[249]6.0149,[250]6.0194,[251]6.0127,[252]6.0091,[253]5.9995,[254]5.9941,[255]5.9830,[256]5.9653,[257]5.9534,[258]5.9457,[259]5.9432,[260]5.9354,[261]5.9313,[262]5.9261,[263]5.9209,[264]5.8991,[265]5.8985,[266]5.8963,[267]5.8899,[268]5.8988,[269]5.8969,[270]5.8974,[271]5.9052,[272]5.9085,[273]5.9088,[274]5.9112,[275]5.9192,[276]5.9254,[277]5.9410,[278]5.9508,[279]5.9598,[280]5.9624,[281]5.9722,[282]5.9780,[283]5.9927,[284]6.0004,[285]6.0087,[286]6.0218,[287]6.0211,[288]6.0267,[289]6.0185,[290]6.0030,[291]5.9883,[292]5.9739,[293]5.9609,[294]5.9629,[295]5.9619,[296]5.9666,[297]5.9652,[298]5.9680,[299]5.9656,[300]5.9551,[301]5.9552,[302]5.9477,[303]5.9390,[304]5.9306,[305]5.9271,[306]5.9146,[307]5.9170,[308]5.9200,[309]5.9045,[310]5.8993,[311]5.8931,[312]5.8954,[313]5.8900,[314]5.8883,[315]5.8731,[316]5.8680,[317]5.8523,[318]5.8324,[319]5.8440,[320]5.8560,[321]5.8602,[322]5.8562,[323]5.8497,[324]5.8470,[325]5.8572,[326]5.8572,[327]5.8595,[328]5.8633,[329]5.8690,[330]5.8718,[331]5.8836,[332]5.8808,[333]5.8874,[334]5.8822,[335]5.8763,[336]5.8801,[337]5.8777,[338]5.8769,[339]5.8718,[340]5.8677,[341]5.8756,[342]5.8786,[343]5.8832,[344]5.8834,[345]5.8837,[346]5.8812,[347]5.8851,[348]5.8883,[349]5.8905,[350]5.8873,[351]5.8881,[352]5.8884,[353]5.8827,[354]5.8831,[355]5.8882,[356]5.8912,[357]5.8877,[358]5.8967,[359]5.8994,[360]5.8959,[361]5.8954,[362]5.9023,[363]5.9135,[364]5.9194,[365]5.9243,[366]5.9256,[367]5.9341,[368]5.9317,[369]5.9326,[370]5.9342,[371]5.9290,[372]5.9336,[373]5.9381,[374]5.9366,[375]5.9368,[376]5.9433,[377]5.9389,[378]5.9416,[379]5.9473,[380]5.9395,[381]5.9361,[382]5.9314,[383]5.9308,[384]5.9304,[385]5.9293,[386]5.9290,[387]5.9288,[388]5.9252,[389]5.9201,[390]5.9134,[391]5.9059,[392]5.9018,[393]5.9004,[394]5.9029,[395]5.9016,[396]5.8946,[397]5.9016,[398]5.9053,[399]5.9129,[400]5.9131,[401]5.9146,[402]5.9158,[403]5.9176,[404]5.9238,[405]5.9143,[406]5.9112,[407]5.9105,[408]5.9121,[409]5.9233,[410]5.9344,[411]5.9455,[412]5.9610,[413]5.9716,[414]5.9790,[415]5.9843,[416]5.9918,[417]6.0035,[418]6.0069,[419]6.0136,[420]6.0222,[421]6.0337,[422]6.0376,[423]6.0445,[424]6.0550,[425]6.0634,[426]6.0697,[427]6.0739,[428]6.0821,[429]6.0871,[430]6.0952,[431]6.1090,[432]6.1126,[433]6.1119,[434]6.1079,[435]6.1090,[436]6.1115,[437]6.1211,[438]6.1284,[439]6.1254,[440]6.1246,[441]6.1199,[442]6.1185,[443]6.1197,[444]6.1202,[445]6.1184,[446]6.1208,[447]6.1238,[448]6.1280,[449]6.1256,[450]6.1265,[451]6.1228,[452]6.1093,[453]6.1006,[454]6.0949,[455]6.0958,[456]6.1004,[457]6.1024,[458]6.1000,[459]6.1005,[460]6.1089,[461]6.1062,[462]6.1049,[463]6.1089,[464]6.1079,[465]6.1052,[466]6.0977,[467]6.0981,[468]6.0979,[469]6.0999,[470]6.1005,[471]6.0958,[472]6.1001,[473]6.0948,[474]6.0960,[475]6.0902,[476]6.0920,[477]6.0848,[478]6.0837,[479]6.0895,[480]6.0941,[481]6.0959,[482]6.0915,[483]6.0873,[484]6.0891,[485]6.0871,[486]6.0815,[487]6.0812,[488]6.0790,[489]6.0743,[490]6.0720,[491]6.0692,[492]6.0636,[493]6.0608,[494]6.0590,[495]6.0584,[496]6.0547,[497]6.0491,[498]6.0474,[499]6.0433,[500]6.0340,[501]6.0274,[502]6.0276,[503]6.0270,[504]6.0184,[505]6.0206,[506]6.0214,[507]6.0157,[508]6.0117,[509]6.0112,[510]6.0145,[511]6.0192,[512]6.0226,[513]6.0245,[514]6.0305,[515]6.0252,[516]6.0243,[517]6.0253,[518]6.0248,[519]6.0278,[520]6.0301,[521]6.0312,[522]6.0338,[523]6.0343,[524]6.0400,[525]6.0431,[526]6.0440,[527]6.0455,[528]6.0406,[529]6.0411,[530]6.0362,[531]6.0350,[532]6.0395,[533]6.0417,[534]6.0399,[535]6.0421,[536]6.0369,[537]6.0349,[538]6.0398,[539]6.0409,[540]6.0446,[541]6.0449,[542]6.0459,[543]6.0475,[544]6.0486,[545]6.0468,[546]6.0478,[547]6.0437,[548]6.0391,[549]6.0390,[550]6.0361,[551]6.0327,[552]6.0306,[553]6.0271,[554]6.0251,[555]6.0221,[556]6.0218,[557]6.0242,[558]6.0206,[559]6.0204,[560]6.0202,[561]6.0205,[562]6.0183,[563]6.0180,[564]6.0224,[565]6.0244,[566]6.0242,[567]6.0220,[568]6.0226,[569]6.0212,[570]6.0240,[571]6.0245,[572]6.0253,[573]6.0253,[574]6.0218,[575]6.0213,[576]6.0213,[577]6.0196,[578]6.0177,[579]6.0181,[580]6.0117,[581]6.0080,[582]6.0070,[583]6.0079,[584]6.0081,[585]6.0007,[586]5.9940,[587]5.9947,[588]5.9994,[589]6.0049,[590]6.0079,[591]6.0100,[592]6.0089,[593]6.0056,[594]6.0066,[595]6.0042,[596]6.0076,[597]6.0054,[598]6.0028,[599]6.0048,[600]6.0044,[601]6.0030,[602]6.0040,[603]6.0069,[604]6.0078,[605]6.0111,[606]6.0132,[607]6.0116,[608]6.0082,[609]6.0091,[610]6.0127,[611]6.0111,[612]6.0137,[613]6.0101,[614]6.0053,[615]5.9983,[616]6.0008,[617]5.9949,[618]5.9903,[619]5.9850,[620]5.9717,[621]5.9650,[622]5.9634,[623]5.9650,[624]5.9655,[625]5.9658,[626]5.9647,[627]5.9670,[628]5.9672,[629]5.9668,[630]5.9699,[631]5.9754,[632]5.9810,[633]5.9795,[634]5.9829,[635]5.9834,[636]5.9800,[637]5.9767,[638]5.9791,[639]5.9760,[640]5.9770,[641]5.9771,[642]5.9836,[643]5.9857,[644]5.9869,[645]5.9851,[646]5.9890,[647]5.9850,[648]5.9860,[649]5.9862,[650]5.9901,[651]5.9952,[652]5.9963,[653]6.0002,[654]5.9940,[655]5.9934,
llama_print_timings:        load time =  6541.18 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 2917328.96 ms / 335360 tokens (    8.70 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 2951478.31 ms

TODO:

  • cuBLAS perplexity
  • dot scalar
  • dot ARM
  • dot AVX

@ggerganov ggerganov added high priority Very important issue generation quality Quality of model output labels Apr 26, 2023
ggml.c Show resolved Hide resolved
@ggerganov ggerganov changed the title ggml : add Q5_0 quantization ggml : add Q5_0 and Q5_1 quantization Apr 26, 2023
@sw
Copy link
Collaborator

sw commented Apr 26, 2023

_mm256_shuffle_epi8 is indeed a bit faster than the 256-entry lookup table, so maybe you don't want to keep the preprocessor-generated tables; and a uint32_t qh would be more convenient for AVX2 than uint8_t qh[4]. But we should probably keep it consistent between Q5_0 and Q5_1.

@ggerganov
Copy link
Owner Author

uint32_t qh would be more convenient for AVX2

If it is just for convenience, then lets keep uint8_t qh[4]; for consistency (unless AVX2 becomes slower)

Regarding the tables - we still need the table_b2b_u table for ARM NEON

@ggerganov ggerganov merged commit 574406d into master Apr 26, 2023
@ggerganov ggerganov deleted the q5_0 branch April 26, 2023 20:14
@mofosyne mofosyne added Tensor Encoding Scheme https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes Review Complexity : High Generally require indepth knowledge of LLMs or GPUs labels May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
generation quality Quality of model output high priority Very important issue Review Complexity : High Generally require indepth knowledge of LLMs or GPUs Tensor Encoding Scheme https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes
Development

Successfully merging this pull request may close these issues.

None yet

4 participants