Our LLama 3 Quantized models

Download our quantized models on huggingface at https://huggingface.co/intelpen

We have models generated with GPTQ – on 4 and 8 bits which were finetuned on wikitext.

We have also Bits & Bytes 4 Bit versions(no finetuning).

Import them using huggingface AuttLLMCasual constructor. In the current version they are blazing fast, the run in les than 7GB of memory (you can run them on almost any modern nvidia GPU).

The performance is lower then full LLama3, still better than most of the other 8B params LLMs.

*These are now public models under MetaLlama3 license

Leave a Reply

Your email address will not be published. Required fields are marked *