LLama 2 RO 4Bits (super fast, 7Gb VRam)
Try it live at on Gradio (New page) : How it works ? As easy as 1,2,3 Load our 4bit mode more from huggingface , using the standard Rollama2 tokenizer. …
Our LLama 3 Quantized models
Download our quantized models on huggingface at https://huggingface.co/intelpen We have models generated with GPTQ – on 4 and 8 bits which were finetuned on wikitext. We have also Bits & …
Test our LLMs
This is a preview of our 8B LLM models finetuned on customer data. The models are quantized on 4bit (reducing the memory need from 48GB to ~6.5GB) and the last …