Try it live at on Gradio (New page) : How it works ? As easy as 1,2,3 Load our 4bit mode more from huggingface , using the standard Rollama2 tokenizer. …
Download our quantized models on huggingface at https://huggingface.co/intelpen We have models generated with GPTQ – on 4 and 8 bits which were finetuned on wikitext. We have also Bits & …
This is a preview of our 8B LLM models finetuned on customer data. The models are quantized on 4bit (reducing the memory need from 48GB to ~6.5GB) and the last …