May 2024 – AI RESEARCH LABORATORY

LLama 2 RO 4Bits (super fast, 7Gb VRam)

Try it live at on Gradio (New page) : How it works ? As easy as 1,2,3 Load our 4bit mode more from huggingface , using the standard Rollama2 tokenizer. …

Uncategorized

Our LLama 3 Quantized models

Download our quantized models on huggingface at https://huggingface.co/intelpen We have models generated with GPTQ – on 4 and 8 bits which were finetuned on wikitext. We have also Bits & …

Month: May 2024