I don’t know how many here are also interested in the current craze of large language models. Certainly, chatGPT, bingAI, and bard have been garnering some recent attention from many sources.
I’ve always thought that the worst aspect of these language models is the fact that they were created from public data and subsequently closed off. I mean, how can you make math proprietary? It doesn’t make sense to me.
Others agreed—the llama.cpp GitHub repo makes it simple to run Meta’s LLaMa algorithm locally on your own hardware, including the ever-increasing ecosystem of models based on LLaMa like Stanford alpaca and alpaca-lora.
I wanted to try out the experience with this last model on the Framework 13 1240P. I have a 32Gb memory stick included and was able to load the (apparently 4-bit quantized) 13-billion parameter model with llama.cpp. Imagine my surprise when I discovered that it not only ran quickly on the Framework, but only used about 8Gb of ram?
The 16 core architecture really seemed to assist this model in performing its highly parallel calculations needed to respond to prompts. I got responses to prompts within a minute. It obviously doesn’t achieve the level of openAI’s proprietary system, but it gives remarkable performance all on a laptop without any GPU! I can’t wait to see how the Framework 16 stacks up…