llama.cpp

llama.cpp
Original author(s)	Georgi Gerganov
Developer(s)	Georgi Gerganov and community
Initial release	Alpha ( b1083 ) / August 26, 2023; 8 months ago
Written in	C++
License	MIT License
Website	github.com/ggerganov/llama.cpp

llama.cpp is an open source software library that performs inference on various Large Language Models such as LLaMA.^[1] It is written in C++.

History[edit]

llama.cpp began development by Georgi Gerganov to implement LLaMA in pure C++ with no dependencies.^[2] The advantage of this method was that it could run on more hardware compared to other inference libraries that depended on hardware dependent closed source libraries like CUDA.^[3] It is written in C++. It currently has 55 thousand stars on GitHub.^[4] Before llama.cpp, Gerganov worked on a similar library called whisper.cpp^[5] which implemented OpenAI's "whisper" speech to text model. llama.cpp gained traction from users who did not have specialized hardware as it could run on just a CPU including on Android devices.^[6]

Architecture[edit]

llama.cpp initially could only run on CPUs but now can run on GPUs using multiple different back-ends including Vulkan and SYCL. These back-ends make up the GGML tensor library which is used by the front-end model-specific llama.cpp code.^[7] llama.cpp has its own model format called GGUF (previously referred to as GMML format).^[8] llama.cpp supports ahead of time model quantization as opposed to on-the-fly quantization.^[9]

References[edit]

^ Connatser, Matthew. "How this open source LLM chatbot runner hit the gas on x86, Arm CPUs". theregister.com. Retrieved 15 April 2024.
^ Connatser, Matthew. "How this open source LLM chatbot runner hit the gas on x86, Arm CPUs". theregister.com. Retrieved 15 April 2024.
^ Connatser, Matthew. "How this open source LLM chatbot runner hit the gas on x86, Arm CPUs". theregister.com. Retrieved 15 April 2024.
^ "ggerganov/llama.cpp". GitHub.
^ "ggerganov/whisper.cpp". GitHub.
^ Edwards, Benj (13 March 2023). "You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi". arstechnica.com. Retrieved 15 April 2024.
^ "GGML - AI at the edge". ggml.ai. Retrieved 16 April 2024.
^ Pounder, Les (25 March 2023). "How To Create Your Own AI Chatbot Server With Raspberry Pi 4". tomshardware.com. Retrieved 16 April 2024.
^ Walkowiak, Bartosz; Walkowiak, Tomasz (2024). "Implementation of language models within an infrastructure designed for Natural Language Processing" (PDF). International Journal of Electronics and Telecommunications. 70 (1): 153–159. doi:10.24425/ijet.2024.149525. Retrieved 8 May 2024.

[1] Connatser, Matthew. "How this open source LLM chatbot runner hit the gas on x86, Arm CPUs". theregister.com. Retrieved 15 April 2024.

[2] Connatser, Matthew. "How this open source LLM chatbot runner hit the gas on x86, Arm CPUs". theregister.com. Retrieved 15 April 2024.

[3] Connatser, Matthew. "How this open source LLM chatbot runner hit the gas on x86, Arm CPUs". theregister.com. Retrieved 15 April 2024.

[4] "ggerganov/llama.cpp". GitHub.

[5] "ggerganov/whisper.cpp". GitHub.

[6] Edwards, Benj (13 March 2023). "You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi". arstechnica.com. Retrieved 15 April 2024.

[7] "GGML - AI at the edge". ggml.ai. Retrieved 16 April 2024.

[8] Pounder, Les (25 March 2023). "How To Create Your Own AI Chatbot Server With Raspberry Pi 4". tomshardware.com. Retrieved 16 April 2024.

[9] Walkowiak, Bartosz; Walkowiak, Tomasz (2024). "Implementation of language models within an infrastructure designed for Natural Language Processing" (PDF). International Journal of Electronics and Telecommunications. 70 (1): 153–159. doi:10.24425/ijet.2024.149525. Retrieved 8 May 2024.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]