llama.cpp

From Wikipedia, the free encyclopedia
(Redirected from Draft:Llama.cpp)
llama.cpp
Original author(s)Georgi Gerganov
Developer(s)Georgi Gerganov and community
Initial releaseAlpha ( b1083 ) / August 26, 2023; 8 months ago (2023-08-26)
Written inC++
LicenseMIT License
Websitegithub.com/ggerganov/llama.cpp

llama.cpp is an open source software library that performs inference on various Large Language Models such as LLaMA.[1] It is written in C++.

History[edit]

llama.cpp began development by Georgi Gerganov to implement LLaMA in pure C++ with no dependencies.[2] The advantage of this method was that it could run on more hardware compared to other inference libraries that depended on hardware dependent closed source libraries like CUDA.[3] It is written in C++. It currently has 55 thousand stars on GitHub.[4] Before llama.cpp, Gerganov worked on a similar library called whisper.cpp[5] which implemented OpenAI's "whisper" speech to text model. llama.cpp gained traction from users who did not have specialized hardware as it could run on just a CPU including on Android devices.[6]

Architecture[edit]

llama.cpp initially could only run on CPUs but now can run on GPUs using multiple different back-ends including Vulkan and SYCL. These back-ends make up the GGML tensor library which is used by the front-end model-specific llama.cpp code.[7] llama.cpp has its own model format called GGUF (previously referred to as GMML format).[8] llama.cpp supports ahead of time model quantization as opposed to on-the-fly quantization.[9]

References[edit]

  1. ^ Connatser, Matthew. "How this open source LLM chatbot runner hit the gas on x86, Arm CPUs". theregister.com. Retrieved 15 April 2024.
  2. ^ Connatser, Matthew. "How this open source LLM chatbot runner hit the gas on x86, Arm CPUs". theregister.com. Retrieved 15 April 2024.
  3. ^ Connatser, Matthew. "How this open source LLM chatbot runner hit the gas on x86, Arm CPUs". theregister.com. Retrieved 15 April 2024.
  4. ^ "ggerganov/llama.cpp". GitHub.
  5. ^ "ggerganov/whisper.cpp". GitHub.
  6. ^ Edwards, Benj (13 March 2023). "You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi". arstechnica.com. Retrieved 15 April 2024.
  7. ^ "GGML - AI at the edge". ggml.ai. Retrieved 16 April 2024.
  8. ^ Pounder, Les (25 March 2023). "How To Create Your Own AI Chatbot Server With Raspberry Pi 4". tomshardware.com. Retrieved 16 April 2024.
  9. ^ Walkowiak, Bartosz; Walkowiak, Tomasz (2024). "Implementation of language models within an infrastructure designed for Natural Language Processing" (PDF). International Journal of Electronics and Telecommunications. 70 (1): 153–159. doi:10.24425/ijet.2024.149525. Retrieved 8 May 2024.