You must be logged in to provide Event Feedback
How to run any LLM anywhere fast
Feedback
We built a high-level native library for local LLM inference on top of llama.cpp, and solved a number of interesting problems along the way. This talk is about those problems.
This talk is going to be about:
- standardization efforts in the LLM space
- differences between the major LLM inference tools
- challenges with making assumptions about turing complete templating systems
- methods of constraining LLM output
- the jungle of sampler configurations and how to reason about them
- some implementation details of the libllama C++ API
- and a few unique challenges of integrating LLMs with game engines
Speakers for How to run any LLM anywhere fast:
Metadata for How to run any LLM anywhere fast
To be recorded: YesTo be streamed: Yes
URLs for How to run any LLM anywhere fast
No URLs found.
Schedule for How to run any LLM anywhere fast
-
Not scheduled yet