Return to schedule

How to run any LLM anywhere fast Feedback

We built a high-level native library for local LLM inference on top of llama.cpp, and solved a number of interesting problems along the way. This talk is about those problems.

This talk is going to be about:

  • standardization efforts in the LLM space
  • differences between the major LLM inference tools
  • challenges with making assumptions about turing complete templating systems
  • methods of constraining LLM output
  • the jungle of sampler configurations and how to reason about them
  • some implementation details of the libllama C++ API
  • and a few unique challenges of integrating LLMs with game engines


Speakers for How to run any LLM anywhere fast:


Metadata for How to run any LLM anywhere fast

To be recorded: Yes
To be streamed: Yes

URLs for How to run any LLM anywhere fast

No URLs found.


Schedule for How to run any LLM anywhere fast

    Not scheduled yet