Program | BornHack

How to run any LLM anywhere fast Feedback

We built a high-level native library for local LLM inference on top of llama.cpp, and solved a number of interesting problems along the way. This talk is about those problems.

This talk is going to be about:

standardization efforts in the LLM space
differences between the major LLM inference tools
challenges with making assumptions about turing complete templating systems
methods of constraining LLM output
the jungle of sampler configurations and how to reason about them
some implementation details of the libllama C++ API
and a few unique challenges of integrating LLMs with game engines

Speakers for How to run any LLM anywhere fast:

Asbjørn Olling (a2)

Metadata for How to run any LLM anywhere fast

To be recorded: Yes
To be streamed: Yes

URLs for How to run any LLM anywhere fast

No URLs found.

Schedule for How to run any LLM anywhere fast

Thursday, Jul 17th, 2025, 12:00 (CEST) - Thursday, Jul 17th, 2025, 13:00 (CEST) at Speakers Tent