The portfolio

Six tools for local inference

Every layer of LLM inference, covered by a purpose-built, open-source tool. From the metal to the mobile screen — each product has its own site, docs, and repository. They all speak GGUF, so models are portable across the stack.

mullama

Runtime + server

Rust

Local LLM runtime in Rust, built on llama.cpp, with native bindings for Rust, Python, Node.js, Go, PHP, and C/C++ — and drop-in Ollama CLI compatibility.

Who it's for: Polyglot teams and app developers who want one inference dependency across many languages.

Ollama-compatible6 bindingsOpenAI + Anthropic APIGGUF Active

mullama.cognisoc.com Details

llamafu

Mobile

Dart / Flutter

A Flutter FFI plugin, built on llama.cpp, for on-device GGUF model inference on Android and iOS — no cloud, no latency, complete privacy.

Who it's for: Mobile developers shipping private, offline AI features in Flutter apps.

On-deviceFlutterVisionTool calling Active

llamafu.cognisoc.com Details

unillm

Runtime / engine

Rust

A modular LLM inference runtime in Rust with a unified, type-safe interface across 47 architectures, built on three composable cores.

Who it's for: Rust engineers who want a composable, type-safe inference engine to build on.

47 architecturesType-safeModular coresGGUF + SafeTensors Active

unillm.cognisoc.com Details

llmdot

Runtime (.NET)

.NET / C#

A .NET-native runtime for local GGUF inference — managed-by-default, with optional Metal and Vulkan compute backends.

Who it's for: C# and .NET developers who want local inference with one package and no native setup.

Managed-by-defaultNuGetGGUFnet8/9/10 Pre-alpha

llmdot.cognisoc.com Details

zigllm

Education

Zig

Learn how LLMs work by building one in Zig — an educational, book-shaped codebase implementing 18 transformer families across 6 progressive layers.

Who it's for: ML/systems-curious learners and Zig fans who learn by reading and writing code.

Educational18 architecturesExplicit SIMD285+ tests Active

zigllm.cognisoc.com Details

cllm

Bare metal

A Multiboot unikernel in C that boots on bare metal or in QEMU and serves an llama.cpp-compatible HTTP API — the kernel is the application.

Who it's for: Systems engineers exploring OS-free, boots-to-inference serving on bare metal.

UnikernelMultibootllama.cpp APIx86 / QEMU Roadmap: inference engine

cllm.cognisoc.com Details

Not sure where to start? See the which-tool-do-I-need matrix on the homepage, or the FAQ.