.NET / C# NuGet.NETGGUFIn-Process

llmdot

Run local GGUF language models from .NET — one package, one format, one programming model. Native in-process inference for C# and the whole .NET ecosystem, no Python and no external server.

Key Features

📦

One NuGet Package

Add Llmdot to any .NET project and run GGUF models in-process. No sidecar, no HTTP, no Python.

⚙️

Native GGUF Inference

Load and run quantized GGUF models directly from C#, F#, or VB.NET through a single programming model.

🔌

Microsoft.Extensions.AI

Plugs into the standard .NET AI abstractions so it drops into existing apps and pipelines.

🖥️

Cross-Platform

Runs anywhere .NET runs — Windows, Linux, and macOS — with the same code and the same model files.

🔒

Local & Private

All inference stays on your hardware. No cloud calls, no per-token billing, works fully offline.

Familiar to .NET Devs

Idiomatic C# API — no FFI boilerplate, no marshalling ceremony. Just install and call.

Quick Start

dotnet add package Llmdot.Core
using Llmdot;

// Load a local GGUF model
using var model = LlmModel.Load("llama-3.2-1b-instruct.Q4_K_M.gguf");

// Generate text in-process — no server, no Python
var response = model.Generate("Explain gravity in one sentence.");
Console.WriteLine(response);

Why llmdot

The .NET ecosystem has lacked a first-class way to run local LLMs. Existing options mean shelling out to an external server, wrapping Python, or hand-rolling P/Invoke against llama.cpp. llmdot gives .NET developers the same in-process, native experience that Python and Rust developers already enjoy in the rest of the Cognisoc stack: one package, one model format, one programming model.

  • In-process — the model runs inside your application. No HTTP hop, no daemon to manage.
  • GGUF-native — use the same quantized model files as the rest of the local-LLM world.
  • Standard abstractions — implements Microsoft.Extensions.AI interfaces so it composes with the tools you already use.

Where it fits

llmdot is the .NET member of the Cognisoc family of local inference runtimes — the same philosophy as mullama (Python/Go/PHP/Node/Rust) and llamafu (mobile/Flutter), built for the languages .NET teams actually ship in.