C UnikernelBare Metalx86HTTP Server

cllm

Name: cllm
Author: Cognisoc

A Multiboot C unikernel that boots on bare metal or QEMU and serves an llama.cpp-compatible HTTP API — the kernel is the application. Inference-engine integration is on the public roadmap.

Visit cllm.cognisoc.com → View on GitHub Read the docs

Key Features

⚡

No Operating System

The kernel IS the application. Zero OS overhead, zero abstraction layers.

🔧

Custom Libc

Built-in malloc, snprintf, string ops — everything needed, nothing wasted.

🔌

PCI + e1000 NIC

PCI bus enumeration and Intel e1000 network driver for direct hardware access.

🌐

HTTP Server

Built-in HTTP server with REST API endpoints for inference requests.

🔗

llama.cpp Compatible

API endpoints compatible with the llama.cpp ecosystem.

🖥️

QEMU + Bare Metal

Run in QEMU for development or boot directly on x86 hardware.

Quick Start

# Prerequisites
sudo apt-get install gcc gcc-multilib make qemu-system-x86

# Build and run
git clone https://github.com/cognisoc/cllm.git
cd cllm
make run

Serial output appears on your terminal. Press Ctrl-A X to exit QEMU.

Architecture

+-----------------------------------------------------------+
|  QEMU / Bare Metal  (x86, Multiboot)                     |
+-----------------------------------------------------------+
|  boot.S       Multiboot entry, stack, serial init         |
|  kernel.c     Kernel main, VGA terminal, serial I/O       |
|  memory.c     Heap allocator (malloc/free)                |
|  string.c     libc subset (snprintf, memcpy, ...)         |
|  network.c    PCI enumeration + e1000 NIC driver          |
|  http.c       HTTP server, request routing                |
|  api_v1.c     llama.cpp-compatible REST API               |
|  llm.c        Model loading and inference interface       |
+-----------------------------------------------------------+

The kernel boots via Multiboot, initializes serial and VGA output, brings up an e1000 network interface via PCI, and enters a packet-processing loop that serves HTTP requests for LLM inference.

Make Targets

Target	Description
`make`	Build release kernel (`build/kernel.bin`)
`make debug`	Build with debug symbols
`make run`	Build and boot in QEMU (serial on stdio)
`make run-vga`	Build and boot in QEMU (VGA window)
`make run-debug`	Build and boot paused for GDB on `:1234`
`make clean`	Remove build artifacts

Roadmap

Multiboot kernel with VGA + serial output
Custom libc (malloc, snprintf, string ops)
PCI enumeration and e1000 NIC driver
HTTP server with REST API endpoints
llama.cpp-compatible API (v1 endpoints)
Integrate llama.cpp inference engine
GPU passthrough (CUDA backend)
Streaming token generation
vLLM optimizations for transformer serving