cllm
A bare-metal C unikernel for serving large language models. No OS, no overhead. Boots directly on hardware and serves inference over HTTP.
Key Features
No Operating System
The kernel IS the application. Zero OS overhead, zero abstraction layers.
Custom Libc
Built-in malloc, snprintf, string ops — everything needed, nothing wasted.
PCI + e1000 NIC
PCI bus enumeration and Intel e1000 network driver for direct hardware access.
HTTP Server
Built-in HTTP server with REST API endpoints for inference requests.
llama.cpp Compatible
API endpoints compatible with the llama.cpp ecosystem.
QEMU + Bare Metal
Run in QEMU for development or boot directly on x86 hardware.
Quick Start
# Prerequisites
sudo apt-get install gcc gcc-multilib make qemu-system-x86
# Build and run
git clone https://github.com/cognisoc/cllm.git
cd cllm
make run
Serial output appears on your terminal. Press Ctrl-A X to exit QEMU.
Architecture
+-----------------------------------------------------------+
| QEMU / Bare Metal (x86, Multiboot) |
+-----------------------------------------------------------+
| boot.S Multiboot entry, stack, serial init |
| kernel.c Kernel main, VGA terminal, serial I/O |
| memory.c Heap allocator (malloc/free) |
| string.c libc subset (snprintf, memcpy, ...) |
| network.c PCI enumeration + e1000 NIC driver |
| http.c HTTP server, request routing |
| api_v1.c llama.cpp-compatible REST API |
| llm.c Model loading and inference interface |
+-----------------------------------------------------------+
The kernel boots via Multiboot, initializes serial and VGA output, brings up an e1000 network interface via PCI, and enters a packet-processing loop that serves HTTP requests for LLM inference.
Make Targets
| Target | Description |
|---|---|
make | Build release kernel (build/kernel.bin) |
make debug | Build with debug symbols |
make run | Build and boot in QEMU (serial on stdio) |
make run-vga | Build and boot in QEMU (VGA window) |
make run-debug | Build and boot paused for GDB on :1234 |
make clean | Remove build artifacts |
Roadmap
- Multiboot kernel with VGA + serial output
- Custom libc (malloc, snprintf, string ops)
- PCI enumeration and e1000 NIC driver
- HTTP server with REST API endpoints
- llama.cpp-compatible API (v1 endpoints)
- Integrate llama.cpp inference engine
- GPU passthrough (CUDA backend)
- Streaming token generation
- vLLM optimizations for transformer serving