Zig EducationalSIMD18 Architectures285+ Tests

zigllm

Name: zigllm
Author: Cognisoc

Learn how LLMs work by building one in Zig. 18 model families, 285+ tests, progressive architecture from tensors to text generation.

Visit zigllm.cognisoc.com → View on GitHub Read the docs

Key Features

📚

Progressive Architecture

6 layers from tensors to text generation. Each layer builds on the last.

🏗️

18 Model Families

LLaMA, Mistral, GPT-2, Falcon, Mamba, BERT, Gemma, StarCoder, and more.

✅

285+ Tests

Every test is executable documentation. Each demonstrates a concept and validates the math.

⚡

SIMD Acceleration

First-class SIMD intrinsics for 3-5x speedup on matrix operations.

📉

18+ Quantization Formats

K-quantization, IQ-quantization, up to 95% memory reduction.

🔧

Zig for ML/AI

Comptime generics, manual memory control, no runtime or garbage collector.

Quick Start

git clone https://github.com/cognisoc/zigllm.git
cd zigllm
zig build test    # Run all 285+ tests

Prerequisites

Zig 0.14+
A modern CPU (AVX/AVX2 recommended but not required)

Progressive Architecture

zigllm builds understanding through 6 layers:

 6. Inference         Text generation, sampling, KV caching, streaming
 5. Models            LLaMA, GPT-2, Mistral, Falcon, GGUF loading, tokenization
 4. Transformers      Multi-head attention, feed-forward networks, full blocks
 3. Neural Primitives Activations (SwiGLU, GELU), normalization (RMSNorm), RoPE
 2. Linear Algebra    SIMD matrix ops, K-quantization, IQ-quantization (18+ formats)
 1. Foundation        Tensors, memory management, memory mapping

Each layer only depends on the layers below it. Start at the bottom and work up.

Model Architectures

Category	Architectures
Core LLMs	LLaMA/LLaMA2, Mistral, GPT-2, Falcon, Qwen, Phi, GPT-J, GPT-NeoX, BLOOM
Specialized	Mamba (state-space), BERT (bidirectional), Gemma, StarCoder (code)
Advanced	Mixture of Experts (MoE), Multi-modal (vision-language), BLAS integration

Key Capabilities

KV Caching — 20x speedup for autoregressive generation
SIMD Acceleration — 3-5x on matrix operations
18+ Quantization Formats — Up to 95% memory reduction
Memory-Mapped Loading — Efficient model loading for large files
Sampling — Greedy, top-k, top-p, temperature, Mirostat, grammar-constrained generation

Why Zig for ML?

Zig offers unique advantages for ML/AI workloads:

Comptime generics — Type-safe tensor operations resolved at compile time
First-class SIMD — Direct intrinsics without wrapper libraries
Manual memory control — Deterministic allocation, no GC pauses
No hidden allocations — Every allocation is explicit and trackable
Cross-compilation — Build for any target from any host