Zig EducationalSIMD18 Architectures285+ Tests

zigllm

Learn how LLMs work by building one in Zig. 18 model families, 285+ tests, progressive architecture from tensors to text generation.

Key Features

📚

Progressive Architecture

6 layers from tensors to text generation. Each layer builds on the last.

🏗️

18 Model Families

LLaMA, Mistral, GPT-2, Falcon, Mamba, BERT, Gemma, StarCoder, and more.

285+ Tests

Every test is executable documentation. Each demonstrates a concept and validates the math.

SIMD Acceleration

First-class SIMD intrinsics for 3-5x speedup on matrix operations.

📉

18+ Quantization Formats

K-quantization, IQ-quantization, up to 95% memory reduction.

🔧

Zig for ML/AI

Comptime generics, manual memory control, no runtime or garbage collector.

Quick Start

git clone https://github.com/cognisoc/zigllm.git
cd zigllm
zig build test    # Run all 285+ tests

Prerequisites

  • Zig 0.14+
  • A modern CPU (AVX/AVX2 recommended but not required)

Progressive Architecture

zigllm builds understanding through 6 layers:

 6. Inference         Text generation, sampling, KV caching, streaming
 5. Models            LLaMA, GPT-2, Mistral, Falcon, GGUF loading, tokenization
 4. Transformers      Multi-head attention, feed-forward networks, full blocks
 3. Neural Primitives Activations (SwiGLU, GELU), normalization (RMSNorm), RoPE
 2. Linear Algebra    SIMD matrix ops, K-quantization, IQ-quantization (18+ formats)
 1. Foundation        Tensors, memory management, memory mapping

Each layer only depends on the layers below it. Start at the bottom and work up.

Model Architectures

CategoryArchitectures
Core LLMsLLaMA/LLaMA2, Mistral, GPT-2, Falcon, Qwen, Phi, GPT-J, GPT-NeoX, BLOOM
SpecializedMamba (state-space), BERT (bidirectional), Gemma, StarCoder (code)
AdvancedMixture of Experts (MoE), Multi-modal (vision-language), BLAS integration

Key Capabilities

  • KV Caching — 20x speedup for autoregressive generation
  • SIMD Acceleration — 3-5x on matrix operations
  • 18+ Quantization Formats — Up to 95% memory reduction
  • Memory-Mapped Loading — Efficient model loading for large files
  • Sampling — Greedy, top-k, top-p, temperature, Mirostat, grammar-constrained generation

Why Zig for ML?

Zig offers unique advantages for ML/AI workloads:

  • Comptime generics — Type-safe tensor operations resolved at compile time
  • First-class SIMD — Direct intrinsics without wrapper libraries
  • Manual memory control — Deterministic allocation, no GC pauses
  • No hidden allocations — Every allocation is explicit and trackable
  • Cross-compilation — Build for any target from any host