Our Mission

AI inference should run
everywhere.

We believe large language models shouldn't be locked behind cloud APIs. Every device — from a mobile phone to a bare-metal server — should be able to run AI locally. That's why we're building the full inference stack, open source.

The Problem

  • Cloud LLM APIs create vendor lock-in and unpredictable costs
  • Sensitive data must leave your infrastructure for every inference call
  • Existing local tools only solve one layer of the stack
  • Most languages have no native LLM bindings — HTTP is the only option
  • Mobile and edge devices are underserved by current inference tooling

Our Approach

  • Build every layer of the inference stack as open source
  • Provide native bindings for 6+ programming languages
  • Target every deployment surface: servers, desktops, mobile, bare metal
  • Maintain API compatibility with existing ecosystems (Ollama, OpenAI)
  • Invest in education to grow the community of local-AI practitioners
  • Explore open hardware reference designs purpose-built for local inference

The Cognisoc Stack

Five projects, each purpose-built for a specific layer of the inference problem.

unillm

Modular Rust inference runtime — the engine that powers everything

mullama

Local LLM server with polyglot bindings — making inference accessible

llamafu

Mobile inference via Flutter — AI in every pocket

cllm

Bare-metal unikernel — pushing inference to the silicon

zigllm

Educational implementation — growing the next generation of ML engineers

By the Numbers

47
Model architectures
6
Language bindings
7
GPU backends
5
Deployment targets

What's Next: Open Hardware

Software is only half the stack. We're exploring open hardware reference designs optimized for local LLM inference — purpose-built boards and configurations designed to run cognisoc software from boot.

Inference Accelerators

Single-board designs with NPUs and RISC-V cores, running cllm directly on bare metal.

FPGA Capes

Reconfigurable accelerator boards for custom quantization and novel attention mechanisms.

Cluster Blueprints

GPU cluster rack designs with optimized networking for distributed inference with unillm.

Open schematics. Open firmware. Open software. If you're in the hardware space, reach out.

Open Source, Open Future

Every project in the Cognisoc ecosystem is open source under MIT or Apache-2.0 licenses. We believe the infrastructure layer for AI inference should be a public good — not a proprietary moat.

We welcome contributions from developers, researchers, and organizations who share our vision of democratizing LLM inference. Whether it's adding a new model architecture to unillm, improving mobile performance in llamafu, or writing educational content for zigllm — there's a place for you.

Get Involved

Whether you're a developer, investor, or cloud architect — we'd love to hear from you.