Skip to content

Collaboration proposal: @sparseskip inference optimization + ternary MoE model + MCP tooling #376

@simeon-kepp

Description

@simeon-kepp

Hi — we came across Off Grid via a cold email from Saiganesh and immediately recognized the overlap with our work.

Who we are: RFI-IRFOS, a small research team building the Ternary Intelligence Stack — a full-layer research and inference platform built on ternary computation:

  • albert. — ternary MoE language model, trained from scratch
  • @sparseskip — patent-pending sparse inference (skips zero-weight expert activations at runtime; 83 tok/s on modest hardware)
  • ternlang — ternary programming language and runtime
  • TernStudio — IDE built for ternary-native development
  • MCP infrastructure — live endpoint on Smithery + Fly.io, auth, KPI pipeline

All MIT-licensed. github.com/rfi-irfos | ternlang.com

Three concrete angles we'd like to explore:

1. @sparseskip — sparse inference for your pipeline

We have a patent-pending technique that skips zero-weight expert activations at inference time. Ternary weights ({-1, 0, +1}) have a very high zero-weight rate by design, so the gains are especially large on ternary models. We're hitting 83 tok/s on modest hardware in our benchmarks. On mobile CPUs where every cycle matters, this could meaningfully improve your tok/s numbers. Happy to discuss how it could fit into the llama.rn / llama.cpp layer.

2. albert. as a model in your browser

albert. will export to GGUF. A ternary MoE at 4–8GB would be the first model of its kind in a mobile app. The quality-per-size tradeoff is the whole point of ternary quantization — fits your 4GB device constraint story well.

3. MCP tooling — we have a head start

We noticed MCP server support is on your Pro roadmap. We have a live MCP endpoint (published on Smithery, running on Fly.io) and have been building that infrastructure for a while. If you're building the client side and we have the server side, this is a natural handoff.


We use Claude Code, move fast, and are genuinely excited about where this could go — a fully offline ternary LLM app with a complete tool ecosystem is not a small thing. Not proposing anything formal — just opening the conversation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions