Local LLM - powered by Gemma

Moose thinks on
your machine.

Every brief, insight, and draft comes from a real Gemma model running locally - unlimited, private, and free. No prompt ever leaves your desk. Bring your own key when you want a frontier model in the loop.

Download free See the models

Runs offline. Works on Windows & macOS.

Hi, Moose · Chat Gemma 4B · local

draft me a tight FAQ answer on "is local AI private"

on it. running this through Gemma right here - nothing's going to the cloud.

Running on your machine

cost

tokens sent out

100%

on device

Per query, always

The local model has no per-token bill. Chat, brief, and draft as much as you like.

Unlimited

Chats, with memory

Moose keeps your context and projects in mind across every conversation.

Nothing

Leaves your desk

Prompts, context, and drafts stay on disk - unless you connect a key yourself.

Pick your size

A model that fits the laptop you already have.

Gemma comes in four sizes. Start small on an older machine, or load the big one on a workstation for the deepest analysis. The model downloads once, then runs fully offline - and you can switch any time.

Downloads in the background, runs offline forever after.

Uses your GPU when it can, your CPU when it can't.

Switch sizes whenever - the cost stays exactly the same.

Choose your model size

Light

Gemma 1B

8 GB RAM · ~0.8 GB

Balanced

Gemma 4B

16 GB RAM · ~3 GB

Pro

Gemma 12B

32 GB RAM · ~8 GB

Max

Gemma 27B

64 GB or GPU · ~17 GB

Gemma 4B≈1.2s a reply

The everyday default. Strong briefs and insights with no perceptible wait. Most people stay here.

Best for

Everyday work

Needs

16 GB RAM

Private by default

Your work stays on your disk. Not on someone's server.

Most AI tools ship your prompts, your data, and your drafts to a cloud you don't control. Hi, Moose flips that: the model lives next to your files, so the default is privacy, not a setting you have to find.

Stays on your machine

Every prompt you type to the local model

Your context, brand voice, and memory

Drafts, briefs, and your library

Search Console data you connect

Leaves only if you say so

The one time anything goes out is when you plug in your own API key for a frontier model, or connect a CMS to publish. Both are opt-in, and Moose tells you plainly each time.

A frontier prompt - only with a key you added

A publish - only to a CMS you connected

Bring your own key

Want a frontier model in the loop? Plug in your key.

When a job calls for GPT, Claude, or Gemini, add your own API key and Moose routes that one task to it. You pay your provider directly, at their price - never a markup to us. The local model handles everything else for free.

No markup, ever. Your key, your provider, their rate.

Per-task routing. Use frontier power only where it earns its cost.

Keys stay local. Stored on your machine, never on ours.

Connected models

Gemma 4B · local

Always on · free · private

Default

GPT · your key

Billed by OpenAI

Connected

Claude · your key

Billed by Anthropic

Connected

Gemini · your key

Billed by Google

Add key

Local LLM - powered by Gemma

Keep the smarts on your side
of the wire.

Free to download, unlimited to use, private by default. Bring your own keys only when you want to.

Download for Mac Download for Windows

A model that fits the laptop you already have.

Your work stays on your disk. Not on someone's server.

Want a frontier model in the loop? Plug in your key.

Keep the smarts on your sideof the wire.

Keep the smarts on your side
of the wire.