A coding agent for any model

Works with frontier models, but also strives to be as reliable as possible with smaller models, including local ones. Designed for tight context windows and limited resources. Free, open-source, and easy to set up.

Get Started Read the Docs

$uv tool install swival

# or on macOS: brew install swival/tap/swival

$swival "Simplify the error handling in src/api.py"

Works with your stack

LM Studio llama.cpp HuggingFace OpenRouter Google Gemini ChatGPT AWS Bedrock Any OpenAI‑compatible

Why Swival

Built for the reality of small context windows, limited resources, and models that need careful prompting to produce good output.

Reliable with small models

Context management is one of its strengths. Keeps things clean and focused, avoids unnecessary context bloat. Graduated compaction and persistent state mean the agent doesn't lose track of work, even under tight limits.

Your models, your way

Auto-discovers your LM Studio or llama.cpp model, or point it at HuggingFace, OpenRouter, Google Gemini, ChatGPT Plus/Pro, AWS Bedrock, or any OpenAI-compatible server. You pick the model and the infrastructure.

Review loop and benchmarking

A configurable review loop with LLM-as-a-judge support. JSON reports capture timing, tool usage, and context events. Useful for comparing models, settings, skills, and MCP servers on real coding tasks.

Secrets stay on your machine

Enable --encrypt-secrets and API keys and credentials in LLM messages are encrypted before they leave your machine. The model never sees the real values. Decryption happens locally when the response comes back, so tools still work normally.

Cross-session memory

The agent remembers things across sessions. Relevant past notes are retrieved via BM25 ranking, so context from earlier work carries forward without bloating the prompt. Use /learn to teach it on the spot.

Pick up where you left off

When a session is interrupted — Ctrl+C, max turns, context overflow — Swival saves its state to disk and resumes automatically next time you run it in the same directory.

A2A server mode

Run swival --serve and your agent becomes an A2A endpoint that other agents can call over HTTP. Multi-turn context, streaming, and bearer auth are built in.

Skills, MCP, and more

Extend the agent with SKILL.md-based skills, MCP servers, and A2A agents. Small and hackable: pure Python with no framework. Easy to read and modify.

Get started in seconds

Pick your provider and run your first task.

Install LM Studio and load a model with tool-calling support. Recommended first model: qwen3-coder-next (great quality/speed tradeoff on local hardware). Start the server.

Install Swival:

uv tool install swival

or on macOS:

brew install swival/tap/swival

Run:

swival "Simplify the error handling in src/api.py"

Start llama-server with a model (--fit on auto-sizes context, -hf downloads from HuggingFace):

llama-server --reasoning auto --fit on \
    -hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL

Install Swival:

uv tool install swival

or on macOS:

brew install swival/tap/swival

Run (model is auto-discovered):

swival --provider llamacpp "Simplify the error handling in src/api.py"

Create a token at huggingface.co/settings/tokens and export it:
```
export HF_TOKEN=hf_...
```

Install Swival:

uv tool install swival

or on macOS:

brew install swival/tap/swival

Run with a tool-calling model (e.g. GLM-5):

swival "Simplify the error handling in src/api.py" \
    --provider huggingface --model zai-org/GLM-5

Install Swival:

uv tool install swival

or on macOS:

brew install swival/tap/swival

Run with any model on the platform (e.g. GLM-5):

swival "Simplify the error handling in src/api.py" \
    --provider openrouter --model z-ai/glm-5

Swival also works with ChatGPT Plus/Pro (uses your existing subscription via OAuth), Google Gemini, AWS Bedrock, and any OpenAI-compatible server (ollama, mlx_lm.server, vLLM, etc.). See all providers.

💬 Interactive mode

For back-and-forth sessions, just run swival with no arguments. The agent keeps the full conversation in memory, so you can iterate on a task across multiple turns.

swival

🐍 Python library

Embed an agent loop directly in your own code. The simplest way is swival.run():

import swival

answer = swival.run(
    "What files handle authentication?",
    provider="openrouter",
    model="z-ai/glm-5",
)

For multi-turn conversations, use the Session class — see the Python API docs.

A coding agent for any model

Why Swival

Reliable with small models

Your models, your way

Review loop and benchmarking

Secrets stay on your machine

Cross-session memory

Pick up where you left off

A2A server mode

Skills, MCP, and more

Get started in seconds

💬 Interactive mode

🐍 Python library

Benchmark and evaluate