Raspy: Putting Claude in a Box

Two months ago I bought a Raspberry Pi cyberdeck kit off Etsy. Cool little thing - Pi 5, 3D printed case, mechanical keyboard. Two weeks later I bought a 3D printer because I wanted to modify the case. One thing led to another.

Now I have an ESP32-S3 that listens for my voice, sends it to my homelab, runs it through Claude, and speaks back. It has a camera. It knows where I am via GPS. It switches between home WiFi and mobile hotspot automatically.

This is Raspy.

Raspy and ESP32-CAM side by side in matching green cases

What It Is

Raspy is a voice-first interface to Claude that runs on an ESP32-S3. But the ESP32 isn't doing anything smart - it's a thin client. Record audio, ship it to the server, play back whatever comes back. All the intelligence lives on my homelab server.

The pipeline looks like this:

[ESP32 Mic] → WebSocket → [ubuntu-homelab]
                              ├── Wyoming Whisper (speech-to-text)
                              ├── Claude CLI (the brain)
                              └── Kokoro TTS (text-to-speech)
                          → HTTP download → [ESP32 Speaker]

Push a button, talk, release. A few seconds later Claude responds through the speaker.

Claude's response displayed on screen

The Status Bar

The display shows everything happening at a glance:

WiFi signal strength
Server connection dot (green = connected)
GPS satellite count (G3 = 3 satellites locked)
Camera status (green CAM or red NO CAM)
Battery percentage
Current time
Network name (home or mobile)

Each element only redraws when it changes.

The Camera

Camera mode showing live MJPEG viewfinder

Triple-tap the button and Raspy enters camera mode. Live MJPEG stream from a separate ESP32-CAM shows up on the display. Tap to take a snapshot. Hold to exit back to voice mode.

The snapshots save to my homelab with GPS coordinates in the filename:

~/raspy-snapshots/2026-01-22/143256_40.6782_-73.9442.jpg

Later I can ask Claude "what did I take a picture of earlier?" and it knows where to look.

The Setup

Two devices working together:

Raspy (ESP32-S3): Voice I/O, display, GPS, main interface
ESP32-CAM: Cheap camera module that serves JPEG over HTTP

Both connect to the same network. Claude has a capture tool it can use when it wants to look - if I say "what's in front of me" it grabs a frame from the ESP32-CAM. GPS location gets passed in with each request so Claude knows where I am.

Home and Mobile Mode

This thing needed to work both at home and on the go. At home it connects to my regular WiFi and talks to the homelab directly. When I'm mobile, iPhone plugs into the Raspberry Pi (raspdeck) for hotspot, and the Pi broadcasts a WiFi network for Raspy and ESP32-CAM to connect to.

The switching is automatic. Plug in iPhone and raspdeck becomes an AP. Unplug and everything falls back to home network. Both devices have both networks configured and auto-reconnect.

Everything's on Tailscale. The voice server runs on ubuntu-homelab - Claude runs as a tight instance of claude -p.

The Hardware

Development stage - ESP32-S3 with GPS module and wiring

The parts:

Freenove FNK0104 (non-touch) - ESP32-S3 with display and audio codec
Neo-7M GPS - for location context
ESP32-CAM - separate camera module
Small speaker - for TTS output

The Freenove needs 8MB PSRAM - without it the audio buffers don't fit. Case is 3D printed in OpenSCAD, printed on a Bambu A1 Mini.

The green sparkle case is v1 - functional but chunky. Will reprint eventually.

Tinkering with adding a custom wake word ("Hey Raspy") but running into challenges. Will eventually get it working.

Building This

Most of this was built with Claude Code. Not in a "I asked AI to write code" way - more like pair programming where I described what I wanted and we worked through the problems together. The button state machine, the network switching logic, the camera streaming - all of that came out of back-and-forth sessions.

The ESP32 firmware is about 1500 lines of C++. The server is maybe 600 lines of Python. Not a huge codebase, but a lot of moving parts: WebSockets, HTTP, I2S audio, TFT display, GPS parsing, WiFi management.

The source isn't public yet but might open source it eventually.

Raspy: Putting Claude in a Box

This is Raspy.

Raspy and ESP32-CAM side by side in matching green cases

What It Is

The pipeline looks like this:

[ESP32 Mic] → WebSocket → [ubuntu-homelab]
                              ├── Wyoming Whisper (speech-to-text)
                              ├── Claude CLI (the brain)
                              └── Kokoro TTS (text-to-speech)
                          → HTTP download → [ESP32 Speaker]

Push a button, talk, release. A few seconds later Claude responds through the speaker.

Claude's response displayed on screen

The Status Bar

The display shows everything happening at a glance:

WiFi signal strength
Server connection dot (green = connected)
GPS satellite count (G3 = 3 satellites locked)
Camera status (green CAM or red NO CAM)
Battery percentage
Current time
Network name (home or mobile)

Each element only redraws when it changes.

The Camera

Camera mode showing live MJPEG viewfinder

Triple-tap the button and Raspy enters camera mode. Live MJPEG stream from a separate ESP32-CAM shows up on the display. Tap to take a snapshot. Hold to exit back to voice mode.

The snapshots save to my homelab with GPS coordinates in the filename:

~/raspy-snapshots/2026-01-22/143256_40.6782_-73.9442.jpg

Later I can ask Claude "what did I take a picture of earlier?" and it knows where to look.

The Setup

Two devices working together:

Raspy (ESP32-S3): Voice I/O, display, GPS, main interface
ESP32-CAM: Cheap camera module that serves JPEG over HTTP

Home and Mobile Mode

The switching is automatic. Plug in iPhone and raspdeck becomes an AP. Unplug and everything falls back to home network. Both devices have both networks configured and auto-reconnect.

Everything's on Tailscale. The voice server runs on ubuntu-homelab - Claude runs as a tight instance of claude -p.

The Hardware

Development stage - ESP32-S3 with GPS module and wiring

The parts:

Freenove FNK0104 (non-touch) - ESP32-S3 with display and audio codec
Neo-7M GPS - for location context
ESP32-CAM - separate camera module
Small speaker - for TTS output

The Freenove needs 8MB PSRAM - without it the audio buffers don't fit. Case is 3D printed in OpenSCAD, printed on a Bambu A1 Mini.

The green sparkle case is v1 - functional but chunky. Will reprint eventually.

Tinkering with adding a custom wake word ("Hey Raspy") but running into challenges. Will eventually get it working.

Building This

The source isn't public yet but might open source it eventually.

Raspy: Putting Claude in a Box

Raspy: Putting Claude in a Box

What It Is

The Status Bar

The Camera

The Setup

Home and Mobile Mode

The Hardware

Building This

More posts

Raspy: Putting Claude in a Box

Raspy: Putting Claude in a Box

What It Is

The Status Bar

The Camera

The Setup

Home and Mobile Mode

The Hardware

Building This

More posts