Family App

The AI assistant my own household runs on. No public users, no demo data. Real memory, real failover, real observability, because it has to actually work every day.

Why this stack

This had to run on my wife's phone every day, not just in a demo. React Native + Expo was the only realistic path to a real iOS app with native calendar and reminders access (EventKit) and, later, real Siri App Intents, without maintaining a separate native codebase. The backend is a Vercel-deployed Node service using the Vercel AI SDK's tool-calling loop, which meant the same tool definitions worked unchanged across Claude, OpenAI, and xAI when I added provider failover. Supabase covers auth, Postgres, and row-level security in 1 hosted piece instead of 3. None of this was theoretical: it replaced an earlier WhatsApp-bot version of the same assistant that had a habit of dropping operations and going silent during provider outages, and this rebuild exists specifically to close those failure modes.

AI-assist note

Spec-driven across 35 specs, built with Claude Code: I wrote every spec and reviewed every diff, Claude Code wrote most of the implementation. The memory-scoping design, the failover chain, the observability fixes, and the Siri App Intents work below all came out of that spec-plan-implement-review loop.

Stack

TypeScript
React Native (Expo SDK 52)
Swift (App Intents)
Node.js
Vercel AI SDK
Supabase (Postgres + Auth + RLS)
Mem0 Platform
Claude (+ OpenAI, xAI Grok via failover)

Domains

Agentic AI
Mobile / Native Integration
Systems Reliability
Observability

Live2 mo

InfraTypeScript / Node.js on Vercel, React Native + Expo SDK 52, Supabase Postgres (RLS), Mem0 Platform (dual-scope memory)

AuthSupabase Auth (email/password) + per-provider API keys (Anthropic, OpenAI, xAI, Mem0, YNAB, Resend) via Vercel env

Specs35

Why this exists

Every family logistics tool I’d tried treated calendar, reminders, budget, and meal planning as separate apps that didn’t talk to each other. This app is the opposite bet: 1 assistant that reads calendar and reminder data straight off the device, talks to the household budget API, holds a memory layer scoped per person and per household, and reasons over all of it in a single chat interface. It grew out of an earlier WhatsApp-based version of the same idea that worked until it didn’t: a provider outage or a dropped API call would fail silently, and nobody found out until something was visibly missing. This rebuild’s entire engineering story is about closing exactly those silent-failure paths.

Architecture

Every external call, device-native or cloud, goes through 1 generic resilience wrapper before it reaches anything else: a retry queue with exponential backoff and a per-integration circuit breaker, so a failing integration degrades instead of cascading. The chat backend’s own AI calls get the same treatment through a 3-tier provider chain: Claude first, OpenAI and then xAI Grok only on retriable failures, never on a 400-class bug. A structured log line and a 15-minute health-check cron sit downstream of all of it, watching for the specific failure shape (a technically-successful, functionally-empty response) that used to go unnoticed for days. Most runs end quietly at the mobile app. A detected failure instead produces a cooldown-gated push alert, so the household never has to be the one to notice something broke.

What shipped

35 specs took this from a WhatsApp bot that quietly dropped operations to a mobile app with a memory layer that keeps personal and household facts properly scoped, a 3-provider failover chain with a standing test path to verify it on demand, a generic retry-queue-and-circuit-breaker wrapper covering every external integration, a health-check cron that turns a silent failure into a 15-minute alert, and a native Siri and App Intents layer that finished the deep-link wiring an earlier feature had left half-done. It runs on 1 real household’s calendar, reminders, and budget every day. There’s no public URL for this one; it holds real family data, so the repo and the live instance both stay private. What’s shown here is the engineering, not the household.

Skill stories

Click a skill to open the story behind it: the decision, what broke, how it got measured, and how it got fixed.

Dual-Scope Memory LayerAgentic AI / Memory Systems
Multi-Provider AI FailoverSystems Reliability
Silent-Failure ObservabilityObservability
Native Siri / App Intents IntegrationMobile / Native Integration
Generic Retry Queue + Circuit BreakerSystems Reliability

Agentic AI / Memory Systems

Dual-Scope Memory Layer

Decision: The assistant's memory needed to tell 2 different kinds of facts apart: things true of 1 person and things true of the whole household. I split storage into a user scope (keyed per family member, personal preferences and communication style) and a family scope (keyed per household, shared routines and facts), searched both in parallel per message, and merged the results into the assistant's context.
What broke: Before this, memory was flat. A preference set by 1 family member had no scope boundary, so nothing stopped it from leaking into another member's suggestions, and there was no way to distinguish 'the assistant learned this from a conversation' from 'someone explicitly told it to remember this.'
How I measured it: Tested with 2 members setting conflicting preferences in the same category and confirmed each member's queries only reflected their own scope. Separately, forced a 5-second delay against the memory service to test the 3-second timeout cutoff, and confirmed chat latency stayed within 500ms of the no-memory baseline.
How I fixed it: Added scope-keyed storage (`user:{id}` / `family:{id}`), a timeout wrapper around every memory call so a slow or dead memory service can never delay a chat response, and structured failure logging so a silent memory outage shows up in logs instead of just quietly returning nothing forever.

Systems Reliability

Multi-Provider AI Failover

Decision: Every chat message and every scheduled job's AI call went through a single hardcoded provider. I built an ordered fallback chain instead: Claude primary, then OpenAI, then xAI Grok, each with its own timeout, retrying only on retriable failures (429, 500, 502, 503, timeouts) and passing 400-class errors straight through since those are bugs, not outages.
What broke: The predecessor version of this assistant (a WhatsApp bot) had exactly 1 provider wired in. A provider outage didn't produce an error message, it produced nothing: a scheduled job would silently fail and the household would just never get that day's update.
How I measured it: Simulated a primary-provider outage with a test-mode env flag, sent chat messages through it, and checked structured logs for provider name, which attempts failed, latency per attempt, and a failover flag, across both interactive chat and scheduled jobs.
How I fixed it: Built the 3-tier chain with per-provider timeouts and a standing simulated-outage test path, so failover is something I can verify on demand instead of something I hope works the next time a provider actually goes down.

Observability

Silent-Failure Observability

Decision: An AI call can return HTTP 200 with empty text and no usable tool output, which looks like success to everything downstream. I made that specific shape count as an error instead of a stored success, added it to the existing error tracker, and added a 15-minute cron that scans every household for repeated failures, stuck conversations, open circuit breakers, and disabled background jobs.
What broke: That empty-response shape was the root cause of a failure that ran for days before anyone noticed, because nothing distinguished it from a normal successful turn. It just sat in the conversation history looking fine.
How I measured it: Forced an empty-text, no-tool-result response in a test conversation and confirmed the error tracker caught it and the user saw a clear fallback message instead of blank output. Checked the health endpoint separately, confirming it reports pass/warn/fail per component (database, background jobs, circuit breakers) instead of a single up/down flag.
How I fixed it: Converted the empty-response case from a silent success into a tracked, alertable error, added a cooldown-gated push notification so repeat failures for the same household don't spam duplicate alerts, and added structured per-interaction logging (conversation id, provider, latency, tool-call count, error type) so the next incident is a log search instead of a guess.

Mobile / Native Integration

Native Siri / App Intents Integration

Decision: The app's voice layer started as URL-scheme shortcuts that had to be manually configured. I replaced that with 6 real iOS App Intents, written in Swift and injected through a custom Expo config plugin, that register themselves with Siri and Spotlight automatically on install, plus 4 home-screen quick actions, without hand-maintaining a parallel native project.
What broke: The deep-link listener those intents needed to target was only half-wired, left over from an earlier feature that never finished the navigation-layer plumbing. Voice commands had nowhere reliable to land.
How I measured it: Tested all 6 registered phrases across cold launch, backgrounded, and foregrounded app states, and separately verified the 1 free-form intent ('ask about X') correctly extracts and sends the spoken text, accounting for normal Siri transcription accuracy rather than expecting it to be perfect.
How I fixed it: Finished the deep-link routing so every recognized path lands on the right screen, added a fallback to the default tab for anything unrecognized instead of an error or crash, and confirmed the app degrades gracefully on iOS versions below the App Intents minimum: it just loses the voice layer, deep links and quick actions keep working.

Systems Reliability

Generic Retry Queue + Circuit Breaker

Decision: Every integration (calendar writes, budget updates, memory storage) was handling failure differently, if at all. I built 1 generic `withResilience()` wrapper providing retry-with-exponential-backoff, a durable Supabase-backed retry queue, and a per-integration circuit breaker, so any external call gets the same failure handling without hand-rolling it at each call site.
What broke: The predecessor bot's failure mode for a failed external call was that the operation just disappeared. A dropped calendar write or a dropped budget update had no queue, no retry, and no record that it had ever been attempted.
How I measured it: Forced 5 consecutive failures against 1 integration and confirmed the circuit opened, meaning later calls were skipped immediately instead of each one burning a 25-plus second timeout. Waited out the cooldown window and confirmed the half-open test call closed the circuit again on success.
How I fixed it: Every transient failure now lands in a durable queue with an idempotency key, retries on a backoff schedule, and moves to a dead-letter state with a push notification if it exhausts its attempts. The failure mode changed from silently lost to queued, retried, or explicitly surfaced.