ryOS ryOS / Docs
GitHub Launch

AI System

Multi-provider AI with streaming responses, tool-loop orchestration, and a two-tier memory pipeline.

Providers

ProviderSDKModels
OpenAI@ai-sdk/openaigpt-5.4
Anthropic@ai-sdk/anthropicsonnet-4.6
Google@ai-sdk/googlegemini-3-flash, gemini-3.1-pro-preview

Default model: gpt-5.4

Specialized models used by specific flows:

  • gemini-3-flash-preview (proactive greeting, applet text mode, chat-room auto replies, memory extraction, and daily-notes processing)
  • gemini-3.1-flash-image-preview (applet image generation)

graph TD
    A[User Message] --> B[Chat API]
    B --> C{Provider Selection}
    C -->|OpenAI| D[gpt-5.4]
    C -->|Anthropic| E[claude-sonnet-4-6]
    C -->|Google| F[gemini-3-flash / gemini-3.1-pro-preview]
    D --> G[AI SDK Stream]
    E --> G
    F --> G
    G --> H[Tool Loop + Response Handler]
    H --> I[UI Update]

Available Tools

ToolDescription
launchAppOpen applications (supports Internet Explorer URL + year time-travel launch)
closeAppClose applications
ipodControliPod playback control: toggle/play/pause/playKnown/addAndPlay/next/previous (+ video/fullscreen/lyrics translation options)
karaokeControlKaraoke playback control (shared music library with iPod, independent playback state)
generateHtmlCreate HTML applets with title and emoji icon
aquariumRender interactive emoji aquarium in chat
listList VFS items: /Applets, /Documents (includes document names), /Applications, /Music, /Applets Store
openOpen files/apps/media from virtual file system
readRead file contents (applets, documents, Applets Store items)
writeCreate/modify markdown documents (overwrite/append/prepend modes)
editEdit existing files with precise text replacement
searchSongsSearch YouTube for songs (with API-key rotation and retry)
settingsChange language, theme, volume, speech, check-for-updates
stickiesControlList/create/update/delete/clear sticky notes
infiniteMacControlControl Infinite Mac emulator (launch system, screen read, mouse/keyboard actions, pause state)
calendarControlCreate, update, delete, and list calendar events; create, toggle, delete, and list todos
contactsControlSearch, create, update, and delete contacts (with Telegram field support)
documentsControlList, read, write, and edit cloud-synced markdown documents
memoryWriteUnified memory writer (long_term or daily)
memoryReadUnified memory reader (long_term by key or daily by date)
memoryDeleteDelete long-term memory by key
songLibraryControlSearch ryOS song libraries and cached metadata from server (list/search/get/searchYoutube/add; scopes: user/global/any; Telegram/server-side)
web_searchOpenAI provider web search (GPT-5.4 only, authenticated users, with geolocation context)
google_searchGoogle provider web search (Gemini 3 Flash only, authenticated users)
webFetchServer-side URL fetch with HTML-to-text extraction for Ryo (sanitized)

API Endpoints

EndpointPurpose
/api/chatMain chat with streaming, tool-calling, and context-aware prompt assembly
/api/ai/extract-memoriesSingle-pass extraction of daily notes + long-term memories from chat history
/api/ai/process-daily-notesBackground processing of past daily notes into long-term memories
/api/ai/ryo-replyAuto-reply generation for chat rooms
/api/applet-aiApplet AI assistant (text + image mode, multimodal input)
/api/ie-generateInternet Explorer time-travel page generation
/api/speechText-to-speech synthesis
/api/audio-transcribeAudio transcription
/api/webhooks/telegramTelegram bot webhook for DM chat (image support, web search, AI tool execution)
/api/cron/telegram-heartbeatAI-powered proactive heartbeat messages via Telegram (cron-triggered)
/api/telegram/link/createGenerate Telegram account linking code
/api/telegram/link/statusCheck Telegram link status
/api/telegram/link/disconnectDisconnect linked Telegram account

Architecture

sequenceDiagram
    participant U as User
    participant C as Chat UI
    participant API as /api/chat
    participant M as AI Model
    participant T as Tool Runtime
    participant S as ryOS State

    U->>C: Send message
    C->>API: POST messages + systemState
    API->>M: streamText (static+dynamic prompts)
    M-->>API: Tool call(s)
    API->>T: Execute server tools / emit client tools
    T->>S: Read or mutate state
    S-->>T: Result
    T-->>M: Tool result
    M-->>API: Final tokens
    API-->>C: UI message stream
    C-->>U: Display response

Tool Handlers

Backend tool registry lives in api/chat/tools/:

Tool profiles control which tools are available per channel:

  • all (web chat): Full tool set — all client-side and server-side tools
  • telegram: Server-side subset — memory, calendar, stickies, contacts, documents, songLibraryControl (all execute server-side via Redis)
  • memory: Memory tools only

Client execution handlers remain in src/apps/chats/tools/:

  • appHandlers.ts - Launch/close app execution
  • ipodHandler.ts / karaokeHandler.ts - Media control execution
  • calendarHandler.ts - Calendar event management execution
  • contactsHandler.ts - Contact management execution
  • settingsHandler.ts - System settings updates
  • stickiesHandler.ts - Sticky note operations
  • infiniteMacHandler.ts - Infinite Mac control bridge

Shared conversation preparation lives in api/_utils/ryo-conversation.ts:

  • prepareRyoConversationModelInput() - Unified entry point for both web chat and Telegram channels
  • Assembles static system prompt, dynamic context (memories, daily notes, system state), and tools
  • Handles model selection, message enrichment, and OpenAI web search injection

Tool schema highlights

  • launchApp now enforces that internet-explorer launches must provide both url and year together (or neither), with year-range validation.
  • ipodControl and karaokeControl schemas enforce action-specific arguments (e.g. addAndPlay requires id; playback-state actions must not include track identifiers).
  • memoryWrite / memoryRead are unified schemas using a type field:
    • long_term (default): key-based memory operations
    • daily: journal-style per-day operations
  • infiniteMacControl supports multimodal screen inspection by returning screen captures that can be converted into model-readable image content.

System Prompts

Core prompt constants are defined in api/_utils/_aiPrompts.ts:

  • CORE_PRIORITY_INSTRUCTIONS - Priority and memory-override rules
  • RYO_PERSONA_INSTRUCTIONS - Ryo identity and background
  • ANSWER_STYLE_INSTRUCTIONS - Style and language behavior
  • CODE_GENERATION_INSTRUCTIONS - Applet generation constraints
  • CHAT_INSTRUCTIONS - Chats behavior and memory usage guidance
  • TELEGRAM_CHAT_INSTRUCTIONS - Telegram DM-specific behavior (plain text only, calendar/stickies/contacts/documents tool guidance)
  • TOOL_USAGE_INSTRUCTIONS - VFS and tool workflow rules
  • MEMORY_INSTRUCTIONS - Two-tier memory strategy and tool usage policy
  • IE_HTML_GENERATION_INSTRUCTIONS - Internet Explorer HTML generation rules

Channel-specific prompt composition (via ryo-conversation.ts):

  • Web chat (/api/chat): CORE_PRIORITY + ANSWER_STYLE + RYO_PERSONA + CHAT + TOOL_USAGE + MEMORY + CODE_GENERATION, then appends dynamic user/system state.
  • Telegram (/api/webhooks/telegram): CORE_PRIORITY + ANSWER_STYLE + RYO_PERSONA + TELEGRAM_CHAT + MEMORY, then appends dynamic user context.
  • /api/applet-ai uses a dedicated compact applet system prompt for embedded UI contexts.
  • /api/ie-generate splits prompts into static + dynamic sections for year/URL-aware generation.

Memory System

ryOS uses a two-tier Redis-backed memory model:

Tier 1: Daily Notes (journal memory)

  • Append-only entries grouped by date (YYYY-MM-DD) in user timezone.
  • Each entry stores:
    • Unix timestamp (timestamp)
    • UTC ISO timestamp (isoTimestamp)
    • Local date/time (localDate, localTime)
    • Timezone (timeZone)
    • Entry content
  • Daily notes auto-expire after 30 days (TTL).
  • Recent notes from the last 3 days are injected into chat prompt context.

Tier 2: Long-Term Memories

  • Two-layer structure:
    • Index: key + summary + updatedAt (always visible to the model)
    • Detail: full content + createdAt + updatedAt
  • Capped at 50 memories per user.
  • Canonical key guidance (e.g. name, preferences, projects, instructions) is used by extraction pipelines.

Stale-memory cleanup

  • Long-term hygiene includes automatic cleanup of stale temporary memories.
  • Temporary context-like memories (e.g. short-lived travel/meeting context) are removed when old enough (default retention: 7 days) and heuristics identify them as transient.
  • Cleanup runs as part of the daily-notes processing cycle before new extraction.

Daily Notes Processing Pipeline

flowchart TD
    A[Conversation / tool writes] --> B[/api/ai/extract-memories]
    B --> C[Append daily notes + optional long-term updates]
    C --> D[Unprocessed daily notes accumulate]
    D --> E[/api/chat proactive greeting trigger]
    E --> F[/api/ai/process-daily-notes]
    F --> G[Cleanup stale temporary memories]
    G --> H[Process past days only, oldest first]
    H --> I[Extract + consolidate long-term memories]
    I --> J[Mark daily note processed]

Pipeline behavior:

  1. extract-memories performs single-pass extraction from chat history (daily notes + candidate long-term facts).
  2. Daily notes continue collecting while a day is active.
  3. process-daily-notes processes unprocessed past days (excludes today), consolidates overlaps, and marks each processed day.
  4. Chat endpoint can trigger process-daily-notes in the background during proactive greeting flow.
  5. Telegram heartbeat cron also triggers process-daily-notes and extracts memories from new Telegram chat messages since the last heartbeat.

apiHandler Pattern

AI endpoints use a shared api/_utils/api-handler.ts wrapper for consistency:

  • CORS and allowed-origin checks
  • Method gating + automatic OPTIONS handling
  • Optional JSON body parsing
  • Shared per-request context injection (req, res, redis, logger, origin, user, body)
  • Unified auth modes (none, optional, required) with optional expired-token allowance
  • Unified top-level error handling and status logging

Common endpoint configurations in this AI stack:

  • /api/chat: auth: "optional", allowExpiredAuth: true, parseJsonBody: true, contentType: null
  • /api/applet-ai: auth: "optional", parseJsonBody: true, contentType: null
  • /api/ie-generate: auth: "none", parseJsonBody: true, contentType: null
  • /api/ai/extract-memories: auth: "required", parseJsonBody: true
  • /api/ai/process-daily-notes: auth: "required", parseJsonBody: true
  • /api/ai/ryo-reply: auth: "required", parseJsonBody: true
  • /api/webhooks/telegram: Custom handler (webhook secret validation, Telegram-specific auth via linked accounts)
  • /api/cron/telegram-heartbeat: Custom handler (cron secret via Authorization: Bearer header)

Additional AI Capabilities

  • Proactive greetings: /api/chat supports a proactive greeting mode for logged-in users with memories. Uses gemini-3-flash-preview to generate a short, context-aware greeting referencing recent activity or memories. Triggers background daily-note processing on each greeting.
  • Telegram bot DM chat: /api/webhooks/telegram enables private Telegram DM conversations with Ryo. Supports image attachments (downloaded and injected as multimodal content), web search, and server-side tool execution (memory, calendar, stickies, contacts, documents). Users link accounts via /api/telegram/link/* endpoints. Includes per-user burst and account-window rate limiting.
  • Telegram heartbeat insights: /api/cron/telegram-heartbeat runs on a 30-minute cron schedule. Analyzes today's daily notes, recent Telegram conversation, and heartbeat history to decide whether to proactively message the user. Processes daily notes and extracts memories from new chat messages before each decision. Uses gating logic to avoid redundant or stale nudges.
  • Web search: Authenticated users get a search tool based on the selected model: web_search (OpenAI) for gpt-5.4 with geolocation context, or google_search (Google) for gemini-3-flash. Anonymous users do not get search tools.
  • Chat-room auto replies: /api/ai/ryo-reply generates room messages as ryo with dedicated rate limits.
  • Applet multimodal AI: /api/applet-ai supports text chat, image attachments in message history, and binary image generation responses.
  • Infinite Mac visual loop: infiniteMacControl can return screenshots for model-visible state inspection.
  • Internet Explorer caching: /api/ie-generate stores cleaned generated HTML snapshots in Redis for recent-history retrieval.