๐Ÿ† AWS H0 Hackathon โ€” Database Showcase

How ChatScroll uses AWS Databases

A technical overview of every AWS Database feature used โ€” from pgvector semantic search to DynamoDB TTL.

๐Ÿ”pgvector โ€” Semantic search๐ŸŒฒltree โ€” Folder hierarchy๐Ÿ“tsvector โ€” Full-text searchโšกDynamoDB TTL โ€” Auto-expiry๐Ÿ”‘Composite Keys โ€” Message indexing๐Ÿค–gemini-embedding-001

Aurora PostgreSQL (Serverless v2)

Relational core โ€” notes, folders, users, conversations, semantic search

Notes / Scrolls Table

notes
  • pgvector extension stores 3072-dim embeddings per scroll
  • gemini-embedding-001 generates embeddings on save via TrySaveEmbeddingAsync
  • Cosine similarity search surfaces semantically related scrolls
  • tsvector full-text index for exact keyword matching
  • Hybrid search: semantic nearest-neighbour combined with keyword results
-- Cosine distance (pgvector)
embedding <=> queryVector

-- Semantic search with threshold
WHERE 1 - (embedding <=> $vec) > 0.5
ORDER BY embedding <=> $vec
LIMIT 3

-- Full-text search
WHERE search_vector @@ plainto_tsquery('english', $q)
ORDER BY ts_rank(search_vector, ...) DESC

-- Helper
SELECT vector_dims(embedding) -- โ†’ 3072

Folders Table

folders
  • ltree extension stores folder paths as dot-separated label trees
  • Enables hierarchical queries without recursive CTEs
  • e.g. programming.containers โ†’ Programming โ€บ Containers
  • Related-scrolls scoping: find root parent, then include all descendant folder IDs
  • parent_id FK used for 2-level hierarchy traversal in API
-- ltree path matching
WHERE path ~ 'programming.*'

-- All folders in a subtree
WHERE path = $root
   OR path ~ ($root || '.*')

-- Foreign-key traversal (used in GetRelated)
WHERE folder_id = ANY($allowedIds::uuid[])

Conversations Table

conversations
  • Conversation metadata (title, created_at, user_id) lives in Aurora
  • Message content lives in DynamoDB โ€” dual-database pattern
  • Aurora handles referential integrity; DynamoDB handles message volume
  • Anonymous session IDs bridge the two stores before login
  • On login, migrate-anonymous re-parents both Aurora rows and DynamoDB items
Dual-database pattern: Aurora owns structure + search; DynamoDB owns the high-volume chat message stream.

Users Table

users
  • Cognito JWT sub claim decoded server-side โ€” no middleware or JWT library needed
  • Anonymous users get a browser-generated UUID via X-User-Id header
  • EnsureUserExistsAsync auto-creates the row on first API request โ€” no registration step needed
  • User profile (display name, email) synced from Cognito JWT claims on each call
-- Identity resolution order (ApiControllerBase)
1. Authorization: Bearer <cognitoJwt>
   โ†’ decode payload โ†’ parse sub as UUID
2. X-User-Id header โ†’ parse as UUID
3. Hard-coded dev fallback (local only)

Amazon DynamoDB

High-volume chat messages โ€” composite keys, TTL, and pay-per-request

Chat Messages Table

chatscroll-messages
  • Partition key groups all messages in a conversation together
  • Sort key timestamp#messageId enables chronological reads and time-range queries
  • TTL attribute auto-expires messages after 90 days โ€” no cron jobs, no manual deletes
  • PAY_PER_REQUEST billing โ€” scales to millions of messages with zero capacity planning
  • Each item stores role (user/assistant), content, folderSuggestion metadata

Access Patterns

PKconversationId = :convId
SKbegins_with SK, ''

Fetch all messages in a conversation, ordered chronologically

PKconversationId = :convId
SKbetween :t1 and :t2

Fetch messages within a time range (e.g. last 100 messages)

PK conversationId (String)
SK {timestamp}#{messageId} (String)
TTL expiresAt (Unix epoch, 90d)

System Architecture

Request flow from browser to databases

Browser
Next.js / Vercel
โ†’
AWS ECS Fargate
ASP.NET Core API
โ†“
โ†“
Aurora PostgreSQL
๐Ÿ“œ scrolls + embeddings
๐ŸŒฒ folders (ltree)
๐Ÿ‘ค users (Cognito sub)
๐Ÿ’ฌ conversation metadata
DynamoDB
๐Ÿ’ฌ chat messages
โšก TTL (90-day expiry)
๐Ÿ”‘ composite sort keys
๐Ÿ“ˆ pay-per-request scale
Next.js
App Router ยท Vercel
ASP.NET Core
ECS Fargate ยท ECR
Aurora PG
Serverless v2 ยท pgvector
DynamoDB
On-demand ยท TTL

CI/CD Deployment

GitHub Actions โ†’ ECR โ†’ ECS Fargate

Pipeline
  • Push to master triggers GitHub Actions workflow
  • Docker image built and pushed to Amazon ECR
  • ECS task definition registered automatically per deploy
  • ECS service updated; polling loop waits for stabilization
Infrastructure
  • ECS Fargate โ€” serverless containers, no EC2 management
  • ALB with HTTPS termination in front of ECS tasks
  • Secrets (DB password, Gemini key) via ECS task environment
  • Aurora Serverless v2 โ€” auto-pauses when idle
Observability
  • Health check endpoint: GET /api/health
  • Live stats endpoint: GET /api/stats
  • Embedding backfill: POST /api/notes/admin/backfill-embeddings
  • Storage type exposed in stats response: aurora_pgvector
git pushโ†’GitHub Actionsโ†’docker buildโ†’ECR pushโ†’ECS deployโ†’โœ… live

See it in action โ€” save an AI answer as a Scroll and watch the embedding get generated.

Try ChatScroll โ†’