ReviewData (RD) — Engineering Onboarding & Architecture Guide

Page 1 · Orientation

System at a Glance

What the system does, and the shape of it in one screen.

ReviewData (RD) is backend integrator infrastructure. It has no customers of its own — other platforms (called integrators, chiefly Reputation Management / RM) call its API to get four jobs done, and RD reports back over webhooks.

Operation	Kind	How RD handles it
Scrape reviews	Long-running	Temporal workflow → reviews written to S3 → webhook.
Find publisher URLs	Long-running	Temporal workflow → URL written to DB → webhook.
Generate AI responses	In-process	A single OpenAI call in the API service — no Temporal.
Post responses	Long-running	Temporal workflow (or a human via QA) → webhook.

The whole system on one diagram

Integrator (RM) Internal staff QA staff │ API key │ JWT login │ Chrome extension ▼ ▼ ▼ ┌──────────────────────────────────────────────────────────────────────────────┐ │ FastAPI service (async, ASGI) │ │ /api/v1 integrator endpoints · /admin · /qa · /internal (extension) │ └───────┬───────────────┬───────────────┬───────────────┬───────────────────────┘ │ read/write │ start wf │ cache/locks │ AI (OpenAI) ▼ ▼ ▼ ▼ ┌─────────┐ ┌──────────────┐ ┌────────┐ (in-process call) │ Postgres│ │ Temporal │ │ Redis │ │ (schema)│ │ (durable │ └────────┘ └────▲────┘ │ workflows) │ │ └──────┬───────┘ │ same DB │ polls task queues │ ┌──────┴──────────────────────────────┐ │ │ Temporal Workers (Workflow Team) │ └────────┤ scrapers · posters · url-finders │──► S3 (.jl reviews) ─► CloudFront ─► URLs │ + Platform's webhook-delivery worker │──► HTTP POST ────────► Integrator webhook └──────────────────────────────────────┘

The one-sentence mental model The FastAPI service accepts work and hands off a durable ticket (a job row + a Temporal workflow); Workers do the heavy lifting and write results back to the same database and to S3; webhooks tell the integrator it's done. Everything is correlated by request_id and task_id.

Explicitly out of scope

✗ No customer SPA · ✗ No billing (that's RM) · ✗ No review content in the DB (reviews live in S3; the DB stores URLs + metadata) · ✗ No scraper/poster implementation here (that's the Workflow Team — Page 4).

Concern	Choice	Where used
Admin/QA login	JWT (access + refresh)	Admin/QA UI; also an extension-type JWT for the Chrome extension.
Integrator auth	API key — SHA-256 hashed at rest	In the request payload; looked up by prefix, then hash-verified.
Password hashing	Argon2id (`argon2-cffi`)	User passwords.
Webhook integrity	HMAC-SHA256 (`X-RD-Signature`)	Signs every outbound webhook body.
Secret-at-rest	Fernet symmetric encryption	Publisher credentials, cookies, OAuth tokens.

Process	Owner	Responsibility
API service (Gunicorn/Uvicorn)	Platform	All HTTP: integrator REST, Admin/QA, extension endpoints. Starts workflows. Runs the in-process AI Response call and the background queue-manager loops.
Temporal server	Self-hosted	Durable state store + task-queue broker. Nothing business-specific lives here.
Scraper / poster / URL-finder workers	Workflow Team	Poll task queues, execute the actual publisher logic (Playwright, HTTP, captcha), write results.
webhook-delivery worker	Platform	The one worker Platform owns — runs the outbound webhook delivery workflow + retry schedule.
PostgreSQL	Shared	Single source of truth. Both teams read/write; only Platform writes migrations.
Redis	Shared	Ephemeral state: rate-limit counters, distributed locks, cookie/session store, scraper session tokens.

Step	Owner
Validate request, create job row, set initial status, start workflow	Platform
Update status during execution; write final result (S3 URLs, errors)	Workflow Team
Fire the completion webhook (via the shared emit service)	Workflow Team

Boundary	Idempotency key
Starting a workflow	Deterministic ID `scrape-{task_id}` → a duplicate start raises `WorkflowAlreadyStartedError`, caught & treated as success.
DB writes in an activity	`INSERT ... ON CONFLICT DO NOTHING/UPDATE`.
S3 writes	Deterministic key `{request_id}/{task_id}-batch-{n}.jl` — overwrite is safe (same content).
Outbound webhook	`delivery_id` — integrator dedupes on its side.
Integrator retry of a POST	De-dup on `(api_key, foreign_key, endpoint)` for a short window.

Resource	Use	Never
Database	`AsyncSession`, `session.scalar()`/`scalars()`, `postgresql+asyncpg://`	`session.query()` (1.x style), `psycopg2`
HTTP	`httpx.AsyncClient`, reused from `app.state`	`requests`; a new client per call
Redis	`redis.asyncio` via `RedisService`, one shared pool	the sync `redis` API; a client built ad-hoc
S3	`aioboto3` (or `boto3` inside `asyncio.to_thread`)	blocking `boto3` on the loop
Sleep / locks	`await asyncio.sleep()`, `asyncio.Lock`	`time.sleep()`, `threading.Lock`

Work	Mechanism
Short, non-critical (cache warm, fire-and-forget log)	`asyncio.create_task(...)` in the handler
Durable, retryable, multi-step (scrape, post, webhook)	Temporal workflow
Recurring poll loops (the queue managers)	Long-lived task started in `lifespan`
Anything you'd have used Celery for	Temporal. Celery is banned.

Identifier	Generated by	Purpose
`request_id`	Platform (inbound)	Correlates one request → all downstream work + every log line. Never null.
`task_id`	Platform (job create)	The job ID; reused verbatim as the Temporal workflow ID.
`delivery_id`	Platform (per attempt)	Changes on each webhook retry; the integrator's idempotency key.
`foreign_key`	Integrator	RM's own reference; RD only echoes it back.
`external_location_id`	Integrator	RM's `business.id`.
`internal_business_id`	Discovered while scraping	Publisher-side business ID (e.g. place_id); required to post.
`internal_review_id`	Discovered while scraping	Publisher-side review ID; sent to the publisher when posting a reply.

Queue	Purpose	Concurrency profile
`scraping-internal`	Internal SAU scraper workflows	normal
`scraping-lde`	LDE-routed scrapes	normal
`scrape-<publisher>`	Per-publisher scrape workflows + their HTTP activities	normal (per queue)
`scraping-browser`	Activity-routing target: only the browser-backed activities hop here	bounded by the browser-api session envelope (~12)
`posting`	Response posting (most common)	normal
`posting-browser`	Cookie/credential posting (Playwright)	1 per worker — Playwright is heavy
`url-finder`	URL discovery	normal
`webhook-delivery`	Outbound webhook delivery + retry	high — horizontally scaled
`maintenance`	Cron-style: cookie health, alias rotation, archival	low

Endpoint	Kind	Does
`POST /request-reviews`	Async → `task_id`	Kick off a scrape.
`POST /retrieve-task`	Sync, paginated	Read scrape results.
`POST /get-publisher-info`	Sync	Look up publisher/account info.
`POST /submit-response`	Async → `task_id`	Post a reply to a review.
`POST /retrieve-posting-data`	Sync, paginated	Read posting results.
`POST /ai-response`	Sync/async (per config)	Generate an AI reply (in-process OpenAI).
`POST /url-finder/per-business` · `/file-based`	Async → `task_id`	URL discovery, single or bulk.

Event	Fires when
`EVENT.URL_UPDATED`	A publisher URL is resolved (auto or QA).
`EVENT.DATA_RESULT`	A scrape completed.
`EVENT.CREDENTIAL_UPDATED` · `EVENT.COOKIE_UPDATED`	QA added credentials / captured cookies.
`EVENT.RESPONSE_SUBMISSION`	A posting attempt finished (success or failure).
`EVENT.PUBLISHER_DISCONNECTED`	An account got disconnected during posting.

Question	Where
"What is X / is X in scope?"	`dev-docs/00-foundation/` + glossary
"How should I write this?"	`dev-docs/03-engineering-standards/`
"How does feature Y work?"	`dev-docs/02-components/Y.md` + its matching skill
"Who owns this table/flow?"	`00-foundation/03-team-boundaries.md`
"How do I run a migration?"	skill: `run-migration`

System at a Glance

The whole system on one diagram

Explicitly out of scope

The Technology Stack — a Roadmap

Auth & security, at a glance

Tooling & the frontend

Testing

Quality

Ship

Admin/QA UI

Runtime Architecture & Processes

The independent processes

Lifespan singletons — one client per process

How one request travels the whole system

Two Teams, One Database

Platform Team — this repo

Workflow Team — separate repo

The three shared contracts

Who writes each row

The MCS Layering

Why the layering earns its keep

Testability

Maintainability

Consistency

Clean errors

The canonical shape

Domain organization (not type organization)

Design Patterns in RD

1 · Service object over a session

2 · Exception hierarchy → single envelope

3 · Pydantic contract at the boundary

4 · Idempotency at every async boundary

5 · Correlation via contextvars

6 · Lifespan singletons + explicit DI

The Async & Concurrency Model

The tools by resource

Concurrency patterns you'll use

Choosing where background work runs

The Data Model

The location hierarchy

Configuration & identity

PublisherConfiguration

IdentityAddress

Jobs & cross-cutting tables

The identifiers (know the difference cold)

Temporal Orchestration

Workflows vs activities

Workflow

Activity

Conventions the Platform team enforces

Human-in-the-loop: signals + wait conditions

How Platform triggers a workflow (from a service)

Task Queues & Worker Topology

The queues

Two concepts that trip people up

Worker fleet shape (prod)

The Queue Engine: Dispatch, Fairness & Autoscaling

The problem it solves

The control loop

The five moving parts

The same idea, applied to LLM reports

Request & Webhook Flows

The inbound REST surface

A scrape, end to end

Outbound webhook events

Delivery is itself a durable workflow

Posting, QA & Error Handling

Posting: mode × auth type

Modes

Auth types

The QA Task Board — five streams

Error handling: one envelope, always

Observability

Foundations to Master First

Week 1 — Language & async fundamentals

Week 2 — The frameworks

Week 3 — What makes RD special

System Understanding & First Tasks

Trace these end-to-end (on paper, then in the code)

Component specs to read, in order

5 · Correlation via `contextvars`