Back to BlogEngineering

Building a Multi-Tenant SaaS Platform: Architecture Lessons from the Trenches

JO
James Okonkwo
Principal Engineer
February 14, 20268 min read

Building a multi-tenant SaaS platform is one of the most challenging architectural problems in software engineering. Every decision — from database design to authentication to billing — has to account for the fact that hundreds of businesses will share the same infrastructure while expecting complete isolation. Here is what we learned building one.

What Multi-Tenancy Really Means

At its core, multi-tenancy means a single instance of your application serves multiple customers (tenants). Each tenant gets their own data, configuration, and experience, but they all run on the same codebase and infrastructure. The alternative — deploying a separate instance for each customer — works at small scale but becomes an operational nightmare past 50 customers.

The challenge is threefold: data isolation (Tenant A must never see Tenant B's data), configuration isolation (each tenant has their own settings, branding, and workflows), and performance isolation (one tenant's traffic spike should not degrade another tenant's experience).

Database Strategy: Shared Schema with Row-Level Isolation

We evaluated three approaches: separate databases per tenant, separate schemas per tenant, and shared schema with row-level isolation. We chose the third.

Every table includes a tenant_id column, and every query filters by it. This sounds simple, but the devil is in the details:

  • Query safety: Every ORM query must include the tenant filter. We enforce this through middleware that injects the tenant context, and through code reviews that flag any query missing the filter.
  • Indexing: Composite indexes on (tenant_id, ...) are essential for performance. A query that scans millions of rows across all tenants when it should only touch a few hundred is a common anti-pattern.
  • Migrations: Schema changes affect all tenants simultaneously. We use careful, additive-only migrations (ALTER TABLE ADD COLUMN) with sensible defaults, avoiding destructive changes that could cause downtime.

The shared-schema approach gives us operational simplicity (one database to back up, monitor, and scale) while row-level isolation gives us the security guarantees tenants expect.

Authentication and Authorization

Multi-tenant auth has three layers:

Layer 1: Identity — Who is this user? We use JWT tokens with a 4-hour expiry, stored in HttpOnly cookies with Secure and SameSite flags. The token contains the user ID and tenant ID, so every authenticated request carries its tenant context.

Layer 2: Tenant Binding — Does this user belong to this tenant? A user in Tenant A cannot access Tenant B's resources, even if they somehow obtain a valid JWT. The tenant ID in the token is checked against the resource being accessed on every request.

Layer 3: Role-Based Access — What can this user do within their tenant? We implement three roles: admin (full access), developer (API access), and user (limited access). Roles are enforced at both the API layer and the UI layer, with the API being the source of truth.

API keys follow a similar pattern: each key is scoped to a tenant and a role. We store only the SHA-256 hash of the key in the database, never the plaintext. Keys use prefixes (ua_ for regular, ua_admin_ for admin) so they can be easily identified in logs without exposing their value.

Per-Tenant Configuration

Each tenant can customize their AI agent's personality, greeting, knowledge base, booking rules, notification preferences, and branding. This configuration lives in the database and is loaded into memory on each request.

The key architectural decision here is the balance between flexibility and complexity. We use a structured configuration model (explicit columns for common settings) rather than a generic key-value store. This makes validation straightforward, migrations predictable, and the code readable.

For less common customizations, we provide a JSON settings column that tenants can use for advanced configuration. This hybrid approach covers 95% of use cases with structured fields while allowing power users to customize the remaining 5%.

Scaling Considerations

Vertical scaling (bigger servers) is surprisingly effective for multi-tenant applications, especially when your database queries are well-indexed. We run our entire platform on a single VPS with PostgreSQL, Redis, and the application server — handling hundreds of tenants comfortably.

When you do need to scale horizontally, the tenant_id-based architecture makes it straightforward. You can shard your database by tenant_id, route requests to different application instances based on tenant, and use Redis for shared state like session management and rate limiting.

Connection pooling is critical. With hundreds of tenants making concurrent requests, you need to manage database connections carefully. We use SQLAlchemy's connection pool with a pool size of 20 and max overflow of 40, with aggressive recycling to prevent stale connections.

Lessons Learned

  1. Add tenant_id filtering from day one. Retrofitting it is painful and error-prone.
  2. Test with multiple tenants in your test suite. A bug that only manifests when two tenants interact is the worst kind of bug.
  3. Rate limit per tenant, not just per IP. A single tenant should not be able to monopolize your resources.
  4. Log tenant_id in every log line. When debugging production issues, tenant context is invaluable.
  5. Plan your billing model around tenants. Usage tracking, plan limits, and overage handling all need tenant-level granularity.

Multi-tenancy is not a feature you add; it is a foundation you build on. Get it right early, and everything else becomes easier.

#saas#architecture#multi-tenant#database#engineering