🏗️ System Design Q6 / 17

How do you design a database schema for a high-traffic e-commerce application?

AI-Powered Answer ✓ Answered

Designing a robust and scalable database schema is paramount for a high-traffic e-commerce application. It directly impacts performance, data integrity, and the ability to handle large volumes of transactions and user requests. A well-thought-out schema ensures efficient data retrieval, supports complex business logic, and facilitates future growth and feature additions.

Core Entities and Relationships

The foundation of an e-commerce schema revolves around a few core entities: Users, Products, Orders, and Carts. Understanding their attributes and relationships is crucial for defining the tables and their structures.

Users

The Users table stores customer information. It's often linked to addresses, payment methods, and orders. Consider separating sensitive user data into a different table if needed for compliance or security.

Column	Data Type	Constraints	Description
user_id	BIGINT	PRIMARY KEY, AUTO_INCREMENT	Unique identifier for the user
username	VARCHAR(50)	UNIQUE, NOT NULL	User's chosen username
email	VARCHAR(255)	UNIQUE, NOT NULL	User's email address
password_hash	VARCHAR(255)	NOT NULL	Hashed password for security
first_name	VARCHAR(100)	NULLABLE	User's first name
last_name	VARCHAR(100)	NULLABLE	User's last name
created_at	TIMESTAMP	DEFAULT CURRENT_TIMESTAMP	Timestamp of user creation
updated_at	TIMESTAMP	DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP	Timestamp of last update

Products

Products are central to an e-commerce system. Consider categories, variants (e.g., size, color), and inventory management. A separate inventory table is crucial for high-traffic scenarios to minimize contention.

Column	Data Type	Constraints	Description
product_id	BIGINT	PRIMARY KEY, AUTO_INCREMENT	Unique identifier for the product
name	VARCHAR(255)	NOT NULL	Product name
description	TEXT	NULLABLE	Detailed product description
price	DECIMAL(10, 2)	NOT NULL, > 0	Base price of the product
category_id	BIGINT	FOREIGN KEY (Categories)	Reference to product category
sku	VARCHAR(50)	UNIQUE, NOT NULL	Stock Keeping Unit
image_url	VARCHAR(512)	NULLABLE	URL for the main product image
is_active	BOOLEAN	DEFAULT TRUE	Whether the product is currently active
created_at	TIMESTAMP	DEFAULT CURRENT_TIMESTAMP	Timestamp of product creation

Example: Categories table to organize products.

Column	Data Type	Constraints	Description
category_id	BIGINT	PRIMARY KEY, AUTO_INCREMENT	Unique identifier for the category
name	VARCHAR(100)	UNIQUE, NOT NULL	Category name (e.g., 'Electronics')
parent_category_id	BIGINT	FOREIGN KEY (Categories)	Self-referencing for hierarchical categories

Example: ProductVariants table for different options (size, color, etc.) and specific inventory tracking.

Column	Data Type	Constraints	Description
variant_id	BIGINT	PRIMARY KEY, AUTO_INCREMENT	Unique ID for a specific product variant
product_id	BIGINT	FOREIGN KEY (Products), NOT NULL	References the base product
sku	VARCHAR(50)	UNIQUE, NOT NULL	Unique SKU for this variant
attributes	JSON	NULLABLE	JSON object for variant attributes (e.g., {"color": "red", "size": "L"})
price_modifier	DECIMAL(10, 2)	DEFAULT 0.00	Additional price for this variant
stock_quantity	INT	NOT NULL, >= 0	Current stock level for this variant
image_url	VARCHAR(512)	NULLABLE	Specific image for this variant

Orders

The Orders tables track customer purchases. This usually involves an Orders header table and an OrderItems detail table.

Column	Data Type	Constraints	Description
order_id	BIGINT	PRIMARY KEY, AUTO_INCREMENT	Unique identifier for the order
user_id	BIGINT	FOREIGN KEY (Users), NOT NULL	References the user who placed the order
order_date	TIMESTAMP	DEFAULT CURRENT_TIMESTAMP	Date and time of order placement
total_amount	DECIMAL(10, 2)	NOT NULL, >= 0	Total cost of the order
status	VARCHAR(50)	NOT NULL	Order status (e.g., 'pending', 'processing', 'shipped', 'completed', 'cancelled')
shipping_address_id	BIGINT	FOREIGN KEY (Addresses)	Reference to shipping address (separate table)
billing_address_id	BIGINT	FOREIGN KEY (Addresses)	Reference to billing address
payment_method_id	BIGINT	FOREIGN KEY (PaymentMethods)	Reference to payment method

Example: OrderItems table to detail each item within an order.

Column	Data Type	Constraints	Description
order_item_id	BIGINT	PRIMARY KEY, AUTO_INCREMENT	Unique identifier for the order item
order_id	BIGINT	FOREIGN KEY (Orders), NOT NULL	References the parent order
variant_id	BIGINT	FOREIGN KEY (ProductVariants), NOT NULL	References the specific product variant purchased
quantity	INT	NOT NULL, > 0	Quantity of the item ordered
price_at_purchase	DECIMAL(10, 2)	NOT NULL, >= 0	Price of the item at the time of purchase (important for historical data)

Carts

Shopping carts hold items a user intends to purchase. These are typically temporary and may be flushed after a certain period or once an order is placed. Denormalization can sometimes be considered here for performance.

Column	Data Type	Constraints	Description
cart_id	BIGINT	PRIMARY KEY, AUTO_INCREMENT	Unique identifier for the cart
user_id	BIGINT	FOREIGN KEY (Users), UNIQUE, NOT NULL	References the user who owns the cart (one cart per user)
created_at	TIMESTAMP	DEFAULT CURRENT_TIMESTAMP	Timestamp of cart creation
updated_at	TIMESTAMP	DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP	Timestamp of last cart modification

Example: CartItems table to detail items within a user's cart.

Column	Data Type	Constraints	Description
cart_item_id	BIGINT	PRIMARY KEY, AUTO_INCREMENT	Unique identifier for the cart item
cart_id	BIGINT	FOREIGN KEY (Carts), NOT NULL	References the parent cart
variant_id	BIGINT	FOREIGN KEY (ProductVariants), NOT NULL	References the specific product variant in the cart
quantity	INT	NOT NULL, > 0	Quantity of the item in the cart

Key Design Principles

Normalization vs. Denormalization

Normalization reduces data redundancy and improves data integrity, while denormalization can improve read performance by reducing joins. High-traffic e-commerce often uses a hybrid approach.

Normalize: For write-heavy operations, frequently updated data (e.g., inventory stock_quantity), and master data (users, products, categories) where integrity is paramount.
Denormalize: For read-heavy, performance-critical paths like product display (e.g., storing a product's category name directly in the product table, or even embedding product details into order items to freeze the state at purchase time). Search indexes are often highly denormalized.

Indexing Strategy

Proper indexing is crucial for query performance. Without it, the database has to scan entire tables to find data.

Primary Keys: Automatically indexed.
Foreign Keys: Always index foreign keys to speed up join operations.
Frequently Queried Columns: Index columns used in WHERE clauses (e.g., product_id, user_id, order_status, category_id).
Sort/Group By Columns: Index columns used in ORDER BY or GROUP BY clauses.
Unique Constraints: Columns requiring uniqueness (e.g., email, sku) are often indexed automatically.
Avoid Over-indexing: Too many indexes can slow down write operations.

Data Types and Constraints

Choosing appropriate data types and applying constraints ensures data quality and optimizes storage.

Use BIGINT for IDs to accommodate a very large number of records.
Use VARCHAR with appropriate length for strings, TEXT for longer descriptions.
Use DECIMAL for monetary values to avoid floating-point inaccuracies.
Use TIMESTAMP or DATETIME for date/time values.
Utilize NOT NULL, UNIQUE, DEFAULT constraints, and check constraints to enforce business rules.

Idempotency and Transactions

For critical e-commerce operations like order placement and payment processing, ensure atomicity, consistency, isolation, and durability (ACID) properties through transactions. Design idempotent operations to handle retries without side effects (e.g., double charging).

Scalability and Performance Considerations

Sharding/Partitioning

For extremely high traffic, a single database server may not suffice. Sharding distributes data across multiple database instances to improve performance and availability. Common strategies include horizontal partitioning based on a key (e.g., user_id, order_id).

Hash-based sharding: Distributes data evenly.
Range-based sharding: Groups similar data together, good for localized queries.
Directory-based sharding: Uses a lookup service to find the correct shard.

Replication

Read replicas can scale read operations by directing read queries to secondary database instances, offloading the primary write instance. This is highly effective for e-commerce, which typically has a much higher read-to-write ratio.

Caching

Implement caching layers (e.g., Redis, Memcached) for frequently accessed, relatively static data such as product details, category listings, and popular items. This reduces the load on the database significantly.

Asynchronous Operations

Decouple non-critical operations (e.g., sending email notifications, updating analytics, complex inventory adjustments) using message queues. This frees up immediate request paths and ensures a snappier user experience.

Database Choice

Relational Databases (SQL)

Often the default choice due to ACID compliance, strong consistency, and complex query capabilities.

Strengths: Strong consistency, complex joins, mature ecosystem, well-suited for transactional data (Orders, Users).
Examples: PostgreSQL, MySQL, Amazon Aurora.

NoSQL Databases

Can complement SQL databases for specific use cases requiring high scalability, flexibility, or eventually consistent models.

Document Stores (e.g., MongoDB): Good for flexible product catalogs, user profiles, content management where schemas might evolve frequently.
Key-Value Stores (e.g., Redis, DynamoDB): Excellent for caching, session management, shopping cart data due to fast read/write performance.
Graph Databases (e.g., Neo4j): Useful for recommendation engines, social connections, complex product relationships.
Search Engines (e.g., Elasticsearch): Critical for product search and filtering, often replicating and denormalizing data from the primary database.

Many high-traffic e-commerce platforms adopt a polyglot persistence strategy, combining different database types to leverage their respective strengths.

← All System Design questions