Design Booking System (Inventory & Reservations)

Problem Context

🏨 Hotel booking platforms like Booking.com and Airbnb manage millions of properties worldwide. Booking.com alone processes over 1.5 million room nights daily across 500,000+ properties.

Unlike ticketing systems where each seat is unique, hotel inventory is interchangable. The challenge here is managing date-range availability across millions of rooms while handling concurrent bookings.

Functional Requirements

Hotel booking systems have a massive scope, so you need to narrow it down with your interviewer. For us, we will focus on inventory and reservation.

Core Functional Requirements

FR1: Users should be able to search for room availability across a date range.
FR2: Users should be able to reserve rooms (temporarily hold inventory).
FR3: Users should be able to confirm booking (payment) or cancel.

Out of Scope:

Property management and listing creation.
Pricing engine and dynamic rate calculation.
Reviews and ratings.
Recommendations and personalization.
Payment processing internals.

Booking systems have complex business logic around pricing, taxes, and policies. Acknowledging these as out of scope shows you understand the full product.

Non-Functional Requirements

Core Non-Functional Requirements

NFR1: Search queries should return within 500ms.
NFR2: Zero double-booking of the same room-night.
NFR3: System should handle 100K+ concurrent searches.
NFR4: System should be highly available (99.9%+).

Here's what we have so far:

Let's get to work!

The Set Up

Planning the Approach

Based on our requirements, we have two fundamental challenges:

The Date-Range Problem: A booking for Jan 5-8 must check and reserve inventory across ALL three nights (5th, 6th, 7th). One unavailable night means the entire booking fails.
Interchangability: Unlike seats, rooms of the same type are interchangeable. For availabile inventory,we track counts (5 King Rooms available), not specific room assignments.

In the interview, start with a working system first. Acknowledge the flaws but get something functional before optimizing (you can in the deep dive).

Defining the Core Entities

For this problem, we have several entities to work with:

Property: A hotel or vacation rental with location, amenities, and policies.
Room Type: A category of room at a property (ex: "King Suite", "Double Queen").
Inventory: Available count for a specific room type on a specific date.
Reservation: A hold on inventory across a date range.
Booking: A confirmed, paid reservation.

API Interface

Our APIs naturally split into search (read-heavy, needs speed) and booking (write-heavy, needs consistency).

Search APIs (Read-Heavy): FR1

These power the search experience when users look for hotels.

1. Search Available Properties

GET /search?location=NYC&checkIn=2024-03-15&checkOut=2024-03-18&guests=2

Response:
{
  "properties": [
    {
      "propertyId": "prop_marriott_123",
      "name": "Marriott Times Square",
      "location": "New York, NY",
      "roomTypes": [
        {
          "roomTypeId": "rt_king_suite",
          "name": "King Suite",
          "available": true,
          "pricePerNight": 299
        },
        {
          "roomTypeId": "rt_double_queen", 
          "name": "Double Queen",
          "available": false,
          "pricePerNight": 249
        }
      ]
    }
  ]
}

2. Get Detailed Availability

GET /properties/{propertyId}/availability?checkIn=2024-03-15&checkOut=2024-03-18

Response:
{
  "propertyId": "prop_marriott_123",
  "dates": {
    "2024-03-15": {"rt_king_suite": 5, "rt_double_queen": 0},
    "2024-03-16": {"rt_king_suite": 3, "rt_double_queen": 2},
    "2024-03-17": {"rt_king_suite": 4, "rt_double_queen": 1}
  },
  "bookableRoomTypes": ["rt_king_suite"]
}

Why show per-date breakdown? The UI might show "Only 3 left for March 16!" as urgency messaging. The bookableRoomTypes array contains only room types available on ALL dates.

Booking systems don't assign rooms until check-in at the hotel. This allows us to keep counts and decrement on creation of reservations.

Booking APIs (Write-Heavy): FR2, FR3

These handle the actual reservation and confirmation flow.

1. Create Reservation

POST /reservations

Request:
{
  "propertyId": "prop_marriott_123",
  "roomTypeId": "rt_king_suite",
  "checkIn": "2024-03-15",
  "checkOut": "2024-03-18",
  "guestId": "user_456"
}

Response:
{
  "reservationId": "res_abc789",
  "status": "pending",
  "expiresAt": "2024-03-01T10:30:00Z",  // 30 min to complete payment
  "totalPrice": 897,
  "nights": 3
}

What happens here? Inventory is decremented for all three nights. Other users now see one fewer King Suite available, and we give the guest 30 minutes to pay.

2. Confirm Booking (Payment)

POST /bookings

Request:
{
  "reservationId": "res_abc789",
  "paymentToken": "tok_visa_4242"
}

Response:
{
  "bookingId": "book_xyz123",
  "status": "confirmed",
  "confirmationCode": "MRQ-789456",
  "checkIn": "2024-03-15",
  "checkOut": "2024-03-18"
}

What happens here? Reservation converts to a confirmed booking. The inventory stays decremented.

3. Cancel Reservation/Booking

DELETE /reservations/{reservationId}

Response:
{
  "status": "cancelled",
  "inventoryRestored": true,
  "refundAmount": 897
}

Cancelled reservations must restore inventory. If a King Suite on March 16 was decremented, it gets incremented back.

High-Level Design

Let's start with our functional requirements:

FR1: Search room availability (date range)
FR2: Reserve rooms (hold inventory)
FR3: Confirm booking or cancel

We'll start with the simplest design and fix problems as we go. In an interview, you can start at Diagram 3.

1) The Simplest System: FR2, FR3 (Reservations)

Let's start with the simplest system where a user picks a room and we mark it booked.

Table: rooms

room_type_id	date	booked
king_suite	2024-03-15	true
king_suite	2024-03-16	true
king_suite	2024-03-17	true

This works for a single room. The user books, we mark it.

But what breaks?

Multiple rooms of same type: Hotels have 50 King Suites, not 1. We need to track counts, not boolean flags.

2) Track Inventory Counts: FR2 (Reservations)

Instead of booked=true, we track how many rooms are available per night:

Table: inventory

room_type_id	date	available
king_suite	2024-03-15	5
king_suite	2024-03-16	3
king_suite	2024-03-17	4

Now we can have 50 King Suites and track how many are still available each night.

But what breaks?

Partial booking: What if March 16 has 0 available but March 15 and 17 have 5? We'd decrement 15 and 17, then fail on 16. Now inventory is wrong.
Race conditions: Two users book the last room at the same time. Both decrements succeed, and we would sell the same room twice.

3) Atomic Date-Range Booking: FR2 (Reservations) ✅

We need to ensure ALL dates have availability before decrementing ANY of them. This must be atomic:

Key Points:

FOR UPDATE locks the rows during the transaction. Other transactions wait.
All-or-nothing: either all dates are decremented, or none are.

This is pessimistic locking. Unlike Ticketmaster where optimistic locking works (fast fail on contention), hotel bookings span multiple rows that must all succeed together.

What breaks?

Search is slow: To check availability for a search, we need to scan inventory rows for every property and room type and date. That's potentially millions of rows.

4) Separate Read and Write Paths: FR1 (Search) + FR2

Reads and writes have very different needs:

Writes (reservations): Must be strongly consistent, can be slower
Reads (search): Need to be fast, can tolerate slight staleness

The solution is to use a cache for reads that we update after each write.

How it works:

Write path: Booking Service writes to PostgreSQL (source of truth), then updates Redis
Read path: Search Service reads from Redis (fast, handles 100K+ concurrent users)

The cache might be stale for a few milliseconds between the DB write and cache update, but that's acceptable for search results

What breaks?

Stale search results: The user sees "5 available", clicks to book, but in the 200ms since their search, someone else booked the last one.

5) Handle Stale Availability: FR1 (Search)

The staleness problem is inherent in any cached system. We handle it with a split check:

Why this is acceptable:

Most searches result in browsing, not immediate booking
The window of staleness is typically < 1 second
When someone does try to book, we always verify against truth

What breaks?

We still need to actually find properties efficiently. We will handle this in the deep dives.

6) Complete System: FR1, FR2, FR3 ✅

Now let's add a proper Search Service that can filter properties before checking availability:

This is our baseline architecture.

Now we can address our non-functional requirements and more in the deep dives:

NFR2 (Consistency): How exactly do we prevent double-booking?
NFR1 (Latency): How do we make search fast across dates?
NFR3 (Scale): How do we handle high-demand properties?

Potential Deep Dives

1) How do we prevent double-booking?: NFR2 (Zero Double-Booking)

In Diagram 3, we showed the transaction. But what actually prevents two concurrent bookings for the last room?

The key is FOR UPDATE: This lock prevents two transactions from reading the same row simultaneously. The second transaction waits until the first commits or rolls back.

Without FOR UPDATE, both users would read available=1, both would decrement, and you'd end up with available=-1 (double-booked).

2) How do we efficiently model date-range inventory?: NFR1 (Search Latency)

Storing one row per (room_type, date) seems simple, but a hotel with 10 room types bookable for the next 365 days = 3,650 rows per property. Across 500K properties = 1.8 billion rows.

For the cache layer (Redis), we use a simple key structure:

Key:   inventory:{property_id}:{room_type_id}:{date}
Value: 5

Example:
  inventory:prop_123:king_suite:2024-03-15 = 5
  inventory:prop_123:king_suite:2024-03-16 = 3

Checking availability for a date range means fetching multiple keys and checking all that are ≥ 1.

3) How do we handle reservation expiry?: NFR4 (Availability)

Reservations that expire (user abandons checkout) must restore inventory. The most effective case for us is using many expiry workers0.

The expiry worker should process in batches and use transactions to ensure inventory restoration is atomic with marking the reservation expired.

4) How do we handle cancellations?

Unlike expiry (system-driven), cancellations are user-driven and may have business rules:

Inventory restoration must be in the same transaction as the status update. If the transaction fails after incrementing inventory but before marking cancelled, you'd have double inventory.

5) How do we handle high-demand properties?: NFR3 (Scale)

Some properties get massive traffic (popular resort during holiday season). The FOR UPDATE lock becomes a bottleneck:

Sharding adds complexity, so only use it for properties with measured high contention. Most properties DO NOT need this.

What to Expect?

That covered a lot! Here's what you should focus on at each level.

Mid-level

Breadth over Depth (80/20): Focus on getting the basic flow working. Show you understand why date-range bookings are complex and how atomic transactions solve the problem.
Expect Probing: "What happens if one date in the range is unavailable?" "Why do we need transactions?" You should explain the partial-booking problem clearly.
Assisted Driving: You lead the initial design, but the interviewer may guide you toward the inventory model and cache separation.
The Booking System Bar: Demonstrate understanding of the core reservation flow (Diagrams 1-4). Articulate why fungible inventory with counts differs from unique-seat systems.

Senior

Balanced Breadth & Depth (60/40): Go deeper on inventory modeling, cache consistency, and the two-phase availability check. Explain trade-offs in row-per-date vs. range-based storage.
Proactive Problem-Solving: Identify the stale-cache problem before the interviewer mentions it. Bring up how search might show availability that's already gone.
Articulate Trade-offs: "Row-per-date is simple but uses more storage. For 500K properties, that's billions of rows, but storage is cheap and queries are straightforward enough so we don't need that."
The Booking System Bar: Complete the full system (Diagram 6) and proactively dive into 2 deep dives: locking mechanics (Deep Dive 1) and inventory modeling (Deep Dive 2).

Staff

Depth over Breadth (40/60): The interviewer assumes you know the basics. Spend ~15 minutes getting to the complete system, then go deep on interesting scaling problems.
Experience-Backed Decisions: You've built or operated similar systems. You know when to shard inventory and can discuss real-world contention scenarios.
Full Proactivity: You drive the entire conversation. You suggest alternatives: "We could use Redis for locks, but database-level locking is simpler and sufficient here."
The Booking System Bar: Address all deep dives without prompting: locking, inventory modeling, expiry handling, cancellations, and high-demand sharding. Convey that given enough time, you could actually build this.

Do a mock interview of this question with AI & pass your real interview. Good luck! 🏨

Design Booking System (Inventory + Reservations)

Design Booking System (Inventory & Reservations)

Problem Context

Functional Requirements

Core Functional Requirements

Out of Scope:

Non-Functional Requirements

Core Non-Functional Requirements

The Set Up

Planning the Approach

Defining the Core Entities

API Interface

Search APIs (Read-Heavy): FR1

Booking APIs (Write-Heavy): FR2, FR3

High-Level Design

1) The Simplest System: FR2, FR3 (Reservations)

2) Track Inventory Counts: FR2 (Reservations)

3) Atomic Date-Range Booking: FR2 (Reservations) ✅

4) Separate Read and Write Paths: FR1 (Search) + FR2

5) Handle Stale Availability: FR1 (Search)

6) Complete System: FR1, FR2, FR3 ✅

Potential Deep Dives

1) How do we prevent double-booking?: NFR2 (Zero Double-Booking)

2) How do we efficiently model date-range inventory?: NFR1 (Search Latency)

3) How do we handle reservation expiry?: NFR4 (Availability)

4) How do we handle cancellations?

5) How do we handle high-demand properties?: NFR3 (Scale)

What to Expect?

Mid-level

Senior

Staff