ClickHouse event analytics database design
10:23 08 Apr 2026

I’m designing a high-volume analytics system and I’m trying to choose the data model before the dataset becomes too large to change easily.

Current setup:

  • Custom app events are stored in ClickHouse

  • Device and user records are stored in MongoDB

  • MongoDB is where current device/user dimensions live, such as:

    • os_name

    • os_version

    • device_model

    • app_version

    • current user_uuid

  • ClickHouse is the event store

Important concepts:

  • device_uuid

  • user_uuid

  • client event timestamp

  • server-side received_at

Important product behavior:

  • A device can start as anonymous

    For example, at first user_uuid = device_uuid

  • Later the same device can become a real user

    For example, user_uuid = John

  • When that happens, past anonymous events should also be attributed to the real user

Because of that, storing user_uuid directly in the huge ClickHouse events table creates expensive update/mutation problems.

So one design I’m considering is:

events table in ClickHouse, immutable:

project_id 
device_uuid 
timestamp 
received_at 
event_code 
session_id 
properties

Separate ownership table in ClickHouse:

project_id 
device_uuid 
valid_from 
valid_to 
user_uuid

Then user resolution is done by joining events to ownership intervals using device_uuid + received_at.

Short example:

  • 10:00 device D1 is anonymous, so owner is D1

  • 10:05 and 10:06 events are inserted for device D1

  • 10:10 user logs in, now owner becomes John

  • Those past anonymous events should also appear under John

My concern is not ingestion. My concern is analytics later.

For example, I may need queries like:

  • unique users who did app_open -> permission_granted

  • windowFunnel by user

  • retention / cohort analysis by user

  • “show events from iOS 18.6 devices for users who later converted”

  • segment by event properties + resolved user + device/app dimensions

  • high-volume dashboards with many users and many events

Questions:

  1. Is keeping user_uuid out of the main events table still the right approach at large scale?

  2. Is a separate ownership table in ClickHouse a practical production-ready model for this?

  3. If device/app/os dimensions stay in MongoDB while events stay in ClickHouse, is that a bad sign for analytics?
    For example, would queries like “events from iOS 18.6 devices for converted users” become awkward because they depend on both MongoDB dimensions and ClickHouse events?

Notes:

  • I want to choose the most optimized approach in terms of RAM, CPU, and disk usage.

  • I also considered storing `user_uuid` directly in the `events` table and using ClickHouse lightweight updates when updates are needed, but since that feature is still experimental, it feels risky.

  • The strange part is that Claude, ChatGPT, and Gemini all suggested the separate ownership table approach above, but I couldn’t find any article or documentation describing a similar architecture specifically for ClickHouse.

database mongodb clickhouse