Best approach for sharing video raw data between multiple process in Gstreamer and RTSP compatible

06:39 02 Mar 2026

Question Body

I am working on an embedded Linux project (i.MX8 platform) where I need to share raw camera video across multiple processes using a producer/consumer architecture.

Architecture

I have:

Producer process
- Captures camera using v4l2src
- Sends raw NV12 1080p30 frames to shared memory using shmsink
Multiple consumer processes
- One records to file
- One performs AI inference
- One (or more) provides RTSP streaming
- All consume raw frames via shmsrc

Example producer pipeline:

gst-launch-1.0 \
v4l2src device=/dev/video3 io-mode=dmabuf ! \
video/x-raw,format=NV12,width=1920,height=1080,framerate=30/1 ! \
queue ! \
shmsink socket-path=/tmp/cam.sock wait-for-connection=false sync=false

Example consumer (RTSP branch):

shmsrc socket-path=/tmp/cam.sock is-live=true do-timestamp=true ! \
video/x-raw,format=NV12,width=1920,height=1080,framerate=30/1 ! \
videoscale ! videorate ! \
v4l2h264enc ! h264parse ! rtph264pay

Problem

CPU usage is extremely high.

Producer alone consumes ~100% (of one core)
Producer + 3 RTSP branches consume ~360% of 400% total CPU (quad core system)

This is unexpected because:

Encoding is hardware accelerated (v4l2h264enc)
Capture uses io-mode=dmabuf
No software decoding is involved

However, sharing raw frames across processes appears very expensive.

Observations

Each branch reads raw 1080p frames from shared memory.
Each branch performs scaling and framerate conversion independently.
Shared memory causes one memory copy per branch.
Total memory bandwidth becomes very high:
- 1920×1080×1.5 bytes ≈ 3MB per frame
- 3MB × 30 fps × multiple branches

Using tee inside a single process reduces CPU usage significantly, but:

Static tee does not meet my requirements.
I need dynamic branch creation and removal.
I need true inter-process separation (producer/consumer model).
appsrc introduces heavy copying and does not share buffers across processes.
I want to strictly avoid re-encoding and re-decoding between processes.
UDP/RTP transport is not preferred because it requires encoding.

Question

What is the best architecture to:

Share raw camera video across multiple processes
Avoid excessive CPU usage
Avoid repeated encoding/decoding
Allow dynamic branch creation
Maintain process isolation

Specifically:

Is shmsink/shmsrc inherently copy-heavy and memory-bandwidth limited?
Is there a way to share DMABUF across processes without copying?
Would encoding once and sharing compressed stream be the only scalable solution?
Are there NXP/i.MX specific mechanisms (DMA-BUF export, v4l2 memory sharing, imx plugins) better suited for this?
Is there a recommended design pattern for this use case on embedded systems?

Goal

My goal is to:

Avoid multiple encode/decode cycles
Avoid unnecessary copies
Keep CPU usage minimal
Support dynamic consumer processes

Any architectural guidance or NXP-specific recommendations would be highly appreciated.

gstreamer ipc shared-memory rtsp ip-camera