How to share a large CustomObject to workers in Python multiprocessing on Windows (spawn)?
07:09 30 Nov 2025

I'm trying to run calculations using multiple cores in Python on multiple platforms (Linux, macOS, Windows). I need to pass a large CustomClass Object and a dict (both readonly) to all workers. So far I tried to use multiprocessing with Pool. On platforms using fork (Linux, macOS) this results in faster computation. On Windows (spawn) multiprocessing is much slower and memory inefficient. On Windows processes will be spawned sequentially, taking much longer than the actual computation.

import multiprocessing as mp
import numpy as np
import tqdm

class CustomClass:
  def __init(self, data):
    ## assigns multiple huge vectors and objects


class ParallelComputation:
  ...
  def compute(n_jobs=-1):
    custom_obj = CustomCLass(data) # ~1GB of memory
    cfg = dict(some_config=True)
    inputs = [(1, np.random.rand(1, 300)), (1, np.random.rand(1, 300)), ...] # array of tuples
    worker = worker_task

    with mp.Pool(processes=n_jobs, initializer=_set_globals, initargs=(custom_obj, cfg)) as pool:
      for result in tqdm(pool.imap(worker, inputs, chunksize=8), total=n_rows):
          i, row = result
          out[i, :] = row
    ...

  
  def worker_task(args):
    lib = _G_LIB
    cfg = _G_CFG
    # some computation


def _set_globals(custom_obj, cfg):
  global _G_LIB, _G_CFG
  _G_LIB = custom_obj
  _G_CFG = cfg

When running my computation I also have to guard it on Windows:

if __name__ == "__main__":
  parallel = ParallelComputation(...)
  parallel.compute()

I also tried joblib, which spawns worker processes in parallel, but will also consume a lot of memory. I do not change the custom_obj or cfg, so SharedMemory or a ProcessManager seem not suitable. How can I use multiprocessing with a large CustomClass Object efficiently on Windows?

python parallel-processing multiprocessing python-multiprocessing joblib