How to Build Long-Running Tasks with BullMQ (Queue + Worker)

Background

Some tasks should not run inside an HTTP request:

Email delivery (verification, tracking links)
Reconciliation / scheduled sweeps (fallback when webhooks are missed)
Batch processing (imports, exports, backfills)

The common failure mode is predictable: the request path gets slow, timeouts increase, and downstream providers get spammed or rate-limited.

A reliable approach is a Queue + Worker model (BullMQ on top of Redis): enqueue work fast, return the HTTP response, and let background workers process jobs with retries/backoff, concurrency limits, and lifecycle events.

Goals

Reduce HTTP latency: return quickly while heavy work runs asynchronously.
Reliability: automatic retries/backoff; job lifecycle decoupled from requests.
Rate-limit and concurrency control: prevent spam and protect downstream providers (e.g. email API, payment gateway).
Operational visibility: logs/metrics for job states and failures.

Building blocks (code map)

Background entrypoint: a NestJS module that imports queue modules (e.g. EmailQueueModule, PaymentsReconcileQueueModule)
Queue names / job names: central constants so producers/consumers share the same identifiers (e.g. QueueName.email, QueueName.paymentsReconcile)
Job payload types: explicit interfaces per job so you can evolve payloads safely (e.g. IVerifyEmailJob, IOrderTrackingLinkJob)
Redis config: a single BullMQ Redis connection shared by producers, workers, and schedulers

Producer / Consumer model

Producer (enqueue job)

In feature service layers, inject the queue via @InjectQueue(QueueName.…).
Enqueue with queue.add(jobName, data, opts).
HTTP flows do not wait for job completion.

Consumer/Worker (process job)

Each queue has a @Processor(QueueName.…) that implements WorkerHost.process(job).
Workers run in the same process as the NestJS API (there is no separate worker service).

If jobs become CPU-heavy or very high volume, running workers in-process can degrade API throughput. Consider splitting workers into a dedicated process/service when you hit that threshold.

Queue: `email` (async email delivery)

Registration: apps/api/src/background/queues/email-queue/email-queue.module.ts
- Configures BullMQ streams events (maxLen: 1000)
Processor: apps/api/src/background/queues/email-queue/email.processor.ts
- concurrency: 1 (sequential delivery)
- limiter: { max: 1, duration: 150 } (rate limit ~ 1 job / 150ms)
- removeOnComplete: garbage-collect completed jobs by age + count to keep Redis bounded
- Dispatch by job.name:
  - email-verification → EmailQueueService.sendEmailVerification(...)
  - order-tracking-link → EmailQueueService.sendOrderTrackingLink(...)
Queue events listener: apps/api/src/background/queues/email-queue/email-queue.events.ts
- Logs lifecycle events (added/waiting/active/completed/failed)
Business email send: apps/api/src/background/queues/email-queue/email-queue.service.ts
- Calls MailService (Brevo integration)

Queue: `payments-reconcile` (scheduled reconciliation / long-running batch)

Registration: a queue module that registers a reconcile queue + processor
Scheduler (cron):
- On module init, if PAYMENTS_RECONCILE_ENABLED=true:
  - upsertJobScheduler(..., { pattern: '0 * * * *' }, { name: payments-sweep })
- Goal: a fallback when webhooks are missed (or delayed) by reconciling pending orders.
Processor:
- concurrency: 1
- Batch:
  - WINDOW: last 24 hours
  - LIMIT: 200 orders / run
- For each order: PaymentsService.reconcileProviderOrder(orderId) + logs outcomes

Retry / backoff policy (examples)

Email jobs are enqueued in:
- apps/api/src/api/auth/auth.service.ts (register → enqueue email-verification)
- apps/api/src/api/payments/payments.service.ts (payment success → enqueue order-tracking-link)
Common pattern:
- attempts: 3
- backoff: { type: 'exponential', delay: 60000 }

Observability

Worker events (processor-level): log active/progress/completed/failed/stalled/error
Queue events (queue-level stream): log added/waiting/active/completed/failed
Overall monitoring stack: Grafana Cloud + Loki/Mimir (see Monitoring Stack section)

Execution flows (sequence diagrams)

A) Sign-up → enqueue email verification

Rendering diagram…

B) Payment confirmed → enqueue order tracking email

Rendering diagram…

C) Hourly payment reconcile sweep (missed webhook fallback)

Rendering diagram…

Scaling & pitfalls (important)

In-process workers: because processors share the API process, heavy jobs or slow I/O can impact API throughput. If background volume grows, consider moving workers to a separate service/process to scale independently.
Scheduler & multi-replica: the scheduler runs on module init. With multiple API replicas sharing the same Redis, ensure you don’t accidentally “double schedule”. Using upsertJobScheduler with a stable SCHEDULER_ID reduces risk, but validate in HA setups.
DLQ/alerts: if failures are mostly log-driven, maturity improvements include dashboards/alerts for failed jobs (Bull Board + alert rules on Loki/Mimir/Sentry).

How to Build Long-Running Tasks with BullMQ (Queue + Worker)

Background

Goals

Building blocks (code map)

Producer / Consumer model

Producer (enqueue job)

Consumer/Worker (process job)

Queue: email (async email delivery)

Queue: payments-reconcile (scheduled reconciliation / long-running batch)

Retry / backoff policy (examples)

Observability

Execution flows (sequence diagrams)

A) Sign-up → enqueue email verification

B) Payment confirmed → enqueue order tracking email

C) Hourly payment reconcile sweep (missed webhook fallback)

Scaling & pitfalls (important)

How to Build Long-Running Tasks with BullMQ (Queue + Worker)

Background

Goals

Building blocks (code map)

Producer / Consumer model

Producer (enqueue job)

Consumer/Worker (process job)

Queue: email (async email delivery)

Queue: payments-reconcile (scheduled reconciliation / long-running batch)

Retry / backoff policy (examples)

Observability

Execution flows (sequence diagrams)

A) Sign-up → enqueue email verification

B) Payment confirmed → enqueue order tracking email

C) Hourly payment reconcile sweep (missed webhook fallback)

Scaling & pitfalls (important)

Queue: `email` (async email delivery)

Queue: `payments-reconcile` (scheduled reconciliation / long-running batch)

Queue: `email` (async email delivery)

Queue: `payments-reconcile` (scheduled reconciliation / long-running batch)