VuThanhThien®
LinkedIn

© 2026 Vũ Thanh Thiên. All rights reserved.

GitHubLinkedInFacebookEmail
Back to Blog
May 9, 2026
5 min read

How to Build Long-Running Tasks with BullMQ (Queue + Worker)

BackendNestJSBullMQRedisQueuesSystem DesignReliability

Background

Some tasks should not run inside an HTTP request:

  • Email delivery (verification, tracking links)
  • Reconciliation / scheduled sweeps (fallback when webhooks are missed)
  • Batch processing (imports, exports, backfills)

The common failure mode is predictable: the request path gets slow, timeouts increase, and downstream providers get spammed or rate-limited.

A reliable approach is a Queue + Worker model (BullMQ on top of Redis): enqueue work fast, return the HTTP response, and let background workers process jobs with retries/backoff, concurrency limits, and lifecycle events.


Goals

  • Reduce HTTP latency: return quickly while heavy work runs asynchronously.
  • Reliability: automatic retries/backoff; job lifecycle decoupled from requests.
  • Rate-limit and concurrency control: prevent spam and protect downstream providers (e.g. email API, payment gateway).
  • Operational visibility: logs/metrics for job states and failures.

Building blocks (code map)

  • Background entrypoint: a NestJS module that imports queue modules (e.g. EmailQueueModule, PaymentsReconcileQueueModule)
  • Queue names / job names: central constants so producers/consumers share the same identifiers (e.g. QueueName.email, QueueName.paymentsReconcile)
  • Job payload types: explicit interfaces per job so you can evolve payloads safely (e.g. IVerifyEmailJob, IOrderTrackingLinkJob)
  • Redis config: a single BullMQ Redis connection shared by producers, workers, and schedulers

Producer / Consumer model

Producer (enqueue job)

  • In feature service layers, inject the queue via @InjectQueue(QueueName.…).
  • Enqueue with queue.add(jobName, data, opts).
  • HTTP flows do not wait for job completion.

Consumer/Worker (process job)

  • Each queue has a @Processor(QueueName.…) that implements WorkerHost.process(job).
  • Workers run in the same process as the NestJS API (there is no separate worker service).

If jobs become CPU-heavy or very high volume, running workers in-process can degrade API throughput. Consider splitting workers into a dedicated process/service when you hit that threshold.


Queue: email (async email delivery)

  • Registration: apps/api/src/background/queues/email-queue/email-queue.module.ts
    • Configures BullMQ streams events (maxLen: 1000)
  • Processor: apps/api/src/background/queues/email-queue/email.processor.ts
    • concurrency: 1 (sequential delivery)
    • limiter: { max: 1, duration: 150 } (rate limit ~ 1 job / 150ms)
    • removeOnComplete: garbage-collect completed jobs by age + count to keep Redis bounded
    • Dispatch by job.name:
      • email-verification → EmailQueueService.sendEmailVerification(...)
      • order-tracking-link → EmailQueueService.sendOrderTrackingLink(...)
  • Queue events listener: apps/api/src/background/queues/email-queue/email-queue.events.ts
    • Logs lifecycle events (added/waiting/active/completed/failed)
  • Business email send: apps/api/src/background/queues/email-queue/email-queue.service.ts
    • Calls MailService (Brevo integration)

Queue: payments-reconcile (scheduled reconciliation / long-running batch)

  • Registration: a queue module that registers a reconcile queue + processor
  • Scheduler (cron):
    • On module init, if PAYMENTS_RECONCILE_ENABLED=true:
      • upsertJobScheduler(..., { pattern: '0 * * * *' }, { name: payments-sweep })
    • Goal: a fallback when webhooks are missed (or delayed) by reconciling pending orders.
  • Processor:
    • concurrency: 1
    • Batch:
      • WINDOW: last 24 hours
      • LIMIT: 200 orders / run
    • For each order: PaymentsService.reconcileProviderOrder(orderId) + logs outcomes

Retry / backoff policy (examples)

  • Email jobs are enqueued in:
    • apps/api/src/api/auth/auth.service.ts (register → enqueue email-verification)
    • apps/api/src/api/payments/payments.service.ts (payment success → enqueue order-tracking-link)
  • Common pattern:
    • attempts: 3
    • backoff: { type: 'exponential', delay: 60000 }

Observability

  • Worker events (processor-level): log active/progress/completed/failed/stalled/error
  • Queue events (queue-level stream): log added/waiting/active/completed/failed
  • Overall monitoring stack: Grafana Cloud + Loki/Mimir (see Monitoring Stack section)

Execution flows (sequence diagrams)

A) Sign-up → enqueue email verification

Rendering diagram…

B) Payment confirmed → enqueue order tracking email

Rendering diagram…

C) Hourly payment reconcile sweep (missed webhook fallback)

Rendering diagram…

Scaling & pitfalls (important)

  • In-process workers: because processors share the API process, heavy jobs or slow I/O can impact API throughput. If background volume grows, consider moving workers to a separate service/process to scale independently.
  • Scheduler & multi-replica: the scheduler runs on module init. With multiple API replicas sharing the same Redis, ensure you don’t accidentally “double schedule”. Using upsertJobScheduler with a stable SCHEDULER_ID reduces risk, but validate in HA setups.
  • DLQ/alerts: if failures are mostly log-driven, maturity improvements include dashboards/alerts for failed jobs (Bull Board + alert rules on Loki/Mimir/Sentry).

Written by Vũ Thanh Thiên