Skip to main content

14 posts tagged with "Node.js"

View all tags

dotenv silently truncates values at #? Wrap .env values in double quotes

ยท 4 min read

Encountered this while building AI Ops โ€” LLM-powered analytics that surfaces market trends, user behavior, and sales data for precise operational strategy.

TL;DRโ€‹

dotenv treats # in unquoted values as an inline comment. KEY=value#hash is actually loaded as value, with #hash dropped โ€” no warning, no error. Fix: wrap any .env value containing #, spaces, or special characters in double quotes โ€” KEY="value#hash".

Symptomโ€‹

Backend calls to an upstream service keep returning 401 Invalid credentials:

POST /api/v1/dag/trigger โ†’ 500
Stack: Airflow JWT auth failed (401): {"detail":"Invalid credentials"}
at getJwtToken (airflow-client.ts)

Investigation shows the password written in .env is 24 chars and contains # and &:

AIRFLOW_PASSWORD=ooGR0^kThVI&ag#RyCpUmbIr

But the value loaded into process.env.AIRFLOW_PASSWORD is only 10 chars long โ€” #RyCpUmbIr is gone. Calling the upstream auth endpoint with the full password from CLAUDE.md returns 201; calling it with the truncated value from .env returns 401. The credentials are fine; the value loaded from .env is truncated.

Root Causeโ€‹

dotenv follows shell convention: anything after # in an unquoted value is treated as an inline comment.

# .env
AIRFLOW_PASSWORD=ooGR0^kThVI&ag#RyCpUmbIr
# dotenv actually parses:
# AIRFLOW_PASSWORD = "ooGR0^kThVI&ag"
# #RyCpUmbIr โ† dropped

This behavior is documented, but there is no warning or log. What the runtime gets is a silently truncated string. Combined with shell-escaping semantics for &, spaces, and $, the bug is even more hidden:

CharacterBehavior when unquoted
#Everything after is treated as inline comment, truncated
(space)Everything after is dropped
$VARTriggers variable expansion (may resolve to empty string)
&Shell background operator; dotenv usually preserves it but it bites again when joined into shell commands

Strong-random strings like JWT_SECRET, API_KEY, and DATABASE_URL frequently contain # โ€” high-risk territory.

Solutionโ€‹

Wrap any value with special characters in double quotes in .env:

# .env
AIRFLOW_PASSWORD="ooGR0^kThVI&ag#RyCpUmbIr"
JWT_SECRET="abc#def$ghi jkl"
DATABASE_URL="postgres://user:p@ss#word@host:5432/db"

Restart the service to apply:

# pm2
pm2 restart analytics-api --update-env

# docker compose
docker compose restart api

# systemd
sudo systemctl restart api

Why it works: when dotenv sees double quotes, it reads the value literally up to the closing quote. #, spaces, and $ are not special-cased (unless you explicitly enable expand). Verify the loaded value immediately after the fix:

// Validate critical env vars at startup to catch truncation early
const required = ['AIRFLOW_PASSWORD', 'JWT_SECRET', 'DATABASE_URL'] as const;
for (const key of required) {
const v = process.env[key];
if (!v || v.length < 16) {
throw new Error(`${key} not loaded correctly (length ${v?.length ?? 0}); check .env quoting`);
}
}

This turns dotenv's silent failure into a startup failure, exposing the bug immediately the next time.

Caveats

  • Single quotes also work, but dotenv does not expand $VAR inside single quotes โ€” it does inside double quotes. For passwords you usually want literal values: prefer double quotes + avoid writing ${...}.
  • dotenv versions: v15+ behaves as described; earlier versions (pre-v8) handle # slightly differently. Check the CHANGELOG before upgrading.
  • Docker / Kubernetes Secrets: variables injected via environment: don't go through dotenv and aren't affected. Only .env files and dotenv.config() paths are.
  • CI environments: GitHub Actions and GitLab CI inject secrets into the env context, also bypassing dotenv.

FAQโ€‹

Why does a password with # in .env get shorter?โ€‹

dotenv treats everything after # as an inline comment by default and drops it. Unquoted KEY=value#hash is loaded as just value, with no error or log. Wrap the value in double quotes โ€” KEY="value#hash" โ€” to preserve the full content.

How do I debug dotenv not working?โ€‹

Three steps: first confirm dotenv.config() runs before all imports (ES Module imports are hoisted statically โ€” see debugging silent JWT signature failures); then verify .env values have no unescaped # or spaces; finally print process.env.XXX length and characters at startup and diff them char-by-char against the .env source file.

CCLEE

Independent developer, 24 years in e-commerce, focused on grounding AI in real business scenarios.

Work with me

Node.js AsyncLocalStorage Returns undefined in a Callback? EventEmitter Escapes Its Context

ยท 5 min read

The request-logging middleware reads AsyncLocalStorage's traceId inside the res.on('finish') callback, and getStore() returns undefined โ€” every response log ends up missing its traceId.

Encountered this while building the ecommerce data collection tool for a client โ€” the server uses ALS to carry each request's traceId through the entire handling chain, but response logs stubbornly refused to correlate. The culprit turned out to be "late callbacks" losing the context.

TL;DRโ€‹

EventEmitter callbacks like res.on('finish') run outside the async context they were registered in, so als.getStore() can't find the request's store. The most reliable fix is to capture the value into a closure variable during the synchronous segment and use that closure inside the callback; when you need the full store, rebuild the context with als.run(store, fn) inside the callback.

The Problemโ€‹

An innocent-looking request-logging middleware:

// middleware/requestLog.js
import { als } from '../utils/als.js';

app.use((req, res, next) => {
res.on('finish', () => {
const store = als.getStore();
logger.info({
traceId: store?.traceId, // always undefined in the response log
statusCode: res.statusCode,
}, 'request');
});
next();
});

The middleware order is fine, the traceId is readable everywhere else in the request chain (routes, business logic), but not inside res.on('finish'). The really confusing part: move als.getStore() into the synchronous segment before next(), and it has a value.

Root Causeโ€‹

AsyncLocalStorage relies on Node's async_hooks to bind the store to the currently active async context and propagate it down the async call chain. The semantics of als.run(store, fn) are: during fn's execution (and any async tasks it spawns), getStore() returns this store.

The problem is EventEmitter. res.on('finish', cb) registers cb as a listener, to be fired by EventEmitter's dispatch loop after the response is sent. The async context that fires cb is the one active where dispatch happens โ€” not the request's context. And since the response is usually sent after the request-handling chain, the als.run scope for that request may already have exited.

So als.getStore() inside cb reads the store of "whatever context is active right now," which doesn't belong to this request โ€” the result is undefined (or worse, a different request's store).

Any callback with "registered in one context, fired in another" has this trap: res.on('finish'), once, some setTimeout/setInterval, chrome.alarms listeners, and so on.

Solutionโ€‹

Two patterns, depending on what you need.

If your callback only needs a couple of values from the store (most often just traceId), the simplest and most robust approach is to capture them during the synchronous segment โ€” when the store is guaranteed alive โ€” into a closure, and use that closure in the callback without ever touching ALS:

app.use((req, res, next) => {
// Synchronous segment: we're inside the als.run scope, getStore() always has a value
const traceId = als.getStore()?.traceId;
const start = Date.now();

res.on('finish', () => {
// Use the closure's traceId, never touch ALS
logger.info({
traceId, // reliably present
statusCode: res.statusCode,
durationMs: Date.now() - start,
}, 'request');
});

next();
});

This swaps the uncertainty of "is the async context still alive" for a deterministic closure reference. When the callback fires no longer matters โ€” the value is already captured.

Pattern B: rebuild the context with als.runโ€‹

When the callback invokes a blob of code that internally depends on getStore() (a logger mixin, Sentry scope injection), rewriting each call to use a closure is impractical. Instead, rebuild the context at the callback's entry:

res.on('finish', () => {
const traceId = capturedTraceId; // value captured in the synchronous segment
if (traceId) {
// Re-establish the ALS context inside the callback so record()'s internal getStore() works
als.run({ traceId }, () => record(res, start));
} else {
record(res, start);
}
});

als.run(store, fn) creates a new, independent async context, binds the store to it, and makes it visible to fn and every async task it spawns. It's safer than als.enterWith โ€” which mutates the "current shared context" and causes cross-talk under concurrency. That's a separate trap, covered in AsyncLocalStorage reads the wrong value under concurrency? Replace enterWith with run.

Caveats

  • To tell whether a callback will lose context, ask whether registration and firing are separated. res.on('finish'), once, and cross-tick setTimeout are suspect; await and fetch().then() inherit naturally along the async chain and need no handling.
  • Prefer pattern A. It reduces the problem to an ordinary closure โ€” best readability, no implicit "context rebuild" behavior. Only reach for pattern B when the callback wraps a lot of existing code that depends on getStore().
  • Don't patch this with als.enterWith in the callback โ€” it mutates the shared parent context under concurrency and causes cross-talk, a far harder bug to diagnose than a lost context.

FAQโ€‹

Why can't I read AsyncLocalStorage inside the res.on('finish') callback?โ€‹

res.on('finish', cb) registers cb as an EventEmitter listener that only fires after the response is sent. The async context active when it fires is the dispatch context, not the request's, and the request's als.run scope may have already exited โ€” so getStore() returns undefined.

How do I make an EventEmitter callback see the AsyncLocalStorage context again?โ€‹

The simplest way is to capture the value into a closure variable during the synchronous segment and use the closure in the callback. If the callback wraps a lot of code that depends on getStore(), rebuild the context at the callback's entry with als.run(store, fn). Prefer the former; reserve the latter for retrofitting existing logic.

CCLEE

Independent developer, 24 years in e-commerce, focused on grounding AI in real business scenarios.

Work with me

Node.js AsyncLocalStorage Reads the Wrong Value Under Concurrency? Replace enterWith with run

ยท 6 min read

A BullMQ worker with concurrency: 3 goes live, and the logs and Sentry reports of concurrent jobs are all crossed โ€” job A's error stack lands under job B's traceId, and you spend ages staring at the wrong trace.

Encountered this while building the AI Analytics platform โ€” intelligent analysis of market trends, user behavior, and sales data for ecommerce operations. The backend runs analysis jobs concurrently on BullMQ, each stamping a traceId via AsyncLocalStorage for log correlation, and the moment concurrency ramped up the traceIds started crossing.

TL;DRโ€‹

als.enterWith(store) mutates the currently active shared parent context, so concurrent tasks overwrite each other when they interleave at an await โ€” the last write wins, and every interleaved task reads the same wrong value. The fix is to switch to als.run(store, fn) and wrap the entire processor in it: it creates a fresh, independent context per call and restores the previous one on exit, so no amount of concurrency causes cross-talk.

The Problemโ€‹

Each job stamps its own traceId into ALS on entry, and the processor (which contains awaits) reads that traceId for logging and Sentry reporting:

// worker.js โ€” cross-talk version
new Worker('analytics', (job) => {
als.enterWith({ traceId: job.data.executionId }); // stamp on entry
return processJob(job); // internally: multiple awaits + logger.info({ traceId: als.getStore().traceId })
}, { concurrency: 3 });

It works in isolation, but turn on concurrency: 3 and the weirdness begins:

# job A (executionId: aaa) and job B (executionId: bbb) enter almost simultaneously
[worker] job A start traceId=aaa
[worker] job B start traceId=bbb
# A hits an await and yields; B calls enterWith({bbb}); when A resumes:
[worker] job A step2 traceId=bbb โ† crossed into B
[worker] job A error traceId=bbb โ† reported under B's trace in Sentry

Not intermittent โ€” it reproduces deterministically whenever there's concurrency, and the traceId always equals "the value of the most recent enterWith."

Root Causeโ€‹

The key is that enterWith doesn't write to a "this-call-only" context โ€” it writes to the currently active shared parent context.

AsyncLocalStorage contexts form a tree: one async context can be shared by multiple child tasks. The semantics of als.enterWith(store) are "write this store onto the context I'm currently in." When the worker runs with concurrency: 3, the three job processors share the same parent context (the worker loop's context), so:

  • job A calls enterWith({aaa}) โ†’ the shared context is written to aaa;
  • job A awaits and yields;
  • job B calls enterWith({bbb}) โ†’ the same shared context is overwritten to bbb;
  • job A resumes and reads getStore() โ†’ it gets bbb.

That's classic last-write-wins cross-talk. The more await points and the higher the concurrency, the more frequent the overwrites and the worse the corruption. Under concurrency: 1 it looks fine simply because there's no interleaving โ€” which is exactly what makes it so dangerous: single-threaded debugging during development never surfaces it.

The Node docs are explicit about this: enterWith() can have unintended side effects, and recommends run() instead.

Solutionโ€‹

Swap enterWith for run, and wrap the entire processor (not just one segment) with it:

// worker.js โ€” isolated version
new Worker('analytics', (job) => {
return als.run(
{ traceId: job.data.executionId },
() => processJob(job), // the whole processor runs in its own context
);
}, { concurrency: 3 });

The semantics of als.run(store, fn): create a brand-new, independent async context, bind the store to it, and make it visible to fn and every async task it spawns; when fn returns, the context restores to what it was before the call.

Because each run call establishes a fresh context scoped to that invocation, concurrent tasks are isolated by construction โ€” job A's context always holds aaa, job B's always holds bbb, no matter how they interleave at await points.

The payoff is direct:

  • Per-call snapshot: the traceId is bound on job entry, and the entire handling chain (every await, sub-function, Sentry scope) reads this job's own value;
  • Auto-restore on exit: when the job ends the context resets, with no leakage into the next job or the worker's main loop;
  • Concurrency-safe: crank concurrency as high as you like โ€” behavior stays identical to single-threaded.

If the processor is an extracted function (say processWorkflowJob, processAtomicJob), wrap it once at the Worker construction site โ€” no need to touch the processor internals:

new Worker(queue, (job) => als.run({ traceId: job.data.id }, () => processWorkflowJob(job)), { concurrency });

Caveats

  • Whenever there's concurrency (worker concurrency > 1, concurrent HTTP requests, Promise.all batching), don't use enterWith. It's designed for "set once, single-threaded, sequential" use and will always cross-talk under concurrency.
  • run must wrap the entire processor, not just the synchronous entry โ€” otherwise the code after an await inside the processor falls back to the shared context and you've fixed nothing.
  • concurrency: 1 hides this bug. Always load-test with the target concurrency during development, or it only surfaces in production.
  • The other frequent AsyncLocalStorage trap is reading undefined inside a callback (a lost context) โ€” see Node.js AsyncLocalStorage returns undefined in a callback? EventEmitter escapes its context.

FAQโ€‹

What's the difference between als.enterWith and als.run?โ€‹

enterWith writes the store onto the currently active shared parent context, so concurrent async tasks overwrite each other; run creates a fresh, independent context for the callback, binds the store to it, and restores the previous context on exit, so each call is isolated. Node officially recommends run over enterWith.

Why do concurrent tasks read the wrong traceId and cross into another request?โ€‹

When concurrent tasks interleave at an await, the value written by enterWith is overwritten by the most recent call, so every interleaved task reads the same wrong traceId. Switching to als.run gives each call its own isolated context, so no amount of concurrency causes cross-talk.

CCLEE

Independent developer, 24 years in e-commerce, focused on grounding AI in real business scenarios.

Work with me

Node.js require('nanoid') Throws ERR_REQUIRE_ESM? Alternatives After v5 Went ESM-Only

ยท 5 min read

In a CommonJS project, require('nanoid') to generate a unique ID throws ERR_REQUIRE_ESM the moment the process starts, and it exits immediately.

Encountered this while building the ecommerce data collection tool for a client โ€” a browser-side scraper that captures product images, SKUs, prices, and reviews in real time, then cleans and exports them as structured files. The server needed a stable traceId per request for cross-service log correlation.

TL;DRโ€‹

From v5 onward, nanoid is an ESM-only package, and CommonJS require() cannot load it โ€” it always throws ERR_REQUIRE_ESM. If your project is still CJS, the simplest replacement is Node's built-in crypto.randomUUID(): zero dependencies, works in both CJS and ESM, and produces a standard UUID.

The Problemโ€‹

A perfectly ordinary import in a CJS project:

// server.js (CommonJS)
const { nanoid } = require('nanoid');

const traceId = nanoid();

It crashes on startup, the stack pointing at nanoid's entry file:

node server.js

internal/modules/cjs/loader.js:905
Error [ERR_REQUIRE_ESM]: require() of ES Module
/node_modules/nanoid/index.js from server.js not supported.

Instead change the require of index.js in server.js to a CommonJS module,
or use a dynamic import() call.

Note that this isn't an intermittent or environment-specific error โ€” it's a deterministic crash. Once you're on v5, the CJS path simply does not work.

Root Causeโ€‹

In v5, nanoid completed its ESM-only migration: its package.json no longer ships a CommonJS entry, only ESM. Node's CommonJS loader, require(), is synchronous and cannot load an ESM module, so it throws ERR_REQUIRE_ESM.

This isn't a nanoid bug โ€” it's the ecosystem's module-format evolution. More and more packages ship ESM-only (got v12+, node-fetch v3, uuid v7+ all do the same). As long as your host project is CommonJS, you'll hit the same wall with every one of them.

If you've also hit "module not found" with dynamic import(), that's the same ESM resolution rules at work โ€” see Node.js ESM dynamic import can't find the module? Check the file extension.

Solutionโ€‹

Three options, ordered by how little they cost to adopt.

When you're generating unique IDs, nanoid's core value is "short and unique." But as long as the ID doesn't need to fit in a URL or be aggressively shortened, a standard UUID is more than enough โ€” and it's built into Node 14.17+, with zero dependencies:

// Works identically in CommonJS and ESM
const { randomUUID } = require('node:crypto');

const traceId = randomUUID();
// => '1b9d6bcd-bbfd-4b2d-9b5d-ab8dfbbd4bed'

This single change solves three problems at once:

  • Zero dependencies: no more third-party package, so its module format can never hold you hostage;
  • Format alignment: UUID is a universal cross-language, cross-service format, handy for log correlation and database primary keys;
  • CJS/ESM agnostic: node:crypto is built into Node and behaves the same under both module systems.

The only tradeoff is length โ€” a UUID is 36 characters versus nanoid's default 21. For traceIds and primary keys that cost is negligible; only if you need it in a short link should you keep reading.

Option 2: pin nanoid v3โ€‹

nanoid's v3.x is the last major version that supports CommonJS, and require works directly:

// package.json โ€” explicitly pin v3
{
"dependencies": {
"nanoid": "^3.3.7"
}
}
const { nanoid } = require('nanoid');
const id = nanoid(); // 21-char short ID

Good for when you genuinely want short IDs but can't migrate the project to ESM yet. The cost is staying on an old version and missing v5's later updates.

Option 3: async dynamic importโ€‹

If you must use v5, the only way in is ESM's async loader:

// In CommonJS, load the ESM package with dynamic import()
async function makeId() {
const { nanoid } = await import('nanoid');
return nanoid();
}

// The call site itself has to be async
const id = await makeId();

It works, but nanoid is fundamentally a synchronous ID generator โ€” wrapping it in async/await forces async to propagate up the entire call chain, which is rarely worth it.

Caveats

  • This trap isn't unique to nanoid: uuid v7+, node-fetch v3, and got v12+ are all ESM-only, and require-ing them in a CJS project throws the identical ERR_REQUIRE_ESM. The way to tell is to check the target package's package.json for "type": "module" or an "import"-only entry.
  • crypto.randomUUID() requires Node 14.17+; on older runtimes, assemble one yourself with crypto.randomBytes(16).toString('hex').
  • Don't require('nanoid') in a CJS project while also import-ing nanoid in an ESM one โ€” mixing them leaves both old and new copies in the dependency tree, making behavior much harder to predict.

FAQโ€‹

Why does require('nanoid') throw ERR_REQUIRE_ESM in Node.js?โ€‹

Because nanoid has shipped only ESM artifacts since v5, and Node's CommonJS require() loads synchronously and cannot load an ESM module โ€” it throws ERR_REQUIRE_ESM the moment it hits nanoid's entry. This is a hard boundary between the CJS and ESM module systems, not a configuration issue.

Can I still use nanoid v5 in a CommonJS project?โ€‹

Yes, but either load it asynchronously with await import('nanoid') (which forces the whole call chain async) or pin the version to v3.x, which is still CJS-compatible. If you only need a unique ID, Node's built-in crypto.randomUUID() is the simplest path โ€” zero dependencies and supported under both module systems.

CCLEE

Independent developer, 24 years in e-commerce, focused on grounding AI in real business scenarios.

Work with me

Node.js fetch ignores proxy env vars? undici doesn't read http_proxy

ยท 4 min read

In a WSL2 environment with https_proxy properly set, Node.js fetch() still times out when accessing external URLs.

Encountered this issue while building an AI-powered e-commerce tool for a client. Here's the root cause and solution.

TL;DRโ€‹

Node.js 22+ built-in fetch() is powered by undici, which by design does not read http_proxy/https_proxy environment variables. Solution: install node-fetch@3 + https-proxy-agent, create a proxy-aware fetch instance. Falls back to direct connection when no proxy is configured in production.

Problemโ€‹

WSL2 environment with https_proxy correctly set. curl works fine:

echo $https_proxy
# http://172.30.224.1:7897

curl -I https://httpbin.org/ip
# HTTP/1.1 200 OK

But Node.js fetch() times out:

await fetch('https://httpbin.org/ip');
// FetchError: fetch failed
// cause: TimeoutError: Headers Timeout Error

If you're also seeing WSL2 proxy completely unreachable (even curl fails), check your firewall settings first.

Root Causeโ€‹

Node.js v22+ global fetch() is provided by the built-in undici 7.x. undici intentionally does not read http_proxy/https_proxy environment variables โ€” this is by design, not a bug.

Proxy behavior across different HTTP clients:

ClientReads env varsUses proxy
curlAuto-reads https_proxyโœ…
Node.js http/https modulesDoes not readโŒ
axios / node-fetch@3Reads https_proxyโœ…
Node.js built-in fetch() (undici)Does not readโŒ

This causes fetch() to time out in environments that require a proxy to access external networks (WSL2, corporate networks).

Solutionโ€‹

Install node-fetch@3 and https-proxy-agent:

npm install node-fetch@3 https-proxy-agent

Create a proxy-aware fetch instance:

import fetch from 'node-fetch';
import { HttpsProxyAgent } from 'https-proxy-agent';

const proxyUrl = process.env.HTTPS_PROXY || process.env.HTTP_PROXY;
const agent = proxyUrl ? new HttpsProxyAgent(proxyUrl) : undefined;

export async function fetchWithProxy(url, options = {}) {
return fetch(url, { ...options, agent });
}

Usage is almost identical to the native fetch():

// Before
const res = await fetch('https://httpbin.org/ip');

// After
const res = await fetchWithProxy('https://httpbin.org/ip');

Why node-fetch instead of undici's ProxyAgent?โ€‹

Node.js v24 ships with undici 7.x, but the npm [email protected] ProxyAgent is incompatible with the built-in version:

import { ProxyAgent, setGlobalDispatcher } from 'undici';

// Node v24 error: UND_ERR_INVALID_ARG
// npm undici@8 ProxyAgent is incompatible with built-in undici@7 setGlobalDispatcher
setGlobalDispatcher(new ProxyAgent(proxyUrl));

node-fetch@3 + https-proxy-agent is version-agnostic with no compatibility issues. In production without a proxy, agent is undefined and it connects directly.

Caveats

  • Don't try to override global fetch with setGlobalDispatcher โ€” changes don't propagate to worker modules under tsx watch hot reload
  • npm [email protected] FormData types are incompatible with the global FormData, mixing them causes TypeScript compilation errors
  • node-fetch@3 is ESM-only, use import โ€” no require() support

FAQโ€‹

Why doesn't Node.js fetch read the http_proxy environment variable?โ€‹

Node.js 22+ built-in fetch is powered by undici, which by design does not read http_proxy/https_proxy environment variables. Use node-fetch or undici's ProxyAgent to configure proxy manually.

How to make Node.js fetch work through a proxy?โ€‹

Install node-fetch@3 and https-proxy-agent, then create a fetch instance with proxy support. When no proxy is configured in production, it falls back to direct connection, independent of Node version.

WSL2 has other networking pitfalls โ€” Docker Desktop's host mode also makes container ports unreachable from WSL2. The debugging approach is similar: verify curl connectivity first, then check application-level configuration.

Node version upgrades can introduce other issues too โ€” for example, JWT key format changes in Node 24. Worth checking when upgrading.

Need help with Node.js networking issues?

Get in touch

Puppeteer Blocked by Anti-Bot? From Chrome CDP to Electron Alternative

ยท 7 min read

Encountered this issue while building a data collection tool for a client. Here's the full troubleshooting journey from Puppeteer to Electron.

TL;DRโ€‹

Puppeteer stealth plugin cannot bypass advanced CAPTCHA-based anti-bot systems. Switching to Chrome CDP remote debugging led to WSL2 network isolation and Chrome single-instance issues. The final solution: use Electron BrowserWindow to load the target site, let the user log in manually, and automatically extract cookies via session.cookies.get() โ€” eliminating anti-bot detection and cross-platform problems entirely.

Scenario 1: Puppeteer Blocked by Anti-Botโ€‹

Problemโ€‹

Using puppeteer-extra + puppeteer-extra-plugin-stealth for automated login, the browser triggers CAPTCHA interception immediately after launch. Even after passing the CAPTCHA, subsequent pages detect the automation environment and force an exit.

import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';

puppeteer.use(StealthPlugin());

const browser = await puppeteer.launch({
headless: false,
args: [
'--disable-blink-features=AutomationControlled',
'--no-sandbox',
'--disable-infobars',
],
});

const page = await browser.newPage();
await page.goto('https://target-site.com/login');
// Anti-bot CAPTCHA system detects automation, page is blocked

Root Causeโ€‹

Advanced anti-bot CAPTCHA systems don't just check basic fingerprints like navigator.webdriver. They detect automation through multiple dimensions: Chromium build signatures, Canvas/WebGL rendering differences, mouse trajectory patterns, and even DevTools Protocol call stacks. The stealth plugin fixes known fingerprint leaks but cannot eliminate the fundamental differences between Puppeteer's Chromium and a real Chrome browser.

After 8 rounds of debugging (removing timeouts, listening for disconnect events, investigating environment differences, fully configuring stealth plugin, etc.), we confirmed that no configuration could bypass the detection.

Solutionโ€‹

Abandoned Puppeteer automated login entirely. Switched to Chrome DevTools Protocol (CDP) to connect to the user's real browser and extract cookies.

Scenario 2: WSL2 Cannot Connect to Windows Chrome CDPโ€‹

Problemโ€‹

Running a Node.js script in WSL2, connecting to Windows Chrome's debugging port via chrome-remote-interface results in a connection timeout:

import CDP from 'chrome-remote-interface';

const client = await CDP({
host: 'localhost',
port: 9222,
});
// Error: connect ECONNREFUSED 127.0.0.1:9222

Chrome launched on Windows with:

chrome.exe --remote-debugging-port=9222

Running curl localhost:9222/json in Windows PowerShell works fine, but WSL2 cannot connect.

Root Causeโ€‹

Chrome's --remote-debugging-port=9222 binds to 127.0.0.1 by default โ€” the Windows loopback address. WSL2 and Windows have separate network stacks. localhost inside WSL2 points to the Linux loopback address, not Windows. So accessing localhost:9222 from WSL2 actually hits Linux's port 9222, not Windows Chrome.

Solutionโ€‹

Add --remote-debugging-address=0.0.0.0 when launching Chrome to make CDP listen on all network interfaces:

chrome.exe --remote-debugging-port=9222 --remote-debugging-address=0.0.0.0

Or configure port forwarding using Windows netsh:

netsh interface portproxy add v4tov4 listenport=9222 listenaddress=0.0.0.0 connectport=9222 connectaddress=127.0.0.1

Important

--remote-debugging-address=0.0.0.0 exposes the CDP port to the local network, which is a security risk. Only use this in internal development environments. For production, always combine with firewall rules to restrict access.

Scenario 3: Chrome Ignores --remote-debugging-portโ€‹

Problemโ€‹

Chrome is already running. Launching Chrome again with --remote-debugging-port=9222 silently ignores the flag. Chrome simply opens a new tab in the existing window, and the CDP port is not opened:

# Chrome already running
chrome.exe --remote-debugging-port=9222
# No error, but port 9222 is not listening

Root Causeโ€‹

Chrome is designed as a single-instance application. When it detects an existing Chrome process, the newly launched Chrome forwards the URL (and other command-line arguments) to the existing process and exits. Flags like --remote-debugging-port only take effect when the process is first created โ€” the existing process never dynamically loads these parameters.

Solutionโ€‹

Close all Chrome processes first, then relaunch with the flag:

# Windows
taskkill /F /IM chrome.exe
chrome.exe --remote-debugging-port=9222

# macOS
pkill -f "Google Chrome"
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

You can also use --user-data-dir to specify a separate profile directory, avoiding conflicts with your daily Chrome:

chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\chrome-debug-profile"

The three scenarios above show that relying on an external Chrome process for cookie extraction is too fragile under WSL2 + anti-bot detection + process management constraints. The final solution: replace external Chrome with Electron's BrowserWindow.

Why Electron Worksโ€‹

  1. No anti-bot detection: Electron's built-in Chromium shares the same rendering engine as Chrome. Anti-bot systems do not flag it as an automated environment.
  2. No external Chrome dependency: No need to manage Chrome processes, CDP ports, or profile directories.
  3. No WSL2 networking issues: Electron runs natively on the target OS โ€” no cross-OS network isolation problems.
  4. No single-instance conflicts: Electron creates its own BrowserWindow โ€” no conflict with the user's daily Chrome.

Complete Implementationโ€‹

Open login window and poll for cookies:

import { BrowserWindow } from 'electron';

let cookieWindow = null;

function openLoginWindow() {
cookieWindow = new BrowserWindow({
width: 1000,
height: 700,
title: 'Login to Target Platform',
webPreferences: {
// Key: use isolated session, separate from main window
partition: 'cookie-login',
contextIsolation: true,
nodeIntegration: false,
},
});

cookieWindow.loadURL('https://target-site.com/login');

// Poll for target cookie
const interval = setInterval(async () => {
const cookies = await cookieWindow.webContents.session.cookies.get({
domain: '.target-site.com',
});

const sessionCookie = cookies.find((c) => c.name === 'session_token');
if (sessionCookie) {
clearInterval(interval);

// Build full cookie string
const cookieStr = cookies
.map((c) => c.name + '=' + c.value)
.join('; ');

// Write to environment variable
process.env.SESSION_COOKIE = cookieStr;
updateEnvFile('SESSION_COOKIE', cookieStr);

cookieWindow.close();
}
}, 2000);

cookieWindow.on('closed', () => {
clearInterval(interval);
cookieWindow = null;
});
}

Preload script exposing IPC interface:

import { contextBridge, ipcRenderer } from 'electron';

contextBridge.exposeInMainWorld('electronAPI', {
extractCookie: () => ipcRenderer.invoke('extract-cookie'),
onCookieExtracted: (callback) =>
ipcRenderer.on('cookie-extracted', (_, data) => callback(data)),
});

Renderer process usage:

// Detect Electron environment
if (window.electronAPI) {
document.getElementById('extractBtn').addEventListener('click', () => {
window.electronAPI.extractCookie();
});

window.electronAPI.onCookieExtracted((data) => {
console.log('Cookie extracted:', data);
});
}

Important

  • partition: 'cookie-login' creates an isolated session. The login window's cookies won't pollute the main window. If you need to share login state, remove the partition option or use the same partition name.
  • A 2-second polling interval balances UX and performance. Don't replace setInterval with a while + await loop โ€” that would block the renderer process.
  • session.cookies.get() only returns cookies from the current session. You cannot read cookies across partitions.

FAQโ€‹

What to do when Puppeteer stealth plugin is still detected by anti-bot?โ€‹

The stealth plugin only bypasses basic fingerprint checks (like navigator.webdriver). Advanced anti-bot systems detect automation through browser behavior and low-level API characteristics, including Chromium build signatures, Canvas rendering differences, and mouse trajectory patterns. The stealth plugin cannot fully cover all detection vectors. Use Electron BrowserWindow to load the target site and extract cookies via session.cookies.get() after the user logs in manually.

Why is Chrome --remote-debugging-port not working?โ€‹

Chrome uses a single-process architecture. When a Chrome process is already running, the --remote-debugging-port flag is silently ignored. The newly launched Chrome simply opens a new tab in the existing instance without enabling the debugging port. You need to close all Chrome processes first using taskkill /F /IM chrome.exe (Windows) or pkill -f "Google Chrome" (macOS), then relaunch with the flag. Alternatively, use --user-data-dir to specify a separate profile directory to avoid conflicts.

How to connect to Chrome debugging port from WSL2?โ€‹

Chrome's --remote-debugging-port binds to Windows 127.0.0.1 by default, but WSL2 has its own network stack where localhost points to the Linux loopback address. Two solutions: first, add --remote-debugging-address=0.0.0.0 when launching Chrome to make CDP listen on all interfaces; second, configure port forwarding on Windows using netsh interface portproxy to forward WSL2 requests to the Windows CDP port.


Need help with data collection or automation tools?

Contact Us

Vercel Serverless Function Multi-Level Route 404? Bypass the Catch-All Trap with Rewrites

ยท 5 min read

Encountered these issues while deploying a Hono backend to Vercel Serverless Functions. Three cascading problems: esbuild bundling format errors, native dependency resolution failures, and the most subtle one โ€” catch-all routes silently failing on nested paths.

TL;DRโ€‹

  1. esbuild + Hono: Use --format=cjs, exclude native deps with --external
  2. Multi-level route 404: api/[[...path]].ts is unreliable. Use api/index.ts + vercel.json rewrite instead
  3. Better Auth client: baseURL requires a full URL, use window.location.origin to build it

Issue 1: Vercel's Built-in TS Compilation Failsโ€‹

api/[[...path]].ts directly imports Hono's app.ts. Vercel compiles it with built-in TypeScript 5.9.3 (nodenext mode), which throws:

Relative import paths need explicit file extensions
Cannot find name 'process'
Module '"@libsql/client"' declares 'Client' locally, but it is not exported

The root cause is Vercel's built-in TS compiler enforcing strict nodenext module resolution, which is incompatible with how Hono and Turso client libraries export their types.

Solution: Pre-bundle with esbuild and have Vercel execute the compiled output directly.

esbuild server/src/app.ts \
--bundle --platform=node --format=cjs \
--outfile=dist/_server.cjs \
--external:@libsql/client

api/[[...path]].ts becomes a one-liner:

import app from '../../dist/_server.cjs';
export default app;

Why CJS Instead of ESM?โ€‹

Bundling with --format=esm causes dotenv's internal require("path") to throw Dynamic require is not supported. This is an ESM spec limitation that CJS doesn't have.

If you've also run into module resolution issues with Node.js ESM dynamic import, the root cause is the same โ€” ESM enforces strict module format rules while CJS is more lenient.

Native Dependency Handlingโ€‹

@libsql/client includes native binaries (@libsql/linux-x64-gnu). After esbuild bundling, the runtime can't find the module. Solution:

  1. Exclude with --external:@libsql/client
  2. Declare @libsql/client as a dependency in root package.json so Vercel installs it to node_modules

Issue 2: Nested Routes Intercepted by Vercelโ€‹

After fixing esbuild bundling, single-level routes like /api/health and /api/tasks worked fine. But all nested paths (e.g., /api/auth/sign-in/email) returned Vercel-level 404:

HTTP/2 404
x-vercel-error: NOT_FOUND
content-type: text/plain; charset=utf-8

Debugging Processโ€‹

By comparing response headers, the breakpoint was clear:

PathReaches Hono?Key Indicators
/api/healthโœ…x-vercel-cache: MISS, CORS headers present
/api/gateโœ…CORS headers present
/api/gate/sessionโŒx-vercel-error: NOT_FOUND, no CORS

Single-level paths reach the function, nested paths get intercepted. The issue is unrelated to path naming (paths without auth also 404) and unrelated to Vercel's internal routing rules. The root cause: api/[[...path]].ts catch-all pattern only matches single-level paths.

Solutionโ€‹

Drop the catch-all. Use api/index.ts + vercel.json rewrite instead:

{
"rewrites": [
{ "source": "/api/(.*)", "destination": "/api/index" },
{ "source": "/((?!api/).*)", "destination": "/index.html" }
]
}

Rename api/[[...path]].ts to api/index.ts (content unchanged). All /api/* requests are forwarded to /api/index by the rewrite rule, and Hono handles internal routing.

Post-deploy verification:

curl -sI https://example.com/api/gate/session
# HTTP/2 404
# access-control-allow-credentials: true โ† Reaches Hono
# x-vercel-cache: MISS โ† No longer a Vercel-level 404

All three paths /api/gate, /api/gate/, /api/gate/session now reach Hono. POST requests also work:

curl -X POST https://example.com/api/gate/sign-in/email \
-H "Content-Type: application/json" \
-d '{"email":"[email protected]","password":"123456"}'
# {"message":"Invalid email or password","code":"INVALID_EMAIL_OR_PASSWORD"}

Auth routes are fully functional.

Important Notes

  • Rewrite rule order matters โ€” /api/(.*) must come before the SPA fallback
  • After deployment, browsers may cache old JS files. If the frontend still requests old paths, use Ctrl+Shift+R to force refresh
  • If you're also troubleshooting stale deployments, check out Debugging Frontend Deploy Not Updating

Issue 3: Better Auth Client baseURL Errorโ€‹

With routes working, the frontend threw:

BetterAuthError: Invalid base URL: /api/gate
Caused by: TypeError: Failed to construct 'URL': Invalid URL

Better Auth client internally uses new URL() to parse baseURL, which fails with relative paths.

Solution: Use window.location.origin to build a full URL:

export const authClient = createAuthClient({
baseURL: `${window.location.origin}/api/gate`,
})

Locally this expands to http://localhost:5173/api/gate (Vite proxy forwards it), and in production to https://your-domain.com/api/gate โ€” no environment variables needed.

Complete Configuration Referenceโ€‹

The three key files in their final working state:

vercel.json

{
"buildCommand": "pnpm run check && pnpm exec esbuild server/src/app.ts --bundle --platform=node --format=cjs --outfile=dist/_server.cjs --external:@libsql/client && pnpm -F client run build",
"outputDirectory": "client/dist",
"rewrites": [
{ "source": "/api/(.*)", "destination": "/api/index" },
{ "source": "/((?!api/).*)", "destination": "/index.html" }
]
}

api/index.ts

import app from '../dist/_server.cjs';
export default app;

client/src/lib/auth-client.ts

import { createAuthClient } from 'better-auth/react';

export const authClient = createAuthClient({
baseURL: `${window.location.origin}/api/gate`,
});

FAQโ€‹

Why does Vercel api/[[...path]].ts not match multi-level paths?โ€‹

Vercel's catch-all pattern only matches single-level paths (/api/foo), not nested paths (/api/foo/bar). Use api/index.ts with a vercel.json rewrite rule to forward all /api/* requests explicitly.

How to bundle Hono with esbuild for Vercel Serverless?โ€‹

Bundle with esbuild in CJS format (--format=cjs), exclude native dependencies like @libsql/client with --external, and declare them in root package.json so Vercel installs them.

How to fix Better Auth client Invalid base URL error?โ€‹

createAuthClient's baseURL does not accept relative paths. Use window.location.origin to build a full URL that works in both local development and production.


Running into similar Vercel deployment issues? Get in touch and let's talk about your tech stack.

Get in Touch

UPSERT Writes All Zeros? Drizzle sql Template Pitfall with Parameterized Values vs SQL Expressions

ยท 3 min read

Encountered this issue while building an e-commerce analytics platform for a client. Here's the root cause and solution.

TL;DRโ€‹

In Drizzle ORM's sql template tag, sql.join(values.map(v => sql(v))) parameterizes all values. If the values array contains SQL expressions (like date_trunc('week', '2026-05-17'::date)::date), PostgreSQL treats them as plain strings and throws invalid input syntax for type date. SQL expressions must use sql.raw() or be written separately in the template.

The Problemโ€‹

Data collection pipeline: Chrome extension โ†’ CCLHub server โ†’ Analytics API โ†’ PostgreSQL. Symptoms:

  1. CCLHub logs show correct collected data (uv: 403, payAmt: 19478.47)
  2. Analytics API returns 200 success
  3. But database query shows all zeros: uv: 0, pay_amt: 0.00
-- Actual database data
report_date | uv | pay_amt | reveal_cnt
-------------+-----+----------+------------
2026-05-12 | 392 | 7333.67 | 11879 -- old data fine
2026-05-13 | 0 | 0.00 | 0 -- new data all zeros!

Analytics error log reveals:

PostgresError: invalid input syntax for type date:
"date_trunc('week', '2026-05-17'::date)::date"

Root Causeโ€‹

The original code mixed parameterized values with SQL expressions:

// โŒ Problem code
const insertVals: (string | number | null)[] = [
String(shop_id),
String(platform_id),
reportDate,
tenant_id,
`date_trunc('week', '${reportDate}'::date)::date`, // โ† SQL expression
];

// sql.join parameterizes ALL values, including the date_trunc expression
await db.execute(sql`
INSERT INTO table (..., week_start_date)
VALUES (${sql.join(insertVals.map(v => sql`${v}`), sql`,`)})
...
`);

Generated SQL:

-- PostgreSQL receives $5 as a literal string value
INSERT INTO table (..., week_start_date)
VALUES ($1, $2, $3, $4, $5, ...)
-- $5 = "date_trunc('week', '2026-05-17'::date)::date" โ† treated as string!

PostgreSQL tries to parse "date_trunc('week', '2026-05-17'::date)::date" as a date type โ†’ error.

Why zeros instead of an error? Because the same table has a separate inquiry INSERT (PARTIAL UPSERT) that succeeded, creating rows with dashboard columns defaulting to 0. The daily report UPSERT failed but didn't roll back the existing rows.

Solutionโ€‹

Separate SQL expressions from parameterized values using sql.raw() or direct template embedding:

// โœ… Fix: separate parameterized values from SQL expressions
const insertCols = ['shop_id', 'platform_id', 'report_date', 'tenant_id'];
const insertVals: (string | number | null)[] = [
String(shop_id), String(platform_id), reportDate, tenant_id,
];

// 19 data columns parameterized normally
for (const [apiKey, dbCol] of Object.entries(DAILY_COLUMNS)) {
insertCols.push(dbCol);
insertVals.push(row[apiKey] != null ? String(row[apiKey]) : '0');
}

// week_start_date uses SQL expression, NOT in parameterized array
await db.execute(sql`
INSERT INTO table (${sql.raw(insertCols.join(', '))}, week_start_date)
VALUES (
${sql.join(insertVals.map(v => sql`${v}`), sql`,`)},
date_trunc('week', ${reportDate}::date)::date -- โ† directly in template
)
...
`);

Key distinction:

ApproachHow Drizzle handles itWhat PostgreSQL receives
sql template interpolationParameterized ($N)String literal
sql.raw(expression)Inlined into SQLSQL expression
Direct in sql templatePart of templateSQL expression

Caveatsโ€‹

Caveats

  • sql.raw() has SQL injection risk โ€” never use it for user input. In this example, reportDate comes from an internal API with controlled format
  • Drizzle's sql template tag auto-parameterizes all interpolations โ€” this is a safety feature, but SQL function calls shouldn't be parameterized
  • If the entire SQL is dynamically constructed, consider using Drizzle's query builder API instead of raw SQL
  • Database connection config has its own pitfalls โ€” if you're connecting to the wrong PostgreSQL instance, Docker might be silently occupying the port
  • Environment variable loading order is another common trap โ€” JWT signing silently failing is a classic example of dotenv running after the import chain

JWT Signing Silently Fails? Check Your Node.js Environment Variable Loading Order

ยท 3 min read

Encountered this issue while building a SaaS authentication system for a client. Here's the root cause and solution.

TL;DRโ€‹

In Node.js ES Modules, import statements execute before dotenv.config(). If module-level code reads process.env.JWT_SECRET, it gets undefined, causing JWT signing to use the string "undefined" as the secret โ€” no errors thrown, but all token verification fails. The fix: lazy initialization.

The Problemโ€‹

JWT login returns 200, but all subsequent requests return 401. Investigation reveals:

  1. Tokens generated at login can't be verified by jwtVerify()
  2. Every server restart invalidates all previously issued tokens
  3. console.log(process.env.JWT_SECRET) outputs undefined
// jwt.ts โ€” module-level code
import crypto from 'crypto';

// โŒ This line executes BEFORE dotenv.config(), JWT_SECRET is undefined
const SECRET = crypto.createSecretKey(
new TextEncoder().encode(process.env.JWT_SECRET)
);

The worst part: no error is thrown. new TextEncoder().encode(undefined) encodes the string "undefined" into bytes, producing a valid but wrong secret key.

Root Causeโ€‹

ES Module import statements are statically hoisted:

// server.ts (entry file)
import { router } from './routes/auth'; // โ† runs first
import { authenticateToken } from './middleware/auth'; // โ† runs first

dotenv.config(); // โ† runs AFTER all imported modules execute

Execution order:

  1. Node.js scans all import statements and builds the dependency graph
  2. Executes all imported modules' top-level code depth-first (jwt.ts's const SECRET = ... runs here)
  3. Returns to server.ts, runs dotenv.config()
  4. Now .env is loaded into process.env

So jwt.ts module-level code reads process.env.JWT_SECRET as undefined.

Solutionโ€‹

Move secret initialization into a function โ€” env var is read on first call, not at import time:

import crypto from 'crypto';

let _secret: crypto.KeyObject | null = null;

function getSecret(): crypto.KeyObject {
if (!_secret) {
const secretValue = process.env.JWT_SECRET;
if (!secretValue) {
throw new Error('JWT_SECRET environment variable not set');
}
_secret = crypto.createSecretKey(
new TextEncoder().encode(secretValue)
);
}
return _secret;
}

// Use getSecret() everywhere the key is needed
export async function generateToken(payload: any): Promise<string> {
return new SignJWT(payload)
.setProtectedHeader({ alg: 'HS256' })
.sign(getSecret()); // โ† deferred until runtime
}

No dependency on entry file import order โ€” safe regardless of when called.

Option 2: Call dotenv at the Very Top of Entry Fileโ€‹

// server.ts โ€” ensure these lines come before ALL imports
import 'dotenv/config'; // or require('dotenv').config()
import express from 'express';
// ...other imports

Limitation: If another entry point (cron job, worker) forgets this line, the bug resurfaces.

Caveatsโ€‹

Caveats

  • This pitfall affects all module-level env var reads, not just JWT โ€” database connections, API keys, etc.; ESM module resolution has another common gotcha โ€” missing .js extensions in dynamic imports causes module-not-found errors in production
  • require('dotenv').config() only guarantees order in CommonJS; ES Module import always executes before runtime code
  • Lazy initialization works well for: secrets, DB connection pools, external API clients, and other one-time resources; if you're using the jose library with Node 24, also watch out for KeyObject format changes

Node.js ESM Dynamic Import Can't Find Module? Check the File Extension

ยท 3 min read

Encountered this issue while building a SaaS analytics platform for a client. Here's the root cause and solution.

TL;DRโ€‹

In Node.js ESM mode, import('./path/to/module') doesn't auto-resolve ./path/to/module.js. If the TypeScript build output is missing .js extensions, the module throws ERR_MODULE_NOT_FOUND. When this import is inside deferred logic (timers, conditionals), the app starts fine but crashes later โ€” PM2 shows a climbing restart count.

Fix: Ensure all ESM dynamic imports include .js extensions, and automate this in the build pipeline.

Problemโ€‹

PM2 showed the app restarting continuously:

โ”‚ name             โ”‚ โ†บ    โ”‚ status โ”‚ uptime โ”‚
โ”‚ analytics-api โ”‚ 9 โ”‚ online โ”‚ 28m โ”‚

Error log repeated every few minutes:

Error [ERR_MODULE_NOT_FOUND]: Cannot find module '/app/dist/domains/video/cleanup'
imported from /app/dist/server.js

But the file clearly exists:

$ ls dist/domains/video/
cleanup.js executor.js queue.js

Root Causeโ€‹

ESM doesn't auto-resolve file extensionsโ€‹

Node.js CommonJS (require()) automatically tries .js, .json, and other extensions. ESM (import) does not.

// โŒ ESM can't find the module
import('./domains/video/cleanup')
// Node.js looks for: ./domains/video/cleanup (exact path, no extension)
// Actual file: ./domains/video/cleanup.js

// โœ… Must include .js extension
import('./domains/video/cleanup.js')

Deferred imports hide the problemโ€‹

The import was inside a deferred execution:

// server.ts โ€” doesn't execute immediately on startup
import('./domains/video/cleanup.js').then(({ startCleanupScheduler }) => {
startCleanupScheduler(); // triggers seconds later
});

The app starts successfully (DB connection, port binding all fine). When the timer fires, the import fails โ†’ process crashes โ†’ PM2 restarts โ†’ starts fine again โ†’ timer fires again โ†’ crashes again. This creates a crash loop.

Why did it work before?โ€‹

Previous deployment used a build script that included a post-build step to fix import paths. One deployment skipped this step and deployed raw tsc output โ€” tsc doesn't modify import paths in output files.

Solutionโ€‹

1. Write .js extensions in TypeScript sourceโ€‹

TypeScript officially recommends writing .js extensions even in .ts files:

// โœ… Write .js even in .ts source files
import('./domains/video/cleanup.js').then(({ startCleanupScheduler }) => {
startCleanupScheduler();
});

2. Automated post-build fix (recommended)โ€‹

Add an import-fixing script to the build pipeline:

{
"scripts": {
"build": "tsc && node fix-imports.js"
}
}

Core logic of fix-imports.js:

import { readFileSync, writeFileSync, readdirSync } from 'fs';
import { join } from 'path';

function fixImports(dir) {
for (const file of readdirSync(dir, { withFileTypes: true })) {
const fullPath = join(dir, file.name);
if (file.isDirectory()) {
fixImports(fullPath);
} else if (file.name.endsWith('.js')) {
let content = readFileSync(fullPath, 'utf8');
// Fix dynamic imports
const fixed = content.replace(
/import\(['"](\.[^'"]+)['"]\)/g,
(match, path) => path.endsWith('.js') ? match : match.replace(path, path + '.js')
);
// Fix static imports
const fixed2 = fixed.replace(
/from\s+['"](\.[^'"]+)['"]/g,
(match, path) => path.endsWith('.js') ? match : match.replace(path, path + '.js')
);
if (fixed2 !== content) {
writeFileSync(fullPath, fixed2);
}
}
}
}

The build pipeline automatically appends .js to all relative import paths, keeping TypeScript source extension-free while preventing module-not-found errors in ESM deployments.

Note

  • This only affects Node.js ESM mode ("type": "module" or .mjs files). CommonJS is unaffected.
  • Static imports (import ... from './foo') have the same limitation, not just dynamic import(); import hoisting also causes another common issue โ€” dotenv runs after the import chain, leaving env vars undefined
  • Using tsx or ts-node in development won't show this error (they auto-resolve extensions), but node dist/server.js in production will fail.
  • PM2 crash loop signature: restart count (โ†บ) keeps growing, uptime never exceeds a few minutes.