Technologies and Software Engineering

NodeJS NPM Phantom Dependencies Understanding and Mitigation

Overview

NodeJS and NPM manage package dependencies by physically representing the dependency graph on disk within node_modules folders. This system, combined with NodeJS’s module resolution algorithm, introduces phantom dependencies: undeclared packages a project implicitly relies upon due to the flattened node_modules structure or ancestral node_modules directories.

Key Insights

Technical Details

Traditional vs. NodeJS Dependency Resolution

Conventional package managers represent package dependencies as a directed acyclic graph (DAG), where a central store often houses packages, and module resolvers traverse this graph. DAGs can feature “diamond dependencies,” where multiple packages depend on a common sub-dependency.

NodeJS and NPM adopt a distinct approach:

Unique Characteristics of NPM’s node_modules Approach

NPM’s disk-based model presents several unique behaviors:

Consequences: The Problem with Phantom Dependencies

A phantom dependency occurs when a project uses a package not explicitly listed in its package.json file.

Consider this example:

my-library/package.json

{
  "name": "my-library",
  "version": "1.0.0",
  "main": "lib/index.js",
  "dependencies": {
    "minimatch": "^3.0.4"
  },
  "devDependencies": {
    "rimraf": "^2.6.2"
  }
}

my-library/lib/index.js

var minimatch = require('minimatch');
var expand = require('brace-expansion'); // ???
var glob = require('glob'); // ???

// (more code here that uses those libraries)

In this scenario, brace-expansion is a dependency of minimatch, and glob is a dependency of rimraf. During installation, NPM often flattens these into my-library/node_modules. NodeJS’s require() function finds them without consulting package.json files, making it appear to work correctly. However, this is a bug, not a feature, and leads to critical issues:

Phantom node_modules Folders in Monorepos

Monorepos introduce another class of phantom dependency problem through ancestral node_modules folders.

Consider a monorepo with a root-level package.json:

my-monorepo/package.json

{
  "name": "my-monorepo",
  "version": "0.0.0",
  "scripts": {
    "deploy-app": "node ./deploy-app.js"
  },
  "devDependencies": {
    "semver": "~5.6.0"
  }
}

This root package.json might include semver as a devDependency for a deploy-app script. The resulting folder structure after npm install could be:

- my-monorepo/
  - package.json
  - node_modules/
    - semver/
    - ...
  - my-library/
    - package.json
    - lib/
      - index.js
    - node_modules/
      - brace-expansion
      - minimatch
      - ...

Due to NodeJS’s parent folder probing, my-library/lib/index.js can successfully execute require("semver"), even though semver is not declared in my-library/package.json nor installed directly under my-library/node_modules. This is an insidious phantom dependency, as my-library implicitly relies on a package installed at a higher level in the file system hierarchy, potentially outside its own declared scope.

Mitigating Phantom Dependencies with Rush

Rush directly addresses phantom dependency issues by implementing a symlinking strategy for project dependencies. This strategy ensures that:

For even stricter control, the PNPM package manager, when used with Rush, extends these protections to all indirect dependencies. PNPM allows for workarounds for “bad” packages via a pnpmfile.js configuration, ensuring comprehensive dependency integrity.

Tags:

Search