Quantifying Code “Understandability” Part 1: Bounded Contexts

I recently read an excellent article on “what makes code difficult to read” which pointed out that there is no commonly used and accepted measure for code readability. The author outlined many of the core heuristics for more readable code–for example, line/ operator/ operand count, novelty, conditional simplicity, nesting, and variable lifespan–but doesn’t attempt to quantify them.

What if we did try to measure these heuristics?

Where would we start?

This is what I have been pondering recently.

Finding the Context Boundary

If we are creating a system that mirrors a developer’s experience, then my first thought is to begin by attempting to define the context window that a developer must keep in mind when reading a section of the codebase to understand it.

As a codebase scales, there’s no chance a developer can keep the entire codebase in their head when making a change–instead they compartmentalise chunks of the codebase and hide complexity behind mental abstractions.

At any one point there is a certain amount of code a developer must store in their quick-access memory–I have decided to call this a “bounded context” and define it like so:

📖

Bounded Context: the smallest region of the codebase within which a developer must maintain a mental model to confidently modify or use that code.

A codebase can be either highly-coupled (fewer bounded contexts) or highly-factored (many bounded contexts).

Bigger bounded contexts mean more cognitive overhead to understand the code a single piece of code.

Before we can even consider quantifying any of the readability heuristics, we must first find the scope within which to measure them–I don’t believe this is a trivial task.

Let’s take some basic examples in TypeScript (although this principle is language agnostic).

Basic Examples of Bounded Context

A Function


// a pure function
function increment(count) {
  return count + 1;
}

Here, a pure function is the bounded context. For the developer to understand the code, they don’t need to look anywhere outside the function declaration–everything they need to keep in their head is here.

Bounded contexts stop at function calls–they do not leak into child function calls (so long as those function calls do not mutate any data). For example, in the following:


function login(username, password) {
    if areValidUserCredentials(username, password) {
        return initialiseSession(username);
    }

    throw Exception('Invalid user credentials');
}

function areValidUserCredentials (username, password) {
    const user = getUserByUsername(username);
    return password === decrypt(user.passwordHash)
}

We have two bounded contexts, one for each function. This is because in order to understand what the code is doing in login, a developer doesn’t need to read the code inside areValidUserCredentials. Sure, sometimes function names lie to us, but naming is hard and I don’t think static analysis has any business measuring the subjective quality of variable names. Ultimately, I want to reward these kind of abstractions that put a name to functionality and hide the implementation details of different levels of abstraction, so I’m saying that the boundary stops there.

A Source File


// sourceFile.js
let globalVar = "Hello World";
console.log(globalVar)

Nothing is exported, therefore the scope is bounded by the source file.

If I add more code, it gets added to the context–even if that additional code “has nothing to do with” our globalVar. Let’s clarify why:


// sourceFile.js
let globalVar = "Hello World";
console.log(globalVar)

/* Lots of unrelated code
	...
	...
*/

globalVar = "Mutated Hello World"; // ❌ developers need to scan everything to find this

Despite the fact that we may have a long script file with multiple “unrelated” parts of the program, the bounded context is still defined as the whole file as there is nothing stopping mutation of global variables–the developer can’t be sure that their program is isolated until they have read all the code to confirm that there are no side-effects which mutate globalVar. (Once they have checked which sections of the script affect globalVar, they might mentally extract those lines that are irrelevant out of that mental bounded context, but they still needed to do the scanning).

A Class


class ExampleClass {
  private value = 42;

  doSomething() {
    // References 'this.value' => belongs to the class's scope
    this.value++;

    // A nested arrow function referencing 'this.value'
    const nested = () => {
      // Because it reads 'this.value', it merges with doSomething() as well
      return this.value * 2;
    };

    // Another nested arrow function (pure) doesn't reference outer scope
    const pureNested = (x: number) => x + 10;
    

    return { nested, pureNested };
  }
  
  doSomethingElse() {
	  pureFunction(this.value) // bounded context ends here
  }
  
  // ==================
  // BOUNDED CONTEXT 2
  static pureFunction(msg: string) {
	  console.log(msg)
  }
  // BOUNDED CONTEXT 2
  // ==================
}

Here we see another container for a context boundary. Classes provide encapsulation of variables, so we can say that a developer could understand just a single class in order the confidently make a change to the code.

In the above example, I see two bounded contexts, doSomething + doSomethingElse + value within the class being the first and the static pureFunction being the second.

doSomething + doSomethingElse includes the whole class (minus pureFunction) because both doSomething and doSomethingElse mutate the shared state of value.

pureFunction is it’s own bounded context because I can confidently make a change to the code (within the scope of pureFunction) just by understanding the code within the function itself.

This hints at what is defining these bounded context–the mutation of state.

Contaminating Bounded Contexts

The above seems to illustrate a rule of bounded contexts–shared mutable state contaminates a bounded context. Once two bounded contexts share state, they merge. Consider:


function login(username, password) {

		function areValidUserCredentials () {
				// INNER FUNCTION IS IMPURE
		    const user = getUserByUsername(username);
		    return password === decrypt(user.passwordHash)
		}

    if areValidUserCredentials(username, password) {
        return initialiseSession(username);
    }

    throw Exception('Invalid user credentials');
}

There is only one bounded context here, as the username and password are captured within the areValidUserCredentials closure. Closures can mutate the variables they capture, therefore a developer must understand the closure to understand and confidently make a change to the code. In turning this into a pure function, we create two smaller bounded contexts.

Files can contaminate one another by exporting variables.


// file-a.js
export let COUNT = 0

function increment() {
	COUNT++
}

// nasty code here
setTimeout(() => { COUNT++ }, 30000)


// file-b.js
import { COUNT } from './file-a'

COUNT++

// must read file-a to understand the unexpected result
setTimeout(() => { console.log(COUNT) }, 50000);

Here we have a single bounded context–file-a leaks into file-b by exporting some mutable state.

Another example of nested contamination:


let COUNT = 0

// BY CALLING f1, WE EVENTUALLY CALL f3, WHICH IS CONTAMINATED BY THE GLOBAL SCOPE
function f1 () {
	// ...
	f2()
}

function f2 () {
	// ...
	f3()
}

function f3 () {
	COUNT++
}

As f3 is in the file’s bounded context, by proxy, so is f2 and by proxy f1. This is hidden coupling, but the developer really needs to understand it to confidently make a change to f1.

An example of mutation through parameter mutation:


let STATE = { count: 0 }

function f1 (state) {
	// ...
	state.num +=1 // ❌ MUTATION: THIS CONTAMINATES THE CONTEXT
}

f1(STATE)

This only applies when passing objects, as the value passed in as a parameter is a reference to the address, which allows the inner function to mutate the state of the parameter–meaning this context is no longer isolated and should be attached to the calling function.

Possible Bounded Context Containers in TypeScript

Source File: Any code placed at the top level of a .ts file can be considered one “container,” especially if it declares and owns top-level variables or logic.


// sourceFile.ts

console.log("Top-level code here.");

// Perhaps some variables or imports, exported functions, etc.
const globalVar = 42;

function topLevelFunction() {
  // ...
}

Arrow Function: A concise function form often assigned to a variable or used inline.


const greet = (name: string): string => {
  return `Hello, ${name}`;
};

Function Expression: Similar to an arrow function but uses the classic function keyword. Can be anonymous or named.


const greet = function(name: string): string {
  return `Hello, ${name}`;
};

Function Declaration: named function defined at any scope (usually top-level or nested).


function greet(name: string) {
  return `Hello, ${name}`;
}

Object Literal Methods:


const obj = {
  foo() { /* ... */ },
  bar: function() { /* ... */ },
  baz: () => { /* ... */ }
};

Class Declaration: Represents a blueprint for objects; can own state (this.value) and methods.


class ExampleClass {
  // class-level fields
  private value = 42;

  methodName() {
    // ...
  }

  constructor() {
    // ...
  }
}

Method Declaration (typically static): A method can be its own container if it is pure–otherwise it defaults back to the class declaration context container.


class ExampleClass {

  static doSomething() {
    // ...
  }
}

Module Declaration (namespaces): A legacy TS style (“internal modules”) – can contain variables, functions, classes. Often acts as a single container.


namespace MyNamespace {
  export const x = 123;

  export function doStuff() {
    return x + 1;
  }
}

Static Blocks (Class static initialisation blocks): TS 4.6+ supports static { /* ... */ } in class declarations, which might hold logic and references.


class WithStaticBlock {
  static counter = 0;

  static {
    // Runs once on class loading
    this.counter = 100;
  }
}

I have omitted constructor declarations and get/ set accessors as these so often default back to the bounded context being the enclosing class (so much so that we can probably make the assumption that these types of containers do not form an isolated bounded context).

I have also omitted try/ catch block, although these could be a candidate so long as they only call pure functions:


let outerGlobalVar = 0; // not referenced in try/catch
//...
	try {
		// this is similar to a pure function
		let count = 0;
		increment(count);
		increment(count);		
	} catch {
		// another potential bounded context
	}

//...

In Practice, an Algorithm to Identify Bounded Contexts

Start with the TypeScript AST

Identify all the possible (nested) containers for a bounded context

Start at the child nodes of the containers (i.e. highest depth)

For each child of the container (line of code), find the “ownership” of each inner statement (i.e. looking for variable mutation)

If a child is owned by an outer container, eliminate the child and attach it to the highest shared node of ownership

Work upwards through the nodes from highest-depth first repeating steps 4 and 5 until no more eliminations of candidate bounded contexts are possible

For example:


function login(username, password) {

		function decrypt(pass) {
			return lib.decrypt(pass)
		}

		function areValidUserCredentials () {
		    const user = getUserByUsername(username);
		    return password === decrypt(user.passwordHash)
		}

    if areValidUserCredentials(username, password) {
        return initialiseSession(username);
    }

    throw Exception('Invalid user credentials');
}

The AST (just for containers) may look something like this:

We have identified the 3 possible bounded contexts.

Starting at the deepest child (either of the two inner functions).

For areValidUserCredentials, we look through each line and see in the first statement, ownership over username is in login.

We eliminate areValidUserCredentials as a possible bounded context (after seeing no other ownerships that are higher in the AST than login) and this context merges with that of login.

Repeating this for decrypt, which is a pure function, we only see local state, so this can remain as a bounded context.

This results in two bounded contexts.

This might be represented like so:


{
	"function-areValidUserCredentials-<HASH>": {
		"context-declaration-file-name": "a/b/c.ts",
		"context-declaration-line": 5,
		"context-source-locations": [ { "file": "...", "line": 0 }, ...],
		"node": {...},
	},
	"function-decrypt-<HASH>": {
		"context-declaration-file-name": "a/b/c.ts",
		"context-declaration-line": 10,
		"context-source-locations": [...],
		"node": {...},
	}
}

Measuring and Optimising Bounded Context as a Heuristic

The intention of this exploration is the first step to measuring further heuristics within the bounded contexts of a program; however it seems defining and analysing these bounded contexts is beneficial in itself. Adjusting the average size of bounded contexts in a codebase, plus reducing the contamination points is a good start even before we measure the readability heuristics within a single context.

Smaller contexts mean less scanning of the code before you can be confident in your change. It also means that we have more abstractions (i.e. function calls) which label the intent and constructs a mental barrier for the reader, beyond which they don’t need to consider the implementation details.

The disadvantage of encouraging the extraction of functions and static methods is that misleading names can lead to an incorrect mental model of the code and the introduction of bugs.