Knowledge Factories: Why AI Strategy Is Organisation Design

It’s easy to see how, in 2023, we could imagine generative AI would change the workplace.

With GPT 3.5 recently released, those who were paying attention watched as text streamed into their chat feeds with coherent content suggestions for anything you could imagine. It wasn’t perfect, but it was tempting to think that with a little more engineering and smarter models, we could integrate this into our work systems and accelerate our output.

Three years later, the models have got smarter, AI permeates the workplace, and yet the rapid generation of coherent text has not translated directly into enterprise-level productivity gains. In 2024, Microsoft reported that 75% of knowledge workers used generative AI, with 78% bringing their own AI tools. In McKinsey’s The state of AI in 2025, 88% of respondents said AI was used in at least one function within their organisations and yet most reported no EBIT impact at the enterprise level.

While 60% of leaders worry their organisation lacks a plan and vision, what strategies must leaders adopt to capture the value of generative AI?

Optimise the System, Not the Station

The reason AI usage looks impressive in a demo but underwhelming at scale is that text generation – the thing LLMs speed up most visibly – is rarely the system-level bottleneck.

Faster text generation can be useful; however, those gains are limited to a single station. Value is only captured if this helps the organisation deliver more useful outcomes to customers more quickly.

Value capture is the result of two things: getting the right output at each station and keeping it moving steadily through the system.

This is because work is not a collection of isolated tasks; it also flows through the organisation as part of a production system.

A production system is an arrangement of people, procedures, tools, and input materials that transform demand into delivered goods or services. Production systems are not unique to physical goods. They span all domains, from software to sales to customer support to compliance.

In software, a customer need becomes a feature request, then a specification, then a ticket, then code, then a reviewed change, then a program running in production. In sales, a lead becomes a conversation, then a proposal, then a negotiation, then a contract, then an account handover. In customer support, a problem becomes a ticket, then an investigation, then a response, then perhaps escalation, documentation, or product feedback.

In each case, the structure is the same, even if the details differ: pulled by some form of demand, a piece of work moves through the system where at each station, an actor receives an input, transforms it, and passes an output downstream.

The boundaries of its handover points define a station. A station begins when one worker receives work from upstream and ends when that worker passes the transformed work downstream to another worker.

Well-integrated AI can improve station-level productivity: a station can now draft faster, summarise faster, generate code faster, or produce analysis faster. But organisations capture value through system-level productivity: the speed and reliability with which the whole piece of work flows from demand to delivered value.

The system's overall productivity is determined by the flow of work, not by the speed at which individual stations complete their parts. The system can only produce as quickly as its constraints allow.

Faster Stations, Stagnant Systems

If every station in a system gets faster, shouldn’t the whole system get faster too?

This only holds if the speed of those individual stations is what constrains the system.

If work waits between stations, faster processing at one station only creates more waiting, not more value. If downstream validation rejects the output, faster production overwhelms the system with rework.

This is the difference between processing time and lead time. Processing time is the time spent actively transforming the work. Lead time is the time from demand to delivered value. AI can reduce processing time; however, organisations only increase productivity through reduced lead time.

This helps explain why AI can feel transformative in personal work while much less so inside a large organisation.

Take the example of a developer using AI on a personal project. That same person is the customer, product owner, designer, developer, reviewer, tester, and release manager. The system consists of a single station with no handover points. When an AI coding tool helps that developer write code faster, the project itself moves faster because station-level productivity and system-level productivity are equivalent.

That pattern breaks when more stations are added to the system, such as in a large organisation. A developer in a company does not usually own the whole flow from customer demand to production value. Requirements move from customer to product to design to engineering. Code moves through review, testing, security, release, etc. Each handover point introduces the possibility of waiting, misunderstanding, rejection, rework, or changed priorities.

This is why two things can be true at the same time. Individual workers can report that AI makes them much more productive, and organisations can still struggle to show equivalent enterprise-level gains.

The station can be faster while the system remains slow.

Once you see this distinction, the AI productivity conundrum resolves. Most AI adoption has accelerated stations, whereas the missing value is at the system level.

AI Has Joined the Production Line

AI only generates plausible text; it does not automatically ground itself in your business reality, understand your customer, know your organisation’s standards, or align with your interests. When LLMs fail to produce output right the first time, workers waste time reviewing, rewriting, and reformatting. Worse than lost time, unnoticed mistakes can ship to the customer and cause real harm.

Consequently, the industry has naturally focused on essential station-level quality in recent years.

Firstly, the underlying models have improved in several ways. We have trained larger models that produce more acceptable text in a wider variety of conditions. Larger context windows allow models to work with more surrounding information, and reasoning capabilities have improved performance on tasks that require multi-step thought. Advancements have also led to the refinement of smaller models that can run closer to the user, including on-device in some cases, improving the democratisation of LLM capability.

Secondly, we have become more AI-literate, both as individuals and organisations. We have learned that providing the right context and a precise prompt gives significantly better results. We have developed web search and retrieval tools that can provide up-to-date context, which the LLM can interpret to reduce hallucinations and make outputs easier to verify.

But these improvements still work best when the task is narrow.

An LLM performs much better when it is given clear context, a defined output format, and a specific standard to meet. It performs much worse when asked to complete a broad, ambiguous workflow in one step. OpenAI’s guidance on building agents reflects this: agents should be designed around clear instructions, tools, guardrails, and human intervention where needed, rather than treated as vague autonomous magic.

This should not surprise us. Humans also struggle more with broad, ambiguous tasks compared to narrow, well-defined ones. Even the most skilled workers, who can operate in ambiguity, tend to succeed by using frameworks to split the work into smaller, more achievable tasks.

A vague instruction like “sort out the client proposal” creates ambiguity, rework, and delay. A clearer task like “turn this discovery note into a two-page proposal using this template and these pricing assumptions” is much easier to execute and review for both LLMs and humans.

We have found that with a leading model, accurate context, and a good system prompt, high-quality text can be generated that meets the standards set by a domain expert operator at a station.

As station-level output quality has improved, attention has shifted towards integration. The hypothesis is that once sufficient model output is achieved, productivity is limited by the manual duct taping of output from a chat interface into the final output interface. If we integrate AI directly into our tools, we can remove this manual process to increase value capture.

Integration is valuable as it removes waste during handover by connecting actors to tools.

In software, this shift is easy to see. The first pattern was to ask ChatGPT for code in a browser, then copy the answer into an integrated development environment (IDE) by hand. The model was useful, but the handover cost is obvious; text is produced in one place, while the real work happens elsewhere.

Tools like Cursor moved AI into the IDE where the LLM could inspect the codebase, understand local context, suggest changes that fit the current design, and write across multiple files. MCP and similar integration patterns push in the same direction: they make it easier for models to connect to tools and systems rather than living in a detached chat window. Agent harnesses go a step further by wrapping the model in a controlled environment with tools, instructions, memory, checks, and permissions to act more reliably inside a workflow.

As AI integration increases, it becomes less like a tool beside the work and more like an actor inside the system.

Not an actor in a human or moral sense, but operationally. Something is an actor in the system when work can be delegated to it, it transforms inputs into outputs, and its output changes what happens downstream.

A word processor is a tool used to write. A calculator is a tool used to calculate. But an LLM drafting a client email is taking on part of the operation: structure, phrasing, tone, and content. A coding agent modifying files is even more clearly an actor because its output is used directly downstream.

As AI becomes more integrated and capable of producing outputs, it becomes increasingly less important for productivity whether the actor is human or AI. What matters is that a station receives the right input, performs the right transformation, and produces output that the next station can use.

That is the point at which our AI strategy becomes organisation design.

Don’t Build Smart Agents and Get Out of the Way

Once AI has entered the production system, the inevitable question is: with potentially unlimited access to capable actors, how should we leverage them?

The naive answer is to add more workers to the system. If one agent is useful, use ten; if ten are useful, use a hundred. Perhaps we flood the organisation with capable workers and wait for productivity to rise.

But organisations do not scale just by adding workers. They scale when the work is designed so that more actors create more value, not more coordination overhead.

Cursor’s January 2026 experiment is useful because it makes this coordination problem visible in a concentrated form. Cursor sought to understand how far they could push autonomous coding by running hundreds of concurrent agents to build a web browser from scratch. Their early approach was exactly this: flood the system with capable agents, giving them equal status and letting them self-coordinate through a shared file. Each agent would check what others were doing, claim a task, and update its status. To avoid two agents taking the same task, Cursor used a locking mechanism.

This failed in familiar ways.

Agents held locks for too long or forgot to release them, which became a bottleneck. Cursor reported that “twenty agents would slow down to the effective throughput of two or three”, with most time spent waiting. Even when the locking worked correctly, the system was brittle.

More interestingly, the agents became risk-averse. With no hierarchy, they avoided difficult tasks and made small, safe changes instead. No agent took responsibility for the harder end-to-end problems.

The lesson here is not of the limits of coding agents, but scaling an organisation of workers.

Cursor improved their system through organisational change, by separating agents by role into planners and workers. Planners explored the codebase and created tasks. Workers focused on completing assigned tasks. A judge agent decided whether to continue each cycle. In other words, they introduced role separation, clearer ownership, and feedback loops.

They were designing an organisation.

Given the number of agents and the run duration, even allowing for the task’s complexity, the results were underwhelming. The issue was not the underlying model, but the coordination structure.

AI agents are not uniquely hard to coordinate – any group of actors, human or AI, becomes hard to coordinate as the work becomes larger, more interdependent, and longer-running. That is, as the production system gains more stations.

The philosophy of “build smart agents and get out of the way” does not scale – it assumes that the limiting factor is the individual model performance. Even an organisation of the smartest individuals must be coordinated to produce value at scale. We cannot assume that intelligent organisational design will emerge by hiring smarter people.

So what’s the alternative?

Knowledge Factories

In earlier periods of industrial change, productivity did not come only from better tools. It came from redesigning the way work was organised. Physical crafts that had once been performed end-to-end by skilled artisans were broken into smaller repeatable tasks. Those tasks could be systematised and sequenced into a production flow.

That transition had many costs and tensions, and the analogy should not be stretched too far. Knowledge work is not identical to physical manufacturing, but the parallel is still important: complex work has inputs, transformations, standards, handovers, and other artefacts present in physical production systems.

New technology enabled the automation of some physical labour, but the industry still had to learn how to organise the work to capture the value. A machine standing alone did not create a modern production system. Stations, quality standards, and handover points needed to be introduced to allow work to flow through the system.

Over time, industrial production learned that productivity is not just about the capability of each station. The Toyota Production System (TPS), shaped by Toyota’s postwar production experiments and the work of Taiichi Ohno, captures many of these lessons in a particularly disciplined form based on two core pillars. Firstly, Jidoka, often translated as automation with a human touch, where quality is built into the process rather than inspected only at the end. Secondly, Just-in-Time, where each process produces exactly what is needed exactly when it is requested.

The TPS describes this relationship: that value capture depends on both the quality produced at the station and the flow of work through the system.

Quality must be created at every station. If defects are allowed to move downstream, this creates rework, waiting, and interruption later in the process.

Flow has to be designed between stations. If work piles up as unfinished inventory, waits for review, or moves in large batches, lead time grows even when individual stations are fast.

In physical manufacturing, this problem might look like parts waiting between machines. In knowledge work, it looks like unfinished tickets, draft documents waiting for review, unmerged pull requests, unresolved comments, half-approved proposals, unanswered questions, and AI outputs waiting for someone to copy, check, or rework.

Much of today’s knowledge work is still treated like craft work. A skilled person takes a broad request, gathers context, makes decisions, writes something, checks it, reformats it, sends it, chases feedback, and carries the work across the organisation by hand.

While not all knowledge work is equally routine, Toyota’s training system (as described in Toyota Talent) claims that the essential components of any skill learned by one person are potentially learnable by most other people, special talent aside. By decomposing complex work into clear, teachable steps, we can design systems that allow both humans and AI to contribute to an effective production system.

This does not mean removing humans from the system, but changing where they operate.

Building a Knowledge Factory Means Redesigning Work

If value is captured at the system level, then the AI strategy must start with the system.

Many AI adoption programmes begin by asking where to add AI. That tends to produce station-level adoption, with new tools, but the underlying flow of work remains mostly unchanged.

A better starting point is to step back and look at the flow of work and design of the stations. Where is the flow interrupted, and what would have to be true to enable constant, steady movement of value through the system?

Crucially, answering these questions will increase productivity with or without integration of AI – the underlying challenges are present for systems with many actors and stations. These questions should be posed before considering AI at all. By asking them, you lay the groundwork to allow your organisation to scale, enabling the effective introduction of AI.

This means mapping the production system as it exists today. Where does demand enter? What are the stations? What does each receive? What must each produce? Where are the handover points? What standard must the output meet before it moves downstream? Where does work wait? Where is rework created? Where do humans currently act as the integration layer between disconnected tools? This process is Value Stream Mapping.

We must start at the end of the flow by understanding what the customer actually needs, then design the flow backwards from that demand. Based on customer demand, we can then conceptualise “internal customers”. Each station becomes the customer of the upstream station, defining the exact quality of inputs needed to supply their customer.

Once the flow is visible, AI can be placed more intentionally. Some existing stations may be good candidates for high autonomy because the task is narrow, inputs are clear, output is easy to validate, and the cost of error is low. Other stations may need human review because the decision is high-risk, ambiguous, political, or ethical.

At each station, we need a standard, which is the tool used to assess output quality. If a worker produces output that does not meet the standard, the defect is caught immediately, not in five steps. Defects discovered late are more expensive because they interrupt more people, invalidate more work, and create more rework. Having a standard at each station is a form of shift-left quality.

This is where the organisational learning happens. Every defect is evidence that the production system is not yet good enough. Maybe the input was unclear. Maybe the standard was implicit. Maybe the agent had the wrong tools. Maybe the handover point was badly designed. Maybe the work should not have been automated. Maybe the human review came too late.

Good organisation design is the multiplier on AI capability.

The organisations that capture value from AI will be those that redesign their work such that additional actors create additional value, rather than additional coordination overhead.

The pressure to adopt AI has only exposed existing organisational inefficiencies. The 2025 McKinsey report confirms this: “Redesigning workflows is a key success factor” to AI adoption.

A Place for Humans in the Knowledge Factory

If knowledge work becomes arranged into knowledge factories, the human role changes.

In the industrial revolution, machines automated parts of physical labour, but humans did not vanish – their roles shifted. Now they design, configure, maintain, improve, and manage the system that produces the work.

Something similar is likely to happen in knowledge work.

Today, many knowledge workers are still producers on the critical path. They receive a broad request, gather context, make judgements, produce the output, move it into the right system, chase feedback, and repair whatever breaks along the way.

As AI becomes more capable at station-level transformations, humans move away from performing every transformation by hand and towards tending the organisational scaling problem itself. That means defining the work clearly enough for agents to perform it.

In software, we already see early signs of this through files like SKILLS.md, which encode the preferences and standards of human domain experts to produce high-quality output.

Many valuable outputs are not merely correct or incorrect. They are appropriate or inappropriate, clear or confusing, commercially sensible or naive, legally risky or acceptable, aligned or misaligned with the organisation’s intent. Those standards often live tacitly in experienced people. In a knowledge factory, part of the human role is to make those standards explicit enough that AI can work with them.

Humans will also design the boundaries of autonomy. Some stations can run with little intervention. Others should require review. Some should stop and ask for help when uncertainty is high. Some tasks are too hazardous to delegate. The point is not to maximise autonomy everywhere, but to design the production system so that human judgment appears where it creates the most value.

When defects occur, humans become problem solvers for the system. The bad response, broken test, poor proposal, unsafe recommendation, or rejected output should trigger the question, “What in the system allowed this defect to occur, and how do we prevent it from recurring?”

In a system like this, we can more readily scale the number of workers to produce more value, rather than more overhead.

That is why the strategy we need for AI is really a strategy for scaling organisations.

Better models improve station-level productivity. Better integration brings AI into the production line. But only better organisation design unlocks system-level productivity.

The companies that capture the most value from AI will not be the ones that deploy the most agents or create the smoothest integrations with their tools. They will be the ones who understand that faster stations do not automatically create faster systems, and use this insight to get the right output at each station and keep it moving steadily through the system.