2025#

June 6, 2025
in Software Development, Programming Fundamentals
3 min read

Imperative vs. Declarative: A Concept That Built an Empire

For self-taught developers and those early in their careers, certain foundational concepts often remain unexamined. You pick up syntax, frameworks, and tools rapidly, but the underlying paradigms that shape how we think about code get overlooked.

These terms, imperative and declarative, sound academic but represent a simple distinction that transforms how you approach problems.

The Two Ways to Think About Problems

Imperative programming tells the computer exactly how to do something, step by step. Like giving someone furniture assembly instructions: "First, attach part A to part B using screw 3. Then insert dowel C into hole D."

Example:

const numbers = [1, 2, 3, 4, 5];

const doubled = [];
for (let i = 0; i < numbers.length; i++) {
  doubled.push(numbers[i] * 2);
}

Declarative programming describes what you want, letting the system figure out how. Like ordering at a restaurant: "I want a pepperoni pizza." You don't explain kneading dough or managing oven temperature.

Example:

const numbers = [1, 2, 3, 4, 5];

const doubled = numbers.map(n => n * 2);

Same result. But notice how the second approach abstracts away the loop management, index tracking, and array building. You declare your intent—"transform each number by doubling it"—and map handles the mechanics.

Why This Matters Beyond Syntax

The power becomes obvious when you scale up complexity. Consider drawing graphics:

Imperative (HTML Canvas):

const ctx = canvas.getContext('2d');
ctx.beginPath();
ctx.rect(50, 50, 100, 75);
ctx.fillStyle = 'red';
ctx.fill();
ctx.closePath();

Declarative (SVG):

<rect x="50" y="50" width="100" height="75" fill="red" />

The imperative version commands each drawing step. The declarative version simply states "there is a red rectangle here" and lets the renderer handle pixel manipulation, memory allocation, and screen updates.

The same pattern appears everywhere in modern development:

Database queries: SQL is declarative. You specify what data you want, not how to scan tables or optimize joins.

SELECT name FROM users WHERE age > 25

Configuration management: Tools like Ansible let you declare desired system states rather than scripting installation steps.

- name: Ensure Apache is installed and running
  service:
    name: apache2
    state: started

Modern JavaScript: Methods like filter, reduce, and find let you declare transformations instead of managing loops.

// Instead of imperative loops
const adults = [];
for (let i = 0; i < users.length; i++) {
  if (users[i].age >= 18) {
    adults.push(users[i]);
  }
}

// Write declarative transformations
const adults = users.filter(user => user.age >= 18);

The Billion-Dollar Story

Now let me tell you how this simple principle reshaped the entire tech industry. In 1999, a buddy of mine, Google employee #21, tried to recruit me from Microsoft. "You probably don't even need to interview," he said, showing me their server racks filled with cheap consumer hardware. While competitors bought expensive fault-tolerant systems, Google was betting everything on commodity machines and a radical programming approach.

The imperative approach would be a coordination nightmare. Send chunk 1 to server 47, chunk 2 to server 134, wait for responses, handle server failures, retry failed chunks, merge partial results... Multiply that across thousands of machines and you get unmanageable complexity.

Instead, Google developed what became known as MapReduce; a declarative paradigm for distributed computing. Engineers could write: Here's my map function (extract words from web pages). Here's my reduce function (count word frequencies). Process the entire web. This framework handled all the imperative details: data distribution, failure recovery, load balancing, result aggregation. Engineers declared what they wanted computed. The system figured out how to coordinate thousands of servers.

This wasn't just elegant computer science. It was competitive advantage. While competitors struggled with complex distributed systems built on expensive hardware, Google's engineers focused on algorithms and data insights. Their declarative approach to distributed computing let them scale faster and cheaper than anyone thought possible.

What my friend was showing me in 1999, commodity hardware coordinated by smart software that abstracted away distributed complexity, was MapReduce in action, years before the famous 2004 paper. That paper didn't introduce a new concept; it documented the practices that had already powered Google's rise to dominance.

May 21, 2025
in AI, LLM, Workflow Automation
8 min read

Beyond Obsolescence: The Modest Proposal for LLM-Native Workflow Automation

Our prior analysis, "The Beginning and End of LLM Workflow Software: How MCP Will Obsolesce Workflows," posited that Large Language Models (LLMs), amplified by the Model Context Protocol (MCP), will fundamentally reshape enterprise workflow automation. This follow-up expands on that foundational argument.

The impending shift is not one of outright elimination, but of a profound transformation. Rather than becoming entirely obsolete, the human-centric graphical user interface (GUI) for workflows will largely recede from direct human interaction, as the orchestration of processes evolves to be managed primarily by LLMs.

This critical pivot signifies a change in agency: the primary "user" of workflow capabilities shifts from human to AI. Here, we lay out a modest proposal for a reference architecture that brings this refined vision to life, detailing how LLMs will interact with and harness these next-generation workflow systems.

The Modest Proposal: An LLM-Native Workflow Architecture

Our vision for the future of workflow automation centers on LLMs as the primary orchestrators of processes, with human interaction occurring at a much higher, conversational level. This shifts the complexity away from the human and into the intelligent automation system itself.

MCP Servers: The Secure Hands of the LLM

The foundation of this architecture is the Model Context Protocol (MCP), or similar secure resource access protocols. At Lit.ai, our approach is built on a fundamental philosophy that ensures governance and audibility: any action a user initiates via our platform ultimately executes as that user on the host system. For instance, when a user uploads a file through our web interface, a ls -l command reveals that file is literally "owned" by that user on disk. Similarly, when they launch a training process, a data build, or any other compute-intensive task, a ps aux command reveals that the process was launched by that user's identity, not a shared service account. This granular control is seamlessly integrated with enterprise identity and access management through Keycloak, enabling features like single sign-on (SSO) and federated security. You can delve deeper into our "Execute as User" principle here: https://docs.lit.ai/reference/philosophy/#execute-as-user-a-foundation-of-trust-and-control.

We've now seamlessly extended this very philosophy to our MCP servers. When launched for LLM interactions, these servers inherit the user's existing permissions and security context, ensuring the LLM's actions are strictly governed by the user's defined access rights. This isn't a speculative new security model for AI; it's an intelligent evolution of established enterprise security practices. All LLM-initiated actions are inherently auditable through existing system logs, guaranteeing accountability and adherence to the principle of least privilege.

The LLM's Workflow Interface: Submerged and Powerful

In this new era, legacy visual workflow software won't vanish entirely; instead, it transforms into sophisticated tools primarily used by the LLM. Consider an LLM's proven ability to generate clean JSON documents from natural language prompts. This is precisely how it will interact with the underlying workflow system.

This LLM-native interface offers distinct advantages over traditional human GUIs, because it's designed for programmatic interaction, not visual clicks and drags:

Unconstrained by Human UIs: The LLM doesn't need to visually parse a flowchart or navigate menus. It interacts directly with the workflow system's deepest configuration layers. This means the workflow tool's capabilities are no longer limited by what a human developer could represent in a GUI. For example, instead of waiting for a vendor to build UI components for a new property or function, the LLM can define and leverage these dynamically. The underlying workflow definition could be a flexible data structure like a JSON document, infinitely extensible on the fly by the LLM.
Unrivaled Efficiency: An LLM can interpret and generate the precise underlying code, API calls, or domain-specific language that defines the process. This direct programmatic access is orders of magnitude more efficient than any human-driven clicks and drags. Imagine the difference between writing machine code directly versus meticulously configuring a complex circuit board by hand—the LLM operates at a vastly accelerated conceptual level.
Dynamic Adaptation and Reactive Feature Generation: The LLM won't just create workflows from scratch; it will dynamically modify them in real-time. This includes its remarkable ability to write and integrate code changes on the fly to add features to a live workflow, or adapt to unforeseen circumstances. This provides a reactive, agile automation layer that can self-correct and enhance processes as conditions change, all without human intervention in a visual design tool.
Autonomous Optimization: Leveraging its analytical capabilities, the LLM could continuously monitor runtime data, identify bottlenecks or inefficiencies within the workflow's execution, and even implement optimizations to the process's internal logic. This moves from human-initiated process improvement to continuous, AI-driven refinement.

This approach creates a powerful separation: humans define what needs to happen through natural language, and the LLM handles how it happens, managing the intricate details of process execution within its own highly efficient, automated interface.

Illustrative Scenarios: Realizing Value with LLM-Native Workflows

Let's look at how this translates into tangible value creation:

Empowering Customer Service with Conversational Data Access

Imagine a customer service representative (CSR) on a call. In a traditional setup, the CSR might navigate a legacy Windows application, click through multiple tabs, copy-paste account numbers, and wait for various system queries to retrieve customer data. This is often clunky, slow, and distracting.

In an LLM-native environment, the CSR simply asks their AI assistant: "What is John Doe's current account balance and recent purchase history for product X?" Behind the scenes, the LLM, via MCP acting as the CSR, seamlessly accesses the CRM, payment system, and order database. It orchestrates the necessary API calls, pulls disparate data, and synthesizes a concise, relevant answer instantly. The entire "workflow" of retrieving, joining, and presenting this data happens invisibly, managed by the LLM, eliminating manual navigation and dramatically improving customer experience.

Accelerating Marketing Campaigns with AI Orchestration

Consider a marketing professional launching a complex, multi-channel campaign. Historically, this might involve using a dedicated marketing automation platform to visually design a workflow: dragging components for email sends, social media posts, ad placements, and follow-up sequences. Each component needs manual configuration, integration setup, and testing.

With an LLM-native approach, the marketing person converses with the AI: "Launch a campaign for our new Q3 product, target customers in segments A and B, include a personalized email sequence, a social media push on LinkedIn and X, and a retargeting ad on Google Ads. If a customer clicks the email link, send a follow-up SMS."

The LLM interprets this narrative. Using its access to marketing platforms via MCP, it dynamically constructs the underlying "workflow"—configuring the email platform, scheduling social posts, setting up ad campaigns, and integrating trigger-based SMS. If the marketing team later says, "Actually, let's add TikTok to that social push," the LLM seamlessly updates the live campaign's internal logic, reacting and adapting in real-time, requiring no manual GUI manipulation.

Dynamic Feature Enhancement for Core Business Logic

Imagine a core business process, like loan application review. Initially, the LLM-managed workflow handles standard credit checks and document verification. A new regulation requires a specific new bankruptcy check and a conditional review meeting for certain applicants.

Instead of a developer manually coding changes into a workflow engine, a subject matter expert (SME) simply tells the LLM: "For loan applications, also check if the applicant has had a bankruptcy in the last five years. If so, automatically flag the application and schedule a review call with our financial advisor team, ensuring it respects their calendar availability."

The LLM, understanding the existing process and having access to the bankruptcy database API and scheduling tools via MCP, dynamically writes or modifies the necessary internal code for the loan review "workflow." It adds the new conditional logic and scheduling steps, demonstrating its reactive ability to enhance core features without human intervention in a visual design tool.

Human Expertise: The Indispensable LLM Coaches

In this evolved landscape, human expertise isn't diminished; it's transformed and elevated. The "citizen developer" who mastered a specific GUI gives way to the LLM Coach or Context Engineer. These are the subject matter experts (SMEs) within an organization who deeply understand their domain, the organization's data, and its unique business rules. Their role becomes one of high-level guidance:

Defining Context: Providing the LLM with the nuanced information it needs about available APIs, data schemas, and precise business rules.
Prompt Strategy & Oversight: Guiding the LLM in structuring effective prompts and conversational patterns, and defining the overarching strategy for how the LLM interacts with its context to achieve optimal results. This involves ensuring the LLM understands and applies the best practices for prompt construction, even as it increasingly manages the literal generation of those prompts itself.
Feedback and Coaching: Collaborating with the LLM to refine its behavior, validate its generated logic, and ensure it accurately meets complex requirements.
Strategic Oversight: Auditing LLM-generated logic and ensuring compliance, especially for critical functions.

This evolution redefines human-AI collaboration, leveraging the strengths of both. It ensures that the profound knowledge held by human experts is amplified, not replaced, by AI.

Anticipating Counterarguments and Refutations

We're aware that such a fundamental shift invites scrutiny. Let's address some common counterarguments head-on:

"This is too complex to set up initially."

While the initial phase requires defining the LLM's operational context – exposing APIs, documenting data models, and ingesting business rules – this is a one-time strategic investment in foundational enterprise knowledge. This effort shifts from continuous, tool-specific GUI configuration (which itself is complex and time-consuming) to building a reusable, LLM-consumable knowledge base. Furthermore, dedicated "LLM Coaches" (SMEs) will specialize in streamlining this process, making the setup efficient and highly valuable.

"What about the 'black box' problem for critical processes?"

For critical functions where deterministic behavior and explainability are paramount, our architecture directly addresses this. The LLM is empowered to generate determinate, auditable code (e.g., precise Python functions or specific machine learning models) for these decision points. This generated code can be inspected, verified, and integrated into existing compliance frameworks, ensuring transparency where it matters most. The "black box" is no longer the LLM's inference, but the transparent, verifiable code it outputs.

"Humans need visual workflows to understand processes."

While humans do value visualizations, these will become "on-demand" capabilities, generated precisely when needed. The LLM can produce contextually relevant diagrams (like Mermaid diagrams), data visualizations, or flowcharts based on natural language queries. The visual representation becomes a result of the LLM's understanding and orchestration, not the primary, cumbersome means of defining it. Users won't be forced to manually configure diagrams; they'll simply ask the LLM to show them the process.

The Dawn of LLM-Native Operations

The future of workflow automation isn't about better diagrams and drag-and-drop interfaces for humans. It's about a fundamental transformation where intelligent systems, driven by natural language, directly orchestrate the intricate processes of the enterprise. Workflow tools, rather than being obsolesced, will evolve to serve a new primary user: the LLM itself.

May 19, 2025
in AI, LLM, Workflow Automation
12 min read

The Beginning and End of LLM Workflow Software: How MCP Will Obsolesce Workflows

In the rapidly evolving landscape of enterprise software, we're witnessing the meteoric rise of workflow automation tools. These platforms promise to streamline operations through visual interfaces where users can design, implement, and monitor complex business processes. Yet despite their current popularity, these GUI-based workflow solutions may represent the last generation of their kind—soon to be replaced by more versatile Large Language Model (LLM) interfaces.

The Current Workflow Software Boom

The workflow automation market is experiencing unprecedented growth, projected to reach 78.8 billion USD by 2030 with a staggering 23.1% compound annual growth rate. This explosive expansion is evident in both funding activity and market adoption: Workato secured a 200 million USD Series E round at a $5.7 billion valuation, while established players like ServiceNow and Appian continue to report record subscription revenues.

A quick glance at a typical workflow builder interface reveals the complexity these tools embrace:

alt text

The landscape is crowded with vendors aggressively competing for market share:

Enterprise platforms: ServiceNow, Pega, Appian, and IBM Process Automation dominate the high-end market, offering comprehensive solutions tightly integrated with their broader software ecosystems.
Integration specialists: Workato, Tray.io, and Zapier focus specifically on connecting disparate applications through visual workflow builders, catering to the growing API economy.
Emerging players: Newer entrants like Bardeen, n8n, and Make (formerly Integromat) are gaining traction with innovative approaches and specialized capabilities.

This workflow automation boom follows a familiar pattern we've seen before. Between 2018 and 2022, Robotic Process Automation (RPA) experienced a similar explosive growth cycle. Companies like UiPath reached a peak valuation of $35 billion before a significant market correction as limitations became apparent. RPA promised to automate routine tasks by mimicking human interactions with existing interfaces—essentially screen scraping and macro recording at an enterprise scale—but struggled with brittle connections, high maintenance overhead, and limited adaptability to changing interfaces.

Today's workflow tools attempt to address these limitations by focusing on API connections rather than UI interactions, but they still follow the same fundamental paradigm: visual programming interfaces that require specialized knowledge to build and maintain.

So why are organizations pouring billions into these platforms despite the lessons from RPA? Several factors drive this investment:

Digital transformation imperatives: COVID-19 dramatically accelerated organizations' need to automate processes as remote work became essential and manual, paper-based workflows proved impossible to maintain.
The automation gap: Companies recognize the potential of AI and automation but have lacked accessible tools to implement them across the organization without heavy IT involvement.
Democratization promise: Workflow tools market themselves as empowering "citizen developers"—business users who can automate their own processes without coding knowledge.
Pre-LLM capabilities: Until recently, organizations had few alternatives for process automation that didn't require extensive software development.

What we're witnessing is essentially a technological stepping stone—organizations hungry for AI-powered results before true AI was ready to deliver them at scale. But as we'll see, that technological gap is rapidly closing, with profound implications for the workflow software category.

Why LLMs Will Disrupt Workflow Software

While current workflow tools represent incremental improvements on decades-old visual programming paradigms, LLMs offer a fundamentally different approach—one that aligns with how humans naturally express process logic and intent. The technical capabilities enabling this shift are advancing rapidly, creating the conditions for widespread disruption.

The Technical Foundation: Resource Access Protocols

The key technical enabler for LLM-driven workflows is the development of secure protocols that allow these models to access and manipulate resources. Model Context Protocol (MCP) represents one of the most promising approaches:

MCP provides a standardized way for LLMs to:

Access data from various systems through controlled APIs
Execute actions with proper authentication and authorization
Maintain context across multiple interactions
Document actions taken for compliance and debugging

Unlike earlier attempts at AI automation, MCP and similar protocols solve the "last mile" problem by creating secure bridges between conversational AI and the systems that need to be accessed or manipulated. Major cloud providers are already implementing variations of these protocols, with Microsoft's Azure AI Actions, Google's Gemini API, and Anthropic's Claude Tools representing early implementations.

The proliferation of these standards means that instead of building custom integrations for each workflow tool, organizations can create a single set of LLM-compatible APIs that work across any AI interface.

Natural Language vs. GUI Interfaces

The cognitive load difference between traditional workflow tools and LLM interfaces becomes apparent when comparing approaches to the same problem:

Traditional Workflow Tool Process

Open workflow designer application
Create a new workflow and name it
Drag "Trigger" component (Customer Signup)
Configure webhook or database monitor
Drag "HTTP Request" component
Configure endpoint URL for credit API
Add authentication parameters (API key, tokens)
Add request body parameters and format
Connect to "JSON Parser" component
Define schema for response parsing
Create variable for credit score
Add "Decision" component
Configure condition (score < 600)
For "True" path, add "Notification" component
Configure recipients, subject, and message template
Add error handling for API timeout
Add error handling for data format issues
Test with sample data
Debug connection issues
Deploy to production environment
Configure monitoring alerts

LLM Approach

When a new customer signs up, retrieve their credit score from our API, 
store it in our database, and if the score is below 600, notify the risk 
assessment team.

The workflow tool approach requires not only understanding the business logic but also learning the specific implementation patterns of the tool itself. Users must know which components to use, how to properly connect them, and how to configure each element—skills that rarely transfer between different workflow platforms.

Dynamic Adaptation Through Conversation

Real business processes rarely remain static. Consider how process changes propagate in each paradigm:

Traditional Workflow Change Process

Open existing workflow in designer
Identify components that need modification
Add new components for bankruptcy check
Configure API connection to bankruptcy database
Add new decision branch
Connect positive result to new components
Add calendar integration component
Configure meeting details and attendees
Update documentation to reflect changes
Redeploy updated workflow
Test all paths, including existing functionality
Update monitoring for new failure points

LLM Approach

Actually, let's also check if they've had a bankruptcy in the last five 
years, and if so, automatically schedule a review call with our financial 
advisor team.

The LLM simply incorporates the new requirement conversationally. Behind the scenes, it maintains a complete understanding of the existing process and extends it appropriately—adding the necessary API calls, conditional logic, and scheduling actions without requiring the user to manipulate visual components.

Early implementations of this approach are already appearing. GitHub Copilot for Docs can update software configuration by conversing with developers about their intentions, rather than requiring them to parse documentation and make manual changes. Similarly, companies like Adept are building AI assistants that can operate existing software interfaces based on natural language instructions.

Self-Healing Systems: The Maintenance Advantage

Perhaps the most profound advantage of LLM-driven workflows is their ability to adapt to changing environments without breaking. Traditional workflows are notoriously brittle:

Traditional Workflow Failure Scenarios:

An API endpoint changes its structure
A data source modifies its authentication requirements
A third-party service deprecates a feature
A database schema is updated
Operating system or runtime dependencies change

When these changes occur, traditional workflows break and require manual intervention. Someone must diagnose the issue, understand the change, modify the workflow components, test the fixes, and redeploy. This maintenance overhead is substantial—studies suggest organizations spend 60-80% of their workflow automation resources on maintenance rather than creating new value.

LLM-Driven Workflow Adaptation: LLMs with proper resource access can automatically adapt to many changes:

When an API returns errors, the LLM can examine documentation, test alternative approaches, and adjust parameters
If authentication requirements change, the LLM can interpret error messages and modify its approach
When services deprecate features, the LLM can find and implement alternatives based on its understanding of the underlying intent
Changes in database schemas can be discovered and accommodated dynamically
Environmental changes can be detected and worked around

Rather than breaking, LLM-driven workflows degrade gracefully and can often self-heal without human intervention. When they do require assistance, the interaction is conversational:

User: The customer onboarding workflow seems to be failing at the credit check 
step.
LLM: I've investigated the issue. The credit API has changed its response 
format. I've updated the workflow to handle the new format. Would you like 
me to show you the specific changes I made?

This self-healing capacity drastically reduces maintenance overhead and increases system reliability. Organizations using early LLM-driven processes report up to 70% reductions in workflow maintenance time and significantly improved uptime.

Compliance and Audit Superiority

Perhaps counterintuitively, LLM-driven workflows can provide superior compliance capabilities. Several financial institutions are already piloting LLM systems that maintain comprehensive audit logs that surpass traditional workflow tools:

Granular Action Logging: Every step, decision point, and data access is logged with complete context
Natural Language Explanations: Each action includes an explanation of why it was taken
Cryptographic Verification: Logs can be cryptographically signed and verified for tamper detection
Full Data Lineage: Complete tracking of where data originated and how it was transformed
Semantic Search: Compliance teams can query logs using natural language questions

A major U.S. bank recently compared their existing workflow tool's audit capabilities with a prototype LLM-driven system and found the LLM approach provided 3.5x more detailed audit information with 65% less storage requirements, due to the elimination of redundant metadata and more efficient logging.

Visualization On Demand

For scenarios where visual representation is beneficial, LLMs offer a significant advantage: contextually appropriate visualizations generated precisely when needed.

Rather than being limited to pre-designed dashboards and reports, users can request visualizations tailored to their current needs:

User: Show me a diagram of how the customer onboarding process changes with 
the new bankruptcy check.

LLM: Generates a Mermaid diagram showing the modified process flow with the 
new condition highlighted

User: How will this affect our approval rates based on historical data?

LLM: Generates a bar chart showing projected approval rate changes based on 
historical bankruptcy data

Companies like Observable and Vercel are already building tools that integrate LLM-generated visualizations into business workflows, allowing users to create complex data visualizations through conversation rather than manual configuration.

Current State of Adoption

While the technical capabilities exist, we're still in the early stages of this transition. Rather than presenting hypothetical examples as established successes, it's more accurate to examine how organizations are currently experimenting with LLM-driven workflow approaches:

Prototype implementations: Several companies are building prototype systems that use LLMs to orchestrate workflows, but these remain largely experimental and haven't yet replaced enterprise-wide workflow systems.
Augmentation rather than replacement: Most organizations are currently using LLMs to augment existing workflow tools—helping users configure complex components or troubleshoot issues—rather than replacing the tools entirely.
Domain-specific applications: The most successful early implementations focus on narrow domains with well-defined processes, such as content approval workflows or customer support triage, rather than attempting to replace entire workflow platforms.
Hybrid approaches: Organizations are finding success with approaches that combine traditional workflow engines with LLM interfaces, allowing users to interact conversationally while maintaining the robustness of established systems.

While we don't yet have large-scale case studies with verified metrics showing complete workflow tool replacement, the technological trajectory is clear. As LLM capabilities continue to improve and resource access protocols mature, the barriers to adoption will steadily decrease.

Investment Implications

The disruption of workflow automation by LLMs isn't a gradual shift—it's happening now. For decision-makers, this isn't about careful transitions or hedged investments; it's about immediate and decisive action to avoid wasting resources on soon-to-be-obsolete technology.

Halt Investment in Traditional Workflow Tools Immediately

Stop signing or renewing licenses for traditional workflow automation platforms. These systems will be obsolete within weeks, not years. Any new investment in these platforms represents resources that could be better allocated to LLM+MCP approaches. If you've recently purchased licenses, investigate termination options or ways to repurpose these investments.

Redirect Resources to LLM Infrastructure

Immediately reallocate budgets from workflow software to: - Enterprise-grade LLM deployment on your infrastructure - Implementation of MCP or equivalent protocols - API development for all internal systems - Prompt engineering training for existing workflow specialists

Install LLM+MCP on Every Desktop Now

Rather than planning gradual rollouts, deploy LLM+MCP capabilities across your organization immediately. Every day that employees continue to build workflows in traditional tools is a day of wasted effort creating systems that will need to be replaced. Local or server-based LLMs with proper resource access should become standard tools alongside word processors and spreadsheets.

Retrain Teams for the New Paradigm

Your workflow specialists need to become prompt engineers—not next quarter, but this week: - Cancel scheduled workflow tool training - Replace with intensive prompt engineering workshops - Focus on teaching conversational process design rather than visual programming - Develop internal guides for effective LLM workflow creation

Examine Legal Obligations

For organizations with existing contracts for workflow platforms: - Review termination clauses and calculate the cost of early exits - Investigate whether remaining license terms can be applied to API access rather than visual workflow tools - Consider whether vendors might offer transitions to their own LLM offerings in lieu of contracted services

Vendors: Pivot or Perish

For workflow automation companies, there's no time for careful transitions: - Immediately halt development on visual workflow designers - Redirect all engineering resources to LLM interfaces and connectors - Open all APIs and create comprehensive documentation for LLM interaction - Develop prompt libraries that encapsulate existing workflow patterns

The AI-assisted development cycle is accelerating innovation at unprecedented rates. What would have taken years is now happening in weeks. Organizations that try to manage this as a gradual transition will find themselves outpaced by competitors who embrace the immediate shift to LLM-driven processes.

Our Own Evolution

We need to acknowledge our own journey in this space. At Lit.ai, we initially invested in building the Workflow Canvas - a visual tool for designing LLM-powered workflows that made the technology more accessible. We created this product with the belief that visual workflow builders would remain essential for orchestrating complex LLM interactions.

However, our direct experience with customers and the rapid evolution of LLM capabilities has caused us to reassess this position. The very technology we're building is becoming sophisticated enough to make our own workflow canvas increasingly unnecessary for many use cases. Rather than clinging to this approach, we're now investing heavily in Model Context Protocol (MCP) and direct LLM resource access.

This pivot represents our commitment to following the technology where it leads, even when that means disrupting our own offerings. We believe the most valuable contribution we can make isn't building better visual workflow tools, but rather developing the connective tissue that allows LLMs to directly access and manipulate the resources they need to execute workflows without intermediary interfaces.

Our journey mirrors what we expect to see across the industry - an initial investment in workflow tools as a stepping stone, followed by a recognition that the real value lies in direct LLM orchestration with proper resource access protocols.

Timeline and Adoption Considerations

While the technical capabilities enabling this shift are rapidly advancing, several factors will influence adoption timelines:

Enterprise Inertia

Large organizations with established workflow infrastructure and trained teams will transition more slowly. Expect these environments to adopt hybrid approaches initially, where LLMs complement rather than replace existing workflow tools.

High-Stakes Domains

Industries with mission-critical workflows (healthcare, finance, aerospace) will maintain traditional interfaces longer, particularly for processes with significant safety or regulatory implications. However, even in these domains, LLMs will gradually demonstrate their reliability for increasingly complex tasks.

Security and Control Concerns

Organizations will need to develop comfort with LLM-executed workflows, particularly regarding security, predictability, and control. Establishing appropriate guardrails and monitoring will be essential for building this confidence.

Conclusion

The current boom in workflow automation software represents the peak of a paradigm that's about to be disrupted. As LLMs gain direct access to resources and demonstrate their ability to understand and execute complex processes through natural language, the value of specialized GUI-based workflow tools will diminish.

Forward-thinking organizations should prepare for this shift by investing in API infrastructure, LLM integration capabilities, and domain-specific knowledge engineering rather than committing deeply to soon-to-be-legacy workflow platforms. The future of workflow automation isn't in better diagrams and drag-drop interfaces—it's in the natural language interaction between users and increasingly capable AI systems.

In fact, this very article demonstrates the principle in action. Rather than using a traditional publishing workflow tool with multiple steps and interfaces, it was originally drafted in Google Docs, then an LLM was instructed to:

Translate this to markdown, save it to a file on the local disk, execute a 
build, then upload it to AWS S3.

The entire publishing workflow—format conversion, file system operations, build process execution, and cloud deployment—was accomplished through a simple natural language request to an LLM with the appropriate resource access, eliminating the need for specialized workflow interfaces.

This perspective challenges conventional wisdom about enterprise software evolution. Decision-makers who recognize this shift early will gain significant advantages in operational efficiency, technology investment, and organizational agility.

April 17, 2025
in AI, Data, Taxonomies, LLM
13 min read

The Rising Value of Taxonomies in the Age of LLMs

Introduction

Large Language Models (LLMs) are growing the demand for structured data, creating a significant opportunity for companies specializing in organizing that data. This article explores how this trend is making expertise in taxonomies and data-matching increasingly valuable for businesses seeking to utilize LLMs effectively.

LLMs Need Structure

LLMs excel at understanding and generating human language. However, they perform even better when that language is organized in a structured way, which improves accuracy, consistency, and reliability. Consider this: Imagine asking an LLM to find all research papers related to a specific protein interaction in a particular type of cancer. If the LLM only has access to general scientific abstracts and articles, it might provide a broad overview of cancer research but struggle to pinpoint the highly specific information you need. You might get a lot of information about cancer in general, but not a precise list of papers that focus on the specific protein interaction.

However, if the LLM has access to a structured database of scientific literature with detailed metadata and relationships, it can perform much more targeted research. This database would include details like:

Protein names and identifiers
Cancer types and subtypes
Experimental methods and results
Genetic and molecular pathways
Relationships to other research papers and datasets

With this structured data, the LLM can quickly identify the relevant papers, analyze their findings, and provide a more focused and accurate summary of the research. This structured approach ensures that the LLM considers critical scientific details and avoids generalizations that might not be relevant to the specific research question. Taxonomies and ontologies are essential for organizing and accessing this kind of complex scientific information.

Large Language Models often benefit significantly from a technique called Retrieval-Augmented Generation (RAG). RAG involves retrieving relevant information from an external knowledge base and providing it to the LLM as context for generating a response. However, RAG systems are only as effective as the data they retrieve. Without well-structured data, the retrieval process can return irrelevant, ambiguous, or incomplete information, leading to poor LLM output. This is where taxonomies, ontologies, and metadata become crucial. They provide the 'well-defined scope' and 'high-quality retrievals' that are essential for successful RAG implementation. By organizing information into clear categories, defining relationships between concepts, and adding rich context, taxonomies enable RAG systems to pinpoint the most relevant data and provide LLMs with the necessary grounding for accurate and insightful responses.

To address these challenges and provide the necessary structure, we can turn to taxonomies. Let's delve into what exactly a taxonomy is and how it can benefit LLMs.

What is a Taxonomy

A taxonomy is a way of organizing information into categories and subcategories. Think of it as a hierarchical classification system. A good example is the biological taxonomy used to classify animals. For instance, red foxes are classified as follows:

Domain: Eukarya (cells with nuclei)
Kingdom: Animalia (all animals)
Phylum: Chordata (animals with a backbone)
Class: Mammalia (mammals)
Order: Carnivora (carnivores)
Family: Canidae (dogs)
Genus: Vulpes (foxes)
Species: Vulpes Vulpes (red fox)

alt text Annina Breen, CC BY-SA 4.0, via Wikimedia Commons

This hierarchical structure shows how we move from a very broad category (all animals) to a very specific one (Red Fox). Just like this animal taxonomy, other taxonomies organize information in a structured way.

Taxonomies provide structure by:

Improving Performance: Taxonomies help LLMs focus on specific areas, reducing the risk of generating incorrect or nonsensical information and improving the relevance of their output.
Facilitating Data Integration: Taxonomies can integrate data from various sources, providing LLMs with a more comprehensive and unified view of information. This is crucial for tasks that require broad knowledge and context.
Providing Contextual Understanding: Taxonomies offer a framework for understanding the relationships between concepts, enabling LLMs to generate more coherent and contextually appropriate responses.

Types of Taxonomies

There are several different types of taxonomies, each with its own strengths and weaknesses, and each relevant to how LLMs can work with data:

Hierarchical Taxonomies: Organize information in a tree-like structure, with broader categories at the top and more specific categories at the bottom. This is the most common type, often used in library classification or organizational charts. For LLMs, this provides a clear, nested structure that aids in understanding relationships and navigating data.

Faceted Taxonomies: Allow information to be categorized in multiple ways, enabling users to filter and refine their searches. Think of e-commerce product catalogs with filters for size, color, and price. This is particularly useful for LLMs that need to handle complex queries and provide highly specific results, as they can leverage multiple facets to refine their output.

Polyhierarchical Taxonomies: A type of hierarchical taxonomy where a concept can belong to multiple parent categories. For example, "tomato" could be classified under both "fruits" and "red foods." This allows LLMs to understand overlapping categories and handle ambiguity in classification.

Associative Taxonomies: Focus on relationships between concepts, rather than just hierarchical structures. For example, a taxonomy of "car" could include terms like "wheel," "engine," "road," and "transportation," highlighting the interconnectedness of these concepts. This helps LLMs understand the broader context and semantic relationships between terms, improving their ability to generate coherent and relevant responses.

Ultimately, the increasing reliance on LLM-generated content necessitates the implementation of well-defined taxonomies to unlock its full potential. The specific type of taxonomy may vary depending on the application, but the underlying principle remains: taxonomies are essential for enhancing the value and utility of LLM outputs.

LLMs and Internal Knowledge Representation

While we've discussed various types of external taxonomies, it's important to note that LLMs also develop their own internal representations of knowledge. These internal representations differ significantly from human-curated taxonomies and play a crucial role in how LLMs process information.

One way LLMs represent knowledge is through word vectors. These are numerical representations of words where words with similar meanings are located close to each other in a multi-dimensional space. For example, the relationship "king - man + woman = queen" can be captured through vector arithmetic, demonstrating how LLMs can represent semantic relationships.

alt text Ben Vierck, Word Vector Illustration, CC0 1.0

The word vector graph illustrates semantic relationships captured by LLMs using numerical representations of words. Each word is represented as a vector in a multi-dimensional space. In this example, the vectors for 'royal,' 'king,' and 'queen' originate at the coordinate (0,0), depicting their positions in this space. The vector labeled 'man' extends from the end of the 'royal' vector to the end of the 'king' vector, while the vector labeled 'woman' extends from the end of the 'royal' vector to the end of the 'queen' vector. This arrangement demonstrates how LLMs can represent semantic relationships such as 'king' being 'royal' plus 'man,' and 'queen' being 'royal' plus 'woman.' The spatial relationships between these vectors reflect the conceptual relationships between the words they represent.

However, these internal representations, unlike human-curated taxonomies, are:

Learned, Not Curated: Acquired through exposure to massive amounts of text data, rather than through a process of human design and refinement. This means the LLM infers relationships, rather than having them explicitly defined.
Unstructured: The relationships learned by LLMs may not always fit into a clear, hierarchical structure.
Context-Dependent: The meaning of a word or concept can vary depending on the surrounding text, making it difficult for LLMs to consistently apply a single, fixed categorization.
Incomplete: It's important to understand that LLMs don't know what they don't know. They might simply be missing knowledge of specific domains or specialized terminology that wasn't included in their training data.

This is where taxonomies become crucial. They provide an external, structured framework that can:

Constrain LLM Output: By mapping LLM output to a defined taxonomy, we can ensure that the information generated is consistent, accurate, and relevant to a specific domain.
Ground LLM Knowledge: Taxonomies can provide LLMs with access to authoritative, curated knowledge that may be missing from their training data.
Bridge the Gap: Taxonomies can bridge the gap between the unconstrained, often ambiguous language that humans use and the more structured, formal representations that LLMs can effectively process.

Taxonomies as Service Providers

Companies that specialize in creating and managing taxonomies and developing metadata schemas and ontologies to complement taxonomies are well-positioned to become key service providers in the LLM ecosystem. Their existing expertise in organizing information and structuring data makes them uniquely qualified to help businesses harness LLMs effectively.

For example, companies that specialize in organizing complex data for specific industries, such as healthcare or finance, often create proprietary systems to analyze and categorize information for their clients. In the healthcare sector, a company might create a proprietary methodology for evaluating healthcare plan value, categorizing patients based on risk factors and predicting healthcare outcomes. In the realm of workforce development, a company might develop a detailed taxonomy of job skills, enabling employers to evaluate their current workforce capabilities and identify skill gaps. This same taxonomy can also empower job seekers to understand the skills needed for emerging roles and navigate the path to acquiring them. These companies develop expertise in data acquisition, market understanding, and efficient data processing to deliver valuable insights.

Companies that specialize in creating and managing taxonomies are not only valuable for general LLM use but also for improving the effectiveness of Retrieval-Augmented Generation systems. RAG's limitations, such as retrieving irrelevant or ambiguous information, often stem from underlying data organization issues. Taxonomy providers can address these issues by creating robust knowledge bases, defining clear data structures, and adding rich metadata. This ensures that RAG systems can retrieve the most relevant and accurate information, thereby significantly enhancing the quality of LLM outputs. In essence, taxonomy experts can help businesses transform their RAG systems from potentially unreliable tools into highly effective knowledge engines.

Strategic Opportunities for Taxonomy Providers in the LLM Era

The rapid advancement and adoption of LLMs are driving an increase in demand for automated content generation. Businesses are increasingly looking to replace human roles with intelligent agents capable of handling various tasks, from customer service and marketing to data analysis and research. This drive towards agent-driven automation creates a fundamental need for well-structured data and robust taxonomies. Companies specializing in these areas are strategically positioned to capitalize on this demand.

Here's how taxonomy companies can leverage this market shift:

1. Capitalizing on the Content Generation Boom:

Demand-Driven Growth: The primary driver will be the sheer volume of content that businesses want to generate using LLMs and agents. Taxonomies are essential to ensure this content is organized, accurate, and aligned with specific business needs. Emphasize that the core opportunity lies in meeting this growing demand.

Agent-Centric Focus: Highlight that the demand is not just for general content but for content that powers intelligent agents. This requires taxonomies that are not just broad but highly specific and contextually rich.

2. Building Partnerships:

The surge in demand for LLM-powered applications and intelligent agents is creating a wave of new organizations focused on developing these solutions. Many of these companies will need specialized data, including job skills taxonomies, to power their agents effectively. This presents a unique opportunity for the job skills taxonomy provider to forge strategic partnerships.

Addressing the "Build vs. Buy" Decision: Many new agent builders will face the decision of whether to build their own skills taxonomy from scratch or partner with an existing provider. Given the rapid pace of LLM development and the complexity of creating and maintaining a robust taxonomy, partnering often proves to be the most efficient and cost-effective route. The taxonomy company can highlight the advantages of partnering:

Faster time to market
Higher quality data
Ongoing updates and maintenance

By targeting these emerging agent-building organizations, the job skills taxonomy company can capitalize on the growing demand for LLM-powered solutions and establish itself as a critical data provider in the evolving AI-driven workforce development landscape. This approach focuses on the new opportunities created by the LLM boom, rather than the existing operations of the taxonomy provider.

Seamless Integration via MCP: To further enhance the value proposition, taxonomy providers should consider surfacing their capabilities using the Model Context Protocol (MCP). MCP allows for standardized communication between different AI agents and systems, enabling seamless integration and interoperability. By making their taxonomies accessible via MCP, providers can ensure that agent builders can easily incorporate their data into their workflows, reducing friction and accelerating development.

3. Capitalizing on Existing Expertise as an Established Player:

Market Advantage: Emphasize that established taxonomy companies have a significant advantage due to their existing expertise, data assets, and client relationships. This position allows them to quickly adapt to the agent-driven market.

Economic Efficiency: Highlight the cost-effectiveness of using established taxonomy providers compared to building in-house solutions. Businesses looking to deploy agents quickly will likely prefer to partner with existing experts.

By focusing on the demand for content generation driven by the rise of intelligent agents and by targeting partnerships with agent-building organizations, taxonomy companies can position themselves for significant growth and success in this evolving market.

Why This Matters to You

We rely on AI more and more every day. From getting quick answers to complex research, we expect AI to provide us with accurate and reliable information. But what happens when the volume of information becomes overwhelming? What happens when AI systems need to sift through massive amounts of data to make critical decisions?

That's where organized data becomes vital. Imagine AI as a powerful detective tasked with solving a complex case. Without a well-organized case file (a robust taxonomy), the detective might get lost in a sea of clues, missing crucial details or drawing the wrong conclusions. But with a meticulously organized file, the detective can:

Quickly Identify Key Evidence: AI can pinpoint the most relevant and reliable information, even in a sea of data.
Connect the Dots: AI can understand the complex relationships between different pieces of information, revealing hidden patterns and insights.
Ensure a Clear Narrative: AI can present a coherent and accurate picture of the situation, avoiding confusion or misinterpretation.

In essence, the better the data is organized, the more effectively AI can serve as a reliable source of truth. It's about ensuring that AI doesn't just process information, but that it processes it in a way that promotes clarity, accuracy, and ultimately, a shared understanding of the world. This is why the role of taxonomies, ontologies, and metadata is so critical—they are the foundation for building AI systems that help us navigate an increasingly complex information landscape with confidence.

The Indispensable Role of Human Curation

While LLMs can be valuable tools in the taxonomy development process, they cannot fully replace human expertise (yet). Human curation is essential because taxonomies are ultimately designed for human consumption. Human curators can ensure that taxonomies are intuitive, user-friendly, and aligned with how people naturally search for and understand information. Human experts are needed not just for creating the taxonomy itself, but also for defining and maintaining the associated metadata and ontologies.

For example, imagine an LLM generating a taxonomy for a complex subject like "fine art." While it might group works by artist or period, a human curator would also consider factors like artistic movement, cultural significance, and thematic connections, creating a taxonomy that is more nuanced and useful for art historians, collectors, and enthusiasts.

alt text By Michelangelo, Public Domain, https://commons.wikimedia.org/w/index.php?curid=9097336

Developing a high-quality taxonomy often requires specialized knowledge of a particular subject area. Human experts can bring this knowledge to the process, ensuring that the taxonomy accurately reflects the complexities of the domain (for now).

Challenges and Opportunities

The rise of LLMs directly fuels the demand for sophisticated taxonomies. While LLMs can assist in generating content, taxonomies ensure that this content is organized, accessible, and contextually relevant. This dynamic creates both opportunities and challenges for taxonomy providers. The evolving nature of LLMs requires constant adaptation in taxonomy strategies, and the integration of metadata and ontologies becomes essential to maximize the utility of LLM-generated content. So, the expertise in developing and maintaining these taxonomies becomes a critical asset in the age of LLMs.

Enhanced Value Through Metadata and Ontologies

The value of taxonomies is significantly amplified when combined with robust metadata and ontologies. Metadata provides detailed descriptions and context, making taxonomies more searchable and understandable for LLMs. Ontologies, with their intricate relationships and defined properties, enable LLMs to grasp deeper contextual meanings and perform complex reasoning.

Metadata is data that describes other data. For example, the title, author, and publication date of a book are metadata. High-quality metadata, such as detailed descriptions, keywords, and classifications, makes taxonomies more easily searchable and understandable by both humans and machines, including LLMs. This rich descriptive information provides essential context that enhances the utility of the taxonomy.

Ontologies are related to taxonomies but go beyond simple hierarchical classification. While taxonomies primarily focus on organizing information into categories and subcategories, often representing "is-a" relationships (e.g., "A dog is a mammal"), ontologies provide a more detailed, formal, and expressive representation of knowledge. They define concepts, their properties, and the complex relationships between them. Ontologies answer questions like "What is this?", "What are its properties?", "How is it related to other things?", and "What can we infer from these relationships?"

Key Distinctions:

Relationship Types: Taxonomies mostly deal with hierarchical ("is-a") relationships. Ontologies handle many different types of relationships (e.g., causal, temporal, spatial, "part-of," "has-property").
Formality: Taxonomies can be informal and ad-hoc. Ontologies are more formal and often use standardized languages and logic (e.g., OWL - Web Ontology Language).
Expressiveness: Taxonomies are less expressive and can't represent complex rules or constraints. Ontologies are highly expressive and can represent complex knowledge and enable sophisticated reasoning.
Purpose: Taxonomies are primarily for organizing and categorizing. Ontologies are for representing knowledge, defining relationships, and enabling automated reasoning.

For instance, an ontology about products would not only categorize them (e.g., "electronics," "clothing") but also define properties like "manufacturer," "material," "weight," and "price," as well as relationships such as "is made of," "is sold by," and "is a component of." This rich, interconnected structure allows an LLM to understand not just the category of a product but also its attributes and how it relates to other products. This added layer of detail is what makes ontologies so valuable for LLMs, as they provide the deep, contextual understanding needed for complex reasoning and knowledge-based tasks. However, this level of detail also makes them more complex to develop and maintain, requiring specialized expertise and ongoing updates.

Therefore, companies that can integrate and provide these elements alongside taxonomies will offer a more compelling and valuable service in the LLM ecosystem. The combination of well-structured taxonomies, rich metadata, and detailed ontologies provides the necessary context and depth for LLMs to operate at their full potential.

Conclusion

The rise of LLMs is creating a classic supply and demand scenario. As more businesses adopt LLMs and techniques like RAG, the demand for structured data and the services of taxonomy providers will increase. However, it's crucial to recognize that the effectiveness of RAG hinges on high-quality data organization. Companies specializing in creating robust taxonomies, ontologies, and metadata are positioned to meet this demand by providing the essential foundation for successful RAG implementations. Their expertise ensures that LLMs and RAG systems can retrieve and utilize information effectively, making their services increasingly valuable for organizations looking to take advantage of LLM-generated content.

February 1, 2025
in AI, Software Development, Technology
14 min read

The AI-Driven Transformation of Software Development

1. Introduction: The Seismic Shift in Software Development

The software development landscape is undergoing a seismic shift, driven by the rapid advancement of artificial intelligence. This transformation transcends simple automation; it fundamentally alters how software is created, acquired, and utilized, leading to a re-evaluation of the traditional 'build versus buy' calculus. The pace of this transformation is likely to accelerate, making it crucial for businesses and individuals to stay adaptable and informed.

2. The Rise of AI-Powered Development Tools

For decades, the software industry has been shaped by a tension between bespoke, custom-built solutions and readily available commercial products. The complexity and cost associated with developing software tailored to specific needs often pushed businesses towards purchasing off-the-shelf solutions, even if those solutions weren't a perfect fit. This gave rise to the dominance of large software vendors and the Software-as-a-Service (SaaS) model. However, AI is poised to disrupt this paradigm.

Introduction to AI-Powered Automation

Large Language Models (LLMs) are revolutionizing software development by understanding natural language instructions and generating code snippets, functions, or even entire modules. Imagine describing a software feature in plain language and having an AI produce the initial code. Many are already using tools like ChatGPT in this way, coaching the AI, suggesting revisions, and identifying improvements before testing the output.

This is 'vibe coding,' where senior engineers guide LLMs with high-level intent rather than writing every line of code. While this provides a significant productivity boost—say, a 5x improvement—the true transformative potential lies in a one-to-many dynamic, where a single expert can exponentially amplify their impact by managing numerous AI agents simultaneously, each focused on different project aspects.

Expanding AI Applications in Development

Additionally, AI is being used for code review tools that can automatically identify potential issues and suggest improvements, and specific AI platforms offered by cloud providers like AWS CodeWhisperer and Google Cloud's AI Platform are providing comprehensive AI-driven development environments. AI is being used for AI-assisted testing and debugging, identifying potential bugs, suggesting fixes, and automating test cases.

Composable Architectures and Orchestration

Beyond code completion and generation, AI tools are also facilitating the development of reusable components and services. This move toward composable architectures allows developers to break down complex tasks into smaller, modular units. These units, powered by AI, can then be easily assembled and orchestrated to create larger applications, increasing efficiency and flexibility. Model Context Protocol (MCP) could play a role in standardizing the discovery and invocation of these services.

Furthermore, LLM workflow orchestration is also becoming more prevalent, where AI models can manage and coordinate the execution of these modular services. This allows for dynamic and adaptable workflows that can be quickly changed or updated as needed.

Human Role and Importance

However, it's crucial to recognize that AI is a tool. Humans will still be needed to guide its development, provide creative direction, and critically evaluate the AI-generated outputs. Human problem-solving skills and domain expertise remain essential for ensuring software quality and effectiveness.

Impact on Productivity and Innovation

These tools are not just incremental improvements; they have the potential to dramatically increase developer productivity, potentially enabling the same output with half the staff or even leading to a fivefold increase in efficiency in the near term, lower the barrier to entry for software creation, and enable the fast iteration of new features.

Impact on Offshoring

Furthermore, AI tools have the potential to level the playing field for offshore development teams. Traditionally, challenges such as time zone differences, communication barriers, and perceived differences in skill level have sometimes put offshore teams at a disadvantage. However, AI-powered development tools can mitigate these challenges:

Enhanced Productivity and Efficiency: AI tools can automate many tasks, allowing offshore teams to deliver faster and more efficiently, overcoming potential time zone delays.
Improved Code Quality and Consistency: AI-assisted code generation, review, and testing tools can ensure high code quality and consistency, regardless of the team's location.
Reduced Communication Barriers: AI-powered translation and documentation tools can facilitate clearer communication and knowledge sharing.
Access to Cutting-Edge Technology: With cloud-based AI tools, offshore teams can access the same advanced technology as onshore teams, eliminating the need for expensive local infrastructure.
Focus on Specialization: Offshore teams can specialize in specific AI-related tasks, such as AI model training, data annotation, or AI-driven testing, becoming highly competitive in these areas.

By embracing AI tools, offshore teams can overcome traditional barriers and compete on an equal footing with onshore teams, offering high-quality software development services at potentially lower costs. This could lead to a more globalized and competitive software development landscape.

3. The Explosion of New Software and Features

This evolution is leading to an explosion of new software products and features. Individuals and small teams can now bring their ideas to life with unprecedented speed and efficiency. This is made possible by AI tools that can quickly translate high-level descriptions into working code, allowing for quicker prototyping and development cycles.

Crucial to the effectiveness of these AI tools is the quality of their training data. High-quality, diverse datasets enable AI models to generate more accurate and robust code. This is particularly impactful in niche markets, where highly specialized software solutions, previously uneconomical to develop, are now becoming viable.

For instance, AI could revolutionize enterprise applications with greater automation and integration capabilities, lead to more personalized and intuitive consumer apps, accelerate scientific discoveries by automating data analysis and simulations, or make embedded systems more intelligent and adaptable.

Furthermore, AI can analyze user data to identify areas for improvement and drive innovation, making software more responsive to user needs. While AI automates many tasks, human creativity and critical thinking are still vital for defining the vision and goals of software projects.

It's important to consider the potential environmental impact of this increased software development, including the energy consumption of training and running AI models. However, AI-driven software also offers opportunities for more efficient resource management and sustainability in other sectors, such as optimizing supply chains or reducing energy waste.

Software will evolve at an unprecedented pace, with AI facilitating fast feature iteration, updates, and highly personalized user experiences. This surge in productivity will likely lead to an explosion of new software products, features, and niche applications, democratizing software creation and lowering the barrier to entry.

4. The Transformation of the Commercial Software Market

This evolution is reshaping the commercial software market. The proliferation of high-quality, AI-enhanced open-source alternatives is putting significant pressure on proprietary vendors. As companies find they can achieve their software needs through internal development or by leveraging robust open-source solutions, they are becoming more price-sensitive and demanding greater value from commercial offerings.

This is forcing vendors to innovate not only in terms of features but also in their business models, with a greater emphasis on value-added services such as consulting, support, and integration expertise. Strategic partnerships and collaboration with open-source communities will also become crucial for commercial vendors to remain competitive.

Commercial software vendors will need to adapt to this shift by offering their functionalities as discoverable services via protocols like MCP. Instead of selling large, complex products, they might provide specialized services that can be easily integrated into other applications. This could lead to new business models centered around providing best-in-class, composable AI capabilities.

Specifically, this shift is leading to changes in priorities and value perceptions. Commercial software vendors will likely need to shift their focus towards providing value-added services such as consulting, support, and integration expertise as open-source alternatives become more competitive. Companies may place a greater emphasis on software that can be easily customized and integrated with their existing systems, potentially leading to a demand for more flexible and modular solutions.

Furthermore, commercial vendors may need to explore strategic partnerships and collaborations with open-source communities to remain competitive and utilize the collective intelligence of the open-source ecosystem.

Overall, AI-driven development has the potential to transform the software landscape, creating a more level playing field for open-source projects and putting significant pressure on the traditional commercial software market. Companies will likely need to adapt their strategies and offerings to remain competitive in this evolving environment.

5. The Impact on the Open-Source Ecosystem

The open-source ecosystem is experiencing a significant transformation driven by AI. AI-powered tools are not only lowering the barriers to contribution, making it easier for developers to participate and contribute, but they are also fundamentally changing the competitive landscape.

Specifically, AI fuels the creation of more robust, feature-rich, and well-maintained open-source software, making these projects even more viable alternatives to commercial offerings. Businesses, especially those sensitive to cost, will have more compelling free options to consider. This acceleration is leading to faster feature parity, where AI could enable open-source projects to rapidly catch up to or even surpass the feature sets of commercial software in certain domains, further reducing the perceived value proposition of paid solutions.

Moreover, the ability for companies to customize open-source software using AI tools could eliminate the need for costly customization services offered by commercial vendors, potentially resulting in customization at zero cost. The agility and flexibility of open-source development, aided by AI, enable quick innovation and experimentation, allowing companies to try new features and technologies more quickly and potentially reducing their reliance on proprietary software that might not be able to keep pace.

AI tools can also help expose open-source components as discoverable services, making them even more accessible and reusable. This can further accelerate the development and adoption of open-source software, as companies can easily integrate these services into their own applications.

Furthermore, the vibrant and collaborative nature of open-source communities, combined with AI tools, provides companies with access to a vast pool of expertise and support at no additional cost. This is accelerating the development cycle, improving code quality, and fostering an even more collaborative and innovative environment. As open-source projects become more mature and feature-rich, they present an increasingly compelling alternative to commercial software, further fueling the shift away from traditional proprietary solutions.

6. The Changing "Build Versus Buy" Calculus

Ultimately, the rise of AI in software development is driving a fundamental shift in the "build versus buy" calculus. The rise of composable architectures means that 'building' now often entails assembling and orchestrating existing services, rather than developing everything from scratch. This dramatically lowers the barrier to entry and makes building tailored solutions even more cost-effective.

Companies are finding that building their own tailored solutions, often on cloud infrastructure, is becoming increasingly cost-effective and strategically advantageous. The ability for companies to customize open-source software using AI could eliminate the need for costly customization services offered by commercial vendors.

Innovation and experimentation in open-source, aided by AI, could further reduce reliance on proprietary software. Robotic Process Automation (RPA) bots can also be exposed as services via MCP, allowing companies to integrate automated tasks into their workflows more easily. This further enhances the 'build' option, as businesses can employ pre-built RPA services to automate repetitive processes.

7. Cloud vs. On-Premise: A Re-evaluation

The potential for AI-driven, easier on-premise app development could indeed have significant implications for the cloud versus on-premise landscape, potentially leading to a shift in reliance on big cloud applications like Salesforce.

There's potential for reduced reliance on big cloud apps. If AI tools drastically simplify and accelerate the development of custom on-premise applications, companies that previously opted for cloud solutions due to the complexity and cost of in-house development might reconsider. They could build tailored solutions that precisely meet their unique needs without the ongoing subscription costs and potential vendor lock-in associated with large cloud platforms.

Furthermore, for organizations with strict data sovereignty requirements, regulatory constraints, or internal policies favoring on-premise control, the ability to easily build and maintain their own applications could be a major advantage. They could retain complete control over their data and infrastructure, addressing concerns that might have pushed them towards cloud solutions despite these preferences.

While cloud platforms offer extensive customization, truly bespoke requirements or deep integration with legacy on-premise systems can sometimes be challenging or costly to achieve. AI-powered development could empower companies to build on-premise applications that seamlessly integrate with their existing infrastructure and are precisely tailored to their workflows.

Composable architectures can also make on-premise development more manageable. Instead of building large, monolithic applications, companies can assemble smaller, more manageable services. This can reduce the complexity of on-premise development and make it a more viable option.

Additionally, while the initial investment in on-premise infrastructure and development might still be significant, the elimination of recurring subscription fees for large cloud platforms could lead to lower total cost of ownership (TCO) over the long term, especially for organizations with stable and predictable needs.

Finally, some organizations have security concerns related to storing sensitive data in the cloud, even with robust security measures in place. The ability to develop and host applications on their own infrastructure might offer a greater sense of control and potentially address these concerns, even if the actual security posture depends heavily on their internal capabilities.

However, several factors might limit the shift away from big cloud apps:

The "As-a-Service" Value Proposition

Cloud platforms like Salesforce offer more than just the application itself. They provide a comprehensive suite of services, including infrastructure management, scalability, security updates, platform maintenance, and often a rich ecosystem of integrations and third-party apps. Building and maintaining all of this in-house, even with AI assistance, could still be a significant undertaking.

Moreover, major cloud vendors invest heavily in research and development, constantly adding new features and capabilities, often leveraging cutting-edge AI themselves. This pace of innovation in the cloud might be difficult for on-premise development, even with AI tools, to keep pace with.

Cloud platforms are inherently designed for scalability and elasticity, allowing businesses to easily adjust resources based on demand. Replicating this level of flexibility on-premise can be complex and expensive. Many companies prefer to focus on their core business activities rather than managing IT infrastructure and application development, even if AI makes it easier; the "as-a-service" model offloads this burden.

Large cloud platforms often have vibrant ecosystems of developers, partners, and a wealth of documentation and community support. Building an equivalent internal ecosystem for on-premise development could be challenging. Some advanced features, particularly those leveraging large-scale data analytics and AI capabilities offered by the cloud providers themselves, might be difficult or impossible to replicate effectively on-premise.

Cloud providers might also shift towards offering more granular, composable services that can be easily integrated into various applications. This would allow companies to leverage the cloud's scalability and infrastructure while still maintaining flexibility and control over their applications.

Therefore, a more likely scenario might be the rise of hybrid approaches, where companies use AI to build custom on-premise applications for specific, sensitive, or highly customized needs, while still relying on cloud platforms for other functions like CRM, marketing automation, and general productivity tools.

While the advent of AI tools that simplify on-premise application development could certainly empower more companies to build their own solutions and potentially reduce their reliance on monolithic cloud applications like Salesforce, a complete exodus is unlikely. The value proposition of cloud platforms extends beyond just the software itself to encompass infrastructure management, scalability, innovation, and ecosystem.

Companies will likely weigh the benefits of greater control and customization offered by on-premise solutions against the convenience, scalability, and breadth of services provided by the cloud. We might see a more fragmented landscape where companies strategically choose the deployment model that best fits their specific needs and capabilities.

8. The AI-Driven Software Revolution: A Summary

The integration of advanced AI into software development is poised to trigger a profound shift, fundamentally altering how software is created, acquired, and utilized. This shift is characterized by:

1. Exponential Increase in Productivity and Innovation:

AI as a Force Multiplier: AI tools are drastically increasing developer productivity, potentially enabling the same output with half the staff or even leading to a fivefold increase in efficiency in the near term.

Cambrian Explosion of Software: This surge in productivity will likely lead to an explosion of new software products, features, and niche applications, democratizing software creation and lowering the barrier to entry.

Rapid Iteration and Personalization: Software will evolve at an unprecedented pace, with AI facilitating fast feature iteration, updates, and highly personalized user experiences. This will often involve complex LLM workflow orchestration to manage and coordinate the various AI-driven processes.

This impact will be felt across various types of software, from enterprise solutions to consumer apps, scientific tools, and embedded systems. The effectiveness of these AI tools relies heavily on the quality of their training data, and the ability to analyze user data will drive further innovation and personalization.

We must also consider the sustainability implications, including the energy consumption of AI models and the potential for AI-driven software to promote resource efficiency in other sectors. These changes are not static; they are part of a dynamic and rapidly evolving landscape. Tools like GitHub Copilot and AWS CodeWhisperer are already demonstrating the power of AI in modern development workflows.

2. Transformation of the Software Development Landscape:

Evolving Roles: The traditional role of a "coder" will diminish, with remaining developers focusing on AI prompt engineering, system architecture, including the design and management of complex LLM workflow orchestration, integration, service orchestration, MCP management, quality assurance, and ethical considerations.

This shift is particularly evident in the rise of vibe coding. More significantly, we're moving towards a one-to-many model where a single subject matter expert (SME) or senior engineer will manage and direct many LLM coding agents, each working on different parts of a project. This orchestration of AI agents will dramatically amplify the impact of senior engineers, allowing them to oversee and guide complex projects with unprecedented efficiency.

AI-Native Companies: New companies built around AI-driven development processes will emerge, potentially disrupting established software giants.

Democratization of Creation: Individuals in non-technical roles will become "citizen developers," creating and customizing software with AI assistance.

3. Broader Economic and Societal Impacts:

Automation Across Industries: The ease of creating custom software will accelerate automation in all sectors, leading to increased productivity but also potential job displacement.

Lower Software Costs: Development cost reductions will translate to lower software prices, making powerful tools more accessible.

New Business Models: New ways to monetize software will emerge, such as LLM features, data analytics, integration services, and specialized composable services offered via MCP.

Workforce Transformation: Educational institutions will need to adapt to train a workforce for skills like AI ethics, prompt engineering, and high-level system design.

Ethical and Security Concerns: Increased reliance on AI raises ethical concerns about bias, privacy, and security vulnerabilities. This includes the challenges of handling sensitive data when using AI tools.

4. Implications for Purchasing Software Today:

Short-Term vs. Long-Term: Businesses must balance immediate needs with the potential for cheaper and better AI-driven alternatives in the future.

Flexibility and Scalability: Prioritizing flexible, scalable, and cloud-based solutions is crucial.

Avoiding Lock-In: Companies should be cautious about long-term contracts and proprietary solutions that might become outdated quickly.

5. Google Firebase Studio as an Example:

AI-Powered Development: Firebase Studio's integration of Gemini and AI agents for prototyping, feature development, and code assistance exemplifies the trend towards AI-driven development environments.

Rapid Prototyping and Iteration: The ability to create functional prototypes from prompts and iterate quickly with AI support validates the potential for an explosion of new software offerings.

In essence, the AI-driven software revolution represents a fundamental shift in the "build versus buy" calculus, empowering businesses and individuals to create tailored solutions more efficiently and affordably. While challenges exist, the long-term trend points towards a more open, flexible, and dynamic software ecosystem. It's important to remember that AI is a tool that amplifies human capabilities, and human ingenuity will remain at the core of software innovation.

9. Conclusion: A More Open and Dynamic Software Ecosystem

In conclusion, the advancements in AI are ushering in an era of unprecedented change in software development. This transformation promises to democratize software creation, accelerate innovation, and empower businesses to build highly customized solutions. While challenges remain, the long-term trend suggests a move towards a more open, composable, flexible, and user-centric software ecosystem, increasingly driven by discoverable services. Furthermore, the pace of these changes is likely to accelerate, making adaptability and continuous learning crucial for both businesses and individuals.

January 31, 2025
in prompt
8 min read

2025 01 31 prompt guide

Background

While this post is mainly intended for users who want to write custom prompts for agentic interpretation of various trading indicators, fundamental and macroeconomic analyses, and other relevant inputs, the information contained within this post can be informative for other varieties of LLM prompting.

General Notes

I’ve found that there is a generally good ‘order’ to doing prompts.

General background information that needs to be included somewhere
Specific data and analysis unique to this specific instance of prediction
Reminders and anti-hallucination hints

The reasoning for this is that the most recent tokens in general have the most impact on the output when there is a conflict. So, we want things that are preventing hallucinations and critical operational information to be the most recent tokens, otherwise they could be outweighed by less relevant information that could trigger a hallucination or other response malformation.

For background information, a lot of it could be viewed as ‘reminders’. Most of this information should exist somewhere in the training data/saved weights. We just want to make sure that the critical stuff is at the ‘forefront of the mind’ so to speak. The other reason to add this information is to remove any ambiguity that might exist in the prompt. I’ll go more into specific examples later.

For specific data and analysis, try not to be too wordy/verbose. The last thing you want to have is the LLM clinging on to the linguistic flavor that you included. THESE MODELS ARE NOT PEOPLE and you would do very well to remember that. You can have some oddly formatted or abrupt sentences as long as all of the information included within is accurate and correctly formed. Words like “Please” and “Thank You” are just going to be extra tokens adding spacing between related tokens. Sometimes these agents can get a little lost and think that they are having a conversation with you. That is not what you want - all the information it outputs should be entirely mission focused. If your agent is saying things like “I’m sorry but” or “Certainly, I can” then you likely need to clamp down on your prompt. These outputs are being fed into other automated systems, which don’t need that information, and these kinds of outputs can even cause errors depending on the system consuming the data.

Additionally, it is important to remember that LLMs ARE NOT A FORM OF ARTIFICIAL GENERAL INTELLIGENCE. LLMs are extremely advanced token prediction models. There are actual benchmarks and tests for AGI that LLMs are not even close to meeting, and the actual experts in the field are unanimous in their agreement that LLMs are not AGI. No matter what you hear from a CEO or tech influencer, remember this information. Chain of thought or whatever new paradigm being pushed is not the model actually thinking. These models are not capable of actual thought and they are not sentient or sapient. Wherever you are trying to figure out if an LLM can solve a problem or why it is failing to solve a problem, you should ask yourself if a highly advanced token prediction system can actually solve the problem. If the answer is no, then the LLM is not going to be able to do it. This guide was written in 2025, and while there will likely be amazing advances in the future, I can say with confidence that LLMs are not anywhere close to breaking the boundaries of AGI. If this AGI boundary is actually broken, then this prompting guide will be wholly obsolete and none of the information contained within will be relevant. So, if you are referencing this guide to solve a problem then this section is still completely accurate. The entirety of this document is written with the understanding of this section as being an absolute, non-arguable axiom, and if it was not true then this document would be worthless. While some readers might view this section as strange/obvious/irrelevant, I can assure you that some people using this guide to write prompts absolutely require this information.

For the following specific examples and analysis, the prompt is formatted with an underline, while my analysis and suggestions are inserted between specific sections of the prompt. This is so that the information is better localized for your reading. If you want to know what the original prompt was, just read the underlined sections, skipping any text that is not underlined.

You are running an analysis on the output of an advanced deep learning model that is making a buy or don't buy decision for this stock.

We start out here with the basic background information that sets the stage for what the model is doing and why. In a standard human to human conversation this information makes sense to be the first part of the conversation. As such, it is likely that the training data for the model would follow this kind of linguistic flow. Furthermore, it is pretty unlikely that hallucinations or any other information included within this prompt would take the model off task here. Most of the hallucinations I see are misattributing data points between metrics or adding in additional information. If you end up having an issue with this (likely due to having an extremely long prompt) then a follow up reminder at the end is your best bet. I’d still recommend leaving in this sentence at the start in that event, due to the reasons described above, as well as the issue being severe enough to even warrant additional action.

The model gives a value between 0 and 1, with 0 being a low confidence for a buy and 1 being a high confidence for a buy. This model is shown to have a 60% accuracy on back test data. The most recent value from the model is the most important indicator, as older values are for times in the past. This historical change in the model's output is included for informative purposes, as an increasing value indicates that market conditions are increasingly favorable for a buy. When making your analysis, don't get confused by the min, max, and mean values provided. These are to be considered in comparison to the most recent value, which is the value you should be considering. In general, a value above a 0.5 is considered a buy signal and a value below 0.5 is considered to not be a buy signal.

Now we have our information on how to actually interpret the results of the model. My first critique of this section (which for the record I wrote) is that the 0 and 1 explanations are ambiguous and likely to cause issues. This observation is backed up by empirical evidence as well, as I’ve seen this model list values of 0.4 as “high confidence”. A better formulation might be:

“0 is the lowest confidence in being a good buy, and 1 is the highest confidence for a good buy. Values below 0.5 are evaluated by training and validation algorithms as not containing a buy signal, while values above 0.5 are considered to be buy signals. By this formulation, taking all values above 0.5 as being a buy gives a 60% accuracy on back test data.”

By doing this we are also placing the information about the 0.5 boundary closer to the information on how to evaluate the range between 0 and 1, which is likely to give us better results. Furthermore, as this is some of the most critical information on how to interpret the model, I would recommend moving it lower in the prompt, especially if the agent is making mistakes in analyzing if individual predictions are good buys or not good buys.

We also have in this section the disclaimer about the historical values that we include. This was originally included as the agent was thinking that some of those historical and aggregate values represented the actual prediction of the model. We included that data to allow the agent to look for outliers and trends. If we see a model sitting around 0.35 for a period of time, and then our most recent value is a 0.49, one might conclude that market conditions had drastically changed, and it might be a good time to buy even though we haven’t crossed the 0.5 threshold. On the other hand, if we’ve been oscillating between 0.48 and 0.52, a value of 0.49 is much less significant. All of this information would be lost if we simply provided the most recent value, as both situations would be showing a 0.49. You might also wonder why we don’t just build out a mathematical model that makes this decision for us. While this is often a good approach when dealing with LLM models, this situation is one filled with ambiguity and judgement calls. What if rather than a value of 0.35, it had been a value of 0.38/0.4/0.45? Where does the cutoff exist? One could use fuzzy logic or a similar approach, but one could also just leave this decision up to the agent. If something has an objective, mathematical, modelable, right/wrong answer, then you should use that mathematical model to give you an answer instead of the LLM agent. If something needs a judgement call and you cannot determine the right answer until after a prediction is made, then the LLM agent is more appropriate.

Analysis of ensemble_output: min=0.3541967868804931, max=0.5319923162460327, mean=0.40127307176589966. Regression coef=[-0.00171258], score=0.07836451998035432, direction=decreasing. Latest value=0.4099920988082886.

Now we get to the actual information for this specific prediction. Everything up to this point is going to be identical for every single prediction we make. The first thing of note is all those significant digits on all of the numeric values. For Llama 3.2, the string "min=0.3541967868804931" has 11 tokens, while “min=0.354” has 5 tokens. Are all of those extra digits actually meaningful for our prediction? While you might be asking yourself if this matters, consider that "min=0.3541967868804931, max=0.5319923162460327, mean=0.40127307176589966" and "Wow! we have a min value of 0.354, a max value of 0.532, and a mean (not angry mean tho) value of 0.401" both have 35 tokens using Llama 3.2. If you saw that second string, wouldn’t you go in and trim out all that useless extra text? None of those extra tokens provide any meaningful value to our analysis, and only serve to dilute and separate information. Meanwhile, "min of 0.354, max of 0.532, mean of 0.401" has 18 tokens, with a reduction of almost 50%.

What about “Regression coef=[-0.00171258]”? For just 1 more token, we could instead have "Regression coef of -0.002, indicating a downward trend". Which one of those strings contains more useful information? And with all those savings in token count, we could turn "score=0.07836451998035432" (with a token count of 12) into "and an r squared score of 0.078, indicating that this data is not very linear in nature" (with a token count of 21). What a difference in interpretability and information density 9 tokens can make!

That's all for now.

January 30, 2025
in deepseek
1 min read

2025 01 30 deepseek

Just a quick heads up. As early as Monday afternoon the Deepseek-r1 reasoning model was available to all of our agent technology licensees.

alt text

January 28, 2025
in LLM, deepseek
3 min read

2025 01 28 deepseek plausible

DeepSeek's Key Innovations: A Brief Analysis

1. FP8 Mixed Precision Training

What it does: Reduces memory and compute requirements by representing numbers with 8-bit precision instead of 16-bit (FP16) or 32-bit (FP32).
Impact: FP8 mixed precision training nearly doubles throughput on H800 GPUs for tensor operations like matrix multiplications, which are central to transformer workloads. The Hopper architecture’s Tensor Cores are designed for FP8 precision, making it highly effective for large-scale deep learning tasks that require both computational efficiency and high throughput.
Estimated Gain: ~1.8x performance boost, critical for achieving high token throughput.

2. MoE Architecture

What it does: Activates only 37B parameters per token out of 671B, significantly reducing the compute cost for forward and backward passes.
Impact: Sparse activation significantly reduces computational overhead without compromising representational power.
Estimated Gain: Estimated Gain: 5–10x improvement in compute efficiency compared to dense architectures.

3. Auxiliary-Loss-Free Load Balancing

What it does: Eliminates the auxiliary-loss typically used to balance expert activation in MoE, reducing inefficiencies and avoiding performance degradation.
Impact: Improves token processing efficiency without wasting GPU cycles on balancing overhead.
Estimated Gain: ~5–10% boost in efficiency, depending on the prior impact of auxiliary losses.

4 Multi-Token Prediction (MTP):

What it does: Predicts two tokens per forward pass instead of one, reducing the number of forward passes required for training and decoding. The speculative decoding framework validates the second token, with an acceptance rate of 85–90%.
Impact:
- Fewer forward passes accelerate training, improving throughput.
- The high acceptance rate ensures minimal overhead from corrections.
Estimated Gain: ~1.8x improvement in token processing efficiency, depending on model configuration and workload.

5. Communication-Compute Overlap

What it does: Optimizes distributed training by overlapping inter-GPU communication with computation, addressing a common bottleneck in large-scale MoE systems.
Impact: Removes inefficiencies that typically reduce utilization in cross-node setups.
Estimated Gain: Allows near-100% utilization of GPU capacity during training.

Hardware Considerations

DeepSeek trained its model on a cluster of 2,048 H800 GPUs, leveraging Nvidia's Hopper architecture. These GPUs are designed to excel at tasks like matrix multiplications and sparse attention, particularly when using FP8 mixed precision. While the H800 has lower interconnect bandwidth compared to the H100 due to export regulations, its computational efficiency remains strong for the kinds of workloads needed in large-scale AI training.

Token Throughput Calculation

Using their stated figures, let’s verify whether the throughput aligns with the claimed GPU-hour budget.

Throughput per GPU-Hour:
- Tokens Processed: 14.8 trillion tokens.
- GPU-Hours Used: 2.664 million.
- Tokens per GPU-Hour:

$\frac{14.8 \, \text{trillion tokens}}{2.664 \, \text{million GPU-hours}} = 5.56 \, \text{million tokens per GPU-hour}.$

Cluster Throughput:
- GPUs Used: 2,048.
- Tokens Processed per Hour:

$2,048 \, \text{GPUs} \times 5.56 \, \text{million tokens per GPU-hour} = 11.38 \, \text{billion tokens per hour}.$

Time to Process 14.8T Tokens:
- Total Time:

$\frac{14.8 \, \text{trillion tokens}}{11.38 \, \text{billion tokens per hour}} = 1,300 \, \text{hours (or ~54 days)}.$

This aligns with their claim of completing pretraining in less than two months.

Conclusion

DeepSeek-V3’s claim of processing 14.8T tokens in 2.664M GPU-hours is plausible. The numbers are internally consistent, and the described techniques align with established principles of efficient large-scale training. While reproduction by other labs will provide final confirmation, the absence of red flags suggests that DeepSeek's reported achievements are feasible.

For more details on DeepSeek-V3 and its training methodology, refer to the technical report on arXiv.

Qualifications: Through my work on the LIT platform, I’ve developed tools that enable data scientists to efficiently design, train, and deploy deep learning models, including advanced workflows for LLMs. Prior to that, I spent 8 years providing professional services in deep learning, building custom AI solutions across diverse industries. In support of that work I’ve read and analyzed hundreds of technical reports and academic papers. My expertise lies in building tooling, pipelines, and integrations for both predictive and generative AI, supported by a strong foundation in deep learning and software engineering.

January 10, 2025
in AI
25 min read

What is AI? A human-written breakdown of AI.

"AI" is a neat concept, and it is a concept that grows more exciting and complex every single day. Literally, if you work with AI, and fell asleep for a week, you would end up "behind the curve". This document is meant to be a gentle introduction to what AI is, how it works, and if you're interested-- how to get involved. Also, this document is 100% human written and sourced. I personally have reviewed the source material myself to make sure that it is APPLICABLE and ACCURATE.

The reason why I do this, and why it is important to have a human-written digest of what AI is, is because when you source information from an AI, there is a chance that you can get back information that doesn't apply, isn't accurate, or in some cases, AI can create entire source material that doesn't even exist. (And we'll talk about hallucinations later, or if it really interests you, click that link to go there now.)

But come back! There's lots of important material here that will give you an idea of how AI can behave and what you can do to avoid the pitfalls. Understanding AI helps to frame their behavior so that you know what to expect.

The Basics: How Did the Idea of AI evolve?

It's really hard to pinpoint the exact genesis of the idea of AI, but it has been around roughly as long as computing. The idea of directing a computer to think like a human, or internalize a strict set of rules and behave by those rules has probably been around since the "early 1900s" but Alan Turing theorized that it was possible to create artificial intelligence in about 1935. The idea really took off from there and the idea (and industry) began to really blossom in the 1950s with the release of his paper "Computing Machinery and Intelligence"-- in this paper, Turing unveiled the idea of the "Turing Test". It was at this point that people really began to think it was possible to make a machine that not only exhibited intelligence, but was able to trick a human interactor more than 50% of the time. As you probably have already imagined, this idea, born in 1950, still reverberates until this very day in AI, and is something data scientists already consider when creating or training an AI.

The crucially limiting problem was that machines of the time never stored any information. They would read in giant stacks of cards with the programming and data on it and perform operations as the data was moving through the computer. How can you create an AI that can modify it's own behavior or be trained if it cannot "remember" what has happened in the past? Second to that was the immense expense of running a computer able to perform the most basic of operations-- on the order of $200,000 per month in 50s bucks. That's a staggering expense, but data scientists of the time were not dissuaded. The industry of computing was moving so fast that a solution was guaranteed to make itself seen.

In 1955, The RAND Corporation introduced a program called "Logic Theorist". It was designed to closely mimic the behavior of a human by proving mathematical theorems better than what humans could do, and faster. It was able to solve 38 out of 52 formulas from a "Principia Mathematica" written by part of the team that worked on Logic Theorist. It is largely considered the first actual application of Artificial Intelligence on a real computer at a real scale. It introduced the ideas of "Heuristic Programming", AI, and "Information Processing Language". If you want to learn all there is to know about "Logic Theorist", tap on this link to leave the site and see the most complete archive of this work: Logic Theorist

This is a lot.

We're just getting started and 3 new fields of study have been created and a program written to demonstrate that AI can be accomplished. This is probably going to be a "living document", meaning that it will grow and change over time. There's just so much to write, and so little time to educate everyone. But look... buckle up and we'll keep it as low-key as we can because AI is likely the most complex and growing field of study that humans now have. There are so many threads to follow and so many things to try-- but as you'll see later, we're very held back by the amount of computing power that an individual or even a corporation can get their hands on! Today, in late 2024, if you wanted to get properly started in the field of AI, the investment can slide in at $200,000.... what it used to cost for a month of computing time in 1950.

There are amazing things that committed data scientist can do to assemble a workable system for a lot less, but as you can imagine, the tradeoff is computing speed and response time of a particular model.

Let's Go!

So the late 50's through the mid 70's, work on AI was growing at an amazing pace because computers finally had the option to store data (starting in the late 50's) which was integral to AI growing. Being able to store results and react to a changing environment was obviously a game-changer. Progress now was never going to stop. The gentlemen that created Logic Theorist, not to be stopped, released "General Problem Solver" and Joseph Weizenbaum released "ELIZA" then Kenneth Colby released "PARRY" which was described by the scientist as being "ELIZA, but with attitude". That was in 1972. Then "DARPA" got involved and decided to start funding the future of AI... but in the mid 70s, data scientists were already starting to figure out that in many cases, the overwhelming need for computing power was always going to be a sticking point-- computers simply didn't have the storage or computing power to digest the mountains of data. DARPA began to realize that they were putting a lot of money into a science that still needed technology to keep up, and the funding dwindled as we arrive in the 80's.

Scientists in the 80s were able to enhance the algorithmic toolkit they used to try to mimic human intelligence, and as it was optimized and began running faster on the newer emerging hardware. Corporations and government entities again turned on the funding hose and work on AI accelerated again. Edward Feigenbaum introduced the application "expert systems" which leveraged computational power to mirror the decision-making processes of a human.

It was astounding, as expert systems would directly ingest answers from an expert on a topic and could then help other users find those accurate answers. Non-experts could be quickly educated by the AI and the answers of a topic specialist can be quickly given to many non-experts in a 1 to Many style.

Even the Japanese government was involved with expert systems and the Japanese then deeply funded projects like expert systems to the order of $400 million 1980s bucks on their "Fifth Generation Computer Project" colloquially known as FGCP. This project was funded from 1982 to 1990.

The moon shot goals of the FGCP weren't met, but the result was more scientists in the industry-- nevertheless, AI was no longer "in", but the science wasn't finished. AI thrived in the 90s and 2000's despite the lack of significant funding simply due to the dedicated scientists who really believed they could make AI a reality.

For example, Gary Kasparov, a world-renowned chess master, played a series of games against IBM's Deep Blue, an AI model designed to ONLY play chess. Gary lost, and AI regained the focus of the people who wrote the checks.

"Dragon Simply Speaking" was a voice-to-text software that could be used to live dictate notes or whole pages or chapters of a book. Simply Speaking used a primitive form of AI to determine the likely word you are trying to say-- which dramatically increased its efficiency when working with persons with speed impediments, and this software is beloved to this day.

Cynthia Breazeal introduced "Kismet", a robot that used AI to understand and simulate human emotions. Even in the 1990s toys where the target of AI. I used to own a great toy called "20Q". It was a small ball that could fit in the palm of your hand. It had buttons for YES and NO and SELECT. The point of the game is.... 20 questions. The AI in the game was tasked with asking you 20 questions to determine an item you were thinking about... and it was EERILY accurate. I had games last 3 questions.... and also games that last 25. If the AI can't figure it out in 20, it would kindly assist itself by adding on more questions to figure out your word.

Alpha Go was developed by google to play a far more challenging game against a world-champion Go player. The game of Go is outside the scope of this document, but if you want to appreciate how monumental it was that Alpha Go was able to beat a human player, go learn about Go and then come back and enjoy our content about Alpha Go.

Of course there's so much stuff in the middle that I missed and people I didn't recognize for their monumental contributions to the field of Artificial Intelligence.... but the great thing about this document is that you can always come back to focus on something you would like to know about or something you missed. I will do my best to include the most accurate data.

TODAY we live in an age where AI touches every part of the world around us. Telephone calls are made and answered by AI, AI that understands your frustration by the tone of your voice. AI that tries to resolve that frustration. You may not even need to talk to a human now with a properly trained and configured model to handle the calls. I once encountered a complex problem with my taxes and asked the AI what to do and it immediately recognized my problem and told me who to call and what to change to make my taxes legal-- and I have to say, I was impressed. A problem I thought might take hours and up to 10 contacts was resolved without a phone call. This is the power of AI.

Take a deep breath.

If you feel like this is moving really fast-- that's the field of AI. Have a break. Have some water and some toast. Then come back and I'll start teaching you some of the terminology and standard knowledge for working in AI. If there is something missing or incorrect, I always want to know. In most cases you won't even have to cite the source of my error. Just tell me where it is and I'll do the lifting for everyone after that.

Terminology - You gotta know how to speak the lingo

When data scientists talk, they will reference a lot of common things all the time. One of them you may have already encountered in my sprawling explanations here:

MODEL - A model is the programming, data, and analysis that makes an AI work and make decisions with little to no human intervention.

DATA SCIENCE - These are the methods and practices utilized to implement machine learning techniques and analytics applied by subject matter experts to provide model insights that lead to increased efficiency of a model.

MACHINE LEARNING - This is the practice of using the appropriate algorithms to "learn" from massive amounts of data. The AI is able to absorb this corpus and extract patterns from the data that may be of use to the user.

CORPUS - Corpus is the word that is commonly used to describe a "group" of data that is to be ingested to be used by the model. A corpus can contain any type of data as long as the model is able to recognize the data.

What are the types of AI and what do they do?

This is a great question. This is a good question because it recognizes that not all AI is created the same. Not all AI are capable of doing the same work. So what is there? Well, let me lead you through today's most popular types of AI.

There are two general categories that a system would fall under. The first is GOALS which is AI systems that are created and trained for the specific outcome (Goal). The second is TECHNIQUES-- a way of training or teaching a computer to respond to input as though they were human... to replicate human intelligence. It's pretty easy (usually) to categorize an AI under those two categories, but let's see how I do.

Computer Vision

This is a pretty exciting and interesting field of AI. In this type of AI, we train a neural network algorithm to generate or analyze images and image data. We want to help computers learn to "see" objects, even in strange circumstances. We want computers to recognize objects in images and tell us what they are. This is a precursor to real-time vision processing like what an AI robot would need to navigate our complex world.

Generative Adversarial Networks

This is one of my favorite types of AI. You actually end up training 2 AI and telling them "when you disagree, you need to argue with the other AI until both of you agree on an answer. It's basically small-scale computer warfare. So one of the Neural Networks is a generator and the other is the discriminator. If you say "Give me a photo of a gopher in a crossing guard uniform" The generator is going to generate the photos of the gopher and present it to the discriminator to say "does this look good to you?" if not, the discriminator directs the generator to "Do it again. Better this time." These networks are commonly used by large scale image generation sites / software like Photoshop's AI tool and the software "Stable Diffusion" that you can run yourself!

Machine Learning

With machine learning, we can use algorithms that are capable of ingesting large amounts of data at once and perform tasks like text or image generation, classification (of many types), and prediction. If you are involved in Machine Learning (ML) you can do a couple types of learning. Unsupervised and Supervised. For supervised learning, you ALSO have to basically ingest that same amount of data because in Supervised ML, you must repeatedly tell the AI what the correct classification of items is until the AI finally learns how to do it alone.

Natural Language Processing

In this type of AI, we used neural network algorithms to look at text data. We can feed massive "corpus" of text to AI that can read and understand the context of the documents we send. NLP is used by most of the AI you have used in the last few years: Microsoft Copilot, ChatGPT, etc. However, using these examples often combine the NLP and Computer Vision models in combination with each other.

Neural Networks

This AI is trained where the containers to process information are set up to mimic the human brain. And while Neural Networks are the spark that starts up many AI methods, they require exquisitely large amounts of information and computing power and resources and therefore aren't recommended for projects that can be accomplished more easily with a different type of AI.

Deep Learning

Deep learning utilizes several different philosophies and the term actually itself covers a lot of territory, looping in other disciplines. It's machine learning that is done by neural networks. Your models can be either 'shallow' or 'deep' and can contain 1 to MANY layers. A model with more layers would be deeper and thus require more computational effort.

Reinforcement Learning

Reinforcement Learning uses a system of penalties and rewards in order to train the system. For example, let's say we have an oval track and the objective is for the car to drive itself around the track as fast as possible. We give the car an accelerator and a brake and a steering system as well as a rudimentary gear system containing Park, Reverse, and Drive. Now once we have set this all up we tell the trainer "Hey you need to keep that car on the center line and go around the track as quickly as possible. If you deviate from the center line, you will lose 40 points per meter you have deviated from the center line. However, keep the car on the center line (the center line is somewhere underneath the car as it drives) and we'll give you 100 points per meter. Now you turn this AI loose to train itself. If you watch, the cars start by acting just insane... reverse at full throttle and then push it into park, running into walls and gardens... but over time, the AI learns the juicy secret to keeping those points... drive forward, drive fast, keep the car on the line. Soon you will notice cars almost in perfect synchronicity as they move speedily around the track. That's reinforcement learning. Especially in the field of self-driving cars, the reinforcement learning very often will reach out to humans to obtain the correct answer. This human feedback is simply called "Reinforcement Learning Human Feedback" or RLHF.

Artificial General Intelligence

This is the holy grail of data scientists everywhere. If a model and AI are able to reason, think, perceive, and react-- it is then commonly known as AGI. Data scientists are working towards a goal where AGI allows an AI to reason well enough to create it's own solutions to problems from the available data. This should all be done without human intervention. AGI is also a very hot topic because there are people out there that believe that once we achieve AGI, it will not be long before the AI rules or destroys us. There are people who believe that AGI is already real and being contained by OpenAI, the company that offers ChatGPT, the most advanced AI available to the public. You can easily imagine this to be the case when you get eerily accurate responses from ChatGPT, almost like you're talking to a friend that's really ravenous about getting information for you!

History

Alan Turing

Alan Turing, the Father of AI, was an English Mathematician and computer scientist. His work on the Enigma machine that broke the German naval codes is credited with shortening World War II by several years. It took him 6 months to deliver this amazing feat.

After the war, Turing began working at the National Physics Laboratory where he designed and built the Automatic Computing Engine in 1948, and it is credited as one of the first designs for a stored-program computer, a computer where you can feed it a string of instructions at once (like a program) instead of one instruction at a time which was not stored for use in a later program or instruction.

Later, Turing had a problem because the Official Secrets Act forbade him from talking about the Automatic Computing Engine, or even explaining the basis of his analysis about how the machine might work, which resulted in delays staring the ACE project and so in 1947 he took a sabbatical year-- a year which resulted in the work "Intelligent Machinery", which was seminal, but not published in his lifetime. Also during his sabbatical, the Pilot ACE was being built in absentia, and executed its first program on May 10, 1950. The full version of the ACE was not built until after Turing's death by suicide in 1954.

In 1951, Turing began work in mathematical biology-- work that Marvin Minsky was involved in. Turing published what many call his "masterpiece", "The Chemical Basis of Morphogenesis" and January 1952. He was interested in how patterns and shapes developed in biological organisms. His work led to complex calculations that had to be solved by hand because of the lack of powerful computers at the time which could have quickly handled his work.

Even though "The Chemical Basis of Morphogenesis" was published before the structure of DNA was fully understood, Turing's morphogenesis work is still to this day considered his seminal work in mathematical biology. His understanding of morphogenesis has been relevant all the way to a 2023 study about the growth of chia seeds.

It was Turing's 1950 paper asking if it was possible for a machine to think, and the development of a test to answer that question, that solidifies Turing's spot in the field of artificial intelligence.

The Turing Test

The phrase "Turing Test" is more broadly used when referring to certain kinds of behavioral tests designed by humans to test for presence of mind, thought, or simple intelligence. Philosophically, this idea goes, in part, back to Descartes' Discourse on the Method, back further to even the 1669 writings of the Cartesian de Cordemoy. There is evidence that Turing had already read on Descartes' language test when he wrote the paper that changed the trajectory of mechanical thinking in 1950 with his paper "Computing Machinery and Intelligence" which introduces to us the idea that a machine may be able to exhibit some intelligent behavior that is equivalent or indistinguishable from a human participant..

We certainly can't talk about AI without talking about the Turing Test, a proposal made by Alan Turing in 1950 that was a way to deal with the question "Can machines think?" And even Turing, the father of the Turing test thought the pursuit of the question "too meaningless" to deserve discussion.

However, Turing considered a related position concerning whether a machine could do well at what he called the "Imitation Game". Then from Turing's perspective, we have a philosophical question worth considering.

We could write whole chapters just on Turing himself and the philosophy of the test. There are loads of published works that go far beyond what I could discuss here, but a simple search will bring a vast inventory for you to observe.

What is the Turing Test?

The test is a game, and the game is a machine, a person, and an interrogator. An interrogator will ask questions to the person and the machine. For a machine to "pass" it must imitate people in such a way that 70 percent of the time, with 5 minutes of questioning, an interrogator will fail to identify that they are talking to a machine. That's the simple explanation. It gets far more extreme and intense than that. Turing was a genius and thought of a lot of things that laypersons simply would not.

Here's another question: The Turing Test is essentially a chatbot trained to respond in a certain way to the questions and statements we pose to it. There are many kinds of chatbots created to pass the test. The question: Are these machines thinking or are they just really good at assembling responses to our inputs?

By the end of the 20th century, machines were still by and large were far below the standards Turing imagined. Humans are complex and we have complex challenge-response language that often requires real knowledge and machines often just couldn't cut the mustard.

A barrage of objections to Turing's theories were lobbed and Turing's discussions of the objections were complete and thoughtful. It is far beyond the context and scope of this document to talk about all of the contributions that Turing made to the field of AI through the careful handling of these objections.

You can look up the objections and answers using the information below:

The 'Theological' Objection
The 'Heads in the Sand' Objection
The 'Mathematical' Objection
The Argument from Consciousness
Arguments from Various Disabilities
Lady Lovelace's Objection
Argument from Continuity of the Nervous System
Argument from Informality of Behavior
Argument from Extra-Sensory Perception

These are amazing-- and to me particularly, the Arguments from Various Disabilities is the most poignant. It is an argument about how a computer may never be able to assess or purposefully exhibit beauty, kindness, resourcefulness, friendliness, have its own ambition and initiative, have a true sense of humor, and more. It is one of the most solid objections to thinking machines. It is a philosophical conundrum to this very day.

Frank Rosenblatt

Frank Rosenblatt, the Father of AI, was an American psychologist who is primarily notable in the field of AI, and is sometimes called the "father of deep learning" as he was the pioneer in the field of artificial neural networks.

For his PhD thesis, Rosenblatt designed and built a custom computer, the Electronic Profile Analyzing Computer or the EPAC, whose design was to perform "multidimensional analysis" for psychometrics. Multidimensional analysis, in its simplest form is the computation of data in two or more categories. Race speeds of drag cars over the right and left lanes over multiple years of races would be data that could use multidimensional analysis. It is possible to have datasets that extend into higher dimensions, which increases the computational complexity.

Rosenblatt was likely most regarded for the Perceptron, which was a device built in 1957 that was built on biological principles and showed an ability to learn from its previous runs. The program ran on a computer that had an "eye" and when a triangle was held in front of the "eye", it would send the image along a random succession of lines to "response units", where the image of the triangle was registered in memory. Then entire process was simulated on an IBM 704 system.

The perceptron was used by the US National Photographic Interpretation Center to develop a useful algorithm that could ease the burden on human photo interpreters.

The Mark I Perceptron, running on the IBM 704, had 3 layers. One version of the Mark I was as follows:

An array of 400 photocells which were arranged in a grid, 20x20, which were named "sensory units", S-Units, or "input retina" Each S-unity can connect to up to 40 A-Units.
A hidden layer of 512 perceptrons which were called "association units" or "A-Units"
An output later of 8 perceptrons, which were called "response units" or "R-Units"

The S-Units are algorithmically and randomly assigned to an A-Unit with a plugboard, meant to eliminate any particular intentional bias in the perceptron. Connection weights are fixed and not learned. Rosenblatt designed the machine to closely imitate human visual perception.

The perceptron was held up by the Navy, who expected that soon the perceptron would be able to walk, talk, see, write and reproduce itself and also to perform the apex of AI, be conscious of its own existence. The CIA would use the Perceptron to recognize militarily interesting photographs for 4 years from 1960 to 1964. However, the device itself profed that perceptrons could not recognize many classes of patterns. This caused research in the area of neural networks to slow to a stagnate crawl for years until AI scientists discovered that feed-forward neural networks or multilayer perceptrons had greater power to recognize images than a single-layer approach.

To completely explain the perceptron would require a PhD in mathematics, but the idea of the perceptron unlocked weighted products, bias, multiple inputs, and the idea of the artificial neuron. Perceptrons, as an idea, have expanded into the core of AI, far further than Rosenblatt could have ever imagined, and the field of the perceptron is awash in mathematics in the modern era.

John McCarthy

John McCarthy, the Father of AI, was a computer scientist and cognitive scientist. He is regarded as one of the founders of the discipline we call artificial intelligence.

John McCarthy was the co-author of a document that coined the term "artificial intelligence". He was the developer of the computer programming language LISP.

He popularized computer time-sharing, a system where many programs can run at once why sharing slices of time between each program. This also allowed multi-user environments where scientists, students, or enthusiasts could run programs and experiments without scheduling time with the system operator.

John McCarthy invented "garbage collection" which is a system by which a program will determine data that is no longer needed for operations and can be cleared from memory. If a large chunk of memory is allocated to the program, and it is no longer needed, the memory can be freed by a garbage collection routine, freeing programmers and operators from the nasty task of manual memory management.

John McCarthy is one of the "founding fathers" of AI, but he is in a group of rare company with Marvin Minsky, Allen Newell, Herbert Simon, and Alan Turing. The coining of the term "artificial intelligence" was in a proposal that was written by McCarthy, Minsky, Nathaniel Rochester, and Claude Shannon for Dartmouth conference in 1956, where AI was started as an actual field in computing.

In 1958 McCarthy proposed the "Advice Taker" which was a hypothetical computer program devised by McCarthy that would use logic to represent the information in a computer and not just as subject matter from another program. This paper may have also been the very first to propose common sense reasoning ability as the key to AI. This proposal is still being evaluated today.

Later work inspired by advice taker was work on question answering and logic programming, but time sharing systems are the most illuminated example of a legacy because every computer in use today uses some sort of time-sharing system to run all of the programs that we have running at once. Imagine if we could only run ONE tab in a browser... only the browser and no music running at the same time. We could only get our notifications for all of our social media if we stop what we are doing and run the social media app to give it all of the computer time. Computing would be a nightmare!

In 1966, at Stanford, McCarthy and his team wrote a program that was used to play a few chess games with counterparts in what was then known as the Soviet Union. The program lost two games and drew two games.

In 1979, McCarthy wrote an article to Usenet called "Ascribing Mental Qualities to Machines", where he wrote "Machines as simple as thermostats can be said to have beliefs, and having beliefs seems to be a characteristic of most machines capable of problem-solving performance."

In 1980 John Searle responded to McCarthy saying that machines cannot have beliefs because they are not conscious, and that machines lack "intentionality", which is the mental ability to refer to or represent something-- the ability of one's mind to create representations of something that may or may not be complete. It is a philosophical concept applied to machines.

Marvin Minsky

Minsky, the Father of AI, is credited with helping to create today's vision of Artificial Intelligence. Following Minsky's Navy service from 1944 to 1945, he enrolled in Harvard University in 1946, where he was free to explore his intellectual interests to their fullest and in that vein, he completed research in physics, neurophysiology, and psychology. He graduated with honors in mathematics in 1950. He was truly a busy guy with his finger in a lot of pies!

Not content, he enrolled in Princeton University in 1951 and while there he built the world's first neural network simulator. After earning his doctorate in mathematics at Princeton, Minsky returned to Harvard in 1954. In 1955, Minsky invented the confocal scanning microscope.

Marvin had fire, and in 1957 Minsky moved to the Massachusetts Institute of Technology in order to pursue his interest in modeling and understanding human thought using machines.

Minsky and others at MIT who were interested in AI such as John McCarthy, and MID professor of Electrical Engineering, and the creator and developer of the LISP programming language. McCarthy contributed to the development of time-sharing on computers, a method where multiple programs were given small slices of time very quickly to accomplish their tasks. This made it appear that the computer was doing several things at once. This allowed multiple users to connect to one system to get work done without having to schedule the time manually to users.

In 1959 Minsky and McCarthy joined forces and cofounded the Artificial Intelligence Project. It quickly became ground zero for research in the nascent field of Artificial Intelligence. Soon the Artificial Intelligence Project was renamed to the MIT Computer Sciences and Artificial Intelligence Laboratory. Catchy, eh? Those in the know called it CSAIL, which was a lot easier to pronounce and write.

Minsky finally found a home at MIT and stayed there for the rest of his career.

Minsky had a definition of AI, "the science of making machines do things that would require intelligence if done by men.", but AI researchers found it hard to catch that lightning in a bottle, finding it extraordinarily difficult to capture the essence of the entire world in the syntax of computers of the day. Even the most powerful computers in the world, and the most powerful languages to run them.

In 1975, Minsky came up with the concept of "frames" to capture the precise information that must be programmed into a computer before offering more specific direction. For example, to capture our world, a computer must understand the concept of doors, that doors may be locked. They may swing only in one direction, or both. Doors may slide, either direction or up or even down. A door may or may not have a knob that may turn one direction, the other, or both. So in a frame, doors are described in a way that an artificial neural network may understand. Now, we should be able to tell an AI how to navigate a simple set of connected rooms.

Minsky expanded this view when he wrote "The Society of the Mind" in 1985. He proposed that the mind was composed of many individual agents performing basic functions such as telling the body when it is hungry, comparing two boxes of macaroni at the store for weight, nutrition, and price. The criticism, however is that the "Society of the Mind" is not useful to AI researchers and is useful only for the enlightenment of the AI laypersons.

Minsky wrote other books, all the way until 2006 which all contained theories regarding higher-level emotions.

John Von Neumann

To avoid typing the name over and over again, I'm going to use JVN to signify John Von Neumann, the Father of AI, because I'm only human and the repetition of his name may drive me mad.

JVN was the person who pioneered many of the foundations of what makes a modern computer such as the idea of RAM (which JVN posited could be the abstraction of the idea of the human brain) and what later became long term storage or long-term memory with what later became the hard drive.

The first hard drive I ever worked with was a 5MB hard drive in a box that was 1 meter cubed and rattled like a bucket with a bunch of bolts in it, but it drove production in our hospital and was important to our organization.

JVN wasn't done with just RAM and hard drives, he wanted to make sure that his name was cemented in computing history by also drafted the theoretical model for what is now known as a CPU in his 1945 paper "First Draft of a Report on the EDVAC."

Alan Turing studied under JVN at Princeton, so it was no wonder that Turing was excited to continue JVNs work.

JVN came up with the idea of a 'universal constructor', a self-replicating machine whose job it would be to construct other machines, which was based on JVN's work in the theories of cellular automata, developed by JVN in the early 1940's.

Cellular automata are models (in the mathematical sense) that are designed to simulate behaviors in complex systems. They do this by breaking down the systems into simple components that are discreet and easy to predict. Of course this goes even deeper. These men were not clowns.

JVN was a Hungarian-born American and a mathematician, physicist, computer scientist, and "father of AI" to some. He made huge contributions to the fields of set theory and the emerging game theory along with development of a method for solving the linear equations that are now known as the "QR algorithm", which is still used today in numerical analysis.

JVN did work on human memory, theorizing an explanation on how our human brains can store or retrieve information, and according to his theories, memories are stored in the neurons of the brain as patterns of electricity which can be stored and retrieved with a mathematical algorithm.

For those who work in or have studied AI, you probably know where I'm going here, but if you don't... keep reading.

JVN proposed the "learning machine" which, as the name described, is a machine designed to improve over time by learning from various inputs including human intervention.

JVN's contribution to mathematics and computing are more staggering than I can possibly give him credit for.

As time went on JVN developed the Technological Singularity Hypothesis which describes a process by which ever-accelerating technology reaches a point of no return which changes the mode of human life to one where there is little difference between man and machine.

The theory of cellular automata influences AI to this day and is the most common approach to the research of self-replicating and self-teaching machines.

January 8, 2025
in AI, LLM, RAG
5 min read

The Limits of RAG: Why It Fails in Unconstrained AI Applications

Introduction

RAG (Retrieval Augmented Generation) has gained popularity as a technique to enhance LLMs by retrieving information from external sources. However, this approach has significant limitations. This article argues that RAG, as it is currently conceived and applied, is fundamentally flawed for open-ended, unconstrained problems. While it may have niche applications in highly controlled environments, its inherent limitations make it unsuitable for the majority of real-world AI use cases. In many cases, RAG is inappropriately used when an agent-based approach would be more suitable. Model Context Protocol (MCP) offers a more promising way forward.

The Limitations of RAG

The core flaw of RAG goes beyond the "garbage in, garbage out" problem. The unconstrained nature of user input, especially in conversational interfaces, creates a fundamental challenge for retrieval systems. Even with vector search, which aims to capture semantic similarity, RAG struggles with nuanced queries and often disregards crucial metadata, leading to inaccurate or irrelevant results. The chat interface inherently encourages open-ended queries, creating an unbounded input space. Retrieval systems, even with adaptive learning, rely on the assumption that the space of possible queries is finite and predictable. When that assumption breaks, so does the system.

To understand RAG's limitations, it's helpful to categorize common failure scenarios:

Informational Retrieval Failures

While RAG is designed for this, it still fails when the information is nuanced, requires synthesis from multiple sources, or involves complex relationships.

Example: A question requiring understanding of cause-and-effect across documents.

Aggregate Query Failures

RAG struggles with calculations and summaries over a dataset.

Example: "What is the total revenue from product X in Q3?"

Temporal Query Failures

RAG's inability to handle time-based queries and reasoning.

Example: "Show me all the commits that Bob made between March 13th and March 30th, 2020."

Logical Reasoning Failures

While LLMs can exhibit some semblance of logical reasoning, their reliability is questionable. RAG's reliance on retrieved context can further hinder this capability, introducing noise and irrelevant information that throws off the LLM's reasoning process. Given the LLM's inherent limitations in this area, depending on RAG for logical reasoning is a risky proposition.

Example: "If all birds can fly and a penguin is a bird, can a penguin fly?"

Counterfactual Query Failures

LLMs can attempt counterfactual reasoning, but this is a cutting-edge and imperfect capability. RAG adds another layer of complexity, as the retrieved context may or may not be relevant to the counterfactual scenario. The results are often speculative and unreliable.

Example: "What would have happened if World War II had not occurred?"

Multimodal Query Failures

Multimodal queries pose a significant challenge for RAG. Consider the query, "Which animal makes this sound?" where the user vocalizes a kitten's meow. While a human easily recognizes the sound, current RAG systems struggle to process non-textual input. Even if the sound is transcribed, nuances like tone and pitch, crucial for accurate retrieval, are often lost. This highlights RAG's fundamental limitation in handling information beyond text.

Example: "Describe this image."

Business Logic/Policy Failures

RAG systems often fail to adequately incorporate business logic and policies. For example, a chatbot might incorrectly authorize the multiple use of a single-use coupon, leading to financial repercussions. Similarly, a RAG system could provide medical advice that violates healthcare regulations, potentially endangering patients. This is further exacerbated by the fact that the performance of a RAG system in the medical domain can be greatly enhanced with a taxonomy and metadata (i.e., a raw RAG search through medical publications vs. also having a full taxonomy and metadata linking medicines with diseases). This highlights a counterintuitive truth: taxonomies, ontologies, and metadata are more valuable in the age of LLMs, even though LLMs might seem to drive down the cost of producing them.

Furthermore, a RAG application might disclose personally identifiable information due to inadequate data filtering, resulting in privacy violations and legal issues.

Example: A chatbot incorrectly authorizing the multiple use of a single-use coupon.

These examples demonstrate a common thread: RAG struggles when queries require more than just simple keyword matching or semantic similarity. It lacks the ability to effectively utilize structured knowledge, such as taxonomies, ontologies, and metadata, which are often essential for accurate and reliable information retrieval.

Introducing Model Context Protocol (MCP)

Model Context Protocol (MCP) offers a new approach to providing LLMs with the context they need to function effectively. Unlike RAG, which retrieves context at query time, MCP standardizes how models declare their context requirements upfront. This proactive approach has the potential to address many of the limitations of RAG.

MCP as a Solution

MCP offers a more robust and future-proof way to provide context to LLMs. Consider an MCP service wrapped around a traditional SQL database. An LLM agent system, instead of relying on RAG to retrieve potentially irrelevant text snippets, can use MCP to precisely query the database for the exact information it needs. This approach offers several advantages:

Constrained Input: By defining context needs upfront, MCP avoids the problem of unconstrained input. The LLM agent only queries for information that is known to be relevant and available.
Query-Retrieval Alignment: MCP ensures that the query is perfectly aligned with the retrieval mechanism (e.g., a SQL query retrieves structured data from a database). This eliminates the "garbage in, garbage out" problem that plagues RAG.
Structured Context: MCP facilitates the use of structured knowledge sources like databases, knowledge graphs, and semantic networks. This allows LLMs to access and utilize information in a more precise and compositional way, compared to retrieving large chunks of unstructured text.
Reduced Complexity: By providing a standardized protocol for context acquisition, MCP reduces the need for ad-hoc patching and refinement that is typical of RAG systems.

The Power of Structured Knowledge

MCP's ability to leverage taxonomies, ontologies, and metadata is key to its potential. In contrast to RAG, which often struggles to extract meaning from unstructured text, MCP enables LLMs to interact with structured knowledge in a way that is both efficient and reliable. This is particularly important for complex queries that require:

Precise Definitions: Taxonomies and ontologies provide clear and unambiguous definitions of concepts, ensuring that the LLM is operating on a solid foundation of knowledge.
Relationship Understanding: Structured knowledge captures the relationships between concepts, allowing LLMs to perform complex reasoning and inference.
Contextual Awareness: Metadata provides additional context about data points, enabling LLMs to filter and retrieve information with greater accuracy.

Conclusion: The Future of Context

RAG, as it is currently conceived and applied, is fundamentally flawed for open-ended, unconstrained problems. Its reliance on query-time retrieval makes it inherently susceptible to the challenges of unconstrained input, query-retrieval misalignment, and the need for constant patching. MCP offers a promising alternative. By shifting to a proactive approach that defines context needs upfront and leverages structured knowledge, MCP has the potential to provide LLMs with the precise and relevant information they need to function effectively.

Further research and development of MCP and similar protocols are crucial for building robust and reliable AI systems that can truly understand and interact with the world. The future of LLMs and AI depends on our ability to move beyond the limitations of RAG and embrace more structured and controlled ways of providing context.