This is a ‘living’ article which will be updated as progress is made in tests and other research emerges. The original articile was written on Jaunuary 13 2025 and major edits will be dated.
There are also demos of this process in action, the latest being the presentation to the Virtual AI Engineer Summit in NY in March 2025.
Updates to add
Utilize domain-specific heuristics (rules of thumb) for few-shot training
Include validation loops to review and assess results against heuristics, goals and constraints
Run scenarios on top of a worldsim to allow the model to draw general, real world knowledge into the deliberations
Challenges:
Domains that are highly System II dependent lack robust heuristics and rules of thumb
Strict rules could constrain creativity
Executive Summary
Large Language Models (LLMs) continue to improve but data deficits in specialized domains and the fundamental differences between predictive completion and expert reasoning mean that we are still a long way from creating specialist analytical agents for high-value, specialized tasks (eg negotiations, crisis management or intelligence analysis). Current LLM development remains founded on enhancing pattern recognition systems through better training which will not overcome this shortfall, nor will the introduction of chain of thought processes.
However, we have a human model for overcoming domain inexperience from management consultancy and other roles that require high-level problem-solving. Non subject-matter-experts (SME) can be provided with domain-specific heuristics or rules of thumb as guidance. These heuristics supplement their otherwise capable reasoning skills, allowing them to perform in unfamiliar conditions. We can take a similar approach with capable LLMs, providing them with SME-generated, domain-specific heuristics they can apply as necessary. Unlike few-shot training, the base model is not being trained on these rules: instead it recursively applies these guidelines as it iterates through the problem. It is noteworthy that this approach is not without difficulty as there are many domains that lack a clean set of higher-order guiding principles so developing the heuristics will be time-consuming and complex in its own right.
Utilizing the world simulation capabilities of some models further enhances performance as it allows the model to build its reasoning on a realistic view of the world, ensuring that suggestions are realistic and take into account relevant factors not specified in the problem. Initial test results using corporate negotiations have been promising and proved the concept was valid.
This Domain Speciality Paradox – where the most complex, high-value tasks are least supported by artificial intelligence (AI) – is likely to persist until new approaches to model architecture, training and application are discovered. However, while this gap persists, the application of domain-specific heuristics offers a straightforward solution to this problem, allowing us to apply these LLMs to high-value tasks with encouraging results.
The Domain Speciality Paradox
As a domain becomes more specialized, narrow and high-value, there appears to be an inverse relationship with the availability of suitable training data, creating a fundamental barrier to deploying general-purpose LLMs in these contexts. This paradox manifests in three critical ways:
Data Scarcity: Specialized domains inherently generate less documentable data
Data Privacy: High-value domains often have strict confidentiality requirements
Data Quality: The available data may not capture crucial decision-making processes
These shortfalls are characteristics of high-level analytical tasks such as complex business negotiations, crisis management, or intelligence forecasting: activities we’ll define here as narrow specialties. This is distinct from broad specialties like trading, medicine or coding, where there is an abundance of publicly available, well-structured data.
Therefore, this creates a fundamental challenge: the most valuable applications of AI in terms of expert analysis might be the least suited to current frontier LLM approaches.
Differentiation of Tasks
Underst
Carpe tomorrow!