The Right Tool for the Job: Getting the best out of AI for analytical tasks

Before we jump in, don’t be put off by the technical aspects of this: the overall concept is pretty straightforward and something you can put into practice yourself (I explain how to at the end).

Using the Right Tool for the Job

Despite their incredible capabilities, even the best LLM*-based AI suffers from a tendency to exaggerate or make up facts – AKA hallucinate. That’s not such an issue when you’re asking it to help tweak some text or brainstorm ideas, but it’s a big problem when you’re trying to use AI for analysis. (*Large language model.)

On the other hand, simply dumping a mountain of data into a report or sending someone a spreadsheet without any commentary creates a different set of problems.

However, when we take a combined approach, we can get some pretty impressive results as long as we make sure we’re using the right tool for the job.

Here’s an example of how I’ve combined these approaches in DCDR, most recently to create a series of quarterly country stability assessment reports. These reports require a combination of data – e.g. news reports – analysis, and text summarization.

I’ve combined these individual functions together into a single process, each focused on a single task. Here’s how these elements work together.

Example: The DCDR Assessment Chain

(Note that I’m just using DCDR as an example; you can apply the same techniques anytime you’re conducting analysis with AI.)

Step 1- gather the news (DATA)

The app gathers news headlines for each country multiple times per day via an API (an application programming interface, in this case, a way for me to access a news data feed). The parameters of the API can be adjusted to narrow the results down but this isn’t perfect so I need a way to deal with non-relevant stories.

Step 2 – check the news relevance (ANALYSIS)

Each story is passed to a specially trained instance of an LLM (currently an OpenAI model) which is asked to determine if it is ‘relevant’ or ‘not relevant’ with respect to country stability. This takes advantage of an LLM’s ability to detect sentiment, in this case, relevance to DCDR’s definition of country stability. This tags each story for relevance so irrelevant news can be ignored.

Step 3 – Extract Keywords and Summarize the News (SUMMARIZATION)

Next, DCDR uses an LLM’s straightforward text management abilities to extract a list of keywords from stories and write a summary of the news for the period in question.

At this point, you’ll note that we’ve used the three different techniques- data, analysis and summarization – to achieve different outcomes at different stages of the process. You’ll also see how inappropriate wrong tool could be. For example, trying to get the API to only give us news relevant to country stability would give us very mixed results, and asking an LLM for country news will provide unverifiable, possible incorrect, results.

Now you’ve got the gist of the approach, we’ll speed through the rest of the process, but here’s an overview of the whole sequence so you can see how the DCDR switches between functions.

Step 4 – Collect Background Country Information (DATA)

An API call collects factual background information on the country and stores this in a database.

Step 5 – Conduct a Stability Assessment (ASSESSMENT)

The stability assessment uses a very narrowly trained, fine-tuned version of an OpenAI model. This uses the country’s news feed and background data to ascribe a stability rating for the country.

Step 6 – Search for Upcoming Events (DATA)

Another API call searches the web for upcoming events in the country. This produces up-to-date but raw search results, so we need a way to clean these up.

Step 7 – Summarize Upcoming Events (SUMMARIZATION)

Another LLM reviews the raw search results, extracts relevant upcoming events, and summarizes these in a readable format. As with the other LLMs, this model has a different prompt and set of conditions to narrow its focus and improve consistency.

Step 8 – Conduct Forward Assessment (ASSESSMENT)

The final stage uses the specially trained country assessment model from step 5 to conduct a forward-looking stability assessment for the country. (There’s an example of an assessment at the very bottom of this post.)

Time-Consuming, But Time Well Spent

This is a time-consuming process that requires careful sequencing and a lot of computation. Cutting down the number of steps and hoping that a general LLM gives us a decent response would be significantly faster and cheaper from a processing standpoint.

However, chaining the different processes together like this, and using the right tool for the job, plays to the strengths of each component. This ensures that we get the desired results at each stage of the process and decreases the opportunity for error and hallucination.

This is an occasion when the juice is definitely worth the squeeze.

The Non-Technical Approach

At the start, I promised that this was something you could do without getting too deep into the technology, so what does that look like?

In short, you take the same approach, but instead of chaining together a series of tools, you move along the process manually.

For example, instead of an API call for the news, simply filter your browser search for ‘news’. Then, extract the data from those results and clean it up a little before dropping that into the LLM prompt. Next, adjust the LLM prompt to give you the results you want.

Getting the prompt right can take time but once you have one that works, save it for future use. (And, for a real meta-experience, ask your LLM to improve the prompt before you use it in anger.)

Here’s an example of one of the prompts DCDR uses.

The upcoming events data is about 1,500 words which is supplied to the request separately. You could do the same with a version of this prompt, followed by a cut and paste of the web search results.

So the overall concept is simple and remains the same: use the right tool for the job and deliberately pass the output from one stage of the process to the next, whether that’s in code or manually.

What do you think? Leave a Reply