prompt#

January 31, 2025
in prompt
8 min read

2025 01 31 prompt guide

Background

While this post is mainly intended for users who want to write custom prompts for agentic interpretation of various trading indicators, fundamental and macroeconomic analyses, and other relevant inputs, the information contained within this post can be informative for other varieties of LLM prompting.

General Notes

I’ve found that there is a generally good ‘order’ to doing prompts.

General background information that needs to be included somewhere
Specific data and analysis unique to this specific instance of prediction
Reminders and anti-hallucination hints

The reasoning for this is that the most recent tokens in general have the most impact on the output when there is a conflict. So, we want things that are preventing hallucinations and critical operational information to be the most recent tokens, otherwise they could be outweighed by less relevant information that could trigger a hallucination or other response malformation.

For background information, a lot of it could be viewed as ‘reminders’. Most of this information should exist somewhere in the training data/saved weights. We just want to make sure that the critical stuff is at the ‘forefront of the mind’ so to speak. The other reason to add this information is to remove any ambiguity that might exist in the prompt. I’ll go more into specific examples later.

For specific data and analysis, try not to be too wordy/verbose. The last thing you want to have is the LLM clinging on to the linguistic flavor that you included. THESE MODELS ARE NOT PEOPLE and you would do very well to remember that. You can have some oddly formatted or abrupt sentences as long as all of the information included within is accurate and correctly formed. Words like “Please” and “Thank You” are just going to be extra tokens adding spacing between related tokens. Sometimes these agents can get a little lost and think that they are having a conversation with you. That is not what you want - all the information it outputs should be entirely mission focused. If your agent is saying things like “I’m sorry but” or “Certainly, I can” then you likely need to clamp down on your prompt. These outputs are being fed into other automated systems, which don’t need that information, and these kinds of outputs can even cause errors depending on the system consuming the data.

Additionally, it is important to remember that LLMs ARE NOT A FORM OF ARTIFICIAL GENERAL INTELLIGENCE. LLMs are extremely advanced token prediction models. There are actual benchmarks and tests for AGI that LLMs are not even close to meeting, and the actual experts in the field are unanimous in their agreement that LLMs are not AGI. No matter what you hear from a CEO or tech influencer, remember this information. Chain of thought or whatever new paradigm being pushed is not the model actually thinking. These models are not capable of actual thought and they are not sentient or sapient. Wherever you are trying to figure out if an LLM can solve a problem or why it is failing to solve a problem, you should ask yourself if a highly advanced token prediction system can actually solve the problem. If the answer is no, then the LLM is not going to be able to do it. This guide was written in 2025, and while there will likely be amazing advances in the future, I can say with confidence that LLMs are not anywhere close to breaking the boundaries of AGI. If this AGI boundary is actually broken, then this prompting guide will be wholly obsolete and none of the information contained within will be relevant. So, if you are referencing this guide to solve a problem then this section is still completely accurate. The entirety of this document is written with the understanding of this section as being an absolute, non-arguable axiom, and if it was not true then this document would be worthless. While some readers might view this section as strange/obvious/irrelevant, I can assure you that some people using this guide to write prompts absolutely require this information.

For the following specific examples and analysis, the prompt is formatted with an underline, while my analysis and suggestions are inserted between specific sections of the prompt. This is so that the information is better localized for your reading. If you want to know what the original prompt was, just read the underlined sections, skipping any text that is not underlined.

You are running an analysis on the output of an advanced deep learning model that is making a buy or don't buy decision for this stock.

We start out here with the basic background information that sets the stage for what the model is doing and why. In a standard human to human conversation this information makes sense to be the first part of the conversation. As such, it is likely that the training data for the model would follow this kind of linguistic flow. Furthermore, it is pretty unlikely that hallucinations or any other information included within this prompt would take the model off task here. Most of the hallucinations I see are misattributing data points between metrics or adding in additional information. If you end up having an issue with this (likely due to having an extremely long prompt) then a follow up reminder at the end is your best bet. I’d still recommend leaving in this sentence at the start in that event, due to the reasons described above, as well as the issue being severe enough to even warrant additional action.

The model gives a value between 0 and 1, with 0 being a low confidence for a buy and 1 being a high confidence for a buy. This model is shown to have a 60% accuracy on back test data. The most recent value from the model is the most important indicator, as older values are for times in the past. This historical change in the model's output is included for informative purposes, as an increasing value indicates that market conditions are increasingly favorable for a buy. When making your analysis, don't get confused by the min, max, and mean values provided. These are to be considered in comparison to the most recent value, which is the value you should be considering. In general, a value above a 0.5 is considered a buy signal and a value below 0.5 is considered to not be a buy signal.

Now we have our information on how to actually interpret the results of the model. My first critique of this section (which for the record I wrote) is that the 0 and 1 explanations are ambiguous and likely to cause issues. This observation is backed up by empirical evidence as well, as I’ve seen this model list values of 0.4 as “high confidence”. A better formulation might be:

“0 is the lowest confidence in being a good buy, and 1 is the highest confidence for a good buy. Values below 0.5 are evaluated by training and validation algorithms as not containing a buy signal, while values above 0.5 are considered to be buy signals. By this formulation, taking all values above 0.5 as being a buy gives a 60% accuracy on back test data.”

By doing this we are also placing the information about the 0.5 boundary closer to the information on how to evaluate the range between 0 and 1, which is likely to give us better results. Furthermore, as this is some of the most critical information on how to interpret the model, I would recommend moving it lower in the prompt, especially if the agent is making mistakes in analyzing if individual predictions are good buys or not good buys.

We also have in this section the disclaimer about the historical values that we include. This was originally included as the agent was thinking that some of those historical and aggregate values represented the actual prediction of the model. We included that data to allow the agent to look for outliers and trends. If we see a model sitting around 0.35 for a period of time, and then our most recent value is a 0.49, one might conclude that market conditions had drastically changed, and it might be a good time to buy even though we haven’t crossed the 0.5 threshold. On the other hand, if we’ve been oscillating between 0.48 and 0.52, a value of 0.49 is much less significant. All of this information would be lost if we simply provided the most recent value, as both situations would be showing a 0.49. You might also wonder why we don’t just build out a mathematical model that makes this decision for us. While this is often a good approach when dealing with LLM models, this situation is one filled with ambiguity and judgement calls. What if rather than a value of 0.35, it had been a value of 0.38/0.4/0.45? Where does the cutoff exist? One could use fuzzy logic or a similar approach, but one could also just leave this decision up to the agent. If something has an objective, mathematical, modelable, right/wrong answer, then you should use that mathematical model to give you an answer instead of the LLM agent. If something needs a judgement call and you cannot determine the right answer until after a prediction is made, then the LLM agent is more appropriate.

Analysis of ensemble_output: min=0.3541967868804931, max=0.5319923162460327, mean=0.40127307176589966. Regression coef=[-0.00171258], score=0.07836451998035432, direction=decreasing. Latest value=0.4099920988082886.

Now we get to the actual information for this specific prediction. Everything up to this point is going to be identical for every single prediction we make. The first thing of note is all those significant digits on all of the numeric values. For Llama 3.2, the string "min=0.3541967868804931" has 11 tokens, while “min=0.354” has 5 tokens. Are all of those extra digits actually meaningful for our prediction? While you might be asking yourself if this matters, consider that "min=0.3541967868804931, max=0.5319923162460327, mean=0.40127307176589966" and "Wow! we have a min value of 0.354, a max value of 0.532, and a mean (not angry mean tho) value of 0.401" both have 35 tokens using Llama 3.2. If you saw that second string, wouldn’t you go in and trim out all that useless extra text? None of those extra tokens provide any meaningful value to our analysis, and only serve to dilute and separate information. Meanwhile, "min of 0.354, max of 0.532, mean of 0.401" has 18 tokens, with a reduction of almost 50%.

What about “Regression coef=[-0.00171258]”? For just 1 more token, we could instead have "Regression coef of -0.002, indicating a downward trend". Which one of those strings contains more useful information? And with all those savings in token count, we could turn "score=0.07836451998035432" (with a token count of 12) into "and an r squared score of 0.078, indicating that this data is not very linear in nature" (with a token count of 21). What a difference in interpretability and information density 9 tokens can make!

That's all for now.