Summary of Attempted Methods in Steering LLMs

Objective: To find the most efficient method in controlling a language model.

A transformer consists of two main components: (i) an Encoder and (ii) a Decoder. The Encoder contains an Attention mechanism followed by Stacks of Linear layers. When tokens are converted into vector representations, they first pass through the Attention block, which extracts key-value pairs and positional embeddings learned from the training corpus. The outputs are then processed by the Stacks of Linear layers, projecting them through a series of transformations that produce embeddings known as hidden states.

The current challenge is to steer large language models without training or fine-tuning. Table figure 1 below categorizes various methods based on the architectural components they target within the transformer model, specifically the Encoder and Decoder blocks, to achieve controlled manipulation of the model’s behavior. Each method is described along with its challenges and level of difficulty.

Transformer Block: This identifies the primary component of the transformer model that the method influences, whether the Encoder or Decoder block.
Description: A brief explanation of how this method works.
Challenges: Potential limitations, drawbacks, or constraints associated with the method.
Difficulty: An estimation of how complex or resource-intensive the method is to implement.

Transformer Block

Description

Challenges

Difficulty

Encoder Block

Encoder outputs are used as the target contextual embeddings for the next text generations. One can assume that a separate model is built considering the encoder's output and the streaming outputs in generating the next responses.

The challenge with this method is the ranking algorithm and database management of the cached vector embeddings.

Medium

Encoder Block

Fine-tuning a pre-trained model by merging the models architecture weights. This involves building a transformation function that takes the rotates or translates the weights to another space of interest.

Mathematical aspects like Geometric Analysis is a core prerequisite in succeeding with this method. Given that this is still a new concept, not many theories or works are available to justify any analysis or discoveries. - Furthermore, certain Algebraic properties must be satisfied such as the Orthonormality of the linear space combination. This is difficult when considering dense dimensions.

Difficult

Encoder Block

Fine-tuning a pre-trained model using LORA technique on smaller datasets. This technique involves hooking linear perceptrons, adapters, to certain Encoder states.

Selecting the linear layers to hook the adaptive learners.
Alike most challenges in model training, determining the learning parameters can be challenging.

Easy

Decoder Block

Decoder parameter tuning method is implementing exploratory search methods to locate the optimal set of hyper-parameters that produce responses according to specific contextual quality, for example, expressiveness or extroverted behaviour. The method involves running multiple trials with varying decoder parameters (e.g., temperature, top-k, top-p) and analyzing their impact on text expressiveness. This allows for mapping desired expressiveness to optimal decoding configurations, enabling structured control over language model outputs. Techniques like Bayesian optimization, reinforcement learning, or evolutionary search can further refine the parameter selection process for more adaptive and efficient tuning.

This method operates by relying on prompting pre-trained models. The classifier responsible of scoring the traits of the LLM's responses may not be accurate due to bias error. Furthermore, this same model must be consistent throughout all trials.
LLMs responses are conditioned to the given prompt. Given that there are variations of ways of expressing one's intention, extra trials may need to be preformed regardless of the similarity scores of the prompts' contextual embeddings.
Decoding algorithms remain an open research topic, with ongoing efforts to refine and optimize their effectiveness. That said, current decoding methods may not always capture the full range of linguistic expressiveness, coherence, or contextual accuracy needed for certain applications. Limitations in existing approaches, such as exposure bias, repetition, or lack of controllability, highlight the need for more adaptive and robust decoding strategies.

Easy

Summary Table Notes

The method listed in row 3 is commonly used for fine-tuning a pre-trained model on smaller, domain-specific datasets. For example, it can be applied to handling general FAQs and customer inquiries for a company’s product.

PreviousGraph Networks with Attention NextSteering LLM: Decoder Block + Logit Scores Manipulation

Last updated 3 months ago