visualizing how large language models steer through goal-space
This vector flow diagram visualizes how language model outputs shift in goal-space: a space defined
by measurable dimensions of text such as reading level, formality, and more.
More specifically, imagine a text-rewriting request:
This plot shows the three steerability metrics proposed in our
steerability measurement framework. In summary,
we propose a modeling user requests as multi-dimensional vectors in goal-space, and measuring
steerability in terms of goal-space "distance."
For example, in text-rewriting tasks, we often
ask for changes in multiple dimensions of text (e.g., "make this longer, but simplify the language").
When we ask an LLM to rewrite text in these ways, the model's output also changes text in multiple dimensions.
To evaluate performance, we need to take into account changes in all of these dimensions. In our work,
we motivated three main steerability metrics. Informally: