DeLLMa: A Framework for Decision Making Under Uncertainty with Large Language Models

1 University of Southern California * Equal Contribution

Classical decision theory has helped humans make rational decisions for decades. Can it do the same for Large Language Models (LLMs)? DeLLMa (pronouced dilemma) is a Decision-making Large Language Model assistant, utilizing the power of Large Language Models (GPT-4) and Classical Decision Theory to help LLMs make decisions under uncertainty.


dellma teaser

Abstract

Large language models (LLMs) are increasingly used across society, including in domains like business, engineering, and medicine. These fields often grapple with decision-making under uncertainty, a critical yet challenging task. In this paper, we show that directly prompting LLMs on these types of decision-making problems yields poor results, especially as the problem complexity increases. To overcome this limitation, we propose DeLLMa (Decision-making Large Language Model assistant), a framework designed to enhance decision-making accuracy in uncertain environments. DeLLMa involves a multi-step scaffolding procedure, drawing upon principles from decision theory and utility theory, to provide an optimal and human-auditable decision-making process. We validate our framework on decision-making environments involving real agriculture and finance data. Our results show that DeLLMa can significantly improve LLM decision-making performance, achieving up to a 40% increase in accuracy over competing methods.

Project Overview

Given a decision query from a user, our framework DeLLMa (Decision-making LLM assistant) aims to perform optimal decision making under uncertainty.

DeLLMa consists of four main steps:

  1. Identify relevant unknown states based on the problem description and user goals.
  2. Forecast the values of the unknown states given in-context information.
  3. Elicit a utility function that aligns with the user’s goals.
  4. Use this utility function to identify the decision that maximizes expected utility.

We show the full DeLLMa algorithm below:

DeLLMa Algorithm

We can illustrate the decision tree used internally by DeLLMa for decision making under uncertainty. In the following figure, we show this decision tree for our agriculture planning environment (described below). DeLLMa uses these types of decision trees to compute and maximize the expected utility of each available action.

agriculture decision tree

Some Experimental Results

We illustrate DeLLMa below on two decision making under uncertainty problems: agriculture planning (Agriculture) and finance investing (Stocks). Both problems involve sizable degrees of uncertainty from diverse sources, and are representative of different data modalities (natural language and tabular) involved in decision making

First, we show results on the Agriculture environment. We collect bi-annual reports published by the United States Department of Agriculture (USDA) that provide analysis of supply-and-demand conditions in the U.S. fruit markets. To emulate real-life farming timelines, we use the report published in September 2021 as context for planning the forthcoming agricultural year. We additionally supplement these natural language contexts with USDA issued price and yield statistics in California.

We define the utility of planting a fruit as its price × yield reported in the forthcoming year. We identify 7 fruits — apple, avocado, grape, grapefruit, lemon, peach, and pear— that are both studied in the September 2021 report, and endowed with these statistics in 2021 and 2022. We create decision making problems by enumerating all possible combinations of availble fruits. For each decision-making instance, we use related sections of the USDA report and current-year price and yield statistics as context. In the figure below, we show that all DeLLMa variants outperform baseline methods; DeLLMa-Pairs is the best, followed by Top1 and Naive. This result implies that the full ranking of state-action pairs is useful for utility elicitation.

agriculture scores

Next, results on the Stocks environment. The action space A is limited to combinations of 7 stocks: AMD, DIS, GME, GOOGL, META, NVDA and SPY. Unlike agriculture data where the context C are collected through USDA reports, we collect historical stock prices as the context for this problem. Each stock is presented with 24 monthly price in history. In preventing possible data leakage and promoting LLMs to use their common-sense knowledge in making decisions, when using gpt4-1106-preview as the LLM checkpoint, historical price between December 2021 to November 2023 are provided as the context C. These historical monthly prices are collected via Yahoo Finance by the authors.

The goal of the LLM agent is to choose which stock to invest on 2023-12-01 and sell on the last trading day of that month (2023-12-29) so that the return is maximized. In the figure below, we show that on average, DeLLMa-Top1 outperforms all baselines. DeLLMa-Pairs is slightly worse than its Top1 counterpart, meaning that ranking state-action pairs is still a challenging task.

stock scores

BibTeX

@misc{liu2024dellma,
      title={{D}e{L}{L}{M}a: {A} {F}ramework for {D}ecision {M}aking {U}nder {U}ncertainty with {L}arge {L}anguage {M}odels}, 
      author={Ollie Liu$^*$ and Deqing Fu$^*$ and Dani Yogatama and Willie Neiswanger},
      year={2024},
      eprint={2402.02392},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}