DeLLMa: Decision Making Under Uncertainty with Large Language Models

Given a decision query from a user, our framework DeLLMa (Decision-making LLM assistant) aims to perform accurate decision making under uncertainty via inference-time reasoning methods.

DeLLMa consists of four main steps:

Identify relevant unknown states based on the problem description and user goals.
Forecast the values of the unknown states given in-context information.
Elicit a utility function that aligns with the user’s goals.
Use this utility function to identify the decision that maximizes expected utility.

We show the full DeLLMa algorithm below:

DeLLMa Algorithm

We can illustrate the decision tree used internally by DeLLMa for decision making under uncertainty. In the following figure, we show this decision tree for our agriculture planning environment (described below). DeLLMa uses these types of decision trees to compute and maximize the expected utility of each available action.

agriculture decision tree

Some Experimental Results

We illustrate DeLLMa below on two decision making under uncertainty problems: agriculture planning (Agriculture) and finance investing (Stocks). Both problems involve sizable degrees of uncertainty from diverse sources, and are representative of different data modalities (natural language and tabular) involved in decision making

First, we show results on the Agriculture environment. We collect bi-annual reports published by the United States Department of Agriculture (USDA) that provide analysis of supply-and-demand conditions in the U.S. fruit markets. To emulate real-life farming timelines, we use the report published in September 2021 as context for planning the forthcoming agricultural year. We additionally supplement these natural language contexts with USDA issued price and yield statistics in California.

We define the utility of planting a fruit as its price × yield reported in the forthcoming year. We identify 7 fruits — apple, avocado, grape, grapefruit, lemon, peach, and pear— that are both studied in the September 2021 report, and endowed with these statistics in 2021 and 2022. We create decision making problems by enumerating all possible combinations of availble fruits. For each decision-making instance, we use related sections of the USDA report and current-year price and yield statistics as context. In the figure below, we show that all DeLLMa variants outperform baseline methods; DeLLMa-Pairs is the best, followed by Top1 and Naive. This result implies that the full ranking of state-action pairs is useful for utility elicitation.

agriculture scores

Next, results on the Stocks environment. The action space A is limited to combinations of 7 stocks: AMD, DIS, GME, GOOGL, META, NVDA and SPY. Unlike agriculture data where the context C are collected through USDA reports, we collect historical stock prices as the context for this problem. Each stock is presented with 24 monthly price in history. In preventing possible data leakage and promoting LLMs to use their common-sense knowledge in making decisions, when using gpt4-1106-preview as the LLM checkpoint, historical price between December 2021 to November 2023 are provided as the context C. These historical monthly prices are collected via Yahoo Finance by the authors.

The goal of the LLM agent is to choose which stock to invest on 2023-12-01 and sell on the last trading day of that month (2023-12-29) so that the return is maximized. In the figure below, we show that on average, DeLLMa-Top1 outperforms all baselines. DeLLMa-Pairs is slightly worse than its Top1 counterpart, meaning that ranking state-action pairs is still a challenging task.

DeLLMa: Decision Making Under Uncertainty with Large Language Models

Abstract

Project Overview

Some Experimental Results

BibTeX