PAL Prompting

简介

Gao et al., 2022引入了PAL框架，该框架包括降低推理步骤和LLM最终结果的计算，以确保答案的准确性。这降低了LLM的逻辑和计算错误的失败率。

通过将推理步骤外包给解释器，PAL消除了LLM执行计算的需要。结果表明，即使LLM使用更强的COT，PAL在精度上仍然优于它。此外，PAL可以与较弱的LLM协同工作，并在与较强的LLM合作时扩大其优势。

原理

虽然LLM可以将自然语言问题分解为多个步骤并执行简单的算术运算，但在处理复杂的算术或大数字时，它们的性能会显著下降。事实上，即使在基于164B明确数学内容的令牌对基于PaLM的模型进行微调时，据报道其最常见的两个失败是“错误推理”和“错误计算”

由于以前的LLM只在内部执行计算，因此无法处理大型计算和复杂的算法。PAL将问题分解为多个推理步骤，并将这些推理步骤和计算委托给Python解释器。因此，只要LLM的编码能力足够，它就可以准确地执行任何计算。

Prompt 示例

Prompt

Q: I have a drum, a flute, a clarinet, a violin, four accordions, a piano, a trombone, and a trumpet. How many musical instruments do I have?

# solution using Python:

def solution():
    """Q: I have a drum, a flute, a clarinet, a violin, four accordions, a piano, a trombone, and a trumpet. How many musical instruments do I have?"""
    musical_instruments_to_count = {
        'drum': 1,
        'flute': 1,
        'clarinet': 1,
        'violin': 1,
        'accordion': 4,
        'piano': 1,
        'trombone': 1,
        'trumpet': 1
    }
    num_musical_instruments = sum(musical_instruments_to_count.values())
    return num_instruments



Q: I have a chair, two ovens, and three tables. How many objects do I have?

# solution using Python:

def solution():
    """Q: I have a chair, two ovens, and three tables. How many objects do I have?
    """
    objects_to_count = {
        'chair': 1,
        'oven': 2,
        'table': 3
    }
    num_objects = sum(objects_to_count.values())
    return num_objects



Q: I have a chair, two potatoes, a cauliflower, a lettuce head, two tables, a cabbage, two onions, and three fridges. How many vegetables do I have?

# solution using Python:

def solution():
    """Q: I have a chair, two potatoes, a cauliflower, a lettuce head, two tables, a cabbage, two onions, and three fridges. How many vegetables do I have?
    """
    # note: I'm not counting the chair, tables, or fridges as vegetables
    vegetables_to_count = {
        'potato': 2,
        'cauliflower': 1,
        'lettuce head': 1,
        'cabbage': 1,
        'onion': 2
    }
    num_vegetables = sum(vegetables_to_count.values())
    return num_vegetables



Q: I have a raspberry, a cat, a rabbit, a mouse, a pig, two snails, a fish, two cows, a snake, a goat, and a duck. How many animals do I have?

# solution using Python:

def solution():
    """Q: I have a raspberry, a cat, a rabbit, a mouse, a pig, two snails, a fish, two cows, a snake, a goat, and a duck. How many animals do I have?
    """
    # note: I'm not counting the raspberry as an animal
    animals_to_count = {
        'cat': 1,
        'rabbit': 1,
        'mouse': 1,
        'pig': 1,
        'snail': 2,
        'fish': 1,
        'cow': 2,
        'snake': 1,
        'goat': 1,
        'duck': 1
    }
    num_animals = sum(animals_to_count.values())
    return num_animals



Q: {屋子里有三把椅子两个桌子，三张床，请问一共有多少的家具？}

# solution using Python:

Output

以下是Python代码：

def solution():
    """Q: 屋子里有三把椅子两个桌子，三张床，请问一共有多少的家具？"""
    furniture_to_count = {
        'chair': 3,
        'table': 2,
        'bed': 3
    }
    num_furniture = sum(furniture_to_count.values())
    return num_furniture
答案是：你有八件家具。

数据集

我们实验了三大类推理任务:

(1)来自GSM8K、SVAMP、ASDIV和MAWPS等广泛数据集的数学问题; (2)来自BIG-Bench Hard的符号推理; (3)来自BIG-Bench Hard中的算法问题。

GSM8K

GSM8K是一个由8.5K高质量语言多样的初级数学单词问题组成的数据集。这些问题分为7.5K训练问题和1K测试问题。这些问题需要两到八个步骤才能解决，并通过使用基本算术运算（+-/*）进行一系列基本计算来获得最终答案。

Defects4J

该数据集包含5个Java项目的已知缺陷和修复，包括Commons Math、Joda Time等

MultiArith

MultiArith数据集是一个多步骤算法数据集，包含600个初级基于场景的数学问题。

Django

该数据集也是一个自然语言到代码的语义解析任务数据集，其中包括来自Django Web框架的Python代码和相关的自然语言描述。

SQuAD

该数据集是一个广泛用于自然语言处理任务的问答数据集，其中包含维基百科上的问题及其相应的答案

参考文献

[1] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P ., Neelakantan, A., Shyam, P ., Sastry, G., Askell, A., Agarwal, S., Herbert-V oss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. Language Models are Few-Shot Learners. In NeurIPS, 2020.

[2] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., and Zhou, D. Self-Consistency Improves Chain of Thought Reasoning in Language Models. arXiv preprint arXiv:2203.11171, 2022b.

[3] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E., Le, Q., and Zhou, D. Chain of Thought Prompting Elicits Reasoning in Large Language Models. arXiv preprint arXiv:2201.11903, 2022.

实践篇章

ChatGPT 使用指南

帮助我们学习

协助我们工作

丰富我们的经验

方便我们的生活

使用LangChain操作大模型

方法篇章

高级提示设计

自动化提示设计

思维链

上下文学习

评估和可靠性

理论篇

PAL Prompting

简介

原理

Prompt 示例

Prompt

Output

数据集

GSM8K

Defects4J

MultiArith

Django

Codeforces

SQuAD

参考文献

ChatGPT 使用指南

帮助我们学习

协助我们工作

丰富我们的经验

方便我们的生活

使用LangChain操作大模型

高级提示设计

自动化提示设计

思维链

上下文学习

评估和可靠性

PAL Prompting ​

简介 ​

原理 ​

Prompt 示例 ​

Prompt ​

Output ​

数据集 ​

GSM8K ​

Defects4J ​

MultiArith ​

Django ​

Codeforces ​

SQuAD ​

参考文献 ​

PAL Prompting

简介

原理

Prompt 示例

Prompt

Output

数据集

GSM8K

Defects4J

MultiArith

Django

Codeforces

SQuAD

参考文献