使用大型語言模型(LLM)完成Text-to-SQL任務 - 透過Langchain SQL Agent建立資料問答系統

上一篇解釋了如何透過Langchain SQL Chain建立資料問答系統。使用 LangChain 的 SQL Chain 可以將問題轉換為 SQL 查詢，這過程中會將不同的動作連結在一起，最後通過執行完整的 Chain 一步步地完成每個步驟，最終獲得結果。如果我們希望 LLM 能夠更主動地與環境互動並完成特定任務，就需要建立代理（Agent）。

LLM Powered Autonomous Agents

代理（Agent）可以想像成一個有大腦的系統（LLM），在設定目標後，代理能在無需人為干預的情況下自主完成任務。由於代理需要在每次情境變化時進行思考（Thought）並執行最適當的動作（Action），因此我們需要給代理一個工具箱，讓它能從中選擇最合適的工具（Tool）。此外，代理具有記憶功能，會觀察（Observation）執行結果，並根據情境和環境的變化，不斷調整自己的行為，直到完成任務為止。

在使用Langchain建立SQL Agent前，先把資料庫和大型語言服務準備好。

準備資料庫

建立Chinook這個資料庫來當作範例：

import sqlite3
import requests

# 下載 SQL 腳本
url = "https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sql"
response = requests.get(url)

# 儲存 SQL 腳本到本地檔案
with open('Chinook_Sqlite.sql', 'w') as f:
    f.write(response.text)

# 連接到 SQLite 資料庫 (會自動建立資料庫檔案)
conn = sqlite3.connect('Chinook.db')
cursor = conn.cursor()

# 執行 SQL 腳本來建立資料庫
with open('Chinook_Sqlite.sql', 'r') as f:
    cursor.executescript(f.read())

# 提交變更並關閉連接
conn.commit()
conn.close()

使用langchain的SQLDatabase來建立資料庫的查詢引擎

from langchain_community.utilities import SQLDatabase
# 準備sqlite db
db = SQLDatabase.from_uri("sqlite:///Chinook.db")

print(db.dialect)
print(db.get_usable_table_names())

1
2

sqlite
[(1, 'AC/DC'), (2, 'Accept'), (3, 'Aerosmith'), (4, 'Alanis Morissette'), (5, 'Alice In Chains'), (6, 'Antônio Carlos Jobim'), (7, 'Apocalyptica'), (8, 'Audioslave'), (9, 'BackBeat'), (10, 'Billy Cobham')]

準備大型語言模型服務

再來準備一個大型語言模型服務，這裡使用Gemini當範例

import os
import getpass
import google.generativeai as genai
# https://python.langchain.com/docs/integrations/chat/google_generative_ai/

# Configure Gemini
GEMINI_API_KEY = getpass.getpass() # input your gemini api key
genai.configure(api_key=GEMINI_API_KEY)

llm = genai.GenerativeModel('gemini-1.5-flash')

建立SQL Agent

from langchain_community.agent_toolkits import create_sql_agent
agent_executor = create_sql_agent(llm, db=db)

agent_executor.run("How many employees are there?")

建立SQL Agent並輸入問題後，就會得到和SQL Chain一樣的回答結果。

1	There are 8 employees.

SQLAgent執行過程

使用SQL Chain和SQL Agent的差別在於，SQL Chain會把整個問答過程的步驟先定義好後依序執行。

Agent是先定義出不同的工具（Tool），並由大語言模型（LLM）這個大腦動態地根據執行當下的情況來思考（Thought）應該執行什麼動作（Action）。此時，LLM會挑選出其中一種工具，決定工具的輸入（Action Input）並執行。接著，LLM會觀察（Observation）執行完的結果，然後進行下一輪的思考/動作/動作輸入/觀察，直到得到最終答案。

下面可以透過把verbose打開看到LLM的思考/動作/動作輸入/觀察的執行過程。

from langchain_community.agent_toolkits import create_sql_agent
agent_executor = create_sql_agent(llm, db=db, verbose=True)

agent_executor.run("How many employees are there?")

在Text-to-SQL任務中會先有一個起始的情境，也就是讓LLM要先到資料庫裡找到所有可以查詢的資料表，這時Agent選擇了要執行sql_db_list_tables這個動作，在輸入空值的情況下會得到所有的資料表清單。

> Entering new SQL Agent Executor chain...
Thought: I should look at the tables in the database to see what I can query.  Then I should query the schema of the most relevant tables.
Action: sql_db_list_tables
Action Input: 
Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track

Agent在取得所有資料表清單後，開始進行思考並決定要使用Employee這張資料，因為這張資料表和輸入問題最相關。於是Agent選擇執行sql_db_schema並輸入Employee查詢出這張資料表的schema還有三筆資料。

Thought: The table 'Employee' likely contains information about employees. I'll check its schema.
Action: sql_db_schema
Action Input: Employee
CREATE TABLE "Employee" (
	"EmployeeId" INTEGER NOT NULL, 
	"LastName" NVARCHAR(20) NOT NULL, 
	"FirstName" NVARCHAR(20) NOT NULL, 
	"Title" NVARCHAR(30), 
	"ReportsTo" INTEGER, 
	"BirthDate" DATETIME, 
	"HireDate" DATETIME, 
	"Address" NVARCHAR(70), 
	"City" NVARCHAR(40), 
	"State" NVARCHAR(40), 
	"Country" NVARCHAR(40), 
	"PostalCode" NVARCHAR(10), 
	"Phone" NVARCHAR(24), 
	"Fax" NVARCHAR(24), 
	"Email" NVARCHAR(60), 
	PRIMARY KEY ("EmployeeId"), 
	FOREIGN KEY("ReportsTo") REFERENCES "Employee" ("EmployeeId")
)

/*
3 rows from Employee table:
EmployeeId	LastName	FirstName	Title	ReportsTo	BirthDate	HireDate	Address	City	State	Country	PostalCode	Phone	Fax	Email
1	Adams	Andrew	General Manager	None	1962-02-18 00:00:00	2002-08-14 00:00:00	11120 Jasper Ave NW	Edmonton	AB	Canada	T5K 2N1	+1 (780) 428-9482	+1 (780) 428-3457	andrew@chinookcorp.com
2	Edwards	Nancy	Sales Manager	1	1958-12-08 00:00:00	2002-05-01 00:00:00	825 8 Ave SW	Calgary	AB	Canada	T2P 2T3	+1 (403) 262-3443	+1 (403) 262-3322	nancy@chinookcorp.com
3	Peacock	Jane	Sales Support Agent	2	1973-08-29 00:00:00	2002-04-01 00:00:00	1111 6 Ave SW	Calgary	AB	Canada	T2P 5M5	+1 (403) 262-3443	+1 (403) 262-6712	jane@chinookcorp.com

Agent接著在看完Employee的schema和範例資料後，思考完認為可以透過計算這張資料表的列數來回答問題，於是Agent執行sql_db_query並輸入SQL查詢得到查詢結果。

Thought: I can count the number of rows in the Employee table to answer the question.
Action: sql_db_query
Action Input: SELECT COUNT(*) FROM Employee;
[(8,)]

Agent在看完查詢結果後認為已經能回答最後答案了，於是回答出最終的答案。

Thought: I now know the final answer
Final Answer: There are 8 employees.

> Finished chain.
There are 8 employees.

Chain和Agent差異

從Agent在完成Text-to-SQL的任務過程中可以發現，透過Agent可以讓LLM作為大腦來依據不同的情境動態決定下一步該執行什麼動作，Chain和Agent的比較可以參考下表(The table is craeted by Copilot)

	Chains	Agents
目的	設計來執行一系列預定的步驟，按照特定順序進行。每個步驟處理輸入並將其傳遞到下一步。	設計來執行動態任務，步驟並非預定。在當前上下文和可用工具的基礎上即時做出決策。
結構	線性結構，每個步驟明確定義並按順序執行。	使用思考、行動和觀察的循環。根據當前上下文和結果調整步驟。
靈活性	靈活性較低，遵循預定路徑，不會根據新信息或環境變化動態調整行為。	高度靈活，能夠適應新信息和變化的條件。可以根據當前上下文從多種行動和工具中進行選擇。
使用案例	適合具有明確線性工作流程的任務，如數據處理管道、文本轉換序列和已知步驟的例行操作。	適用於複雜、不可預測的任務，如互動應用程序、實時決策系統和需要適應行為的任務。
總結	線性，預定步驟，靈活性較低，適合例行的線性工作流程。	動態，決策循環，高度靈活，適合複雜、適應性任務。

參考資料:
Q&A over SQL - Agents
Weng, Lilian. (Jun 2023). “LLM-powered Autonomous Agents”. Lil’Log.