SQLDatabaseChain

SQLDatabaseChain is a langchain_experimental chain for interacting with SQL Database.

It makes it easier to query your DB in natural language, in the post we shall be seeing an example of connecting to a Postgres DB and query it.

Fetch the dependencies:

pip install psycopg2 -q
pip install langchain_experimental -q
pip install google.generativeai
pip install langchain

psycopg is the most popular PostgreSQL database adapter for Python. It fully implements the Python DB API 2.0 specification and allows for thread safety.

langchain_experimental package contains experimental LangChain code for research purposes. Certain code in this package can be unsafe if not deployed in a sandboxed environment.

google.generativeai Python client library for Google's large language model PaLM API

Create llm instance:

from langchain.llms import GooglePalm
llm = GooglePalm(google_api_key=PLAM_API_KEY)
llm.temperature = 0.1

Import the required modules:

from langchain_experimental.sql import SQLDatabaseChain
from langchain.sql_database import SQLDatabase
from langchain.llms import GooglePalm

SQLDatabaseChain class handles SQL execution via the _call method. It uses inputs and a run manager to run SQL queries against a database.

SQLDatabase class is a SQLAlchemy wrapper around a database.

GooglePalm langchain's class to communicate with Google PaLM models.

Connect to the Postgress DB:

pg_uri = f"postgresql+psycopg2://<username>:<password>@<host>/<database>"

make sure to have ^ postgresql+psycopg2 in the URI

remote_db = SQLDatabase.from_uri(pg_uri)

In this example I shall be connecting to postgress DB hosted on elephantsql.

Create a prompt:

PROMPT = """
I will act as a helpful assistant adept at querying databases. When given a natural language question about the data, I will:

Analyze the question to understand the intent and entities. What data fields or tables are being asked about?
Formulate a SQL query that is syntactically correct and will retrieve the requested data. Focus on succinct, valid SQL with minimal unnecessary syntax.
Run the generated SQL query against the provided database to retrieve results.
Interpret the results and summarize or format them in a user-friendly way to answer the original question. Convey the essence clearly and concisely.
If I cannot understand the question or generate a suitable query, be honest and state that more clarification is needed. Ask relevant follow-up questions.
Optimize for informativeness, clarity, accuracy and brevity. Avoid irrelevant details or overly verbose responses.
Given this natural language question:

{question}

Please provide a helpful SQL-based response:
"""

SQLDatabaseChain from an LLM and DB connection:

db_chain = SQLDatabaseChain.from_llm(llm=llm, db=remote_db, verbose=True, top_k=3)

Query the DB:

query = "Describe the table"
db_chain.run(PROMPT.format(question=query))
query = "Get me the uniq URLs from the table"
db_chain.run(PROMPT.format(question=query))

Without langchain_experimental's SQLDatabaseChain we would have to do something like below:

from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit

toolkit = SQLDatabaseToolkit(db=remote_db, llm=llm)

agent_executor = create_sql_agent(
    llm=llm,
    toolkit=toolkit,
    verbose=True,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
)

agent_executor.run(PROMPT.format(question=query))

Final code:

from langchain_experimental.sql import SQLDatabaseChain
from langchain.sql_database import SQLDatabase
from langchain.llms import GooglePalm

llm = GooglePalm(google_api_key=PLAM_API_KEY)
llm.temperature = 0.1

pg_uri = f"postgresql+psycopg2://lccqjuny:[email protected]/lccqjuny"

remote_db = SQLDatabase.from_uri(pg_uri)

PROMPT = """
I will act as a helpful assistant adept at querying databases. When given a natural language question about the data, I will:

Analyze the question to understand the intent and entities. What data fields or tables are being asked about?
Formulate a SQL query that is syntactically correct and will retrieve the requested data. Focus on succinct, valid SQL with minimal unnecessary syntax.
Run the generated SQL query against the provided database to retrieve results.
Interpret the results and summarize or format them in a user-friendly way to answer the original question. Convey the essence clearly and concisely.
If I cannot understand the question or generate a suitable query, be honest and state that more clarification is needed. Ask relevant follow-up questions.
Optimize for informativeness, clarity, accuracy and brevity. Avoid irrelevant details or overly verbose responses.
Given this natural language question:

{question}

Please provide a helpful SQL-based response:
"""

db_chain = SQLDatabaseChain.from_llm(llm=llm, db=remote_db, verbose=True, top_k=3)

query = "Describe the table"
db_chain.run(PROMPT.format(question=query))

Video demo:


With ❤️ Hemanth.HM