This blog post demonstrates how to use LangChain, OllamaFunctions and the Mistral model to extract structured data from unstructured text.
We’ll start with a simple example, extracting information about a person from a text block.
Prerequisites
- Python 3.7 or higher
- LangChain (install using pip install langchain_experimental -qU)
pip install langchain_experimental -qU
Code
from langchain_experimental.llms.ollama_functions import OllamaFunctions
from langchain.chains import create_extraction_chain
# Define the text
data="""
Born May 11, 1918
Richard Phillips Feynman
New York City, U.S.
Died February 15, 1988 (aged 69)
Los Angeles, California, U.S.
Resting place Mountain View Cemetery and Mausoleum
Education
Massachusetts Institute of Technology (SB)
Princeton University (PhD)
"""
# Define the schema for the extracted data
schema = {
"properties": {
"name": {"type": "string"},
"born": {"type": "date"},
"died": {"type": "date"},
"city": {"type": "string"},
"education": {"type": "string"},
"resting_place": {"type":"string"}
},
"required": ["name", "year"],
}
# Initialize the OllamaFunctions LLM
llm = OllamaFunctions(model="mistral", temperature=0)
# Create the extraction chain
chain = create_extraction_chain(schema, llm)
# Run the chain on the data
result = chain.run(data)
# Print the extracted data
print(result)
## Output
python
[{
'name': 'Richard Phillips Feynman',
'born': 'May 11, 1918',
'city': 'New York City, U.S.',
'died': 'February 15, 1988 (aged 69)',
'resting_place': 'Mountain View Cemetery and Mausoleum',
'education': 'Massachusetts Institute of Technology (SB) Princeton University (PhD)'
}]
Explanation
- We define a schema with the expected fields and types for the extracted data.
- We instantiate an
OllamaFunctionsLLM using the Mistral model. - We create an
extraction_chainusing the schema and the LLM. - The chain is run on the input text.
- The output is a list of dictionaries, each dictionary representing an extracted data point.
Key Takeaways
- OllamaFunctions provides a simple and convenient way to use Mistral with LangChain.
- The
create_extraction_chainfunction allows for easy data extraction based on a defined schema. - The resulting structured data can be used for further processing and analysis.
This example demonstrates the power of LangChain, OllamaFunctions, and Mistral for extracting structured data from text. You can easily adapt this approach to different data formats and schemas to extract information from various text sources.
#langchain#ollama#mistral#ai
About Hemanth HM
Hemanth HM is a Sr. Machine Learning Manager at PayPal, Google Developer Expert, TC39 delegate, FOSS advocate, and community leader with a passion for programming, AI, and open-source contributions.