OllamaFunctions and Mistral

OllamaFunctions & Mistral #

This blog post demonstrates how to use LangChain, OllamaFunctions and the Mistral model to extract structured data from unstructured text.

We'll start with a simple example, extracting information about a person from a text block.

Prerequisites:

Python 3.7 or higher
LangChain (install using pip install langchain_experimental -qU)

Code:

from langchain_experimental.llms.ollama_functions import OllamaFunctions
from langchain.chains import create_extraction_chain

# Define the text
data="""
Born May 11, 1918
Richard Phillips Feynman
New York City, U.S.
Died	February 15, 1988 (aged 69)
Los Angeles, California, U.S.
Resting place	Mountain View Cemetery and Mausoleum
Education
Massachusetts Institute of Technology (SB)
Princeton University (PhD)
"""

# Define the schema for the extracted data
schema = {
    "properties": {
        "name": {"type": "string"},
        "born": {"type": "date"},
        "died": {"type": "date"},
        "city": {"type": "string"},
        "education": {"type": "string"},
        "resting_place": {"type":"string"}
    },
    "required": ["name", "year"],
}

# Initialize the OllamaFunctions LLM
llm = OllamaFunctions(model="mistral", temperature=0)

# Create the extraction chain
chain = create_extraction_chain(schema, llm)

# Run the chain on the data
result = chain.run(data)

# Print the extracted data
print(result)

Output:

[{'name': 'Richard Phillips Feynman',
  'born': 'May 11, 1918',
  'city': 'New York City, U.S.',
  'died': 'February 15, 1988 (aged 69)',
  'resting_place': 'Mountain View Cemetery and Mausoleum',
  'education': 'Massachusetts Institute of Technology (SB) Princeton University (PhD)'}]

Explanation:

We define a schema with the expected fields and types for the extracted data.
We instantiate an OllamaFunctions LLM using the Mistral model.
We create an extraction_chain using the schema and the LLM.
The chain is run on the input text.
The output is a list of dictionaries, each dictionary representing an extracted data point.

Key takeaways:

OllamaFunctions provides a simple and convenient way to use Mistral with LangChain.
The create_extraction_chain function allows for easy data extraction based on a defined schema.
The resulting structured data can be used for further processing and analysis.

This example demonstrates the power of LangChain, OllamaFunctions, and Mistral for extracting structured data from text. You can easily adapt this approach to different data formats and schemas to extract information from various text sources.

Feel free to share this article. You may as well ping me on Twitter.

Published 18 May 2024