cover
LangchainAIGenerative AILLM

Langchain

Cover
https://miro.medium.com/v2/resize:fit:1200/1*-PlFCd_VBcALKReO3ZaOEg.png
Slug
Langchain
Published
Published
Date
Jan 16, 2024
Category
Langchain
AI
Generative AI
LLM
  • Prompts: Templatize, dynamically select, and manage model inputs
  • Chat models: Models that are backed by a language model but take a list of Chat Messages as input and return a Chat Message
  • LLMs: Models that take a text string as input and return a text string

LLM / Chat Mode

 
notion image
There are two types of language models:
  • LLM: underlying model takes a string as input and returns a string
  • ChatModel: underlying model takes a list of messages as input and returns a message
 
Strings are simple, but what exactly are messages? The base message interface is defined by BaseMessage, which has two required attributes:
  • content: The content of the message. Usually a string.
  • role: The entity from which the BaseMessage is coming.
 
LangChain provides several objects to easily distinguish between different roles:
  • HumanMessage: A BaseMessage coming from a human/user.
  • AIMessage: A BaseMessage coming from an AI/assistant.
  • SystemMessage: A BaseMessage coming from the system.
  • FunctionMessage / ToolMessage: A BaseMessage containing the output of a function or tool call.
If none of those roles sound right, there is also a ChatMessage class where you can specify the role manually.
 
LangChain provides a common interface that's shared by both LLMs and ChatModels. However it's useful to understand the difference in order to most effectively construct prompts for a given language model.
The simplest way to call an LLM or ChatModel is using .invoke(), the universal synchronous call method for all LangChain Expression Language (LCEL) objects:
  • LLM.invoke: Takes in a string, returns a string.
  • ChatModel.invoke: Takes in a list of BaseMessage, returns a BaseMessage.
from langchain.llms import GooglePalm from langchain.chat_models import ChatGooglePalm from langchain.schema import HumanMessage import os from dotenv import load_dotenv load_dotenv() # GOOGLE_API_KEY='AIzaSyAmMtufl_TPyWdLbuCFD5Lh8IEpJWrEXYo' llm = GooglePalm() llm = GooglePalm(google_api_key=os.environ["GOOGLE_API_KEY"], temperature=0.1) llm.temperature = 0.2 model = ChatGooglePalm( google_api_key="AIzaSyAmMtufl_TPyWdLbuCFD5Lh8IEpJWrEXYo", temperature=0.7, modelName="models/chat-bison-001", topK=40, topP=1, verbose=True, ) text = "I love programming." message = [HumanMessage(content=text)] print(llm.invoke(text)) print(model.invoke(message))

Prompt templates

#llm_prompt from langchain.prompts import PromptTemplate prompt_template = PromptTemplate.from_template("Tell me a {adjective} joke about {content}.") print(prompt_template.format(adjective="funny", content="chickens")) #chat_prompt from langchain.prompts import ChatPromptTemplate chat_template = ChatPromptTemplate.from_messages( [ ("system", "You are a helpful AI bot. Your name is {name}."), ("human", "Hello, how are you doing?"), ("ai", "I'm doing well, thanks!"), ("human", "{user_input}"), ] ) messages = chat_template.format_messages(name="Bob", user_input="What is your name?")
For additional validation, specify input_variables explicitly. These variables will be compared against the variables present in the template string during instantiation, raising an exception if there is a mismatch. For example:
from langchain.prompts import PromptTemplate invalid_prompt = PromptTemplate( input_variables=["adjective"], template="Tell me a {adjective} joke about {content}.", )
ValidationError: 1 validation error for PromptTemplate __root__ Invalid prompt schema; check for mismatched or missing input parameters. 'content' (type=value_error)
For example, in addition to using the 2-tuple representation of (type, content) used above, you could pass in an instance of MessagePromptTemplate or BaseMessage.
from langchain.chat_models import ChatGooglePalm from langchain.prompts import HumanMessagePromptTemplate from langchain.schema.messages import SystemMessage model = ChatGooglePalm( google_api_key="AIzaSyAmMtufl_TPyWdLbuCFD5Lh8IEpJWrEXYo", ) chat_template = ChatPromptTemplate.from_messages( [ SystemMessage( content=( "You are a helpful assistant that re-writes the user's text to " "make it in 20 words" ) ), HumanMessagePromptTemplate.from_template("{text}"), ] ) print(model(chat_template.format_messages(text="i dont like studying")))
PromptTemplate and ChatPromptTemplate implement the Runnable interface, the basic building block of the LangChain Expression Language (LCEL). This means they support invokeainvokestreamastreambatchabatchastream_log calls.
PromptTemplate accepts a dictionary (of the prompt variables) and returns a StringPromptValue. A ChatPromptTemplate accepts a dictionary and returns a ChatPromptValue.

Connecting to a Feature Store

 
notion image
This concept is extremely relevant when considering putting LLM applications in production. In order to personalize LLM applications, you may want to combine LLMs with up-to-date information about particular users. Feature stores can be a great way to keep that data fresh, and LangChain provides an easy way to combine that data with LLMs.

Feast

Feast (Feature Store) is an open source feature store for machine learning. Feast is the fastest path to manage existing infrastructure to productionize analytic data for model training and online inference.


******** IMPORTANT BUT NEED TO BE CONTINUED… ********

Custom prompt template

There are essentially two distinct prompt templates available - string prompt templates and chat prompt templates. String prompt templates provides a simple prompt in string format, while chat prompt templates produces a more structured prompt to be used with a chat API.
import inspect def get_source_code(function_name): # Get the source code of the function return inspect.getsource(function_name)
from langchain.prompts import StringPromptTemplate from pydantic import BaseModel, validator PROMPT = """\ Given the function name and source code, generate an English language explanation of the function. Function Name: {function_name} Source Code: {source_code} Explanation: """ class FunctionExplainerPromptTemplate(StringPromptTemplate, BaseModel): """A custom prompt template that takes in the function name as input, and formats the prompt template to provide the source code of the function.""" @validator("input_variables") def validate_input_variables(cls, v): """Validate that the input variables are correct.""" if len(v) != 1 or "function_name" not in v: raise ValueError("function_name must be the only input_variable.") return v def format(self, **kwargs) -> str: # Get the source code of the function source_code = get_source_code(kwargs["function_name"]) # Generate the prompt to be sent to the language model prompt = PROMPT.format( function_name=kwargs["function_name"].__name__, source_code=source_code ) return prompt def _prompt_type(self): return "function-explainer" fn_explainer = FunctionExplainerPromptTemplate(input_variables=["function_name"]) # Generate a prompt for the function "get_source_code" prompt = fn_explainer.format(function_name=get_source_code) print(prompt)

Few-shot prompt templates

A few-shot prompt template can be constructed from either a set of examples, or from an Example Selector object.
from langchain.prompts.few_shot import FewShotPromptTemplate from langchain.prompts.prompt import PromptTemplate examples = [ { "question": "Who lived longer, Muhammad Ali or Alan Turing?", "answer": """ Are follow up questions needed here: Yes. Follow up: How old was Muhammad Ali when he died? Intermediate answer: Muhammad Ali was 74 years old when he died. Follow up: How old was Alan Turing when he died? Intermediate answer: Alan Turing was 41 years old when he died. So the final answer is: Muhammad Ali """ }, { "question": "When was the founder of craigslist born?", "answer": """ Are follow up questions needed here: Yes. Follow up: Who was the founder of craigslist? Intermediate answer: Craigslist was founded by Craig Newmark. Follow up: When was Craig Newmark born? Intermediate answer: Craig Newmark was born on December 6, 1952. So the final answer is: December 6, 1952 """ }, { "question": "Who was the maternal grandfather of George Washington?", "answer": """ Are follow up questions needed here: Yes. Follow up: Who was the mother of George Washington? Intermediate answer: The mother of George Washington was Mary Ball Washington. Follow up: Who was the father of Mary Ball Washington? Intermediate answer: The father of Mary Ball Washington was Joseph Ball. So the final answer is: Joseph Ball """ }, { "question": "Are both the directors of Jaws and Casino Royale from the same country?", "answer": """ Are follow up questions needed here: Yes. Follow up: Who is the director of Jaws? Intermediate Answer: The director of Jaws is Steven Spielberg. Follow up: Where is Steven Spielberg from? Intermediate Answer: The United States. Follow up: Who is the director of Casino Royale? Intermediate Answer: The director of Casino Royale is Martin Campbell. Follow up: Where is Martin Campbell from? Intermediate Answer: New Zealand. So the final answer is: No """ } ]

example set

example_prompt = PromptTemplate(input_variables=["question", "answer"], template="Question: {question}\n{answer}") print(example_prompt.format(**examples[0]))
prompt = FewShotPromptTemplate( examples=examples, example_prompt=example_prompt, suffix="Question: {input}", input_variables=["input"] ) print(prompt.format(input="Who was the father of Mary Ball Washington?"))

example selector

from langchain.prompts.example_selector import SemanticSimilarityExampleSelector from langchain.vectorstores import Chroma from langchain.embeddings import OpenAIEmbeddings example_selector = SemanticSimilarityExampleSelector.from_examples( # This is the list of examples available to select from. examples, # This is the embedding class used to produce embeddings which are used to measure semantic similarity. OpenAIEmbeddings(), # This is the VectorStore class that is used to store the embeddings and do a similarity search over. Chroma, # This is the number of examples to produce. k=1 ) # Select the most similar example to the input. question = "Who was the father of Mary Ball Washington?" selected_examples = example_selector.select_examples({"question": question}) print(f"Examples most similar to the input: {question}") for example in selected_examples: print("\n") for k, v in example.items(): print(f"{k}: {v}")
prompt = FewShotPromptTemplate( example_selector=example_selector, example_prompt=example_prompt, suffix="Question: {input}", input_variables=["input"] ) print(prompt.format(input="Who was the father of Mary Ball Washington?"))

Example ———>

from langchain.prompts import ( PromptTemplate, FewShotPromptTemplate, ) examples = [ {"question": "dog is", "answer": "pet animal"}, {"question": "tiger is", "answer": "wild animal"}, {"question": "bear is", "answer": "wild animal"}, {"question": "cat is", "answer": "pet animal"}, ] example_prompt = PromptTemplate( template="My Question: {question}\nAI answer: {answer}", input_variables=["question", "answer"], ) few_shot_prompt = FewShotPromptTemplate( example_prompt=example_prompt, examples=examples, suffix="My Question: {input}", input_variables=["input"], ) print(few_shot_prompt.format(input="crocodile is")) from langchain.llms import GooglePalm import os from dotenv import load_dotenv load_dotenv() # GOOGLE_API_KEY='AIzaSyAmMtufl_TPyWdLbuCFD5Lh8IEpJWrEXYo' llm = GooglePalm(google_api_key=os.environ["GOOGLE_API_KEY"]) chain = few_shot_prompt | llm print(chain.invoke({"input": "crocodile is"}))

Partial prompt templates

from datetime import datetime def _get_datetime(): now = datetime.now() return now.strftime("%m/%d/%Y, %H:%M:%S")
prompt = PromptTemplate( template="Tell me a {adjective} joke about the day {date}", input_variables=["adjective", "date"] ); partial_prompt = prompt.partial(date=_get_datetime) print(partial_prompt.format(adjective="funny"))
OR
prompt = PromptTemplate( template="Tell me a {adjective} joke about the day {date}", input_variables=["adjective"], partial_variables={"date": _get_datetime} ); print(prompt.format(adjective="funny"))

Composition - PipeLining

from langchain.prompts.pipeline import PipelinePromptTemplate from langchain.prompts.prompt import PromptTemplate full_template = """{introduction} {example} {start}""" full_prompt = PromptTemplate.from_template(full_template) introduction_template = """You are impersonating {person}.""" introduction_prompt = PromptTemplate.from_template(introduction_template) example_template = """Here's an example of an interaction: Q: {example_q} A: {example_a}""" example_prompt = PromptTemplate.from_template(example_template) start_template = """Now, do this for real! Q: {input} A:""" start_prompt = PromptTemplate.from_template(start_template) input_prompts = [ ("introduction", introduction_prompt), ("example", example_prompt), ("start", start_prompt) ] pipeline_prompt = PipelinePromptTemplate(final_prompt=full_prompt, pipeline_prompts=input_prompts) print(pipeline_prompt.format( person="Elon Musk", example_q="What's your favorite car?", example_a="Tesla", input="What's your favorite social media site?" )) from langchain.llms import GooglePalm import os from dotenv import load_dotenv load_dotenv() # GOOGLE_API_KEY='AIzaSyAmMtufl_TPyWdLbuCFD5Lh8IEpJWrEXYo' llm = GooglePalm(google_api_key=os.environ["GOOGLE_API_KEY"]) chain = pipeline_prompt | llm print(chain.invoke({"person": "Elon Musk", "example_q": "What's your favorite car?", "example_a": "Tesla", "input": "What's your favorite social media site?"}))

Prompt pipelining

 

Serialization

t is often preferable to store prompts not as python code but as files. This can make it easy to share, store, and version prompts. This notebook covers how to do that in LangChain, walking through all the different types of prompts and the different serialization options.

Parsers

Output parser

from typing import List from langchain.chat_models import ChatOpenAI from langchain.prompts import ChatPromptTemplate from langchain.schema import BaseOutputParser class CommaSeparatedListOutputParser(BaseOutputParser[List[str]]): """Parse the output of an LLM call to a comma-separated list.""" def parse(self, text: str) -> List[str]: """Parse the output of an LLM call.""" return text.strip().split(", ") template = """You are a helpful assistant who generates comma separated lists. A user will pass in a category, and you should generate 5 objects in that category in a comma separated list. ONLY return a comma separated list, and nothing more.""" human_template = "{text}" chat_prompt = ChatPromptTemplate.from_messages([ ("system", template), ("human", human_template), ]) chain = chat_prompt | ChatOpenAI() | CommaSeparatedListOutputParser() chain.invoke({"text": "colors"}) # >> ['red', 'blue', 'green', 'yellow', 'orange']

SimpleJsonOutputParser

from langchain.output_parsers.json import SimpleJsonOutputParser json_prompt = PromptTemplate.from_template( "Return a JSON object with an `answer` key that answers the following question: {question}" ) json_parser = SimpleJsonOutputParser() json_chain = json_prompt | model | json_parser list(json_chain.stream({"question": "Who invented the microscope?"}))

List parser

from langchain.output_parsers import CommaSeparatedListOutputParser from langchain.prompts import PromptTemplate from langchain.llms import OpenAI output_parser = CommaSeparatedListOutputParser() format_instructions = output_parser.get_format_instructions() prompt = PromptTemplate( template="List five {subject}.\n{format_instructions}", input_variables=["subject"], partial_variables={"format_instructions": format_instructions} ) model = OpenAI(temperature=0) _input = prompt.format(subject="ice cream flavors") output = model(_input) output_parser.parse(output)

Datetime parser

from langchain.chains import LLMChain from langchain.llms import OpenAI from langchain.output_parsers import DatetimeOutputParser from langchain.prompts import PromptTemplate output_parser = DatetimeOutputParser() template = """Answer the users question: {question} {format_instructions}""" prompt = PromptTemplate.from_template( template, partial_variables={"format_instructions": output_parser.get_format_instructions()}, ) chain = LLMChain(prompt=prompt, llm=OpenAI()) output = chain.run("around when was bitcoin founded?") output_parser.parse(output) datetime.datetime(2008, 1, 3, 18, 15, 5)

Enum parser

from langchain.output_parsers.enum import EnumOutputParser from enum import Enum class Colors(Enum): RED = "red" GREEN = "green" BLUE = "blue" parser = EnumOutputParser(enum=Colors) parser.parse("red") #<Colors.RED: 'red'> # And new lines parser.parse("blue\n") #<Colors.BLUE: 'blue'> # And raises errors when appropriate parser.parse("yellow") #OutputParserException: Response 'yellow' is not one of the expected values: ['red', 'green', 'blue']

Auto-fixing parser

Pandas DataFrame Parser

Pydantic (JSON) parser

Retry parser

Structured output parser

XML parser

Validator

from langchain.llms import OpenAI from langchain.output_parsers import PydanticOutputParser from langchain.prompts import PromptTemplate from langchain.pydantic_v1 import BaseModel, Field, validator model = OpenAI(model_name="text-davinci-003", temperature=0.0) # Define your desired data structure. class Joke(BaseModel): setup: str = Field(description="question to set up a joke") punchline: str = Field(description="answer to resolve the joke") # You can add custom validation logic easily with Pydantic. @validator("setup") def question_ends_with_question_mark(cls, field): if field[-1] != "?": raise ValueError("Badly formed question!") return field # Set up a parser + inject instructions into the prompt template. parser = PydanticOutputParser(pydantic_object=Joke) prompt = PromptTemplate( template="Answer the user query.\n{format_instructions}\n{query}\n", input_variables=["query"], partial_variables={"format_instructions": parser.get_format_instructions()}, ) # And a query intended to prompt a language model to populate the data structure. prompt_and_model = prompt | model output = prompt_and_model.invoke({"query": "Tell me a joke."}) parser.invoke(output)

Retrieval

Many LLM applications require user-specific data that is not part of the model's training set. The primary way of accomplishing this is through Retrieval Augmented Generation (RAG). In this process, external data is retrieved and then passed to the LLM when doing the generation step.
notion image

Document loaders

Use document loaders to load data from a source as Document's. A Document is a piece of text and associated metadata. For example, there are document loaders for loading a simple .txt file, for loading the text contents of any web page, or even for loading a transcript of a YouTube video.

TextLoader

from langchain.document_loaders import TextLoader loader = TextLoader("./index.md") loader.load()

CSV

from langchain.document_loaders.csv_loader import CSVLoader loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv') loader.load()
#Customizing the CSV parsing and loading loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv', csv_args={ 'delimiter': ',', 'quotechar': '"', 'fieldnames': ['MLB Team', 'Payroll in millions', 'Wins'] }) data = loader.load()
#Specify a column to identify the document source loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv', source_column="Team") data = loader.load()

File Directory

from langchain.document_loaders import DirectoryLoader loader = DirectoryLoader('../', glob="**/*.md") docs = loader.load() len(docs)
Show a progress bar
loader = DirectoryLoader('../', glob="**/*.md", show_progress=True)
Use multithreading
loader = DirectoryLoader('../', glob="**/*.md", use_multithreading=True)
Change loader class
By default this uses the UnstructuredLoader class. However, you can change up the type of loader pretty easily.
from langchain.document_loaders import TextLoader loader = DirectoryLoader('../', glob="**/*.md", loader_cls=TextLoader)
from langchain.document_loaders import PythonLoader loader = DirectoryLoader('../../../../../', glob="**/*.py", loader_cls=PythonLoader)
Auto-detect file encodings with TextLoader
In this example we will see some strategies that can be useful when loading a large list of arbitrary files from a directory using the TextLoader class.
path = '../../../../../tests/integration_tests/examples' loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader)
A. Default Behavior
The file example-non-utf8.txt uses a different encoding, so the load() function fails with a helpful message indicating which file failed decoding.
With the default behavior of TextLoader any failure to load any of the documents will fail the whole loading process and no documents are loaded.
B. Silent fail
We can pass the parameter silent_errors to the DirectoryLoader to skip the files which could not be loaded and continue the load process.
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, silent_errors=True) docs = loader.load()
doc_sources = [doc.metadata['source'] for doc in docs] doc_sources
C. Auto detect encodings
We can also ask TextLoader to auto detect the file encoding before failing, by passing the autodetect_encoding to the loader class.
text_loader_kwargs={'autodetect_encoding': True} loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs) docs = loader.load()
doc_sources = [doc.metadata['source'] for doc in docs] doc_sources

HTML

from langchain.document_loaders import UnstructuredHTMLLoader loader = UnstructuredHTMLLoader("example_data/fake-content.html") data = loader.load()
Loading HTML with BeautifulSoup4
from langchain.document_loaders import BSHTMLLoader loader = BSHTMLLoader("example_data/fake-content.html") data = loader.load()

JSON

Markdown

(readme.md)

PDF

 


Document transformers

Once you've loaded documents, you'll often want to transform them to better suit your application. The simplest example is you may want to split a long document into smaller chunks that can fit into your model's context window. LangChain has a number of built-in document transformers that make it easy to split, combine, filter, and otherwise manipulate documents.

Text splitters

The default recommended text splitter is the RecursiveCharacterTextSplitter.
By default the characters it tries to split on are ["\n\n", "\n", " ", ""]
  • length_function: how the length of chunks is calculated. Defaults to just counting number of characters, but it's pretty common to pass a token counter here.
  • chunk_size: the maximum size of your chunks (as measured by the length function).
  • chunk_overlap: the maximum overlap between chunks. It can be nice to have some overlap to maintain some continuity between chunks (e.g. do a sliding window).
  • add_start_index: whether to include the starting position of each chunk within the original document in the metadata.
# This is a long document we can split up. with open('../../state_of_the_union.txt') as f: state_of_the_union = f.read() from langchain.text_splitter import RecursiveCharacterTextSplitter text_splitter = RecursiveCharacterTextSplitter( # Set a really small chunk size, just to show. chunk_size = 100, chunk_overlap = 20, length_function = len, add_start_index = True, ) texts = text_splitter.create_documents([state_of_the_union]) print(texts[0]) print(texts[1])
Filter redundant docs, translate docs, extract metadata, and more
We can do perform a number of transformations on docs which are not simply splitting the text. With the EmbeddingsRedundantFilter we can identify similar documents and filter out redundancies. With integrations like doctran we can do things like translate documents from one language to another, extract desired properties and add them to metadata, and convert conversational dialogue into a Q/A format set of documents.

HTMLHeaderTextSplitter

`MarkdownHeaderTextSplitter`, the `HTMLHeaderTextSplitter` is a “structure-aware” chunker that splits text at the element level and adds metadata for each header “relevant” to any given chunk. It can return chunks element by element or combine elements with the same metadata, with the objectives of (a) keeping related text grouped (more or less) semantically and (b) preserving context-rich information encoded in document structures. It can be used with other text splitters as part of a chunking pipeline.

Usage examples

1) With an HTML string:

from langchain.text_splitter import HTMLHeaderTextSplitter html_string = """ <!DOCTYPE html> <html> <body> <div> <h1>Foo</h1> <p>Some intro text about Foo.</p> <div> <h2>Bar main section</h2> <p>Some intro text about Bar.</p> <h3>Bar subsection 1</h3> <p>Some text about the first subtopic of Bar.</p> <h3>Bar subsection 2</h3> <p>Some text about the second subtopic of Bar.</p> </div> <div> <h2>Baz</h2> <p>Some text about Baz</p> </div> <br> <p>Some concluding text about Foo</p> </div> </body> </html> """ headers_to_split_on = [ ("h1", "Header 1"), ("h2", "Header 2"), ("h3", "Header 3"), ] html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on) html_header_splits = html_splitter.split_text(html_string) html_header_splits

2) Pipelined to another splitter, with html loaded from a web URL:

from langchain.text_splitter import RecursiveCharacterTextSplitter url = "https://plato.stanford.edu/entries/goedel/" headers_to_split_on = [ ("h1", "Header 1"), ("h2", "Header 2"), ("h3", "Header 3"), ("h4", "Header 4"), ] html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on) # for local file use html_splitter.split_text_from_file(<path_to_file>) html_header_splits = html_splitter.split_text_from_url(url) chunk_size = 500 chunk_overlap = 30 text_splitter = RecursiveCharacterTextSplitter( chunk_size=chunk_size, chunk_overlap=chunk_overlap ) # Split splits = text_splitter.split_documents(html_header_splits) splits[80:85]

Limitations

There can be quite a bit of structural variation from one HTML document to another, and while HTMLHeaderTextSplitter will attempt to attach all “relevant” headers to any given chunk, it can sometimes miss certain headers. For example, the algorithm assumes an informational hierarchy in which headers are always at nodes “above” associated text, i.e. prior siblings, ancestors, and combinations thereof. In the following news article (as of the writing of this document), the document is structured such that the text of the top-level headline, while tagged “h1”, is in a distinct subtree from the text elements that we’d expect it to be “above”—so we can observe that the “h1” element and its associated text do not show up in the chunk metadata (but, where applicable, we do see “h2” and its associated text):
url = "https://www.cnn.com/2023/09/25/weather/el-nino-winter-us-climate/index.html" headers_to_split_on = [ ("h1", "Header 1"), ("h2", "Header 2"), ] html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on) html_header_splits = html_splitter.split_text_from_url(url) print(html_header_splits[1].page_content[:500])

Split by character

# This is a long document we can split up. with open('../../../state_of_the_union.txt') as f: state_of_the_union = f.read() from langchain.text_splitter import CharacterTextSplitter text_splitter = CharacterTextSplitter( separator = "\n\n", chunk_size = 1000, chunk_overlap = 200, length_function = len, is_separator_regex = False, ) texts = text_splitter.create_documents([state_of_the_union]) print(texts[0]) metadatas = [{"document": 1}, {"document": 2}] documents = text_splitter.create_documents([state_of_the_union, state_of_the_union], metadatas=metadatas) print(documents[0]) text_splitter.split_text(state_of_the_union)[0]

Split code

from langchain.text_splitter import ( RecursiveCharacterTextSplitter, Language, ) # Full list of support languages [e.value for e in Language] # You can also see the separators used for a given language RecursiveCharacterTextSplitter.get_separators_for_language(Language.PYTHON) PYTHON_CODE = """ def hello_world(): print("Hello, World!") # Call the function hello_world() """ python_splitter = RecursiveCharacterTextSplitter.from_language( language=Language.PYTHON, chunk_size=50, chunk_overlap=0 ) python_docs = python_splitter.create_documents([PYTHON_CODE]) python_docs
spaCy - spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython.
#!pip install spacy
# This is a long document we can split up. with open("../../../state_of_the_union.txt") as f: state_of_the_union = f.read()
from langchain.text_splitter import SpacyTextSplitter text_splitter = SpacyTextSplitter(chunk_size=1000)
texts = text_splitter.split_text(state_of_the_union) print(texts[0])
Hugging Face tokenizer - We use Hugging Face tokenizer, the GPT2TokenizerFast to count the text length in tokens.
from transformers import GPT2TokenizerFast tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
# This is a long document we can split up. with open("../../../state_of_the_union.txt") as f: state_of_the_union = f.read() from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter.from_huggingface_tokenizer( tokenizer, chunk_size=100, chunk_overlap=0 ) texts = text_splitter.split_text(state_of_the_union)
print(texts[0]

Lost in the middle: The problem with long contexts

No matter the architecture of your model, there is a substantial performance degradation when you include 10+ retrieved documents. In brief: When models must access relevant information in the middle of long contexts, they tend to ignore the provided documents.
To avoid this issue you can re-order documents after retrieval to avoid performance degradation.
from langchain.chains import LLMChain, StuffDocumentsChain from langchain.document_transformers import ( LongContextReorder, ) from langchain.embeddings import HuggingFaceEmbeddings from langchain.prompts import PromptTemplate from langchain.vectorstores import Chroma from langchain.llms import GooglePalm import os from dotenv import load_dotenv load_dotenv() # GOOGLE_API_KEY='AIzaSyAmMtufl_TPyWdLbuCFD5Lh8IEpJWrEXYo' llm = GooglePalm(google_api_key=os.environ["GOOGLE_API_KEY"]) # Get embeddings. embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2") texts = [ "Basquetball is a great sport.", "Fly me to the moon is one of my favourite songs.", "The Celtics are my favourite team.", "This is a document about the Boston Celtics", "I simply love going to the movies", "The Boston Celtics won the game by 20 points", "This is just a random text.", "Elden Ring is one of the best games in the last 15 years.", "L. Kornet is one of the best Celtics players.", "Larry Bird was an iconic NBA player.", ] # Create a retriever retriever = Chroma.from_texts(texts, embedding=embeddings).as_retriever( search_kwargs={"k": 10} ) query = "What can you tell me about the Celtics?" # Get relevant documents ordered by relevance score docs = retriever.get_relevant_documents(query docs
# Reorder the documents: # Less relevant document will be at the middle of the list and more # relevant elements at beginning / end. reordering = LongContextReorder() reordered_docs = reordering.transform_documents(docs) # Confirm that the 4 relevant documents are at beginning and end. reordered_docs
# We prepare and run a custom Stuff chain with reordered docs as context. # Override prompts document_prompt = PromptTemplate( input_variables=["page_content"], template="{page_content}" ) document_variable_name = "context" stuff_prompt_override = """Given this text extracts: ----- {context} ----- Please answer the following question: {query}""" prompt = PromptTemplate( template=stuff_prompt_override, input_variables=["context", "query"] ) # Instantiate the chain llm_chain = LLMChain(llm=llm, prompt=prompt) chain = StuffDocumentsChain( llm_chain=llm_chain, document_prompt=document_prompt, document_variable_name=document_variable_name, ) chain.run(input_documents=reordered_docs, query=query)

Text embedding models

from langchain.embeddings import HuggingFaceEmbeddings embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2") embeddings = embeddings_model.embed_documents( [ "Hi there!", "Oh, hello!", "What's your name?", "My friends call me World", "Hello World!" ] ) len(embeddings), len(embeddings[0]) #(5, 1536) embedded_query = embeddings_model.embed_query("What was the name mentioned in the conversation?") embedded_query[:5]

CacheBackedEmbeddings

Embeddings can be stored or temporarily cached to avoid needing to recompute them.
Caching embeddings can be done using a CacheBackedEmbeddings. The cache backed embedder is a wrapper around an embedder that caches embeddings in a key-value store. The text is hashed and the hash is used as the key in the cache.
The main supported way to initialized a CacheBackedEmbeddings is from_bytes_store. This takes in the following parameters:
  • underlying_embedder: The embedder to use for embedding.
  • document_embedding_cache: Any ByteStore for caching document embeddings.
  • namespace: (optional, defaults to "") The namespace to use for document cache. This namespace is used to avoid collisions with other caches. For example, set it to the name of the embedding model used.
Attention: Be sure to set the namespace parameter to avoid collisions of the same text embedded using different embeddings models.
from langchain.embeddings import CacheBackedEmbeddings
Using with a Vector Store
!pip install openai faiss-cpu from langchain.document_loaders import TextLoader from langchain.embeddings.openai import OpenAIEmbeddings from langchain.storage import LocalFileStore from langchain.text_splitter import CharacterTextSplitter from langchain.vectorstores import FAISS underlying_embeddings = OpenAIEmbeddings() store = LocalFileStore("./cache/") cached_embedder = CacheBackedEmbeddings.from_bytes_store( underlying_embeddings, store, namespace=underlying_embeddings.model ) list(store.yield_keys()) raw_documents = TextLoader("../../state_of_the_union.txt").load() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) documents = text_splitter.split_documents(raw_documents) db = FAISS.from_documents(documents, cached_embedder) db2 = FAISS.from_documents(documents, cached_embedder) list(store.yield_keys())[:5]
Swapping the ByteStore
In order to use a different ByteStore, just use it when creating your CacheBackedEmbeddings. Below, we create an equivalent cached embeddings object, except using the non-persistent InMemoryByteStore instead:
from langchain.embeddings import CacheBackedEmbeddings from langchain.storage import InMemoryByteStore store = InMemoryByteStore() cached_embedder = CacheBackedEmbeddings.from_bytes_store( underlying_embeddings, store, namespace=underlying_embeddings.model )

Vector stores

notion image
from langchain.document_loaders import TextLoader from langchain.embeddings.openai import OpenAIEmbeddings from langchain.text_splitter import CharacterTextSplitter from langchain.vectorstores import Chroma # Load the document, split it into chunks, embed each chunk and load it into the vector store. raw_documents = TextLoader('../../../state_of_the_union.txt').load() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) documents = text_splitter.split_documents(raw_documents) db = Chroma.from_documents(documents, OpenAIEmbeddings())

Similarity search

query = "What did the president say about Ketanji Brown Jackson" docs = db.similarity_search(query) print(docs[0].page_content)

Similarity search by vector

It is also possible to do a search for documents similar to a given embedding vector using similarity_search_by_vector which accepts an embedding vector as a parameter instead of a string.
embedding_vector = OpenAIEmbeddings().embed_query(query) docs = db.similarity_search_by_vector(embedding_vector) print(docs[0].page_content)
The query is the same, and so the result is also the same.

Asynchronous operations

Vector stores are usually run as a separate service that requires some IO operations, and therefore they might be called asynchronously. That gives performance benefits as you don't waste time waiting for responses from external services. That might also be important if you work with an asynchronous framework, such as FastAPI.
LangChain supports async operation on vector stores. All the methods might be called using their async counterparts, with the prefix a, meaning async.
Qdrant is a vector store, which supports all the async operations, thus it will be used in this walkthrough.
pip install qdrant-client
from langchain.vectorstores import Qdrant

Create a vector store asynchronously

db = await Qdrant.afrom_documents(documents, embeddings, "http://localhost:6333")

Similarity search

query = "What did the president say about Ketanji Brown Jackson" docs = await db.asimilarity_search(query) print(docs[0].page_content)

Similarity search by vector

embedding_vector = embeddings.embed_query(query) docs = await db.asimilarity_search_by_vector(embedding_vector)

Maximum marginal relevance search (MMR)

Maximal marginal relevance optimizes for similarity to query and diversity among selected documents. It is also supported in async API.
query = "What did the president say about Ketanji Brown Jackson" found_docs = await qdrant.amax_marginal_relevance_search(query, k=2, fetch_k=10) for i, doc in enumerate(found_docs): print(f"{i + 1}.", doc.page_content, "\n")

Retrievers

A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.
Retrievers implement the Runnable interface, the basic building block of the LangChain Expression Language (LCEL). This means they support invokeainvokestreamastreambatchabatchastream_log calls.
Retrievers accept a string query as input and return a list of Document’s as output.

MultiQueryRetriever

Contextual compression

notion image

Ensemble Retriever

MultiVector Retriever

Parent Document Retriever

When splitting documents for retrieval, there are often conflicting desires:
  1. You may want to have small documents, so that their embeddings can most accurately reflect their meaning. If too long, then the embeddings can lose meaning.
  1. You want to have long enough documents that the context of each chunk is retained.
The ParentDocumentRetriever strikes that balance by splitting and storing small chunks of data. During retrieval, it first fetches the small chunks but then looks up the parent ids for those chunks and returns those larger documents.

Self-querying

A self-querying retriever is one that, as the name suggests, has the ability to query itself. Specifically, given any natural language query, the retriever uses a query-constructing LLM chain to write a structured query and then applies that structured query to its underlying VectorStore. This allows the retriever to not only use the user-input query for semantic similarity comparison with the contents of stored documents but to also extract filters from the user query on the metadata of stored documents and to execute those filters.

Time-weighted vector store retriever









Agents

Agent

This is the chain responsible for deciding what step to take next. This is powered by a language model and a prompt. The inputs to this chain are:
  1. Tools: Descriptions of available tools
  1. User input: The high level objective
  1. Intermediate steps: Any (action, tool output) pairs previously executed in order to achieve the user input
The output is the next action(s) to take or the final response to send to the user (AgentActions or AgentFinish). An action specifies a tool and the input to that tool.

Tools

Tools are functions that an agent can invoke. There are two important design considerations around tools:
  1. Giving the agent access to the right tools
  1. Describing the tools in a way that is most helpful to the agent

Toolkits

For many common tasks, an agent will need a set of related tools. For this LangChain provides the concept of toolkits - groups of around 3-5 tools needed to accomplish specific objectives. For example, the GitHub toolkit has a tool for searching through GitHub issues, a tool for reading a file, a tool for commenting, etc.

AgentExecutor

The agent executor is the runtime for an agent. This is what actually calls the agent, executes the actions it chooses, passes the action outputs back to the agent, and repeats. In pseudocode, this looks roughly like:
next_action = agent.get_action(...) while next_action != AgentFinish: observation = run(next_action) next_action = agent.get_action(..., next_action, observation) return next_action

Other types of agent runtimes

The AgentExecutor class is the main agent runtime supported by LangChain. However, there are other, more experimental runtimes we also support. These include:
Some important terminology (and schema) to know:
  1. AgentAction: This is a dataclass that represents the action an agent should take. It has a tool property (which is the name of the tool that should be invoked) and a tool_input property (the input to that tool)
  1. AgentFinish: This is a dataclass that signifies that the agent has finished and should return to the user. It has a return_values parameter, which is a dictionary to return. It often only has one key - output - that is a string, and so often it is just this key that is returned.
  1. intermediate_steps: These represent previous agent actions and corresponding outputs that are passed around. These are important to pass to future iteration so the agent knows what work it has already done. This is typed as a List[Tuple[AgentAction, Any]]. Note that observation is currently left as type Any to be maximally flexible. In practice, this is often a string.

Define the agent

First, let’s load the language model we’re going to use to control the agent.
from langchain.chat_models import ChatOpenAI llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
We can see that it struggles to count the letters in the string “educa”.
llm.invoke("how many letters in the word educa?")
AIMessage(content='There are 6 letters in the word "educa".')
Next, let’s define some tools to use. Let’s write a really simple Python function to calculate the length of a word that is passed in.
from langchain.agents import tool @tool def get_word_length(word: str) -> int: """Returns the length of a word.""" return len(word) tools = [get_word_length]
Now let us create the prompt. Because OpenAI Function Calling is finetuned for tool usage, we hardly need any instructions on how to reason, or how to output format. We will just have two input variables: input and agent_scratchpadinput should be a string containing the user objective. agent_scratchpad should be a sequence of messages that contains the previous agent tool invocations and the corresponding tool outputs.
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder prompt = ChatPromptTemplate.from_messages( [ ( "system", "You are very powerful assistant, but bad at calculating lengths of words.", ), ("user", "{input}"), MessagesPlaceholder(variable_name="agent_scratchpad"), ] )
How does the agent know what tools it can use? In this case we’re relying on OpenAI function calling LLMs, which take functions as a separate argument and have been specifically trained to know when to invoke those functions.
To pass in our tools to the agent, we just need to format them to the OpenAI function format and pass them to our model. (By bind-ing the functions, we’re making sure that they’re passed in each time the model is invoked.)
from langchain.tools.render import format_tool_to_openai_function llm_with_tools = llm.bind(functions=[format_tool_to_openai_function(t) for t in tools])
Putting those pieces together, we can now create the agent. We will import two last utility functions: a component for formatting intermediate steps (agent action, tool output pairs) to input messages that can be sent to the model, and a component for converting the output message into an agent action/agent finish.
from langchain.agents.format_scratchpad import format_to_openai_function_messages from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser agent = ( { "input": lambda x: x["input"], "agent_scratchpad": lambda x: format_to_openai_function_messages( x["intermediate_steps"] ), } | prompt | llm_with_tools | OpenAIFunctionsAgentOutputParser() )
Now that we have our agent, let’s play around with it! Let’s pass in a simple question and empty intermediate steps and see what it returns:
agent.invoke({"input": "how many letters in the word educa?", "intermediate_steps": []})
AgentActionMessageLog(tool='get_word_length', tool_input={'word': 'educa'}, log="\nInvoking: `get_word_length` with `{'word': 'educa'}`\n\n\n", message_log=[AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\n "word": "educa"\n}', 'name': 'get_word_length'}})])
We can see that it responds with an AgentAction to take (it’s actually an AgentActionMessageLog - a subclass of AgentAction which also tracks the full message log).
If we’ve set up LangSmith, we’ll see a trace that let’s us inspect the input and output to each step in the sequence: https://smith.langchain.com/public/04110122-01a8-413c-8cd0-b4df6eefa4b7/r

Define the runtime

So this is just the first step - now we need to write a runtime for this. The simplest one is just one that continuously loops, calling the agent, then taking the action, and repeating until an AgentFinish is returned. Let’s code that up below:
from langchain_core.agents import AgentFinish user_input = "how many letters in the word educa?" intermediate_steps = [] while True: output = agent.invoke( { "input": user_input, "intermediate_steps": intermediate_steps, } ) if isinstance(output, AgentFinish): final_result = output.return_values["output"] break else: print(f"TOOL NAME: {output.tool}") print(f"TOOL INPUT: {output.tool_input}") tool = {"get_word_length": get_word_length}[output.tool] observation = tool.run(output.tool_input) intermediate_steps.append((output, observation)) print(final_result)
TOOL NAME: get_word_length TOOL INPUT: {'word': 'educa'} There are 5 letters in the word "educa".
Woo! It’s working.

Using AgentExecutor

To simplify this a bit, we can import and use the AgentExecutor class. This bundles up all of the above and adds in error handling, early stopping, tracing, and other quality-of-life improvements that reduce safeguards you need to write.
from langchain.agents import AgentExecutor agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
Now let’s test it out!
agent_executor.invoke({"input": "how many letters in the word educa?"})
> Entering new AgentExecutor chain... Invoking: `get_word_length` with `{'word': 'educa'}` 5There are 5 letters in the word "educa". > Finished chain.
{'input': 'how many letters in the word educa?', 'output': 'There are 5 letters in the word "educa".'}
And looking at the trace, we can see that all of our agent calls and tool invocations are automatically logged: https://smith.langchain.com/public/957b7e26-bef8-4b5b-9ca3-4b4f1c96d501/r

Adding memory

This is great - we have an agent! However, this agent is stateless - it doesn’t remember anything about previous interactions. This means you can’t ask follow up questions easily. Let’s fix that by adding in memory.
In order to do this, we need to do two things:
  1. Add a place for memory variables to go in the prompt
  1. Keep track of the chat history
First, let’s add a place for memory in the prompt. We do this by adding a placeholder for messages with the key "chat_history". Notice that we put this ABOVE the new user input (to follow the conversation flow).
from langchain.prompts import MessagesPlaceholder MEMORY_KEY = "chat_history" prompt = ChatPromptTemplate.from_messages( [ ( "system", "You are very powerful assistant, but bad at calculating lengths of words.", ), MessagesPlaceholder(variable_name=MEMORY_KEY), ("user", "{input}"), MessagesPlaceholder(variable_name="agent_scratchpad"), ] )
We can then set up a list to track the chat history
from langchain_core.messages import AIMessage, HumanMessage chat_history = []
We can then put it all together!
agent = ( { "input": lambda x: x["input"], "agent_scratchpad": lambda x: format_to_openai_function_messages( x["intermediate_steps"] ), "chat_history": lambda x: x["chat_history"], } | prompt | llm_with_tools | OpenAIFunctionsAgentOutputParser() ) agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
When running, we now need to track the inputs and outputs as chat history
input1 = "how many letters in the word educa?" result = agent_executor.invoke({"input": input1, "chat_history": chat_history}) chat_history.extend( [ HumanMessage(content=input1), AIMessage(content=result["output"]), ] ) agent_executor.invoke({"input": "is that a real word?", "chat_history": chat_history})
> Entering new AgentExecutor chain... Invoking: `get_word_length` with `{'word': 'educa'}` 5There are 5 letters in the word "educa". > Finished chain. > Entering new AgentExecutor chain... No, "educa" is not a real word in English. > Finished chain.
{'input': 'is that a real word?', 'chat_history': [HumanMessage(content='how many letters in the word educa?'), AIMessage(content='There are 5 letters in the word "educa".')], 'output': 'No, "educa" is not a real word in English.'}