Anyone who has pawed at LlamaIndex would have bumped into data loaders for sure! One such loader YoutubeTranscriptReader
from llamahub with youtube_transcript_api
as it dependency provides an easy interface to fetch the text transcript of Youtube videos on which we can query creating index.
from llama_hub.youtube_transcript import YoutubeTranscriptReader
loader = YoutubeTranscriptReader()
documents = loader.load_data(ytlinks=['https://www.youtube.com/watch?v=bSHp7WVpPgc'])
Sounds simple right? Then you decide to create a VectorStoreIndex
from the documents
created above.
But if you don’t have a OpenAI
KEY
set you would see something like:
ValueError: No API key found for OpenAI.
Please set either the OPENAI_API_KEY environment
variable or openai.api_key prior to initialization.....
Ah, that’s expected as llamaindex
defaults to OpenAI
, then you would probably be like, let me use other LLMs
say vertex
:
from llama_index.llms.vertex import Vertex
llm = Vertex(model="text-bison", temperature=0, additional_kwargs={})
So, we create a new ServiceContext
as pass it to VectorStoreIndex.from_documents
:
from llama_index.llms.vertex import Vertex
from llama_index import ServiceContext
llm = Vertex(model="text-bison", temperature=0, additional_kwargs={})
service_context = ServiceContext.from_defaults(
llm=llm,
chunk_size=800,
chunk_overlap=20)
It should work fine, right? Nope, we notice:
ValueError: No API key found for OpenAI.
Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization.
API keys can be found or created at https://platform.openai.com/account/api-keys
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/llama_index/embeddings/utils.py](https://localhost:8080/#) in resolve_embed_model(embed_model)
48 validate_openai_api_key(embed_model.api_key)
49 except ValueError as e:
---> 50 raise ValueError(
51 "\n******\n"
52 "Could not load OpenAI embedding model. "
That’s because we aren’t setting an embed model, and its defaulting to openai
we should configure another embed_model
in the service context or provide an openai
key!
So, here we go:
service_context = ServiceContext.from_defaults(
llm=llm,
chunk_size=800,
chunk_overlap=20,
embed_model="local:BAAI/bge-base-en-v1.5")
index = VectorStoreIndex.from_documents(documents,service_context=service_context)
Voila it works! Now we can query the document:
engine = index.as_query_engine()
response = engine.query("Summarize the video")
print(f"{response}")
Would output:
Hemanth HM, a Google developer expert,
shares his journey into coding and his passion for open-source software.
He explains the concept of open-source using the example of traditional
Indian recipes and emphasizes the importance of free and open code.
Hemanth also discusses how he manages to balance his full-time
work with his contributions to open-source projects,
comparing it to a daily routine like taking a shower.
For those aspiring to become Google developer experts,
Hemanth suggests actively contributing to the community,
experimenting with cutting-edge technologies,
and sharing knowledge through blogs, articles, tweets, and GIFs.
embed_model
can be any Huggingface embed model! The crux here is to use embed_model
along with llm
of your choice in the ServiceContext
. Hope this was useful, happy hacking!