


Using embedded DuckDB with persistence: data will be stored in: db Loaded 1 new documents from source_documents Run the following command to ingest all the data. Put any and all your files into the source_documents directory Instructions for ingesting your own dataset This repo uses a state of the union transcript as an example. Note: because of the way langchain loads the SentenceTransformers embeddings, the first time you run the script it will require internet connection to download the embeddings model itself. TARGET_SOURCE_CHUNKS: The amount of chunks (sources) that will be used to answer a question Optimal value differs a lot depending on the model (8 works well for GPT4All, and 1024 is better for LlamaCpp)ĮMBEDDINGS_MODEL_NAME: SentenceTransformers embeddings model name (see ) MODEL_N_BATCH: Number of tokens in the prompt that are fed into the model at a time. MODEL_N_CTX: Maximum token limit for the LLM model MODEL_PATH: Path to your GPT4All or LlamaCpp supported LLM PERSIST_DIRECTORY: is the folder you want your vectorstore in
