Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing configuration of the ingestion and retrieval pipelines via a yaml file #3214

Closed
jacopo-chevallard opened this issue Sep 17, 2024 — with Linear · 1 comment
Assignees
Labels
area: backend Related to backend functionality or under the /backend directory

Comments

Copy link
Collaborator

We want to enable users to configure the ingestion and retrieval pipelines via a yaml file. Users will be able to configure the actual steps to be included in the pipelines, as well as the configuration of each step.

Below is an example of a yaml configuration file we are testing

ingestion_config:

 parser_config:

megaparse_config:

strategy: "fast"

pdf_parser: "unstructured"

splitter_config:

chunk_size: 400

chunk_overlap: 100


retrieval_config:

workflow_config:

name: "standard RAG"

nodes:

 - name: "filter_history"

edges: ["rewrite"]


 - name: "rewrite"

edges: ["retrieve"]


 - name: "retrieve"

edges: ["generate"]


 - name: "generate"

edges: ["END"]

# Maximum number of previous conversation iterations

# to include in the context of the answer

max_history: 10


prompt: "my prompt"


max_files: 20

reranker_config:

# The reranker supplier to use

supplier: "cohere"


# The model to use for the reranker for the given supplier

model: "rerank-multilingual-v3.0"


# Number of chunks returned by the reranker

top_n: 5

llm_config:

# The LLM supplier to use

supplier: "openai"


# The model to use for the LLM for the given supplier

model: "gpt-3.5-turbo-0125"


max_input_tokens: 2000


# Maximum number of tokens to pass to the LLM

# as a context to generate the answer

max_output_tokens: 2000


temperature: 0.7

streaming: true
@dosubot dosubot bot added the area: backend Related to backend functionality or under the /backend directory label Sep 17, 2024
@jacopo-chevallard jacopo-chevallard self-assigned this Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: backend Related to backend functionality or under the /backend directory
1 participant