Documentation/Progress Notes: Difference between revisions
Serdar.sanri (talk | contribs) Created page with "<!-- Wiki format: MediaWiki (Wikipedia-style). Headers: = = and == ==; bold: ''' '''; italic: '' ''; code: <code> </code>; tables: {| |- |}; lists: * #. For Confluence: replace = with h1., == with h2., === with h3.; ''' with *; <code> with {{ }} or {code}. --> = AI Attachments and RAG: End-to-End Guide = This document explains how the BioInsights AI uses attached documents (personality files, progress note templates, and patient documents) when generating progress notes..." |
Serdar.sanri (talk | contribs) |
||
| (5 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
< | <span id="ai-attachments-and-rag-end-to-end-guide"></span> | ||
= AI Attachments and RAG: End-to-End Guide = | = AI Attachments and RAG: End-to-End Guide = | ||
This document explains how the BioInsights AI uses attached documents (personality files, progress note templates, and patient documents) when generating progress notes. It is intended for '''teams''' (product, engineering, support) and '''clients''' who need a clear picture of how the system works, what the | This document explains how the BioInsights AI uses attached documents (personality files, progress note templates, and patient documents) when generating progress notes. It is intended for '''teams''' (product, engineering, support) and '''clients''' who need a clear picture of how the system works, what the “Full content” vs “RAG” toggle means, and how retrieval is performed. | ||
----- | |||
<span id="purpose-and-scope"></span> | |||
== 1. Purpose and scope == | == 1. Purpose and scope == | ||
| Line 12: | Line 14: | ||
* '''Personality documents''' – Guidelines, tone, and instructions attached to an AI personality. | * '''Personality documents''' – Guidelines, tone, and instructions attached to an AI personality. | ||
* '''Template documents''' – Instructions and structure attached to a progress note template. | * '''Template documents''' – Instructions and structure attached to a progress note template. | ||
* '''Patient documents''' – Files attached to the current encounter (e.g. | * '''Patient documents''' – Files attached to the current encounter (e.g. lab results, referrals). | ||
The AI does '''not''' receive raw file binaries. Instead, it receives '''text''' that comes from those documents in one of two ways: | The AI does '''not''' receive raw file binaries. Instead, it receives '''text''' that comes from those documents in one of two ways: | ||
# '''Full content''' – The entire document text is fetched and placed in the | # '''Full content''' – The entire document text is fetched and placed in the AI’s context. | ||
# '''RAG (Retrieval-Augmented Generation)''' – Only the most relevant parts of the document (chunks) are retrieved using semantic search and then added to the context. | # '''RAG (Retrieval-Augmented Generation)''' – Only the most relevant parts of the document (chunks) are retrieved using semantic search and then added to the context. | ||
This guide describes both modes, how documents are prepared (indexing, chunking), how retrieval works (including multi-query and the configurable chunk limit), and the end-to-end flow from setup to AI response. | This guide describes both modes, how documents are prepared (indexing, chunking), how retrieval works (including multi-query and the configurable chunk limit), and the end-to-end flow from setup to AI response. | ||
----- | |||
<span id="high-level-overview"></span> | |||
== 2. High-level overview == | == 2. High-level overview == | ||
<pre> | <pre>┌─────────────────────────────────────────────────────────────────────────────┐ | ||
│ Admin configures personality / template with attached files │ | |||
│ → Each file can be "Full content" or "RAG only" │ | |||
└─────────────────────────────────────────────────────────────────────────────┘ | |||
│ | |||
▼ | |||
┌─────────────────────────────────────────────────────────────────────────────┐ | |||
│ Documents are stored and indexed │ | |||
│ → Full content: read at request time via Document Extractor │ | |||
│ → RAG: split into chunks, embedded, stored in Solr vector store │ | |||
└─────────────────────────────────────────────────────────────────────────────┘ | |||
│ | |||
▼ | |||
┌─────────────────────────────────────────────────────────────────────────────┐ | |||
│ User runs AI progress note (with conversation + optional patient files) │ | |||
└─────────────────────────────────────────────────────────────────────────────┘ | |||
│ | |||
▼ | |||
┌─────────────────────────────────────────────────────────────────────────────┐ | |||
│ System builds AI context: │ | |||
│ • Full-content files → full text injected into system prompt │ | |||
│ • RAG files → semantic search over conversation messages → top chunks │ | |||
│ • Patient docs → same RAG retrieval (vector search by patient + file IDs) │ | |||
└─────────────────────────────────────────────────────────────────────────────┘ | |||
│ | |||
▼ | |||
┌─────────────────────────────────────────────────────────────────────────────┐ | |||
│ OpenAI API is called with augmented prompt (no file_search tool) │ | |||
│ → AI generates the note using only the provided context │ | |||
└─────────────────────────────────────────────────────────────────────────────┘</pre> | |||
</pre> | |||
---- | ----- | ||
== 3. Attachment modes: | <span id="attachment-modes-full-content-vs-rag"></span> | ||
== 3. Attachment modes: “Full content” vs “RAG” == | |||
When you attach a file to a '''personality''' or a '''progress note template''', you can choose how that file is used: | When you attach a file to a '''personality''' or a '''progress note template''', you can choose how that file is used: | ||
{| | {| class="wikitable" | ||
| | !width="6%"| Mode | ||
!width="48%"| What it means | |||
!width="44%"| When to use it | |||
| | |||
|- | |- | ||
| '''Full content''' | | '''Full content''' | ||
| The '''entire''' document text is loaded and added to the | | The '''entire''' document text is loaded and added to the AI’s system prompt. | ||
| Short, critical docs (e.g. | | Short, critical docs (e.g. short guidelines, required structure) where nothing should be missed. | ||
|- | |- | ||
| '''RAG only''' | | '''RAG only''' | ||
| The document is '''not''' sent in full. Only '''relevant chunks''' are retrieved using the current conversation and injected as context. | | The document is '''not''' sent in full. Only '''relevant chunks''' are retrieved using the current conversation and injected as context. | ||
| Longer docs (e.g. | | Longer docs (e.g. long manuals, large templates) where you want the AI to focus on the parts that match the conversation. | ||
|} | |} | ||
* '''Patient documents''' (files attached to the encounter) are always retrieved via '''RAG''' (vector search); there is no | * '''Patient documents''' (files attached to the encounter) are always retrieved via '''RAG''' (vector search); there is no “full content” option for them. | ||
* If you do '''not''' set the toggle (e.g. | * If you do '''not''' set the toggle (e.g. older templates/personalities with only a list of file IDs), the system treats all non-patient attachments as '''full content''' for backward compatibility. | ||
---- | ----- | ||
<span id="how-documents-are-prepared-for-the-ai"></span> | |||
== 4. How documents are prepared for the AI == | == 4. How documents are prepared for the AI == | ||
<span id="storing-and-indexing"></span> | |||
=== 4.1 Storing and indexing === | === 4.1 Storing and indexing === | ||
* '''Storage''': Files are stored in the | * '''Storage''': Files are stored in the application’s file storage (e.g. S3 or local drive) and linked to the personality or template (or to the patient/encounter for patient documents). | ||
* '''Vector store (for RAG)''': | * '''Vector store (for RAG)''': | ||
** Documents that can be used for RAG are '''chunked''' (split into overlapping segments of roughly 1,500 characters). | ** Documents that can be used for RAG are '''chunked''' (split into overlapping segments of roughly 1,500 characters). | ||
| Line 94: | Line 99: | ||
** When the user runs the AI, the system runs '''semantic search''' over these chunks using the conversation as the query (see below). | ** When the user runs the AI, the system runs '''semantic search''' over these chunks using the conversation as the query (see below). | ||
<span id="full-content-path"></span> | |||
=== 4.2 Full-content path === | === 4.2 Full-content path === | ||
* For files marked ''' | * For files marked '''“Full content”''', the system does '''not''' use the vector store at request time. | ||
* It uses the '''Document Extractor''' to read the file (e.g. | * It uses the '''Document Extractor''' to read the file (e.g. PDF, DOCX) and get the full text. | ||
* That full text is then injected into the system prompt so the model sees the whole document. | * That full text is then injected into the system prompt so the model sees the whole document. | ||
<span id="rag-path"></span> | |||
=== 4.3 RAG path === | === 4.3 RAG path === | ||
* For files marked ''' | * For files marked '''“RAG only”''' (and for patient documents), the system uses '''only''' the vector store. | ||
* It does '''not''' send the full document. It runs a '''multi-query retrieval''' (see next section), then injects only the retrieved chunks into the prompt, up to a character budget. | * It does '''not''' send the full document. It runs a '''multi-query retrieval''' (see next section), then injects only the retrieved chunks into the prompt, up to a character budget. | ||
----- | |||
<span id="how-rag-retrieval-works"></span> | |||
== 5. How RAG retrieval works == | == 5. How RAG retrieval works == | ||
Previously, retrieval used '''only the last''' user/developer/assistant message and a '''fixed''' number of chunks (e.g. | Previously, retrieval used '''only the last''' user/developer/assistant message and a '''fixed''' number of chunks (e.g. 20). The current behavior is: | ||
<span id="multi-query-retrieval"></span> | |||
=== 5.1 Multi-query retrieval === | === 5.1 Multi-query retrieval === | ||
| Line 121: | Line 131: | ||
** Results are collected and then '''merged'''. | ** Results are collected and then '''merged'''. | ||
<span id="deduplication-and-limit"></span> | |||
=== 5.2 Deduplication and limit === | === 5.2 Deduplication and limit === | ||
* Chunks are identified by a key (e.g. <code>fileId:chunkIndex</code>). If the same chunk appears in results for multiple messages, it is '''deduplicated''' (one entry per chunk, keeping the best score). | * Chunks are identified by a key (e.g. <code>fileId:chunkIndex</code>). If the same chunk appears in results for multiple messages, it is '''deduplicated''' (one entry per chunk, keeping the best score). | ||
* After merging and sorting by score, the system keeps at most '''N''' chunks for non-patient docs and '''N''' for patient docs, where '''N''' is the '''configurable RAG chunk limit''' (see Configuration below). | * After merging and sorting by score, the system keeps at most '''N''' chunks for non-patient docs and '''N''' for patient docs, where '''N''' is the '''configurable RAG chunk limit''' (see Configuration below). | ||
* So: | * So: “each message” improves recall (earlier context can pull in relevant chunks); the limit and deduplication keep context size and cost under control. | ||
<span id="where-the-chunk-limit-came-from"></span> | |||
=== 5.3 Where the chunk limit came from === | === 5.3 Where the chunk limit came from === | ||
* The previous hardcoded value (e.g. | * The previous hardcoded value (e.g. 20) was an arbitrary default, not derived from a formal requirement. | ||
* The design intention was always to make this '''configurable''' via environment (see <code>SOLR_RAG_CHUNK_LIMIT</code> in <code>env.ts</code>). The code now uses that setting everywhere instead of a fixed 20. | * The design intention was always to make this '''configurable''' via environment (see <code>SOLR_RAG_CHUNK_LIMIT</code> in <code>env.ts</code>). The code now uses that setting everywhere instead of a fixed 20. | ||
* Default in config is '''50'''; the application caps it between 1 and 500 for safety. | * Default in config is '''50'''; the application caps it between 1 and 500 for safety. | ||
<span id="context-budget"></span> | |||
=== 5.4 Context budget === | === 5.4 Context budget === | ||
* Even if many chunks are retrieved, the total '''character count''' of the RAG context sent to the model is capped (e.g. <code>RAG_CONTEXT_MAX_CHARS</code> or a model-specific override). Chunks are added in score order until the budget is reached; the rest are dropped and a warning is logged. | * Even if many chunks are retrieved, the total '''character count''' of the RAG context sent to the model is capped (e.g. <code>RAG_CONTEXT_MAX_CHARS</code> or a model-specific override). Chunks are added in score order until the budget is reached; the rest are dropped and a warning is logged. | ||
----- | |||
<span id="end-to-end-flow-step-by-step"></span> | |||
== 6. End-to-end flow (step by step) == | == 6. End-to-end flow (step by step) == | ||
# '''Setup (personality / template)''' | # '''Setup (personality / template)''' | ||
#* Admin attaches files and, for each file, chooses '''Full content''' or '''RAG only'''. | #* Admin attaches files and, for each file, chooses '''Full content''' or '''RAG only'''. | ||
#* Data is saved (e.g. <code>file_ids</code> and <code>file_ids_full_content</code>). | #* Data is saved (e.g. <code>file_ids</code> and <code>file_ids_full_content</code>). | ||
#* Files are stored; RAG-only (and patient) documents are chunked and indexed in Solr when the indexing pipeline runs. | #* Files are stored; RAG-only (and patient) documents are chunked and indexed in Solr when the indexing pipeline runs. | ||
# '''User starts progress note''' | # '''User starts progress note''' | ||
| Line 149: | Line 164: | ||
#* Those are also chunked and indexed in Solr (by patient and file). | #* Those are also chunked and indexed in Solr (by patient and file). | ||
# '''User runs AI''' | # '''User runs AI''' | ||
#* The frontend sends the conversation (messages) and options (e.g. | #* The frontend sends the conversation (messages) and options (e.g. template ID, personality ID, patient ID, attached file IDs). | ||
#* Backend resolves which files are personality/template and which are patient, and which of the former are full-content vs RAG-only. | #* Backend resolves which files are personality/template and which are patient, and which of the former are full-content vs RAG-only. | ||
# '''Building context''' | # '''Building context''' | ||
| Line 155: | Line 170: | ||
#* '''RAG-only (non-patient) files''' and '''patient files''': | #* '''RAG-only (non-patient) files''' and '''patient files''': | ||
#** If there is already full-document context from step above, the system runs multi-query RAG for '''RAG-only non-patient''' files and appends those chunks to the system message. | #** If there is already full-document context from step above, the system runs multi-query RAG for '''RAG-only non-patient''' files and appends those chunks to the system message. | ||
#** If there is '''no''' full-document context (e.g. | #** If there is '''no''' full-document context (e.g. all attachments are RAG-only or only patient docs), the system uses the '''fallback''' path: multi-query RAG for both non-patient and patient files, then builds a single RAG context (chunks + file list + instructions) and injects it into the system prompt. | ||
# '''API call''' | # '''API call''' | ||
#* The AI provider (e.g. | #* The AI provider (e.g. OpenAI) is called with the '''augmented''' system prompt and the conversation. | ||
#* No | #* No “file_search” or similar tool is used; all document content is in the prompt. | ||
# '''Response''' | # '''Response''' | ||
#* The model generates the note using only the provided context. The UI shows the note and, where applicable, | #* The model generates the note using only the provided context. The UI shows the note and, where applicable, “Documents referenced” so the user knows which files were used. | ||
----- | |||
<span id="configuration-for-operations-team"></span> | |||
== 7. Configuration (for operations / team) == | == 7. Configuration (for operations / team) == | ||
Relevant environment variables: | Relevant environment variables: | ||
{| | {| class="wikitable" | ||
| | !width="13%"| Variable | ||
!width="43%"| Purpose | |||
!width="42%"| Default / notes | |||
| | |||
|- | |- | ||
| <code>SOLR_RAG_CHUNK_LIMIT</code> | | <code>SOLR_RAG_CHUNK_LIMIT</code> | ||
| Line 183: | Line 199: | ||
|} | |} | ||
Other Solr/vector and embedding settings (e.g. | Other Solr/vector and embedding settings (e.g. core name, dimensions) are documented in <code>env.ts</code> and in the Solr/vector store docs referenced below. | ||
----- | |||
<span id="summary-for-clients"></span> | |||
== 8. Summary for clients == | == 8. Summary for clients == | ||
* '''Two ways to use a file''': | * '''Two ways to use a file''': “Full content” (entire document in the prompt) or “RAG only” (only the most relevant parts, based on the conversation). | ||
* '''RAG''' uses the '''whole conversation''' (every user/assistant message) to find relevant sections, not just the last message. | * '''RAG''' uses the '''whole conversation''' (every user/assistant message) to find relevant sections, not just the last message. | ||
* The '''number of chunks''' used is '''configurable''' (<code>SOLR_RAG_CHUNK_LIMIT</code>), so you can tune how much document material the AI sees. | * The '''number of chunks''' used is '''configurable''' (<code>SOLR_RAG_CHUNK_LIMIT</code>), so you can tune how much document material the AI sees. | ||
* '''Patient documents''' are always used via RAG (semantic search); personality and template documents can be either full content or RAG. | * '''Patient documents''' are always used via RAG (semantic search); personality and template documents can be either full content or RAG. | ||
---- | |||
----- | |||
''Last updated: February 2025'' | |||