Documentation/Progress Notes: Difference between revisions

(5 intermediate revisions by the same user not shown)

Line 1:

<!-- ~~Wiki format: MediaWiki (Wikipedia~~-~~style). Headers: = = and == ==; bold: ''' '''; italic: '' ''; code: <code~~> </~~code>; tables: {| |- |}; lists: * #. For Confluence: replace = with h1., == with h2., === with h3.; ''' with *; <code> with {{ }} or {code}. --~~>

= AI Attachments and RAG: End-to-End Guide =

This document explains how the BioInsights AI uses attached documents (personality files, progress note templates, and patient documents) when generating progress notes. It is intended for '''teams''' (product, engineering, support) and '''clients''' who need a clear picture of how the system works, what the ~~"Full content"~~ vs ~~"RAG"~~ toggle means, and how retrieval is performed.

This document explains how the BioInsights AI uses attached documents (personality files, progress note templates, and patient documents) when generating progress notes. It is intended for '''teams''' (product, engineering, support) and '''clients''' who need a clear picture of how the system works, what the “Full content” vs “RAG” toggle means, and how retrieval is performed.

~~----~~

-----

== 1. Purpose and scope ==

Line 12:

Line 14:

* '''Personality documents''' – Guidelines, tone, and instructions attached to an AI personality.

* '''Template documents''' – Instructions and structure attached to a progress note template.

* '''Patient documents''' – Files attached to the current encounter (e.g. ~~lab~~ results, referrals).

* '''Patient documents''' – Files attached to the current encounter (e.g. lab results, referrals).

The AI does '''not''' receive raw file binaries. Instead, it receives '''text''' that comes from those documents in one of two ways:

# '''Full content''' – The entire document text is fetched and placed in the ~~AI's~~ context.

# '''Full content''' – The entire document text is fetched and placed in the AI’s context.

# '''RAG (Retrieval-Augmented Generation)''' – Only the most relevant parts of the document (chunks) are retrieved using semantic search and then added to the context.

This guide describes both modes, how documents are prepared (indexing, chunking), how retrieval works (including multi-query and the configurable chunk limit), and the end-to-end flow from setup to AI response.

~~----~~

-----

== 2. High-level overview ==

<pre>

<pre>┌─────────────────────────────────────────────────────────────────────────────┐

~~+-----------------------------------------------------------------------------+~~

│ Admin configures personality / template with attached files │

| Admin configures personality / template with attached files |

│ → Each file can be "Full content" or "RAG only" │

| -> Each file can be "Full content" or "RAG only~~" |~~

└─────────────────────────────────────────────────────────────────────────────┘

~~+-----------------------------------------------------------------------------+~~

│

|

▼

v

┌─────────────────────────────────────────────────────────────────────────────┐

~~+-----------------------------------------------------------------------------+~~

│ Documents are stored and indexed │

| Documents are stored and indexed |

│ → Full content: read at request time via Document Extractor │

| -> Full content: read at request time via Document Extractor |

│ → RAG: split into chunks, embedded, stored in Solr vector store │

| -> RAG: split into chunks, embedded, stored in Solr vector store |

└─────────────────────────────────────────────────────────────────────────────┘

~~+-----------------------------------------------------------------------------+~~

│

|

▼

v

┌─────────────────────────────────────────────────────────────────────────────┐

~~+-----------------------------------------------------------------------------+~~

│ User runs AI progress note (with conversation + optional patient files) │

| User runs AI progress note (with conversation + optional patient files) |

└─────────────────────────────────────────────────────────────────────────────┘

~~+-----------------------------------------------------------------------------+~~

│

|

▼

v

┌─────────────────────────────────────────────────────────────────────────────┐

~~+-----------------------------------------------------------------------------+~~

│ System builds AI context: │

| System builds AI context: |

│ • Full-content files → full text injected into system prompt │

| * Full-content files -> full text injected into system prompt |

│ • RAG files → semantic search over conversation messages → top chunks │

| * RAG files -> semantic search over conversation messages -> top chunks |

│ • Patient docs → same RAG retrieval (vector search by patient + file IDs) │

| * Patient docs -> same RAG retrieval (vector search by patient + file IDs) |

└─────────────────────────────────────────────────────────────────────────────┘

~~+-----------------------------------------------------------------------------+~~

│

|

▼

v

┌─────────────────────────────────────────────────────────────────────────────┐

~~+-----------------------------------------------------------------------------+~~

│ OpenAI API is called with augmented prompt (no file_search tool) │

| OpenAI API is called with augmented prompt (no file_search tool) |

│ → AI generates the note using only the provided context │

| -> AI generates the note using only the provided context |

└─────────────────────────────────────────────────────────────────────────────┘</pre>

~~+-----------------------------------------------------------------------------+~~

</pre>

----

-----

== 3. Attachment modes: ~~"Full content"~~ vs ~~"RAG"~~ ==

== 3. Attachment modes: “Full content” vs “RAG” ==

When you attach a file to a '''personality''' or a '''progress note template''', you can choose how that file is used:

{| ~~border~~="1" ~~cellpadding~~="5" ~~cellspacing~~="0"

{| class="wikitable"

|-

!width="6%"| Mode

~~| '''Mode'''~~

!width="48%"| What it means

~~| '''~~What it means~~'''~~

!width="44%"| When to use it

| ~~'''~~When to use it~~'''~~

|-

| '''Full content'''

| The '''entire''' document text is loaded and added to the ~~AI's~~ system prompt.

| The '''entire''' document text is loaded and added to the AI’s system prompt.

| Short, critical docs (e.g. ~~short~~ guidelines, required structure) where nothing should be missed.

| Short, critical docs (e.g. short guidelines, required structure) where nothing should be missed.

|-

| '''RAG only'''

| The document is '''not''' sent in full. Only '''relevant chunks''' are retrieved using the current conversation and injected as context.

| Longer docs (e.g. ~~long~~ manuals, large templates) where you want the AI to focus on the parts that match the conversation.

| Longer docs (e.g. long manuals, large templates) where you want the AI to focus on the parts that match the conversation.

|}

* '''Patient documents''' (files attached to the encounter) are always retrieved via '''RAG''' (vector search); there is no ~~"full content"~~ option for them.

* '''Patient documents''' (files attached to the encounter) are always retrieved via '''RAG''' (vector search); there is no “full content” option for them.

* If you do '''not''' set the toggle (e.g. ~~older~~ templates/personalities with only a list of file IDs), the system treats all non-patient attachments as '''full content''' for backward compatibility.

* If you do '''not''' set the toggle (e.g. older templates/personalities with only a list of file IDs), the system treats all non-patient attachments as '''full content''' for backward compatibility.

----

-----

== 4. How documents are prepared for the AI ==

=== 4.1 Storing and indexing ===

* '''Storage''': Files are stored in the ~~application's~~ file storage (e.g. S3 or local drive) and linked to the personality or template (or to the patient/encounter for patient documents).

* '''Storage''': Files are stored in the application’s file storage (e.g. S3 or local drive) and linked to the personality or template (or to the patient/encounter for patient documents).

* '''Vector store (for RAG)''':

** Documents that can be used for RAG are '''chunked''' (split into overlapping segments of roughly 1,500 characters).

Line 94:

Line 99:

** When the user runs the AI, the system runs '''semantic search''' over these chunks using the conversation as the query (see below).

=== 4.2 Full-content path ===

* For files marked '''~~"Full content"~~''', the system does '''not''' use the vector store at request time.

* For files marked '''“Full content”''', the system does '''not''' use the vector store at request time.

* It uses the '''Document Extractor''' to read the file (e.g. ~~PDF~~, DOCX) and get the full text.

* It uses the '''Document Extractor''' to read the file (e.g. PDF, DOCX) and get the full text.

* That full text is then injected into the system prompt so the model sees the whole document.

=== 4.3 RAG path ===

* For files marked '''~~"RAG only"~~''' (and for patient documents), the system uses '''only''' the vector store.

* For files marked '''“RAG only”''' (and for patient documents), the system uses '''only''' the vector store.

* It does '''not''' send the full document. It runs a '''multi-query retrieval''' (see next section), then injects only the retrieved chunks into the prompt, up to a character budget.

~~----~~

-----

== 5. How RAG retrieval works ==

Previously, retrieval used '''only the last''' user/developer/assistant message and a '''fixed''' number of chunks (e.g. 20). The current behavior is:

=== 5.1 Multi-query retrieval ===

Line 121:

Line 131:

** Results are collected and then '''merged'''.

=== 5.2 Deduplication and limit ===

* Chunks are identified by a key (e.g. <code>fileId:chunkIndex</code>). If the same chunk appears in results for multiple messages, it is '''deduplicated''' (one entry per chunk, keeping the best score).

* After merging and sorting by score, the system keeps at most '''N''' chunks for non-patient docs and '''N''' for patient docs, where '''N''' is the '''configurable RAG chunk limit''' (see Configuration below).

* So: ~~"each message"~~ improves recall (earlier context can pull in relevant chunks); the limit and deduplication keep context size and cost under control.

* So: “each message” improves recall (earlier context can pull in relevant chunks); the limit and deduplication keep context size and cost under control.

=== 5.3 Where the chunk limit came from ===

* The previous hardcoded value (e.g. 20) was an arbitrary default, not derived from a formal requirement.

* The design intention was always to make this '''configurable''' via environment (see <code>SOLR_RAG_CHUNK_LIMIT</code> in <code>env.ts</code>). The code now uses that setting everywhere instead of a fixed 20.

* Default in config is '''50'''; the application caps it between 1 and 500 for safety.

=== 5.4 Context budget ===

* Even if many chunks are retrieved, the total '''character count''' of the RAG context sent to the model is capped (e.g. <code>RAG_CONTEXT_MAX_CHARS</code> or a model-specific override). Chunks are added in score order until the budget is reached; the rest are dropped and a warning is logged.

~~----~~

-----

== 6. End-to-end flow (step by step) ==

# '''Setup (personality / template)'''

#* Admin attaches files and, for each file, chooses '''Full content''' or '''RAG only'''.

#* Data is saved (e.g. <code>file_ids</code> and <code>file_ids_full_content</code>).

#* Files are stored; RAG-only (and patient) documents are chunked and indexed in Solr when the indexing pipeline runs.

# '''User starts progress note'''

Line 149:

Line 164:

#* Those are also chunked and indexed in Solr (by patient and file).

# '''User runs AI'''

#* The frontend sends the conversation (messages) and options (e.g. ~~template~~ ID, personality ID, patient ID, attached file IDs).

#* The frontend sends the conversation (messages) and options (e.g. template ID, personality ID, patient ID, attached file IDs).

#* Backend resolves which files are personality/template and which are patient, and which of the former are full-content vs RAG-only.

# '''Building context'''

Line 155:

Line 170:

#* '''RAG-only (non-patient) files''' and '''patient files''':

#** If there is already full-document context from step above, the system runs multi-query RAG for '''RAG-only non-patient''' files and appends those chunks to the system message.

#** If there is '''no''' full-document context (e.g. ~~all~~ attachments are RAG-only or only patient docs), the system uses the '''fallback''' path: multi-query RAG for both non-patient and patient files, then builds a single RAG context (chunks + file list + instructions) and injects it into the system prompt.

#** If there is '''no''' full-document context (e.g. all attachments are RAG-only or only patient docs), the system uses the '''fallback''' path: multi-query RAG for both non-patient and patient files, then builds a single RAG context (chunks + file list + instructions) and injects it into the system prompt.

# '''API call'''

#* The AI provider (e.g. ~~OpenAI~~) is called with the '''augmented''' system prompt and the conversation.

#* The AI provider (e.g. OpenAI) is called with the '''augmented''' system prompt and the conversation.

#* No ~~"file_search"~~ or similar tool is used; all document content is in the prompt.

#* No “file_search” or similar tool is used; all document content is in the prompt.

# '''Response'''

#* The model generates the note using only the provided context. The UI shows the note and, where applicable, ~~"Documents referenced"~~ so the user knows which files were used.

#* The model generates the note using only the provided context. The UI shows the note and, where applicable, “Documents referenced” so the user knows which files were used.

~~----~~

-----

== 7. Configuration (for operations / team) ==

Relevant environment variables:

{| ~~border~~="1" ~~cellpadding~~="5" ~~cellspacing~~="0"

{| class="wikitable"

|-

!width="13%"| Variable

~~| '''Variable'''~~

!width="43%"| Purpose

~~| '''~~Purpose~~'''~~

!width="42%"| Default / notes

| ~~'''~~Default / notes~~'''~~

|-

| <code>SOLR_RAG_CHUNK_LIMIT</code>

Line 183:

Line 199:

|}

Other Solr/vector and embedding settings (e.g. ~~core~~ name, dimensions) are documented in <code>env.ts</code> and in the Solr/vector store docs referenced below.

Other Solr/vector and embedding settings (e.g. core name, dimensions) are documented in <code>env.ts</code> and in the Solr/vector store docs referenced below.

~~----~~

-----

== 8. Summary for clients ==

* '''Two ways to use a file''': ~~"Full content"~~ (entire document in the prompt) or ~~"RAG only"~~ (only the most relevant parts, based on the conversation).

* '''Two ways to use a file''': “Full content” (entire document in the prompt) or “RAG only” (only the most relevant parts, based on the conversation).

* '''RAG''' uses the '''whole conversation''' (every user/assistant message) to find relevant sections, not just the last message.

* The '''number of chunks''' used is '''configurable''' (<code>SOLR_RAG_CHUNK_LIMIT</code>), so you can tune how much document material the AI sees.

* '''Patient documents''' are always used via RAG (semantic search); personality and template documents can be either full content or RAG.

----

-----

''Last updated: February 2025''

@@ Line 1: / Line 1: @@
-<!-- Wiki format: MediaWiki (Wikipedia-style). Headers: = = and == ==; bold: ''' '''; italic: '' ''; code: <code> </code>; tables: {| |- |}; lists: * #. For Confluence: replace = with h1., == with h2., === with h3.; ''' with *; <code> with {{ }} or {code}. -->
+<span id="ai-attachments-and-rag-end-to-end-guide"></span>
 = AI Attachments and RAG: End-to-End Guide =
-This document explains how the BioInsights AI uses attached documents (personality files, progress note templates, and patient documents) when generating progress notes. It is intended for '''teams''' (product, engineering, support) and '''clients''' who need a clear picture of how the system works, what the "Full content" vs "RAG" toggle means, and how retrieval is performed.
+This document explains how the BioInsights AI uses attached documents (personality files, progress note templates, and patient documents) when generating progress notes. It is intended for '''teams''' (product, engineering, support) and '''clients''' who need a clear picture of how the system works, what the “Full content” vs “RAG” toggle means, and how retrieval is performed.
-----
+-----
+<span id="purpose-and-scope"></span>
 == 1. Purpose and scope ==
@@ Line 12: / Line 14: @@
 * '''Personality documents''' – Guidelines, tone, and instructions attached to an AI personality.
 * '''Template documents''' – Instructions and structure attached to a progress note template.
-* '''Patient documents''' – Files attached to the current encounter (e.g. lab results, referrals).
+* '''Patient documents''' – Files attached to the current encounter (e.g. lab results, referrals).
 The AI does '''not''' receive raw file binaries. Instead, it receives '''text''' that comes from those documents in one of two ways:
-# '''Full content''' – The entire document text is fetched and placed in the AI's context.
+# '''Full content''' – The entire document text is fetched and placed in the AI’s context.
 # '''RAG (Retrieval-Augmented Generation)''' – Only the most relevant parts of the document (chunks) are retrieved using semantic search and then added to the context.
 This guide describes both modes, how documents are prepared (indexing, chunking), how retrieval works (including multi-query and the configurable chunk limit), and the end-to-end flow from setup to AI response.
-----
+-----
+<span id="high-level-overview"></span>
 == 2. High-level overview ==
-<pre>
+<pre>┌─────────────────────────────────────────────────────────────────────────────┐
-+-----------------------------------------------------------------------------+
+│  Admin configures personality / template with attached files                 │
-|  Admin configures personality / template with attached files                 |
+│  → Each file can be &quot;Full content&quot; or &quot;RAG only&quot;                             │
-|  -> Each file can be "Full content" or "RAG only"                            |
+└─────────────────────────────────────────────────────────────────────────────┘
-+-----------------------------------------------------------------------------+
+                                         │
-                                         |
+                                         ▼
-                                         v
+┌─────────────────────────────────────────────────────────────────────────────┐
-+-----------------------------------------------------------------------------+
+│  Documents are stored and indexed                                            │
-|  Documents are stored and indexed                                            |
+│  → Full content: read at request time via Document Extractor                   │
-|  -> Full content: read at request time via Document Extractor                |
+│  → RAG: split into chunks, embedded, stored in Solr vector store              │
-|  -> RAG: split into chunks, embedded, stored in Solr vector store            |
+└─────────────────────────────────────────────────────────────────────────────┘
-+-----------------------------------------------------------------------------+
+                                         │
-                                         |
+                                         ▼
-                                         v
+┌─────────────────────────────────────────────────────────────────────────────┐
-+-----------------------------------------------------------------------------+
+│  User runs AI progress note (with conversation + optional patient files)     │
-|  User runs AI progress note (with conversation + optional patient files)     |
+└─────────────────────────────────────────────────────────────────────────────┘
-+-----------------------------------------------------------------------------+
+                                         │
-                                         |
+                                         ▼
-                                         v
+┌─────────────────────────────────────────────────────────────────────────────┐
-+-----------------------------------------------------------------------------+
+│  System builds AI context:                                                   │
-|  System builds AI context:                                                   |
+│  • Full-content files → full text injected into system prompt               │
-|  * Full-content files -> full text injected into system prompt               |
+│  • RAG files → semantic search over conversation messages → top chunks      │
-|  * RAG files -> semantic search over conversation messages -> top chunks     |
+│  • Patient docs → same RAG retrieval (vector search by patient + file IDs)  │
-|  * Patient docs -> same RAG retrieval (vector search by patient + file IDs)   |
+└─────────────────────────────────────────────────────────────────────────────┘
-+-----------------------------------------------------------------------------+
+                                         │
-                                         |
+                                         ▼
-                                         v
+┌─────────────────────────────────────────────────────────────────────────────┐
-+-----------------------------------------------------------------------------+
+│  OpenAI API is called with augmented prompt (no file_search tool)            │
-|  OpenAI API is called with augmented prompt (no file_search tool)              |
+│  → AI generates the note using only the provided context                    │
-|  -> AI generates the note using only the provided context                    |
+└─────────────────────────────────────────────────────────────────────────────┘</pre>
-+-----------------------------------------------------------------------------+
-</pre>
-----
+-----
-== 3. Attachment modes: "Full content" vs "RAG" ==
+<span id="attachment-modes-full-content-vs-rag"></span>
+== 3. Attachment modes: “Full content” vs “RAG” ==
 When you attach a file to a '''personality''' or a '''progress note template''', you can choose how that file is used:
-{| border="1" cellpadding="5" cellspacing="0"
+{| class="wikitable"
-|-
+!width="6%"| Mode
-| '''Mode'''
+!width="48%"| What it means
-| '''What it means'''
+!width="44%"| When to use it
-| '''When to use it'''
 |-
 | '''Full content'''
-| The '''entire''' document text is loaded and added to the AI's system prompt.
+| The '''entire''' document text is loaded and added to the AI’s system prompt.
-| Short, critical docs (e.g. short guidelines, required structure) where nothing should be missed.
+| Short, critical docs (e.g. short guidelines, required structure) where nothing should be missed.
 |-
 | '''RAG only'''
 | The document is '''not''' sent in full. Only '''relevant chunks''' are retrieved using the current conversation and injected as context.
-| Longer docs (e.g. long manuals, large templates) where you want the AI to focus on the parts that match the conversation.
+| Longer docs (e.g. long manuals, large templates) where you want the AI to focus on the parts that match the conversation.
 |}
-* '''Patient documents''' (files attached to the encounter) are always retrieved via '''RAG''' (vector search); there is no "full content" option for them.
+* '''Patient documents''' (files attached to the encounter) are always retrieved via '''RAG''' (vector search); there is no “full content” option for them.
-* If you do '''not''' set the toggle (e.g. older templates/personalities with only a list of file IDs), the system treats all non-patient attachments as '''full content''' for backward compatibility.
+* If you do '''not''' set the toggle (e.g. older templates/personalities with only a list of file IDs), the system treats all non-patient attachments as '''full content''' for backward compatibility.
-----
+-----
+<span id="how-documents-are-prepared-for-the-ai"></span>
 == 4. How documents are prepared for the AI ==
+<span id="storing-and-indexing"></span>
 === 4.1 Storing and indexing ===
-* '''Storage''': Files are stored in the application's file storage (e.g. S3 or local drive) and linked to the personality or template (or to the patient/encounter for patient documents).
+* '''Storage''': Files are stored in the application’s file storage (e.g. S3 or local drive) and linked to the personality or template (or to the patient/encounter for patient documents).
 * '''Vector store (for RAG)''':
 ** Documents that can be used for RAG are '''chunked''' (split into overlapping segments of roughly 1,500 characters).
@@ Line 94: / Line 99: @@
 ** When the user runs the AI, the system runs '''semantic search''' over these chunks using the conversation as the query (see below).
+<span id="full-content-path"></span>
 === 4.2 Full-content path ===
-* For files marked '''"Full content"''', the system does '''not''' use the vector store at request time.
+* For files marked '''“Full content”''', the system does '''not''' use the vector store at request time.
-* It uses the '''Document Extractor''' to read the file (e.g. PDF, DOCX) and get the full text.
+* It uses the '''Document Extractor''' to read the file (e.g. PDF, DOCX) and get the full text.
 * That full text is then injected into the system prompt so the model sees the whole document.
+<span id="rag-path"></span>
 === 4.3 RAG path ===
-* For files marked '''"RAG only"''' (and for patient documents), the system uses '''only''' the vector store.
+* For files marked '''“RAG only”''' (and for patient documents), the system uses '''only''' the vector store.
 * It does '''not''' send the full document. It runs a '''multi-query retrieval''' (see next section), then injects only the retrieved chunks into the prompt, up to a character budget.
-----
+-----
+<span id="how-rag-retrieval-works"></span>
 == 5. How RAG retrieval works ==
-Previously, retrieval used '''only the last''' user/developer/assistant message and a '''fixed''' number of chunks (e.g. 20). The current behavior is:
+Previously, retrieval used '''only the last''' user/developer/assistant message and a '''fixed''' number of chunks (e.g. 20). The current behavior is:
+<span id="multi-query-retrieval"></span>
 === 5.1 Multi-query retrieval ===
@@ Line 121: / Line 131: @@
 ** Results are collected and then '''merged'''.
+<span id="deduplication-and-limit"></span>
 === 5.2 Deduplication and limit ===
-* Chunks are identified by a key (e.g. <code>fileId:chunkIndex</code>). If the same chunk appears in results for multiple messages, it is '''deduplicated''' (one entry per chunk, keeping the best score).
+* Chunks are identified by a key (e.g. <code>fileId:chunkIndex</code>). If the same chunk appears in results for multiple messages, it is '''deduplicated''' (one entry per chunk, keeping the best score).
 * After merging and sorting by score, the system keeps at most '''N''' chunks for non-patient docs and '''N''' for patient docs, where '''N''' is the '''configurable RAG chunk limit''' (see Configuration below).
-* So: "each message" improves recall (earlier context can pull in relevant chunks); the limit and deduplication keep context size and cost under control.
+* So: “each message” improves recall (earlier context can pull in relevant chunks); the limit and deduplication keep context size and cost under control.
+<span id="where-the-chunk-limit-came-from"></span>
 === 5.3 Where the chunk limit came from ===
-* The previous hardcoded value (e.g. 20) was an arbitrary default, not derived from a formal requirement.
+* The previous hardcoded value (e.g. 20) was an arbitrary default, not derived from a formal requirement.
 * The design intention was always to make this '''configurable''' via environment (see <code>SOLR_RAG_CHUNK_LIMIT</code> in <code>env.ts</code>). The code now uses that setting everywhere instead of a fixed 20.
 * Default in config is '''50'''; the application caps it between 1 and 500 for safety.
+<span id="context-budget"></span>
 === 5.4 Context budget ===
-* Even if many chunks are retrieved, the total '''character count''' of the RAG context sent to the model is capped (e.g. <code>RAG_CONTEXT_MAX_CHARS</code> or a model-specific override). Chunks are added in score order until the budget is reached; the rest are dropped and a warning is logged.
+* Even if many chunks are retrieved, the total '''character count''' of the RAG context sent to the model is capped (e.g. <code>RAG_CONTEXT_MAX_CHARS</code> or a model-specific override). Chunks are added in score order until the budget is reached; the rest are dropped and a warning is logged.
-----
+-----
+<span id="end-to-end-flow-step-by-step"></span>
 == 6. End-to-end flow (step by step) ==
 # '''Setup (personality / template)'''
 #* Admin attaches files and, for each file, chooses '''Full content''' or '''RAG only'''.
-#* Data is saved (e.g. <code>file_ids</code> and <code>file_ids_full_content</code>).
+#* Data is saved (e.g. <code>file_ids</code> and <code>file_ids_full_content</code>).
 #* Files are stored; RAG-only (and patient) documents are chunked and indexed in Solr when the indexing pipeline runs.
 # '''User starts progress note'''
@@ Line 149: / Line 164: @@
 #* Those are also chunked and indexed in Solr (by patient and file).
 # '''User runs AI'''
-#* The frontend sends the conversation (messages) and options (e.g. template ID, personality ID, patient ID, attached file IDs).
+#* The frontend sends the conversation (messages) and options (e.g. template ID, personality ID, patient ID, attached file IDs).
 #* Backend resolves which files are personality/template and which are patient, and which of the former are full-content vs RAG-only.
 # '''Building context'''
@@ Line 155: / Line 170: @@
 #* '''RAG-only (non-patient) files''' and '''patient files''':
 #** If there is already full-document context from step above, the system runs multi-query RAG for '''RAG-only non-patient''' files and appends those chunks to the system message.
-#** If there is '''no''' full-document context (e.g. all attachments are RAG-only or only patient docs), the system uses the '''fallback''' path: multi-query RAG for both non-patient and patient files, then builds a single RAG context (chunks + file list + instructions) and injects it into the system prompt.
+#** If there is '''no''' full-document context (e.g. all attachments are RAG-only or only patient docs), the system uses the '''fallback''' path: multi-query RAG for both non-patient and patient files, then builds a single RAG context (chunks + file list + instructions) and injects it into the system prompt.
 # '''API call'''
-#* The AI provider (e.g. OpenAI) is called with the '''augmented''' system prompt and the conversation.
+#* The AI provider (e.g. OpenAI) is called with the '''augmented''' system prompt and the conversation.
-#* No "file_search" or similar tool is used; all document content is in the prompt.
+#* No “file_search” or similar tool is used; all document content is in the prompt.
 # '''Response'''
-#* The model generates the note using only the provided context. The UI shows the note and, where applicable, "Documents referenced" so the user knows which files were used.
+#* The model generates the note using only the provided context. The UI shows the note and, where applicable, “Documents referenced” so the user knows which files were used.
-----
+-----
+<span id="configuration-for-operations-team"></span>
 == 7. Configuration (for operations / team) ==
 Relevant environment variables:
-{| border="1" cellpadding="5" cellspacing="0"
+{| class="wikitable"
-|-
+!width="13%"| Variable
-| '''Variable'''
+!width="43%"| Purpose
-| '''Purpose'''
+!width="42%"| Default / notes
-| '''Default / notes'''
 |-
 | <code>SOLR_RAG_CHUNK_LIMIT</code>
@@ Line 183: / Line 199: @@
 |}
-Other Solr/vector and embedding settings (e.g. core name, dimensions) are documented in <code>env.ts</code> and in the Solr/vector store docs referenced below.
+Other Solr/vector and embedding settings (e.g. core name, dimensions) are documented in <code>env.ts</code> and in the Solr/vector store docs referenced below.
-----
+-----
+<span id="summary-for-clients"></span>
 == 8. Summary for clients ==
-* '''Two ways to use a file''': "Full content" (entire document in the prompt) or "RAG only" (only the most relevant parts, based on the conversation).
+* '''Two ways to use a file''': “Full content” (entire document in the prompt) or “RAG only” (only the most relevant parts, based on the conversation).
 * '''RAG''' uses the '''whole conversation''' (every user/assistant message) to find relevant sections, not just the last message.
 * The '''number of chunks''' used is '''configurable''' (<code>SOLR_RAG_CHUNK_LIMIT</code>), so you can tune how much document material the AI sees.
 * '''Patient documents''' are always used via RAG (semantic search); personality and template documents can be either full content or RAG.
-----
+-----
+''Last updated: February 2025''