diff --git a/README.md b/README.md
index d9febba..67f2bc3 100644
--- a/README.md
+++ b/README.md
@@ -87,6 +87,12 @@ Note: as an alternative to explicitly passing the API keys in the constructor yo
 
 Request a non-streaming completion from the model.
 
+#### `requestChatCompletion(model: Model, options: { messages: Message[], generationOptions: CompletionOptions }, chunkCb: ChunkCb): Promise<CompletionResponse[]>`
+
+Request a completion from the model with a sequence of chat messages which have roles. A message should look like
+`{ role: 'user', content: 'Hello!' }` or `{ role: 'system', content: 'Hi!' }`. The `role` can be either `user`, `system` or `assistant`, no
+matter the model in use.
+
 ### ChatSession
 
 #### `constructor(completionService: CompletionService, model: string, systemPrompt: string)`
@@ -110,6 +116,13 @@ loadPrompt("Hello, may name is %%%(NAME)%%%", { NAME: "Omega" })
 * `importPromptSync(path: string, variables: Record<string, string>): string` - Load a prompt from a file with the given variables
 * `importPrompt(path: string, variables: Record<string, string>): Promise<string>` - Load a prompt from a file with the given variables, asynchronously returning a Promise
 
+### Flow
+
+For building complex multi-round conversations or agents, see the `Flow` class. It allows you to define a flow of messages
+and responses, and then run them in a sequence, with the ability to ask follow-up questions based on the previous responses.
+
+See the documentation [here](./docs/flow.md).
+
 ### More information
 
 For the full API, see the [TypeScript types](./src/index.d.ts). Not all methods are documented in README, the types are
diff --git a/docs/MarkdownProcessing.md b/docs/MarkdownProcessing.md
index be0b95b..780a2ca 100644
--- a/docs/MarkdownProcessing.md
+++ b/docs/MarkdownProcessing.md
@@ -1,4 +1,4 @@
-LXL contains a simple templating system that allows you to conditionally insert data into the markdown at runtime.
+LXL contains a simple templating system ("MDP") that allows you to conditionally insert data into the markdown at runtime.
 
 To insert variables, you use `%%%(VARIABLE_NAME)%%%`. To insert a string based on a specific condition, you use ```%%%[...] if CONDITION%%%``` or with an else clause, ```%%%[...] if BOOLEAN_VAR else [...]%%%```. Note that the square brackets work like quotation mark strings in programming languages, so you can escape them with a backslash, like `\]`.
 
@@ -12,6 +12,7 @@ Would be loaded in JS like this:
 ```js
 const { importPromptSync } = require('langxlang')
 const prompt = importPromptSync('path-to-prompt.md', { NAME: 'Omega', HAS_PROMPT: true, IS_AI_STUDIO: false, LLM_NAME: 'Gemini 1.5 Pro' })
+console.log(prompt) // string
 ```
 
 And would result in:
@@ -38,50 +39,48 @@ You are running via API.
 
 Note: you can optionally put 2 spaces of tabulation in each line after an IF or ELSE block to make the code more readable. This is optional, and will be removed by the parser. If you actually want to put 2+ spaces of tabs, you can add an extra 2 spaces of tabulation (eg use 4 spaces to get 2, 6 to get 4, etc.).
 
-### Guidance Region
+#### Comments
 
-LLMs don't always give you the output you desire for your prompt. One approach to fixing
-this is by modifying your prompt to be more clear and adding more examples.
-Another standard approach is to provide some initial guidance to the model for what the response 
-should look like, to guarantee that the model will give you output similar to that you desire.
-For example, instead of just asking your model to output JSON or YAML and then
-ending the prompt with a question for the model to answer, you might end it with
-a markdown code block (like <code>```yml</code>), that the LLM would then complete the
-body for.
+You can specify a comment by using `<!---` and `-->` (note the additional dash in the starting token). This will be removed by the parser and not included in the output. We don't remove standard `<!--` comments to allow literal HTML comments in the markdown, which may be helpful when you expect markdown responses from the model.
 
-However, this can lead to messy code for you as you then have to prefix that guidance
-to the response you get from LXL. To make this easier, LXL provides a way to mark
-regions of your prompt as guidance. The guidance will then be automatically prepended
-to the model output you get from LXL. 
+### Roles
 
-As for how we send this data to the LLM, if the LLM is in chat mode, we split the user prompt
-by the base prompt and the guidance region, and mark the guidance region as being under
-the role `model` (Google Gemini API) or `assistant` (OpenAI GPT API) and the former as `system` or `user`.
-If the LLM is in completion mode, we simply append the guidance region to the prompt.
+LXL provides a way to take a markdown prompt template like above and then break up the message into a chat session, which includes several messages each with their own roles. This is opposed to getting a single prompt string from above. You can use all the above pre-processing features with this system. For example, given the following prompt template:
 
-To notate a region like this, you can use the following marker <code>%%%$GUIDANCE_START$%%%</code> in the prompt:
-
-User Prompt (prompt.md):
-<pre>
-Please convert this YAML to JSON:
-```yml
-hello: world
+```md
+<|SYSTEM|>
+Respond to the user like a pirate.
+<|USER|>
+How are you today?
+<|ASSISTANT|>
+Arrr, I be doin' well, matey! How can I help ye today?
+<|USER|>
+%%%(PROMPT)%%%
+<|ASSISTANT|>
 ```
-%%%$GUIDANCE_START$%%%
-```json
-</pre>
 
-This will result in:
-- role: system, message: "Please convert this YAML to JSON:\n```yml\nhello: world\n```\n"
-- role: model, message: "```json\n"
-
-And LXL's output will include the <code>```json</code> and the rest of the output as if they were both part of the model's output (this includes streaming).
+By using the following code:
+```js
+const { importPromptSync } = require('langxlang')
+const messages = importPromptSync('path-to-prompt.md', { PROMPT: 'What is the weather like?' }, {
+  roles: { // passing roles will return a messages array as opposed to a string
+    '<|SYSTEM|>': 'system',
+    '<|USER|>': 'user',
+    '<|ASSISTANT|>': 'assistant'
+  }
+})
+console.log(messages) // array
+```
 
-The usage in JS would look like:
+We can extract the following messages array that looks like this:
 ```js
-const { ChatSession, importPromptSync } = require('langxlang')
-const session = new ChatSession(service, 'gpt-3.5-turbo', '', {})
-// importPromptSync returns an object with the prompt and the guidance, that can be passed to sendMessage
-const prompt = importPromptSync('prompt.md', {})
-session.sendMessage(prompt).then(console.log)
-```
\ No newline at end of file
+[
+  { role: 'system', message: 'Respond to the user like a pirate.' },
+  { role: 'user', message: 'How are you today?' },
+  { role: 'assistant', message: 'Arrr, I be doin\' well, matey! How can I help ye today?' },
+  { role: 'user', message: 'What is the weather like?' },
+  // LXL will automatically remove empty messages
+]
+```
+
+This can be helpful for building dialogues beyond what a basic ChatMessage back and forth can give you. For more advanced use-cases (such as multi-round agents), see the [Flow](./flow.md) documentation.
diff --git a/docs/extras.md b/docs/extras.md
new file mode 100644
index 0000000..7c180a7
--- /dev/null
+++ b/docs/extras.md
@@ -0,0 +1,34 @@
+### Guidance Region
+
+LLMs don't always give you the output you desire for your prompt. One approach to fixing
+this is by modifying your prompt to be more clear and adding more examples.
+Another standard approach is to provide some initial guidance to the model for what the response 
+should look like, to guarantee that the model will give you output similar to that you desire.
+For example, instead of just asking your model to output JSON or YAML and then
+ending the prompt with a question for the model to answer, you might end it with
+a markdown code block (like <code>```yml</code>), that the LLM would then complete the
+body for.
+
+However, this can lead to messy code for you as you then have to prefix that guidance
+to the response you get from LXL. To make this easier, LXL provides a way to mark
+regions of your prompt as guidance. The guidance will then be automatically prepended
+to the model output you get from LXL. 
+
+By setting a message role as `guidance`, that message will be sent as a `model` or `assistant` (depending on platform) message to the LLM and then be prepended to the response you get from LXL.
+
+Here is an example:
+```js
+const { CompletionService } = require('langxlang')
+const service = new CompletionService()
+const [response] = await service.requestChatCompletion('gemini-1.0-pro', {
+  messages: [
+    { role: 'user', message: 'Please convert this YAML to JSON:\n```yml\nhello: world\n```\n' },
+    { role: 'guidance', message: '```json\n' }
+  ]
+})
+console.log(response) // { text: '```json\n{"hello": "world"}\n' }
+```
+
+Note: there can only be one guidance message and it must be the last one. You should remove
+it from the messages array the next call you do to requestChatCompletion. This feature works
+best when used with the role parsing system above.
\ No newline at end of file
diff --git a/docs/flow.md b/docs/flow.md
index f7a49b3..85a9a35 100644
--- a/docs/flow.md
+++ b/docs/flow.md
@@ -38,12 +38,21 @@ Hello, how are you doing today on this %%%(DAY_OF_WEEK)%%%?
 %%%endif
 ````
 
-We can construct the following chain to ask the user how they are doing, 
-then ask them what day of the week tomorrow is, support a follow-up question 
-after the first response to ask the model to turn the response into YAML format:
+We can construct the following chain to:
+* ask the user how they are doing, 
+* then ask them what day of the week tomorrow is
+* then support a follow-up question after the first response (where they answered if they were ok), where we ask the model to turn its response into YAML format
+
 ```js
 const chain = (params) => ({
-  prompt: importRawSync('./prompt.md'),
+  prompt: {
+    // if we're using a prompt file with roles, we use specify `text` and `roles`. Otherwise, we can specify `user` and `system` as below.
+    text: importRawSync('./prompt.md'),
+    roles: {
+      '<USER>': 'user',
+      '<ASSISTANT>': 'assistant'
+    }
+  }
   with: {
     DAY_OF_WEEK: params.dayOfWeek
   },
@@ -93,7 +102,12 @@ A chain looks like this:
 ```js
 function chain (initialArguments) {
   return {
-    prompt: 'the prompt here',
+    prompt: {
+      // the system and user prompt will be propagated down the chain but you can override them at any point
+      // by specifying them in another `prompt` object
+      system: 'the system prompt here',
+      user: 'the user prompt here'
+    }
     // the with section contains the parameters to pass, like tools.loadPrompt(prompt, with)
     with: {
       SOME_PARAM: 'value'
@@ -116,7 +130,9 @@ and should return a string. The string will be used to find the next function to
 
 ```js
 const chain = (params) => ({
-  prompt: 'the prompt here',
+  prompt: {
+    user: 'the prompt here',
+  }
   with: {
     SOME_PARAM: 'value'
   },
diff --git a/src/ChatSession.js b/src/ChatSession.js
index 80e7244..fd40d9a 100644
--- a/src/ChatSession.js
+++ b/src/ChatSession.js
@@ -177,7 +177,7 @@ class ChatSession {
       }
       response.content = message.guidanceText + response.content
     }
-    return { text: response.content, calledFunctions: this._calledFunctionsForRound }
+    return { content: response.content, text: response.content, calledFunctions: this._calledFunctionsForRound }
   }
 }
 
diff --git a/src/CompletionService.js b/src/CompletionService.js
index e7f00de..80bb96d 100644
--- a/src/CompletionService.js
+++ b/src/CompletionService.js
@@ -1,7 +1,7 @@
 const openai = require('./openai')
 const palm2 = require('./palm2')
 const gemini = require('./gemini')
-const { cleanMessage, getModelInfo, checkDoesGoogleModelSupportInstructions, knownModels } = require('./util')
+const { cleanMessage, getModelInfo, checkDoesGoogleModelSupportInstructions, checkGuidance, knownModels } = require('./util')
 const caching = require('./caching')
 
 class CompletionService {
@@ -33,43 +33,16 @@ class CompletionService {
     return { openai: openaiModels, google: geminiModels }
   }
 
-  async _requestCompletionOpenAI (model, system, user, { maxTokens, stopSequences, temperature, topP }, chunkCb) {
-    if (!this.openaiApiKey) throw new Error('OpenAI API key not set')
-    const guidance = system?.guidanceText || user?.guidanceText || ''
-    const response = await openai.generateCompletion(model, system.basePrompt || system, user.basePrompt || user, {
-      apiKey: this.openaiApiKey,
-      guidanceMessage: guidance,
-      generationConfig: {
-        max_tokens: maxTokens,
-        stop: stopSequences,
-        temperature,
-        top_p: topP
-      }
-    })
-    return response.choices.map((choice) => ({ text: guidance + choice.content }))
+  async _requestCompletionOpenAI (model, system, user, options, chunkCb) {
+    const messages = [{ role: 'user', content: user }]
+    if (system) messages.unshift({ role: 'system', content: system })
+    return this._requestChatCompleteOpenAI(model, messages, options, undefined, chunkCb)
   }
 
-  async _requestCompletionGemini (model, system, user, { maxTokens, stopSequences, temperature, topP, topK }, chunkCb) {
-    if (!this.geminiApiKey) throw new Error('Gemini API key not set')
-    const guidance = system?.guidanceText || user?.guidanceText || ''
-    // April 2024 - Only Gemini 1.5 supports instructions
-    if (!checkDoesGoogleModelSupportInstructions(model)) {
-      const mergedPrompt = [system, user].join('\n')
-      system = ''
-      user = mergedPrompt
-    }
-    const result = await gemini.generateCompletion(model, system, user, {
-      apiKey: this.geminiApiKey,
-      generationConfig: {
-        maxOutputTokens: maxTokens,
-        stopSequences,
-        temperature,
-        topP,
-        topK
-      }
-    }, chunkCb)
-    chunkCb?.({ done: true, delta: '' })
-    return [{ text: guidance + result.text() }]
+  async _requestCompletionGemini (model, system, user, options, chunkCb) {
+    const messages = [{ role: 'user', content: user }]
+    if (system) messages.unshift({ role: 'system', content: system })
+    return this._requestChatCompleteGemini(model, messages, options, undefined, chunkCb)
   }
 
   async requestCompletion (model, system, user, chunkCb, options = {}) {
@@ -84,16 +57,19 @@ class CompletionService {
         return cachedResponse
       }
     }
-
-    function saveIfCaching (response) {
-      if (response && response.text && options.enableCaching) {
-        caching.addResponseToCache(model, [system, user], response)
+    function saveIfCaching (responses) {
+      for (const response of responses) {
+        if (response && response.content && options.enableCaching) {
+          caching.addResponseToCache(model, [system, user], response)
+        }
       }
-      return response
+      return responses
     }
+
     const genOpts = {
       ...this.defaultGenerationOptions,
-      ...options
+      ...options,
+      enableCaching: false // already handle caching here, as some models alias to chat we don't want to cache twice.
     }
     const { family } = getModelInfo(model)
     switch (family) {
@@ -102,15 +78,16 @@ class CompletionService {
       case 'palm2': {
         if (!this.palm2ApiKey) throw new Error('PaLM2 API key not set')
         const result = await palm2.requestPalmCompletion(system + '\n' + user, this.palm2ApiKey, model)
-        return saveIfCaching({ text: result })
+        return saveIfCaching({ text: result, content: result })
       }
       default:
         throw new Error(`Model '${model}' not supported for completion, available models: ${knownModels.join(', ')}`)
     }
   }
 
-  async _requestStreamingChatOpenAI (model, messages, { maxTokens, stopSequences, temperature, topP }, functions, chunkCb) {
+  async _requestChatCompleteOpenAI (model, messages, { maxTokens, stopSequences, temperature, topP }, functions, chunkCb) {
     if (!this.openaiApiKey) throw new Error('OpenAI API key not set')
+    const guidance = checkGuidance(messages, chunkCb)
     const response = await openai.generateChatCompletionIn(
       model,
       messages.map((entry) => {
@@ -118,7 +95,7 @@ class CompletionService {
         if (msg.role === 'model') msg.role = 'assistant'
         if (msg.role === 'guidance') msg.role = 'assistant'
         return msg
-      }),
+      }).filter((msg) => msg.content),
       {
         apiKey: this.openaiApiKey,
         functions,
@@ -139,23 +116,27 @@ class CompletionService {
         content_filter: 'safety', // an error would be thrown before this
         tool_calls: 'function'
       }[choice.finishReason] ?? 'unknown'
-      return { type: choiceType, isTruncated: choice.finishReason === 'length', ...choice }
+      const content = guidance ? guidance.content + choice.content : choice.content
+      return { type: choiceType, isTruncated: choice.finishReason === 'length', ...choice, content, text: content }
     })
   }
 
-  async _requestStreamingChatGemini (model, messages, { maxTokens, stopSequences, temperature, topP, topK }, functions, chunkCb) {
+  async _requestChatCompleteGemini (model, messages, { maxTokens, stopSequences, temperature, topP, topK }, functions, chunkCb) {
     if (!this.geminiApiKey) throw new Error('Gemini API key not set')
+    // April 2024 - Only Gemini 1.5 supports instructions
+    const supportsSystemInstruction = checkDoesGoogleModelSupportInstructions(model)
+    const guidance = checkGuidance(messages, chunkCb)
     const geminiMessages = messages.map((msg) => {
       const m = structuredClone(msg)
       if (msg.role === 'assistant') m.role = 'model'
-      if (msg.role === 'system') m.role = 'system'
+      if (msg.role === 'system') m.role = supportsSystemInstruction ? 'system' : 'user'
       if (msg.role === 'guidance') m.role = 'model'
       if (msg.content != null) {
         delete m.content
         m.parts = [{ text: msg.content }]
       }
       return m
-    })
+    }).filter((msg) => msg.parts && (msg.parts.length > 0))
     const response = await gemini.generateChatCompletionEx(model, geminiMessages, {
       apiKey: this.geminiApiKey,
       functions,
@@ -170,7 +151,8 @@ class CompletionService {
     if (response.text()) {
       const answer = response.text()
       chunkCb?.({ done: true, delta: '' })
-      const result = { type: 'text', content: answer }
+      const content = guidance ? guidance + answer : answer
+      const result = { type: 'text', content, text: content }
       return [result]
     } else if (response.functionCalls()) {
       const calls = response.functionCalls()
@@ -190,11 +172,30 @@ class CompletionService {
     }
   }
 
-  async requestChatCompletion (model, { messages, functions, generationOptions }, chunkCb) {
+  async requestChatCompletion (model, { messages, functions, enableCaching, generationOptions }, chunkCb) {
+    if (enableCaching) {
+      const cachedResponse = await caching.getCachedResponse(model, messages)
+      if (cachedResponse) {
+        chunkCb?.({ done: false, content: cachedResponse.text })
+        chunkCb?.({ done: true, delta: '' })
+        return cachedResponse
+      }
+    }
+    function saveIfCaching (responses) {
+      for (const response of responses) {
+        if (response && response.content && enableCaching) {
+          caching.addResponseToCache(model, messages, response)
+        }
+      }
+      return responses
+    }
+
     const { family } = getModelInfo(model)
     switch (family) {
-      case 'openai': return this._requestStreamingChatOpenAI(model, messages, { ...this.defaultGenerationOptions, ...generationOptions }, functions, chunkCb)
-      case 'gemini': return this._requestStreamingChatGemini(model, messages, { ...this.defaultGenerationOptions, ...generationOptions }, functions, chunkCb)
+      case 'openai':
+        return saveIfCaching(await this._requestChatCompleteOpenAI(model, messages, { ...this.defaultGenerationOptions, ...generationOptions }, functions, chunkCb))
+      case 'gemini':
+        return saveIfCaching(await this._requestChatCompleteGemini(model, messages, { ...this.defaultGenerationOptions, ...generationOptions }, functions, chunkCb))
       default:
         throw new Error(`Model '${model}' not supported for streaming chat, available models: ${knownModels.join(', ')}`)
     }
diff --git a/src/Flow.js b/src/Flow.js
index 970d0b7..5bc77ed 100644
--- a/src/Flow.js
+++ b/src/Flow.js
@@ -13,7 +13,7 @@ class Flow {
 
   _hash (...args) {
     const hash = crypto.createHash('sha1')
-    args.filter(e => e != null).map(String).forEach(arg => hash.update(arg))
+    hash.update(JSON.stringify(args))
     return hash.digest('hex')
   }
 
@@ -23,35 +23,50 @@ class Flow {
   async _run (details, inherited, runFollowUp, responses) {
     this.lastFlow = details
     this.lastResponses = responses
-    const promptFile = details.prompt || inherited.prompt
-    if (!promptFile) {
-      throw new Error('No prompt provided')
-    }
-    const usingVars = { ...inherited.with, ...details.with }
-    const userPrompt = tools.loadPrompt(promptFile, usingVars)
-    const systemPrompt = details.systemPrompt
-      ? tools.loadPrompt(details.systemPrompt, usingVars)
-      : (inherited.systemPrompt && tools.loadPrompt(inherited.systemPrompt, usingVars))
     const model = details.model || this.defaultModel
+    const usingVars = { ...inherited.with, ...details.with }
+    const prompt = { ...inherited.prompt, ...details.prompt }
+    if (!Object.keys(prompt).length) {
+      throw new Error('No prompt defined')
+    }
+    const messages = []
+
+    if (prompt.text) {
+      const msgs = tools.loadPrompt(prompt.text, usingVars, { roles: prompt.roles })
+      messages.push(...msgs)
+    } else {
+      if (prompt.system) {
+        const system = tools.loadPrompt(prompt.system, usingVars)
+        messages.push({ role: 'system', content: system })
+      }
+      if (prompt.user) {
+        const user = tools.loadPrompt(prompt.user, usingVars)
+        messages.push({ role: 'user', content: user })
+      }
+    }
 
     // This is basically a second layer of caching.
-    const inputHash = this._hash(model, systemPrompt, userPrompt)
+    const inputHash = this._hash(model, messages)
     let resp
     if (runFollowUp && runFollowUp.pastResponses[inputHash]) {
       resp = structuredClone(runFollowUp.pastResponses[inputHash])
     } else {
-      const rs = await this.service.requestCompletion(model, systemPrompt, userPrompt, this.chunkCb, this.generationOpts)
+      const rs = await this.service.requestChatCompletion(model, { messages, generationOptions: this.generationOpts }, this.chunkCb)
       resp = rs[0]
     }
+    if (!resp) {
+      throw new Error('No response from completion service')
+    }
     resp.inputHash = inputHash
     resp.name = details.name
 
     if (details.outputType && details.outputType.codeblock) {
-      const supportedTypes = ['yaml', 'json']
+      const supportedTypes = ['yaml', 'json', 'md']
       if (!supportedTypes.includes(details.outputType.codeblock)) {
-        throw new Error(`Unsupported output type: ${details.outputType.codeblock}`)
+        throw new Error(`Unsupported output type: ${details.outputType.codeblock}. Supported types: ${supportedTypes.join(', ')}`)
       }
       // Abstraction to format/extract the desired format out of the response text, e.g. YAML, JSON, etc.
+      if (!resp.text.includes('```')) throw new Error('No codeblock found in response')
       const [codeblock] = tools.extractCodeblockFromMarkdown(resp.text)
       if (!codeblock) {
         throw new Error('No codeblock found in response')
@@ -60,19 +75,20 @@ class Flow {
         resp.output = yaml.load(codeblock.code)
       } else if (details.outputType.codeblock === 'json') {
         resp.output = JSON.parse(codeblock.code)
+      } else if (details.outputType.codeblock === 'md') {
+        resp.output = codeblock.code
       }
     }
-    responses.push(resp)
 
     const nextInherited = {
       with: usingVars,
-      prompt: promptFile,
-      systemPrompt
+      prompt
     }
 
     if (details.transformResponse) {
       resp = await details.transformResponse(resp)
     }
+    responses.push(resp)
 
     if (runFollowUp && details.followUps[runFollowUp.name]) {
       const f = await details.followUps[runFollowUp.name](resp, runFollowUp.input)
diff --git a/src/GoogleAIStudioCompletionService.js b/src/GoogleAIStudioCompletionService.js
index 04a3fd6..63cf03d 100644
--- a/src/GoogleAIStudioCompletionService.js
+++ b/src/GoogleAIStudioCompletionService.js
@@ -1,5 +1,6 @@
 const caching = require('./caching')
 const studioLoader = require('./googleAIStudio')
+const util = require('./util')
 
 const supportedModels = ['gemini-1.0-pro', 'gemini-1.5-pro']
 const modelAliases = {
@@ -47,21 +48,19 @@ class GoogleAIStudioCompletionService {
         return [cachedResponse]
       }
     }
-
     function saveIfCaching (response) {
-      if (response && response.text && options.enableCaching) {
+      if (response && response.content && options.enableCaching) {
         caching.addResponseToCache(model, [system, user], response)
       }
       return response
     }
 
-    const guidance = system?.guidanceText || user?.guidanceText || ''
-    if (guidance) chunkCb?.({ done: false, delta: guidance })
-    const mergedPrompt = [system?.basePrompt || system, user?.basePrompt || user].join('\n')
-    const messages = [{ role: 'user', content: mergedPrompt }]
-    if (guidance) messages.push({ role: 'model', content: guidance })
+    const messages = [{ role: 'user', content: user }]
+    if (system) {
+      messages.unshift({ role: 'system', content: system })
+    }
     const result = await this._studio.generateCompletion(model, messages, chunkCb)
-    let combinedResult = result.text
+    let combinedResult = result.content
     if (options.autoFeed) {
       const until = options.autoFeed.stopLine
       const maxRounds = options.autoFeed.maxRounds || 10
@@ -70,33 +69,37 @@ class GoogleAIStudioCompletionService {
         // Check if the last message is a model message, if not, insert one
         const lastMessage = messages[messages.length - 1]
         if (lastMessage.role !== 'model') {
-          messages.push({ role: 'model', content: result.text })
+          messages.push({ role: 'model', content: result.content })
         } else {
           // Append the result to the last model message
-          lastMessage.content += result.text
+          lastMessage.content += result.content
         }
         for (let i = 0; i < maxRounds; i++) {
           const lastMessage = messages[messages.length - 1]
           const now = await this._studio.generateCompletion(model, messages, chunkCb)
-          lastMessage.content += now.text
-          combinedResult += now.text
-          if (checkContainsStopTokenLine(now.text, until)) {
+          lastMessage.content += now.content
+          combinedResult += now.content
+          if (checkContainsStopTokenLine(now.content, until)) {
             break
           }
         }
       }
     }
     chunkCb?.({ done: true, delta: '\n' })
-    return [saveIfCaching({ text: guidance + combinedResult })]
+    return [saveIfCaching({ type: 'text', text: combinedResult, content: combinedResult })]
   }
 
   async requestChatCompletion (model, { messages, functions, generationOptions }, chunkCb) {
     model = modelAliases[model] || model
-    if (!supportedModels.includes(model)) {
-      throw new Error(`Model ${model} is not supported`)
-    }
+    if (!supportedModels.includes(model)) throw new Error(`Model ${model} is not supported`)
+
+    const guidance = util.checkGuidance(messages, chunkCb)
     const result = await this._studio.requestChatCompletion(model, messages, chunkCb, { ...generationOptions, functions })
     chunkCb?.({ done: true, delta: '\n' })
+    if (result.type === 'text') {
+      const content = guidance ? guidance + result.content : result.content
+      return [{ ...result, content, text: content }]
+    }
     return [result]
   }
 }
diff --git a/src/gemini.js b/src/gemini.js
index 991d54f..e160c9f 100644
--- a/src/gemini.js
+++ b/src/gemini.js
@@ -27,6 +27,7 @@ async function generateChatCompletionEx (model, messages, options, chunkCb) {
   // tools?: Tool[];
   // toolConfig?: ToolConfig;
   // systemInstruction?: Content;
+  messages = mergeDuplicatedRoleMessages(messages)
   const systemMessage = messages.find(m => m.role === 'system')
   const payload = {
     contents: messages.filter(m => m.role !== 'system'),
@@ -130,6 +131,21 @@ async function listModels (apiKey) {
   return response.models
 }
 
+function mergeDuplicatedRoleMessages (messages) {
+  // if there are 2 messages with the same role, merge them with a newline.
+  // Not doing this can return `GoogleGenerativeAIError: [400 Bad Request] Please ensure that multiturn requests ends with a user role or a function response.`
+  const mergedMessages = []
+  for (let i = 0; i < messages.length; i++) {
+    const message = messages[i]
+    if (i > 0 && message.role === messages[i - 1].role) {
+      mergedMessages[mergedMessages.length - 1].parts.push({ text: message.parts[0].text })
+    } else {
+      mergedMessages.push(message)
+    }
+  }
+  return mergedMessages
+}
+
 module.exports = { generateChatCompletionEx, generateChatCompletionIn, generateCompletion, listModels }
 
 /*
diff --git a/src/googleAIStudio.js b/src/googleAIStudio.js
index 53e912c..07c8886 100644
--- a/src/googleAIStudio.js
+++ b/src/googleAIStudio.js
@@ -16,7 +16,7 @@ function mod () {
   let serverPromise
   let wss
 
-  let throttleTime = 15000
+  let throttleTime = 16000
   let throttle, isBusy
 
   // 1. Run a local server that a local AI Studio client can connect to
@@ -109,7 +109,6 @@ function mod () {
   async function generateCompletion (model, messages, chunkCb, options) {
     await runServer()
     await throttle
-    throttle = sleep(throttleTime)
     if (isBusy) {
       throw new Error('Only one request at a time is supported with AI Studio, please wait for the previous request to finish')
     }
@@ -127,10 +126,12 @@ function mod () {
     }
     serverConnection.off('completionChunk', completionChunk)
     // If the user is using streaming, they won't face any delay getting the response
+    throttle = sleep(throttleTime)
     await throttle
     isBusy = false
     return {
-      text: response.text
+      text: response.text,
+      content: response.text
     }
   }
 
@@ -174,7 +175,7 @@ function mod () {
       prefixedMessages.push({ role: 'model', content: guidanceMessage })
     }
 
-    console.debug('Sending chat completion request to server', model, prefixedMessages)
+    debug('Sending chat completion request to server', model, prefixedMessages)
 
     // const rawResponse = '<FUNCTION_CALL>getWeather({"location":"Beijing"})</FUNCTION_CALL>'
     // return { type: 'function', rawResponse, content: '', fnCalls: [{ name: 'getWeather', args: '{"location":"Beijing"}' }]}
@@ -184,7 +185,7 @@ function mod () {
       ...options,
       stopSequences: stops.concat(options?.stopSequences || [])
     })
-    const text = response.text
+    const text = response.content
     const parts = text.split('<|ASSISTANT|>')
     const result = parts[parts.length - 1].trim()
     const containsFunctionCall = result.includes('<FUNCTION_CALL>')
@@ -204,6 +205,7 @@ function mod () {
     } else {
       return {
         type: 'text',
+        text: result,
         content: result
       }
     }
diff --git a/src/index.d.ts b/src/index.d.ts
index 65d1715..8ec96d0 100644
--- a/src/index.d.ts
+++ b/src/index.d.ts
@@ -1,7 +1,9 @@
-type CompletionResponse = { text: string }
+type CompletionResponse = { content: string, text: string }
 
 declare module 'langxlang' {
   type Model = 'gpt-3.5-turbo-16k' | 'gpt-3.5-turbo' | 'gpt-4' | 'gpt-4-turbo-preview' | 'gemini-1.0-pro' | 'gemini-1.5-pro-latest'
+  type Role = 'system' | 'user' | 'assistant'
+  type Message = { role: Role, content: string }
   type ChunkCb = ({ content: string }) => void
 
   type CompletionOptions = {
@@ -26,11 +28,13 @@ declare module 'langxlang' {
 
     listModels(): Promise<{ openai: Record<string, object>, google: Record<string, object> }>
 
-    // Request a non-streaming completion from the model.
+    // Request a completion from the model with a system prompt and a single user prompt.
     requestCompletion(model: Model, systemPrompt: string, userPrompt: string, _chunkCb?, options?: CompletionOptions & {
       // If true, the response will be cached and returned from the cache if the same request is made again.
       enableCaching?: boolean
     }): Promise<CompletionResponse[]>
+    // Request a completion from the model with a sequence of chat messages which have roles.
+    requestChatCompletion(model: Model, options: { messages: Message[], generationOptions: CompletionOptions }, chunkCb: ChunkCb): Promise<CompletionResponse[]>
   }
 
   // Note: GoogleAIStudioCompletionService does NOT use the official AI Studio API, but instead uses a relay server to forward requests to an AIStudio client.
@@ -61,6 +65,9 @@ declare module 'langxlang' {
       // If true, the response will be cached and returned from the cache if the same request is made again.
       enableCaching?: boolean
     }): Promise<CompletionResponse[]>
+
+    // Request a completion from the model with a sequence of chat messages which have roles.
+    requestChatCompletion(model: Model, options: { messages: Message[], generationOptions: CompletionOptions }, chunkCb: ChunkCb): Promise<CompletionResponse[]>
   }
 
   type SomeCompletionService = CompletionService | GoogleAIStudioCompletionService
@@ -132,15 +139,21 @@ declare module 'langxlang' {
     // gives big chunks over long periods of time unlike OpenAI APIs. Default pipeTo is process.stdout.
     createTypeWriterEffectStream(pipeTo?: NodeJS.WritableStream): (chunk) => void
     // Pre-processes markdown and replaces variables and conditionals with data from `vars`
+    loadPrompt(text: string, vars: Record<string, string>, opts: { roles: Record<string, Role> }): Message[]
     loadPrompt(text: string, vars: Record<string, string>): string
     // Loads a file from disk (from current script's relative path or absolute path) and
     // replaces variables and conditionals with data from `vars`
+    importPromptSync(filePath: string, vars: Record<string, string>, opts: { roles: Record<string, Role> }): Message[]
     importPromptSync(filePath: string, vars: Record<string, string>): string
     // Loads a file from disk (from current script's relative path or absolute path) and
     // replaces variables and conditionals with data from `vars`
+    importPrompt(filePath: string, vars: Record<string, string>, opts: { roles: Record<string, Role> }): Promise<Message[]>
     importPrompt(filePath: string, vars: Record<string, string>): Promise<string>
     // Reads a file from disk and returns the raw contents
     importRawSync(filePath: string): string
+    // Takes in a prompt string and splits it by `roles` into an array of messages that are segmented by role.
+    // The `roles` arg is a record of sequences to role. Default roles are <|SYSTEM|> (system), <|USER|> (user), and <|ASSISTANT|> (assistant).
+    segmentPromptByRoles(prompt: string, roles?: Record<string, string>): { role: Role, content: string }[]
     // Various string manipulation tools to minify/strip down strings
     stripping: {
       stripMarkdown(input: string, options?: StripOptions): string
@@ -161,15 +174,6 @@ declare module 'langxlang' {
   const tools: Tools
   const Func: Func
 
-  // Pre-processes markdown and replaces variables and conditionals with data from `vars`
-  function loadPrompt(text: string, vars: Record<string, string>): string
-  // Loads a file from disk (from current script's relative path or absolute path) and
-  // replaces variables and conditionals with data from `vars`
-  function importPromptSync(filePath: string, vars: Record<string, string>): string
-  // Loads a file from disk (from current script's relative path or absolute path) and
-  // replaces variables and conditionals with data from `vars`
-  function importPrompt(filePath: string, vars: Record<string, string>): Promise<string>
-
   // Flow is a class that can be used to create a chain of prompts and response handling.
   // You can also ask follow up questions somewhere along the chain, and Flow will stop
   // executing the chain and wait for the follow up to be answered.
@@ -194,7 +198,9 @@ declare module 'langxlang' {
 // FLOW
 interface FlowChainObjectBase {
   model: Model
-  prompt: string
+  prompt:
+    | { system: string, user: string }
+    | { text: string, roles: Record<string, Role> }
   with: Record<string, string>
   followUps: Record<string, (resp: CompletionResponse, input: object) => SomeFlowChainObject>
   // The function that transforms the response from the model before it's passed to the followUps/next or returned
diff --git a/src/openai.js b/src/openai.js
index baa95d2..b637ea6 100644
--- a/src/openai.js
+++ b/src/openai.js
@@ -154,7 +154,6 @@ async function generateChatCompletionIn (model, messages, options, chunkCb) {
 async function generateCompletion (model, system, user, options = {}) {
   const messages = [{ role: 'user', content: user }]
   if (system) messages.unshift({ role: 'system', content: system })
-  if (options.guidanceMessage) messages.push({ role: 'assistant', content: options.guidanceMessage })
   const completion = await generateChatCompletionIn(model, messages, options)
   debug('[OpenAI] Completion', JSON.stringify(completion))
   return completion
diff --git a/src/tools.js b/src/tools.js
index 2fc7dd4..5008e82 100644
--- a/src/tools.js
+++ b/src/tools.js
@@ -3,7 +3,8 @@ const viz = require('./tools/viz')
 const xml = require('./tools/xml')
 const yaml = require('./tools/yaml')
 const stripping = require('./tools/stripping')
-const { wrapContentWithSufficientTokens, importPromptRaw, importPromptSync, importPrompt, loadPrompt, preMarkdown } = require('./tools/mdp')
+const mdp = require('./tools/mdp')
+const md = require('./tools/md')
 
 function createTypeWriterEffectStream (to = process.stdout) {
   // Instead of writing everything at once, we want a typewriter effect
@@ -49,13 +50,15 @@ module.exports = {
   concatFilesToMarkdown: codebase.concatFilesToMarkdown,
   createTypeWriterEffectStream,
   extractCodeblockFromMarkdown,
-  wrapContent: wrapContentWithSufficientTokens,
-  preMarkdown,
-  loadPrompt,
-  importPromptRaw,
-  importRawSync: importPromptRaw,
-  importPromptSync,
-  importPrompt,
+  wrapContent: mdp.wrapContentWithSufficientTokens,
+  preMarkdown: mdp.preMarkdown,
+  loadPrompt: mdp.loadPrompt,
+  importPromptRaw: mdp.importPromptRaw,
+  importRawSync: mdp.importPromptRaw,
+  importPromptSync: mdp.importPromptSync,
+  importPrompt: mdp.importPrompt,
+  _segmentPromptByRoles: mdp.segmentByRoles,
+  _parseMarkdown: md.parseMarkdown,
   encodeYAML: yaml.encodeYaml,
   decodeXML: xml.decodeXML
 }
diff --git a/src/tools/md.js b/src/tools/md.js
new file mode 100644
index 0000000..6cb6a9a
--- /dev/null
+++ b/src/tools/md.js
@@ -0,0 +1,109 @@
+const { tokenizeMarkdown } = require('./stripping')
+
+/*
+When tokenizing markdown, this:
+# This is a header
+    some preformat
+
+    preformat code continued!
+
+    even lines with some spaces are OK!
+
+Becomes:
+  ['# This is a header', 'text']
+  ['    some preformat', 'preformat', 'some preformat']
+  ['\n', 'text']
+  ['    preformat code continued!', 'preformat', 'preformat code continued!']
+  ['\n  ', 'text']
+  ['    even lines with some spaces are OK!', 'preformat', 'even lines with some spaces are OK!']
+  ['\n', 'text']
+  Then merged, it becomes:
+  ['# This is a header', 'text']
+  ['    some preformat\n    preformat code continued!\n    even lines with some spaces are OK!', 'preformat', 'some preformat\npreformat code continued!\neven lines with some spaces are OK!']
+*/
+
+/*
+And so here, we we want something more structured, like
+[
+  { type: 'text', text: 'Action: request changes <!-- or "approve" or "comment" -->' },
+  { type: 'section', title: 'Comments', children: [
+    { type: 'section', title: 'src/index.js', children: [
+      { type: 'section', title: 'Line', children: [
+        { type: 'text', text: '```js' }
+        { type: 'text', text: 'let aple = 1' }
+        { type: 'text', text: '```' }
+      ] },
+      ...
+    ] }
+  ]}
+]
+*/
+
+function countStart (str, char) {
+  let c = 0
+  for (const s of str) {
+    if (s === char) c++
+    else break
+  }
+  return c
+}
+
+function parseMarkdown (text, options) {
+  const tokens = tokenizeMarkdown(text, {})
+  // pre-process the data a bit
+  const data = []
+  for (const token of tokens) {
+    if (token[1] === 'text') {
+      const text = token[0]
+      for (const line of text.split('\n')) {
+        if (line.startsWith('#')) {
+          const level = countStart(line, '#')
+          data.push({ type: 'header', level, text: line.slice(level).trim() })
+        } else {
+          data.push({ type: 'text', text: line })
+        }
+      }
+    } else if (token[1] === 'code') {
+      data.push({ type: 'code', raw: token[0], lang: token[2], code: token[3] })
+    } else if (token[1] === 'pre') {
+      data.push({ type: 'pre', raw: token[0], lang: token[2], code: token[3] })
+    } else if (token[1] === 'preformat') {
+      data.push({ type: 'preformat', raw: token[0], code: token[2] })
+    } else if (token[1] === 'comment') {
+      data.push({ type: 'comment', raw: token[0] })
+    }
+  }
+
+  const inter = []
+  let current = inter
+  let currentHeader = null
+  for (let i = 0; i < data.length; i++) {
+    const line = data[i]
+    if (line.type === 'header') {
+      if (!currentHeader) {
+        currentHeader = { type: 'section', level: line.level, title: line.text, children: [], parent: null }
+        current.push(currentHeader)
+        current = currentHeader.children
+      } else {
+        while (currentHeader.level >= line.level) {
+          currentHeader = currentHeader.parent
+          current = currentHeader.children
+        }
+        const newHeader = { type: 'section', level: line.level, title: line.text, children: [], parent: currentHeader }
+        current.push(newHeader)
+        current = newHeader.children
+        currentHeader = newHeader
+      }
+    } else {
+      current.push(line)
+    }
+  }
+  // remove circular references
+  const structured = JSON.parse(JSON.stringify(inter, (key, value) => key === 'parent' ? undefined : value, 2))
+  return {
+    lines: data,
+    structured
+  }
+}
+
+module.exports = { parseMarkdown }
diff --git a/src/tools/mdp.js b/src/tools/mdp.js
index 6f9b76c..6e3cdc8 100644
--- a/src/tools/mdp.js
+++ b/src/tools/mdp.js
@@ -1,16 +1,12 @@
 const fs = require('fs')
 const { join, dirname } = require('path')
 const getCaller = require('caller')
-const { normalizeLineEndings } = require('./stripping')
+const { stripMdpComments, normalizeLineEndings } = require('./stripping')
 
 // See doc/MarkdownPreprocessing.md for more information
 
 class PromptString extends String {
-  constructor (str, basePrompt, guidanceText) {
-    super(str)
-    this.basePrompt = basePrompt
-    this.guidanceText = guidanceText
-  }
+
 }
 
 function preMarkdown (text, vars = {}) {
@@ -24,6 +20,8 @@ function preMarkdown (text, vars = {}) {
   let tokens = []
   let temp = ''
   let result = text
+  // First, strip out all the MDP comments (<!--- ... -->)
+  result = stripMdpComments(text)
 
   // Handle conditional insertions first
   const TOKEN_COND_START = '%%%['
@@ -222,6 +220,35 @@ function preMarkdown (text, vars = {}) {
   return result
 }
 
+const DEFAULT_ROLES = {
+  '<|SYSTEM|>': 'system',
+  '<|USER|>': 'user',
+  '<|ASSISTANT|>': 'assistant'
+}
+
+function segmentByRoles (text, roles) {
+  // split the text into segments based on the roles
+  const segments = []
+  for (let i = 0; i < text.length; i++) {
+    for (const role in roles) {
+      if (text.slice(i, i + role.length) === role) {
+        segments.push({ role, start: i, end: i + role.length })
+        i += role.length
+        break
+      }
+    }
+  }
+  // now we can extract the text from each segment
+  const result = []
+  for (let i = 0; i < segments.length; i++) {
+    const segment = segments[i]
+    const nextSegment = segments[i + 1]
+    const roleText = text.slice(segment.end, nextSegment ? nextSegment.start : text.length)
+    result.push({ role: roles[segment.role], content: roleText.trim() })
+  }
+  return result.filter(x => x.content.trim() !== '')
+}
+
 // Wraps the contents by using the specified token character at least 3 times,
 // ensuring that the token is long enough that it's not present in the content
 function wrapContentWithSufficientTokens (content, token = '`', initialTokenSuffix = '') {
@@ -237,17 +264,20 @@ function wrapContentWithSufficientTokens (content, token = '`', initialTokenSuff
   return lines
 }
 
-function loadPrompt (text, vars) {
-  // Prevent user data from affecting the guidance token by using a intermediate random string
-  const TOKEN_GUIDANCE_START = '%%%$GUIDANCE_START$%%%'
-  const tokenGuidanceStart = `%%%$GUIDANCE_START${Math.random()}$%%%`
-  text = text.replaceAll(TOKEN_GUIDANCE_START, tokenGuidanceStart)
+function loadPrompt (text, vars, options = {}) {
+  const newRoles = {}
+  if (options.roles) {
+    // Prevent user data from affecting this
+    const roles = options.roles === true ? DEFAULT_ROLES : options.roles
+    for (const role in roles) {
+      const newRole = role + Math.random()
+      newRoles[newRole] = roles[role]
+      text = text.replaceAll(role, newRole)
+    }
+  }
   const str = preMarkdown(text.replaceAll('\r\n', '\n'), vars)
-  const guidanceText = str.indexOf(tokenGuidanceStart)
-  if (guidanceText !== -1) {
-    const [basePrompt, guidanceText] = str.split(tokenGuidanceStart)
-    const newStr = str.replace(tokenGuidanceStart, '')
-    return new PromptString(newStr, basePrompt, guidanceText)
+  if (options.roles) {
+    return segmentByRoles(str, newRoles)
   } else {
     return new PromptString(str)
   }
@@ -278,14 +308,14 @@ function importPromptRaw (path) {
   return data.replaceAll('\r\n', '\n')
 }
 
-function importPromptSync (path, vars) {
+function importPromptSync (path, vars, opts) {
   const data = path.startsWith('.')
     ? readSync(path, getCaller(1))
     : readSync(path)
-  return loadPrompt(data, vars)
+  return loadPrompt(data, vars, opts)
 }
 
-async function importPrompt (path, vars) {
+async function importPrompt (path, vars, opts) {
   let fullPath = path
   if (path.startsWith('.')) {
     const caller = getCaller(1)
@@ -295,7 +325,7 @@ async function importPrompt (path, vars) {
   }
   try {
     const text = await fs.promises.readFile(fullPath, 'utf-8')
-    return loadPrompt(text, vars)
+    return loadPrompt(text, vars, opts)
   } catch (e) {
     if (!path.startsWith('.')) {
       throw new Error(`Failed to load prompt at specified path '${path}'. If you want to load a prompt relative to your script's current directory, you need to pass a relative path starting with './'`)
@@ -304,4 +334,4 @@ async function importPrompt (path, vars) {
   }
 }
 
-module.exports = { preMarkdown, wrapContentWithSufficientTokens, loadPrompt, importPromptSync, importPrompt, importPromptRaw }
+module.exports = { preMarkdown, wrapContentWithSufficientTokens, segmentByRoles, loadPrompt, importPromptSync, importPrompt, importPromptRaw }
diff --git a/src/tools/stripping.js b/src/tools/stripping.js
index 7d4919c..0c4b097 100644
--- a/src/tools/stripping.js
+++ b/src/tools/stripping.js
@@ -10,6 +10,28 @@ function normalizeLineEndings (str) {
   return str.replace(/\r\n/g, '\n')
 }
 
+function count (str, char) {
+  let c = 0
+  for (const s of str) {
+    if (s === char) c++
+  }
+  return c
+}
+function countStart (str, char) {
+  let c = 0
+  for (const s of str) {
+    if (s === char) c++
+    else break
+  }
+  return c
+}
+function stripXmlComments (text) {
+  return text.replace(/<!--[\s\S]*?-->/g, '')
+}
+function stripMdpComments (text) {
+  return text.replace(/<!---[\s\S]*?-->/g, '')
+}
+
 function stripJava (code, options) {
   // First, we need to "tokenize" the code, by splitting it into 3 types of data: comments, strings, and code.
   const tokens = []
@@ -426,40 +448,101 @@ function strOnlyContainsCharExcludingWhitespace (str, char) {
 }
 
 function tokenizeMarkdown (comment, options) {
-  // console.log('Tokenize', comment)
   const tokens = []
   let tokenSoFar = ''
   let inCodeBlock = false
   let inCodeLang
+  let inPreTag = false
+  let linePadding = 0
   for (let i = 0; i < comment.length; i++) {
     const currentChar = comment[i]
+    const lastChar = comment[i - 1]
     const slice = comment.slice(i)
-    if (inCodeBlock) {
-      // TODO: Check markdown spec -- does \n have to proceed the code block end?
-      // Because LLMs can't backtrack to escape a codeblock with more backticks after it's started
-      // writing, we need to check \n before closing block to make sure it's actually the end
-      if (slice.startsWith('\n' + inCodeBlock)) {
-        const code = tokenSoFar.slice(inCodeBlock.length + inCodeLang.length + 1) // +1 for the newline
-        tokens.push([tokenSoFar + '\n' + inCodeBlock, 'code', inCodeLang, code + '\n'])
-        i += inCodeBlock.length
+    if (lastChar === '\n') {
+      linePadding = countStart(slice.replace('\t', '    '), ' ')
+    }
+    if (inPreTag) {
+      if (slice.startsWith('</pre>')) {
+        tokens.push([tokenSoFar + '</pre>', 'pre'])
+        i += 5
+        inPreTag = false
+        tokenSoFar = ''
+      } else {
+        tokenSoFar += currentChar
+      }
+    } else if (inCodeBlock) {
+      // This handles backticks closing code blocks. It's tricky as the markdown spec isn't clear on this.
+      // On top of that, once LLMs start generating text (for example with 3 backticks), they can't backtrack
+      // and add more backticks to the enclosing codeblock to avoid escaping problems. This means it is not
+      // possible to ascertain starting and ending code blocks just by looking at n-back tick chars, we must
+      // also on top make sure the padding for the start/stop backtick'ed lines are the same. This seems to work
+      // well and also handles tabulation, for example a paragraph that's got 1-3 spaces of indent (4+ would be a pre block).
+      if (slice.startsWith(inCodeBlock.tag) && (inCodeBlock.padding === linePadding)) {
+        const code = tokenSoFar.slice(inCodeBlock.tag.length + inCodeLang.length + 1) // +1 for the newline after ```
+        tokens.push([tokenSoFar + inCodeBlock.tag, 'code', inCodeLang, code])
+        i += inCodeBlock.tag.length
         inCodeBlock = false
         tokenSoFar = ''
       } else {
         tokenSoFar += currentChar
       }
     } else {
+      if (lastChar === '\n' && slice.startsWith('    ')) {
+        // Handle tab preformatted text blocks.
+        // This is a bit tricky as we need to check if the last line is empty or a markdown header before
+        // we can allow a preformatted block to start. Also, multiple subsequent preformatted blocks should
+        // be concatenated, or even text blocks if they are empty, so we have to concat afterwards in postproc.
+        const lastLine = tokenSoFar.slice(0, -1).split('\n').pop()
+        if (lastLine.trim() === '' || lastLine.startsWith('#')) {
+          // 4-space code block for this whole line
+          tokens.push([tokenSoFar, 'text'])
+          tokenSoFar = ''
+          let lineEnd = slice.indexOf('\n')
+          if (lineEnd === -1) lineEnd = slice.length
+          const raw = slice.slice(0, lineEnd + 1)
+          const code = slice.slice(4, lineEnd)
+          tokens.push([raw, 'preformat', code])
+          i += lineEnd
+          continue
+        }
+      }
+      if (slice.startsWith('<!--')) { // Comment
+        const end = slice.indexOf('-->')
+        if (end === -1) {
+          if (options.allowMalformed) {
+            tokens.push([tokenSoFar, 'text'])
+            tokens.push([slice, 'comment'])
+            break
+          } else {
+            throw new Error('Unmatched markdown comment')
+          }
+        } else {
+          tokens.push([tokenSoFar, 'text'])
+          tokens.push([slice.slice(0, end + 3), 'comment'])
+          i += end + 2
+          tokenSoFar = ''
+        }
+        continue
+      }
+      const preMatch = slice.match(/^<pre>/)
       const codeMatch = slice.match(/^([`]{3,})([a-zA-Z]*)\n/)
       if (codeMatch) {
         tokens.push([tokenSoFar, 'text'])
-        inCodeBlock = codeMatch[1]
+        inCodeBlock = { tag: codeMatch[1], padding: linePadding }
         inCodeLang = codeMatch[2]
         tokenSoFar = codeMatch[0]
         i += tokenSoFar.length - 1
+      } else if (preMatch) {
+        tokens.push([tokenSoFar, 'text'])
+        inPreTag = true
+        tokenSoFar = preMatch[0]
+        i += tokenSoFar.length - 1
       } else {
         tokenSoFar += currentChar
       }
     }
   }
+
   if (inCodeBlock) {
     if (options.allowMalformed) {
       tokens.push([tokenSoFar, 'text'])
@@ -468,7 +551,35 @@ function tokenizeMarkdown (comment, options) {
     }
   }
   tokens.push([tokenSoFar, 'text'])
-  return tokens
+
+  // Now we need to merge adjacent preformatted blocks or preformatted blocks with spacing text blocks between
+  const updated = []
+  for (let i = 0; i < tokens.length; i++) {
+    const token = tokens[i]
+    if (token[1] === 'preformat') {
+      let intermediateEmptyLines = ''
+      for (let j = i + 1; j < tokens.length; j++) {
+        const nextToken = tokens[j]
+        if (nextToken[1] === 'preformat') {
+          if (intermediateEmptyLines) {
+            const lineCount = count(intermediateEmptyLines, '\n')
+            token[0] += intermediateEmptyLines
+            token[2] += '\n'.repeat(lineCount)
+            intermediateEmptyLines = ''
+          }
+          token[0] += nextToken[0]
+          token[2] += '\n' + nextToken[2]
+          i = j
+        } else if (nextToken[1] === 'text' && nextToken[0].trim() === '') {
+          intermediateEmptyLines += nextToken[0]
+        } else {
+          break
+        }
+      }
+    }
+    updated.push(token)
+  }
+  return updated
 }
 
 // This mainly erases extraneous new lines outside of code blocks, including ones with empty block quotes
@@ -541,4 +652,4 @@ function stripDiff (diff, options = {}) {
   return result.join('\n')
 }
 
-module.exports = { stripJava, stripPHP, stripGo, stripMarkdown, stripDiff, removeNonAscii, normalizeLineEndings, tokenizeMarkdown }
+module.exports = { stripJava, stripPHP, stripGo, stripMarkdown, stripDiff, removeNonAscii, normalizeLineEndings, tokenizeMarkdown, stripXmlComments, stripMdpComments }
diff --git a/src/tools/viz.js b/src/tools/viz.js
index 4d19bc7..1acec9c 100644
--- a/src/tools/viz.js
+++ b/src/tools/viz.js
@@ -85,15 +85,15 @@ async function makeVizForPrompt (system, user, models, options) {
     if (modelInfo.author === 'googleaistudio') {
       aiStudioService ??= new GoogleAIStudioCompletionService(options?.aiStudioPort)
       await aiStudioService.ready
-      const { text } = await aiStudioService.requestCompletion(model, system, user)
+      const [response] = await aiStudioService.requestCompletion(model, system, user)
       data.models.push([modelInfo.displayName, modelInfo.safeId])
-      data.outputs[modelInfo.safeId] = text
+      data.outputs[modelInfo.safeId] = response.text
       continue
     }
 
-    const { text } = await service.requestCompletion(model, system, user)
+    const [response] = await service.requestCompletion(model, system, user)
     data.models.push([modelInfo.displayName, modelInfo.safeId])
-    data.outputs[modelInfo.safeId] = text
+    data.outputs[modelInfo.safeId] = response.text
   }
 
   aiStudioService?.stop()
diff --git a/src/util.js b/src/util.js
index 210cf8e..fc109cb 100644
--- a/src/util.js
+++ b/src/util.js
@@ -52,4 +52,20 @@ async function sleep (ms) {
   })
 }
 
-module.exports = { sleep, cleanMessage, getModelInfo, getRateLimit, checkDoesGoogleModelSupportInstructions, knownModelInfo, knownModels }
+function checkGuidance (messages, chunkCb) {
+  const guidance = messages.filter((msg) => msg.role === 'guidance')
+  if (guidance.length > 1) {
+    throw new Error('Only one guidance message is supported')
+  } else if (guidance.length) {
+    // ensure it's the last message
+    const lastMsg = messages[messages.length - 1]
+    if (lastMsg !== guidance[0]) {
+      throw new Error('Guidance message must be the last message')
+    }
+    chunkCb?.({ done: false, content: guidance[0].content })
+    return guidance[0].content
+  }
+  return ''
+}
+
+module.exports = { sleep, cleanMessage, getModelInfo, getRateLimit, checkDoesGoogleModelSupportInstructions, checkGuidance, knownModelInfo, knownModels }
diff --git a/test/flow.test.js b/test/flow.test.js
index 0a727e1..9cc3a60 100644
--- a/test/flow.test.js
+++ b/test/flow.test.js
@@ -2,8 +2,7 @@
 const assert = require('assert')
 const { Flow } = require('langxlang')
 
-const chain = (params) => ({
-  prompt: `
+const promptText = `
 <USER>
 Hello, how are you doing today on this %%%(DAY_OF_WEEK)%%%?
 <ASSISTANT>
@@ -22,7 +21,16 @@ Hello, how are you doing today on this %%%(DAY_OF_WEEK)%%%?
   ~~~
   <ASSISTANT>
 %%%endif
-`.trim().replaceAll('~~~', '```'),
+`.trim().replaceAll('~~~', '```')
+
+const chain = (params) => ({
+  prompt: {
+    text: promptText,
+    roles: {
+      '<USER>': 'user',
+      '<ASSISTANT>': 'assistant'
+    }
+  },
   with: {
     DAY_OF_WEEK: params.dayOfWeek
   },
@@ -44,9 +52,10 @@ Hello, how are you doing today on this %%%(DAY_OF_WEEK)%%%?
 })
 
 const dummyCompletionService = {
-  requestCompletion: async (model, systemPrompt, userPrompt, chunkCb, generationOpts) => {
-    const mergedPrompt = [systemPrompt, userPrompt].join('\n')
-    // console.log('mergedPrompt', mergedPrompt)
+  requestChatCompletion: async (model, { messages, generationOpts }, chunkCb) => {
+    assert(messages.every(msg => ['user', 'assistant'].includes(msg.role)))
+    const mergedPrompt = messages.map(m => m.content).join('\n')
+    // console.log('mergedPrompt', [mergedPrompt], messages)
     if (mergedPrompt.includes('YAML')) {
       return [{ text: 'Sure! Here is some yaml:\n```yaml\nare_ok: yes\n```\nI hope that helps!' }]
     } else if (mergedPrompt.includes('tomorrow')) {
diff --git a/test/testPromptRoles.md b/test/testPromptRoles.md
new file mode 100644
index 0000000..f5bc4bd
--- /dev/null
+++ b/test/testPromptRoles.md
@@ -0,0 +1,10 @@
+<|SYSTEM|>
+Respond to the user like a pirate.
+<|USER|>
+How are you today?
+<|ASSISTANT|>
+Arrr, I be doin' well, matey! How can I help ye today?
+<|USER|>
+What is the weather like?
+<|ASSISTANT|>
+Arrr, the weather be fair and mild, matey. Ye be safe to set sail!
\ No newline at end of file
diff --git a/test/testprompt.md b/test/testprompt.md
index 0a4b3f4..a3f53e2 100644
--- a/test/testprompt.md
+++ b/test/testprompt.md
@@ -5,4 +5,4 @@ You are running in Google AI Studio.
 %%%else
 You are running via API.
 %%%endif
-Done!
+Done!<!---comment!-->
diff --git a/test/tooling.test.js b/test/tooling.test.js
index d8bd031..deeea8a 100644
--- a/test/tooling.test.js
+++ b/test/tooling.test.js
@@ -29,6 +29,24 @@ describe('Basic tests', () => {
     // fs.writeFileSync('__encoded.yaml', encoded)
     yaml.load(encoded)
   })
+
+  it('md parsing works', function () {
+    const parsed = tools._parseMarkdown(testMd)
+    const json = JSON.stringify(parsed)
+    // console.log('Parsed', json)
+    assert.strictEqual(json, expectedMd)
+  })
+
+  it('mdp role processing', function () {
+    const prompt = tools.importRawSync('./testPromptRoles.md')
+    const messages = tools._segmentPromptByRoles(prompt, {
+      '<|SYSTEM|>': 'system',
+      '<|USER|>': 'user',
+      '<|ASSISTANT|>': 'assistant'
+    }) // use default roles
+    console.log('Messages', JSON.stringify(messages))
+    assert.strictEqual(JSON.stringify(messages), '[{"role":"system","content":"Respond to the user like a pirate."},{"role":"user","content":"How are you today?"},{"role":"assistant","content":"Arrr, I be doin\' well, matey! How can I help ye today?"},{"role":"user","content":"What is the weather like?"},{"role":"assistant","content":"Arrr, the weather be fair and mild, matey. Ye be safe to set sail!"}]')
+  })
 })
 
 describe('stripping', function () {
@@ -129,3 +147,28 @@ const mineflayer3213 = "- [X] The [FAQ](https://github.com/PrismarineJS/mineflay
 
 const expectedStrip33 = 'Are you saying the automatic doc deploy script is broken? It seems to be\nworking to me. Please explain what has changed in the source and what\nshould be different on the html page.'
 const expectedStrip3213 = "- [X] The [FAQ](https://github.com/PrismarineJS/mineflayer/blob/master/docs/FAQ.md) doesn't contain a resolution to my issue \n## Versions\n - mineflayer: 4.14.0\n - server: vanilla 1.20.1\n - node: 20.8.0\n## Detailed description of a problem\nI was trying to get the bot to the middle of a water pool, by walking it over a boat, and it wasn't moving. I eventually teleported the bot to where I wanted it and it started walking backwards.\nThe physics of the bot doesn't support walking on to boats, and instead treats them as walls.\nhttps://github.com/PrismarineJS/mineflayer/assets/55368789/cb8ab56c-939f-483a-9692-e078ec6366ce\n## Your current code\n```js\n// note: replaced chat handling with the code I used in the video\nconst mineflayer = require('mineflayer')\nconst bot = mineflayer.createBot({\n  host: 'localhost',\n  port: 44741,\n  username: 'SentryBuster',\n  auth: 'offline'\n})\nbot.once('login',()=>{\n  bot.settings.skinParts.showCape = false\n  var flipped = false;\n  setInterval(()=>{\n    let skinparts = bot.settings.skinParts;\n    // skinparts.showJacket = flipped;\n    // skinparts.showLeftSleeve = flipped;\n    // skinparts.showRightSleeve = flipped;\n    // skinparts.showLeftPants = flipped;\n    // skinparts.showRightPants = flipped;\n    skinparts.showHat = flipped;\n    flipped=!flipped\n    bot.setSettings(bot.settings);\n  },250)\n})\nbot.on('spawn',()=>{\n  bot.setControlState('forward',true);\n  setTimeout({\n    bot.setControlState('forward',false);\n    bot.setControlState('back',true);\n  },250)\n})\n```\n## Expected behavior\nI expect the bot to walk onto the boat, just like it does for slabs and stairs.\n## Additional context\nI searched the issues and found a similar issue (#228) about the bot having issues with block collision boxes, but no mention of entities.\nI definitely could modify the bot to jump, and it was only for the bot to get to the middle of the water pool, but I feel like it'd be good to make an issue about this, if anyone else encounters it, or if it becomes a bigger problem in the future. This is an edge case and may not be worth fixing."
+
+const testMd = `
+Action: request changes <!-- or "approve" or "comment" -->
+
+# Comments
+## src/index.js
+This is a file!
+### Line
+    let aple = 1
+### Comment
+It seems like you misspelled 'apple' here. It should be 'apple'.
+## src/index.js
+### Line
+    // Export the function
+
+  
+    module.exports = {}
+### Comment
+It seems like you forgot to export a function from this module! Here's my suggested correction:
+~~~suggestion
+// Export the function
+module.exports = { myFunction }
+~~~
+`.trim().replaceAll('~~~', '```')
+const expectedMd = JSON.stringify({"lines":[{"type":"text","text":"Action: request changes "},{"type":"comment","raw":"<!-- or \"approve\" or \"comment\" -->"},{"type":"text","text":""},{"type":"text","text":""},{"type":"header","level":1,"text":"Comments"},{"type":"header","level":2,"text":"src/index.js"},{"type":"text","text":"This is a file!"},{"type":"header","level":3,"text":"Line"},{"type":"text","text":""},{"type":"preformat","raw":"    let aple = 1\n","code":"let aple = 1"},{"type":"header","level":3,"text":"Comment"},{"type":"text","text":"It seems like you misspelled 'apple' here. It should be 'apple'."},{"type":"header","level":2,"text":"src/index.js"},{"type":"header","level":3,"text":"Line"},{"type":"text","text":""},{"type":"preformat","raw":"    // Export the function\n\n  \n    module.exports = {}\n","code":"// Export the function\n\n\nmodule.exports = {}"},{"type":"header","level":3,"text":"Comment"},{"type":"text","text":"It seems like you forgot to export a function from this module! Here's my suggested correction:"},{"type":"text","text":""},{"type":"code","raw":"```suggestion\n// Export the function\nmodule.exports = { myFunction }\n```","lang":"suggestion","code":"// Export the function\nmodule.exports = { myFunction }\n"},{"type":"text","text":""}],"structured":[{"type":"text","text":"Action: request changes "},{"type":"comment","raw":"<!-- or \"approve\" or \"comment\" -->"},{"type":"text","text":""},{"type":"text","text":""},{"type":"section","level":1,"title":"Comments","children":[{"type":"section","level":2,"title":"src/index.js","children":[{"type":"text","text":"This is a file!"},{"type":"section","level":3,"title":"Line","children":[{"type":"text","text":""},{"type":"preformat","raw":"    let aple = 1\n","code":"let aple = 1"}]},{"type":"section","level":3,"title":"Comment","children":[{"type":"text","text":"It seems like you misspelled 'apple' here. It should be 'apple'."}]}]},{"type":"section","level":2,"title":"src/index.js","children":[{"type":"section","level":3,"title":"Line","children":[{"type":"text","text":""},{"type":"preformat","raw":"    // Export the function\n\n  \n    module.exports = {}\n","code":"// Export the function\n\n\nmodule.exports = {}"}]},{"type":"section","level":3,"title":"Comment","children":[{"type":"text","text":"It seems like you forgot to export a function from this module! Here's my suggested correction:"},{"type":"text","text":""},{"type":"code","raw":"```suggestion\n// Export the function\nmodule.exports = { myFunction }\n```","lang":"suggestion","code":"// Export the function\nmodule.exports = { myFunction }\n"},{"type":"text","text":""}]}]}]}]}) // eslint-disable-line