diff --git a/docs/reference/openai_text_generation.md b/docs/reference/openai_text_generation.md
index 5845545fa..5eb1c8c1d 100644
--- a/docs/reference/openai_text_generation.md
+++ b/docs/reference/openai_text_generation.md
@@ -1,55 +1,89 @@
 # Generate text with the OpenAI API
 
-Outlines is focused on 🔓 models, but includes an OpenAI integration nevertheless. You can instantiate a model very simply by calling the [outlines.models.openai][] function, with either a chat or non chat model:
+Outlines supports models available via the OpenAI Chat API, e.g. ChatGPT and GPT-4. The following models can be used with Outlines:
 
 ```python
 from outlines import models
 
-model = models.openai("text-davinci-003")
-model = models.openai("gpt4")
+model = models.openai("gpt-3.5-turbo")
+model = models.openai("gpt-4")
 
 print(type(model))
-# OpenAIAPI
+# OpenAI
 ```
 
-!!! note
+It is possible to pass a system message to the model when initializing it:
+
+```python
+from outlines import models
+
+model = models.openai("gpt-4", system_prompt="You are a useful assistant")
+```
+
+This message will be used for every subsequent use of the model:
+
+## Usage
 
-    It is currently not possible to pass a system message to the model. If that is something you need, please [open an Issue](https://github.com/outlines-dev/outlines/issues) or, better, [submit a Pull Request](https://github.com/outlines-dev/outlines/pulls).
+### Call the model
 
-The OpenAI integration supports the following features:
+OpenAI models can be directly called with a prompt:
 
-- The ability to stop the generation when a specified sequence is found [🔗](#stop-when-a-sequence-is-found)
-- The ability to choose between different choices [🔗](#multiple-choices)
-- Vectorization, i.e. the ability to pass an array of prompts and execute all requests concurrently [🔗](#vectorized-calls)
+```python
+from outlines import models
+
+model = models.openai("gpt-3.5-turbo")
+result = model("Say something", temperature=0, samples=2)
+```
+
+!!! warning
 
-## Stop when a sequence is found
+    This syntax will soon be deprecated and one will be able to generate text with OpenAI models with the same syntax used to generate text with Open Source models.
+
+### Stop when a sequence is found
 
 The OpenAI API tends to be chatty and it can be useful to stop the generation once a given sequence has been found, instead of paying for the extra tokens and needing to post-process the output. For instance if you only to generate a single sentence:
 
 ```python
 from outlines import models
 
-model = models.openai("text-davinci-003")
+model = models.openai("gpt-4")
 response = model("Write a sentence", stop_at=['.'])
 ```
 
-## Multiple choices
+### Choose between multiple choices
 
-It can be difficult to deal with a classification problem with the OpenAI API. However well you prompt the model, chances are you are going to have to post-process the output anyway. Sometimes the model will even make up choices. Outlines allows you to *guarantee* that the output of the model will be within a set of choices you specify:
+It can be difficult to deal with a classification problem with the OpenAI API. However well you prompt the model, chances are you are going to have to post-process the output anyway. Sometimes the model will even make up choices. Outlines allows you to *guarantee* that the output of the model will be within a set of choices:
 
 ```python
 from outlines import models
 
-prompt = """
-Review: The OpenAI API is very limited. It does not allow me to do guided generation properly.
-Question: What is the overall sentiment of this review?
-Answer:
-"""
+model = models.openai("gpt-3.5-turbo")
+result = model.generate_choice("Red or blue?", ["red", "blue"])
+```
 
-model = models.openai("text-davinci-003")
-response = model(prompt, is_in=['Positive', 'Negative'])
+!!! warning
+
+    This syntax will soon be deprecated and one will be able to generate text with OpenAI models with the same syntax used to generate text with Open Source models.
+
+## Monitoring API use
+
+It is important to be able to track your API usage when working with OpenAI's API. The number of prompt tokens and completion tokens is directly accessible via the model instance:
+
+```python
+import outlines.models
+
+model = models.openai("gpt-4")
+
+print(model.prompt_tokens)
+# 0
+
+print(model.completion_tokens)
+# 0
 ```
 
+These numbers are updated every time you call the model.
+
+
 ## Vectorized calls
 
 A unique feature of Outlines is that calls to the OpenAI API are *vectorized* (In the [NumPy sense](https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html) of the word). In plain English this means that you can call an Openai model with an array of prompts with arbitrary shape to an OpenAI model and it will return an array of answers. All calls are executed concurrently, which means this takes roughly the same time as calling the model with a single prompt:
@@ -165,3 +199,21 @@ You may find this useful, e.g., to implement [Tree of Thoughts](https://arxiv.or
 !!! note
 
     Outlines provides an `@outlines.vectorize` decorator that you can use on any `async` python function. This can be useful for instance when you call a remote API within your workflow.
+
+
+## Advanced usage
+
+It is possible to specify the values for `seed`, `presence_penalty`, `frequence_penalty`, `top_p` by passing an instance of `OpenAIConfig` when initializing the model:
+
+```python
+from outlines.models.openai import OpenAIConfig
+from outlines import models
+
+config = OpenAIConfig(
+    presence_penalty=1.,
+    frequence_penalty=1.,
+    top_p=.95,
+    seed=0,
+)
+model = models.openai("gpt-4", config=config)
+```
diff --git a/outlines/models/openai.py b/outlines/models/openai.py
index 7d7ac61c8..45509a9f4 100644
--- a/outlines/models/openai.py
+++ b/outlines/models/openai.py
@@ -57,7 +57,7 @@ class OpenAIConfig:
 
     """
 
-    model: str
+    model: str = ""
     frequency_penalty: float = 0
     logit_bias: Dict[int, int] = field(default_factory=dict)
     max_tokens: Optional[int] = None
@@ -79,6 +79,8 @@ def __init__(
         model_name: str,
         api_key: Optional[str] = None,
         max_retries: int = 6,
+        timeout: Optional[float] = None,
+        system_prompt: Optional[str] = None,
         config: Optional[OpenAIConfig] = None,
     ):
         """Create an `OpenAI` instance.
@@ -93,6 +95,10 @@ def __init__(
             `openai.api_key`.
         max_retries
             The maximum number of retries when calls to the API fail.
+        timeout
+            Duration after which the request times out.
+        system_prompt
+            The content of the system message that precedes the user's prompt.
         config
             An instance of `OpenAIConfig`. Can be useful to specify some
             parameters that cannot be set by calling this class' methods.
@@ -120,7 +126,16 @@ def __init__(
         else:
             self.config = OpenAIConfig(model=model_name)
 
-        self.client = openai.AsyncOpenAI(api_key=api_key, max_retries=max_retries)
+        self.client = openai.AsyncOpenAI(
+            api_key=api_key, max_retries=max_retries, timeout=timeout
+        )
+        self.system_prompt = system_prompt
+
+        # We count the total number of prompt and generated tokens as returned
+        # by the OpenAI API, summed over all the requests performed with this
+        # model instance.
+        self.prompt_tokens = 0
+        self.completion_tokens = 0
 
     def __call__(
         self,
@@ -158,7 +173,13 @@ def __call__(
                 )
             )
         if "gpt-" in self.config.model:
-            return generate_chat(prompt, self.client, config)
+            response, usage = generate_chat(
+                prompt, self.system_prompt, self.client, config
+            )
+            self.prompt_tokens += usage["prompt_tokens"]
+            self.completion_tokens += usage["completion_tokens"]
+
+            return response
 
     def generate_choice(
         self, prompt: str, choices: List[str], max_tokens: Optional[int] = None
@@ -210,7 +231,13 @@ def generate_choice(
                 break
 
             config = replace(config, logit_bias=mask, max_tokens=max_tokens_left)
-            response = generate_chat(prompt, self.client, config)
+
+            response, usage = generate_chat(
+                prompt, self.system_prompt, self.client, config
+            )
+            self.completion_tokens += usage["completion_tokens"]
+            self.prompt_tokens += usage["prompt_tokens"]
+
             encoded_response = tokenizer.encode(response)
 
             if encoded_response in encoded_choices_left:
@@ -255,22 +282,46 @@ def __repr__(self):
 
 
 @cache(ignore="client")
-@functools.partial(outlines.vectorize, signature="(),(),()->(s)")
+@functools.partial(outlines.vectorize, signature="(),(),(),()->(s),()")
 async def generate_chat(
-    prompt: str, client: "AsyncOpenAI", config: OpenAIConfig
-) -> np.ndarray:
+    prompt: str,
+    system_prompt: Union[str, None],
+    client: "AsyncOpenAI",
+    config: OpenAIConfig,
+) -> Tuple[np.ndarray, Dict]:
+    """Call OpenAI's Chat Completion API.
+
+    Parameters
+    ----------
+    prompt
+        The prompt we use to start the generation. Passed to the model
+        with the "user" role.
+    system_prompt
+        The system prompt, passed to the model with the "system" role
+        before the prompt.
+    client
+        The API client
+    config
+        An `OpenAIConfig` instance.
+
+    Returns
+    -------
+    A tuple that contains the model's response(s) and usage statistics.
+
+    """
+    system_message = (
+        [{"role": "system", "content": system_prompt}] if system_prompt else []
+    )
+    user_message = [{"role": "user", "content": prompt}]
+
     responses = await client.chat.completions.create(
-        messages=[{"role": "user", "content": prompt}], **asdict(config)  # type: ignore
+        messages=system_message + user_message,
+        **asdict(config),  # type: ignore
     )
 
-    if config.n == 1:
-        results = np.array([responses.choices[0].message.content])
-    else:
-        results = np.array(
-            [responses.choices[i].message.content for i in range(config.n)]
-        )
+    results = np.array([responses.choices[i].message.content for i in range(config.n)])
 
-    return results
+    return results, responses.usage.model_dump()
 
 
 openai = OpenAI
@@ -292,8 +343,8 @@ def find_response_choices_intersection(
     choices.
 
     Say the response is of the form `[1, 2, 3, 4, 5]` and we have the choices
-    `[[1, 2], [1, 2, 3], [6, 7, 8]` then the function will return `[1, 2]` as the
-    intersection, and `[1, 2, 3]` as the choice that is left.
+    `[[1, 2], [1, 2, 3], [6, 7, 8]` then the function will return `[1, 2, 3]` as the
+    intersection, and `[[]]` as the list of choices left.
 
     Parameters
     ----------
@@ -305,7 +356,8 @@ def find_response_choices_intersection(
     Returns
     -------
     A tuple that contains the longest intersection between the response and the
-    different choices, and the choices which start with this intersection.
+    different choices, and the choices which start with this intersection, with the
+    intersection removed.
 
     """
     max_len_prefix = 0
diff --git a/pyproject.toml b/pyproject.toml
index 87493cb03..48f3e9dd4 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -92,7 +92,7 @@ module = [
     "jinja2",
     "joblib.*",
     "jsonschema.*",
-    "openai",
+    "openai.*",
     "nest_asyncio",
     "numpy.*",
     "perscache.*",