Generate

Interact with different LLM providers, this endpoint is based on the structure of the LLM API.

Recent Requests
Log in to see full request history
TimeStatusUser Agent
Retrieving recent requests…
LoadingLoading…
Path Params
uuid
required
Body Params
messages
array of objects
required

A list containing all the conversations between the user and the assistant. Each item in the list should be a dictionary with two keys: 'role' and 'message'.

role: Specifies the role of the speaker and can have the values 'user', 'system', 'assistant' or 'tool'. The system role instructs the way the model should answer, e.g. 'You are a helpful assistant'. The user role specifies the user query and assistant is the model's response. The tool role is for external tools that can be used in the conversation.

message: A list of dictionaries. Each dictionary in the 'message' list must contain the keys 'type' and 'content'.

Structure

  • type: Specifies the type of content and can be 'image_url' or 'text'.
  • content: A dictionary with the actual content based on the 'type':
    • If 'type' is 'image_url', 'content' must contain 'image_url' and must not contain 'text'.
    • If 'type' is 'text', 'content' must contain 'text' and must not contain 'image_url'.

Example

[
  {
    "role": "user",
    "content": [
        {
          "type": "text",
          "text": "Describe this image"
        },
        {
        "type": "image_url",
        "image_url": {
          "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        }
    ]
  }
]
messages*
string
required
length ≥ 1

The OpenAI model to use for the chat completion. This field is required and specifies which language model will process the conversation.

Example values: 'gpt-3.5-turbo', 'gpt-4', 'gpt-4-turbo'

string
enum

Choices:

  • 'low': Minimal reasoning, quick responses
  • 'medium': Balanced reasoning approach
  • 'high': In-depth, comprehensive reasoning

Example: 'high' for complex problem-solving tasks

  • low - low
  • medium - medium
  • high - high
Allowed:
metadata
array of objects

Optional list of metadata associated with the chat request. Can be used to provide additional context or tracking information.

Example:

{
  "metadata": [
    {"key": "conversation_id", "value": "chat_12345"},
    {"key": "source", "value": "customer_support"}
  ]
}
metadata
double
-2 to 2

Controls repetitiveness of model responses by penalizing frequent tokens. Ranges from -2.0 to 2.0.

Values:

  • Positive values: Reduce token repetition
  • Negative values: Encourage repetition
  • 0.0: Default behavior

Example: 1.5 to significantly reduce repeated phrases

logit_bias
object

Modify the likelihood of specific tokens appearing in the response. A dictionary where keys are token IDs and values are bias scores.

Example:

{
  "logit_bias": {
    "50256": -100,  # Reduce probability of end-of-text token
    "15": 5  # Slightly increase probability of a specific token
  }
}
boolean

If set to True, returns log probabilities of the most likely tokens. Useful for advanced token probability analysis.

Example: True to get detailed token likelihood information

integer
0 to 20

Number of top log probabilities to return with each token. Must be between 0 and 20.

Example: 5 to get top 5 most likely tokens for each position

integer
≥ 1

Maximum number of tokens to generate in the completion. Must be at least 1.

Example: 150 to limit response to approximately 100-150 words

integer
≥ 1

Number of chat completion choices to generate.

**Example**: 3 to generate multiple alternative responses
modalities
array of strings

List of supported input/output modalities for the chat.

Example:

{
  "modalities": ["text", "image", "audio"]
}
modalities
prediction
object

Optional field for storing prediction-related information. Flexible dictionary to capture model's predictive metadata.

Example:

{
  "prediction": {
    "confidence_score": 0.85,
    "top_prediction": "response_category"
  }
}
audio
object

Optional dictionary for audio-related parameters or metadata.

Example:

{
  "audio": {
    "language": "en-US",
    "transcription_format": "srt"
  }
}
double
-2 to 2

Adjusts likelihood of discussing new topics by penalizing existing tokens. Ranges from -2.0 to 2.0.

Values:

  • Positive values: Encourage more diverse topics
  • Negative values: Keep discussion more focused
  • 0.0: Default behavior

Example: 1.0 to promote topic diversity

response_format
object

Specify the desired response format for the completion.

Example:

{
  "response_format": {
    "type": "json_object",
    "schema": {...}
  }
}
integer

Set a seed for deterministic sampling to reproduce consistent results.

Example: 42 for a reproducible random generation process

string
enum

Choices:

  • 'auto': Automatically select appropriate tier
  • 'default': Use default service configuration
  • auto - auto
  • default - default
Allowed:
stop
array of strings

List of strings that will cause the model to stop generating.

Example:

{
  "stop": ["\n", "Human:", "AI:"]
}
stop
boolean
Defaults to false

If True, returns tokens as they are generated in a streaming format. Default is False.

Example: True for real-time token streaming

stream_options
object

Additional configuration for streaming responses.

Example:

{
  "stream_options": {
    "include_usage": true
  }
}
double
0 to 2

Controls randomness in token selection. Ranges from 0.0 to 2.0.

Values:

  • 0.0: Most deterministic, focused responses
  • 1.0: Balanced randomness
  • 2.0: Most creative, unpredictable responses

Example: 0.7 for a good balance of creativity and focus

double
0 to 1

Nucleus sampling threshold for token selection. Ranges from 0.0 to 1.0. Default is 1.0.

Values:

  • 1.0: Consider all tokens
  • Lower values: More focused, deterministic sampling

Example: 0.9 to select from top 90% most probable tokens

tools
array

List of tools or function definitions available to the model.

Example:

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Retrieve current weather"
      }
    }
  ]
}
tools
string
length ≥ 1

Specify how tools should be used in the completion.

Example values:

  • 'auto': Model decides when to use tools
  • 'none': Disable tool usage
  • Specific tool name to always use a particular tool
boolean

Allow the model to make multiple tool calls in parallel.

Example: True to enable concurrent tool invocations

string
length ≥ 1

Optional identifier for the end-user to help track and monitor API usage.

Example: 'user_123456'

string
length ≥ 1

Control how function calls are handled.

Example values:

  • 'auto': Default behavior
  • 'none': Disable function calls
  • Specific function name to force its execution
functions
array of objects

List of function definitions available to the model.

Example:

{
  "functions": [
    {
      "name": "get_current_weather",
      "description": "Get the current weather for a location",
      "parameters": {...}
    }
  ]
}
functions
thinking
object

Configuration for enabling Claude's extended thinking. When enabled, responses include thinking content blocks showing Claude's thinking process before the final answer. Requires a minimum budget of 1,024 tokens and counts towards your max_tokens limit.

Example:

{
  'thinking': {
    'type': 'enabled'
    'budget_tokens': '1024'  }
}
web_search_options
object

Options for web search integration. Example: json web_search_options={ "search_context_size": "medium" # Options: "low", "medium", "high" }

filter_documents
object

Filter uploaded documents based on their metadata. Specify key-value pairs where the key represents the metadata field and the value is the desired metadata value. Please ensure that the provided metadata keys are available in your database.

double
Defaults to 0

A minimum score threshold for the model to consider a chunk as a valid response. Higher values mean the model will be more conservative and only return chunks that are more similar to the query. Lower values mean the model will be more open to returning chunks that are less similar to the query.

integer
≥ 1
Defaults to 3

How many results chunk you want to return

integer
1 to 16385
Defaults to 100

The maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.

uuid
Responses

Language
Credentials
Bearer
JWT
LoadingLoading…
Response
Click Try It! to start a request and see the response here! Or choose an example:
application/json