LLM and Go: Investigating OpenAI Chat Completions API
·2132 words·11 mins·
loading
·
loading
In the previous article I covered the fundamentals of the Chat Completions API: setting up a client, maintaining conversation history, and integrating tools. That was enough to build a working conversational agent. This article goes a level deeper — into the API parameters that shape what the model returns and how it thinks.
Two parameters stand out as particularly useful in production: response_format and reasoning_effort. The first gives you control over the structure of the model’s output. The second controls how much the model reasons before responding — which turns out to matter more than you might expect once you start caring about latency and cost.
Chat Completions API details # The Chat Completions API endpoint accepts a rich set of parameters. Most have sensible defaults and you will rarely touch them, but understanding what is available saves you from reaching for workarounds that already exist in the API. The table below covers the current non-deprecated parameters from the API reference:
Parameter Type Description model string ID of the model to use messages array Conversation history as an ordered list of messages response_format object Output format: text, json_object, or json_schema reasoning_effort string Reasoning intensity for reasoning models: low, medium, high temperature number Sampling temperature from 0 to 2; higher values produce more random output top_p number Alternative to temperature; nucleus sampling probability mass max_completion_tokens integer Maximum tokens the model may generate in the response n integer Number of completion choices to return stream boolean Stream partial responses as server-sent events stop string/array Sequences at which the API stops generating presence_penalty number Penalises new tokens based on whether they appear in the text so far frequency_penalty number Penalises new tokens based on their frequency in the text so far tools array List of tools (functions) the model may call tool_choice string/object Controls which tool the model calls seed integer Seed for deterministic sampling user string Unique identifier for the end user In this article we focus on response_format and reasoning_effort — two parameters with a direct, visible impact on production systems.
Information extraction with response_format # The response_format parameter controls how the model structures its output. The default is plain text. Setting it to json_object tells the model to return valid JSON, but gives you no control over the schema. Setting it to json_schema goes further: you provide a JSON Schema document and the model guarantees its output will conform to it. OpenAI calls this structured output.