In the previous article I covered the fundamentals of the Chat Completions API: setting up a client, maintaining conversation history, and integrating tools. That was enough to build a working conversational agent. This article goes a level deeper — into the API parameters that shape what the model returns and how it thinks.
Two parameters stand out as particularly useful in production: response_format and reasoning_effort. The first
gives you control over the structure of the model’s output. The second controls how much the model reasons before
responding — which turns out to matter more than you might expect once you start caring about latency and cost.
Chat Completions API details #
The Chat Completions API endpoint accepts a rich set of parameters. Most have sensible defaults and you will rarely touch them, but understanding what is available saves you from reaching for workarounds that already exist in the API. The table below covers the current non-deprecated parameters from the API reference:
| Parameter | Type | Description |
|---|---|---|
model |
string | ID of the model to use |
messages |
array | Conversation history as an ordered list of messages |
response_format |
object | Output format: text, json_object, or json_schema |
reasoning_effort |
string | Reasoning intensity for reasoning models: low, medium, high |
temperature |
number | Sampling temperature from 0 to 2; higher values produce more random output |
top_p |
number | Alternative to temperature; nucleus sampling probability mass |
max_completion_tokens |
integer | Maximum tokens the model may generate in the response |
n |
integer | Number of completion choices to return |
stream |
boolean | Stream partial responses as server-sent events |
stop |
string/array | Sequences at which the API stops generating |
presence_penalty |
number | Penalises new tokens based on whether they appear in the text so far |
frequency_penalty |
number | Penalises new tokens based on their frequency in the text so far |
tools |
array | List of tools (functions) the model may call |
tool_choice |
string/object | Controls which tool the model calls |
seed |
integer | Seed for deterministic sampling |
user |
string | Unique identifier for the end user |
In this article we focus on response_format and reasoning_effort — two parameters with a direct, visible impact on
production systems.
Information extraction with response_format #
The response_format parameter controls how the model structures its output. The default is plain text. Setting it to
json_object tells the model to return valid JSON, but gives you no control over the schema. Setting it to
json_schema goes further: you provide a JSON Schema document and the model
guarantees its output will conform to it. OpenAI calls this structured output.
Structured output matters whenever downstream code needs to parse the model’s response. Without it, you are parsing free text — which works until the model changes a field name or adds a sentence before the JSON block. With it, you get a contract. The model’s output either matches the schema or the call fails with an error you can handle, rather than silently producing malformed data.
A good production case for this is extracting company details from a website. I have used exactly this approach: given a company URL, extract name, description, and address as structured data, regardless of the language or layout of the page.
package main
import (
// ...
"github.com/openai/openai-go"
"github.com/openai/openai-go/option"
)
var client openai.Client
func extractWebsiteContent(websiteURL string) (string, error) {
// ...
}
func main() {
apiKey := os.Getenv("OPENAI_API_KEY")
if apiKey == "" {
fmt.Fprintln(os.Stderr, "error: OPENAI_API_KEY environment variable is not set")
os.Exit(1)
}
client = openai.NewClient(option.WithAPIKey(apiKey))
scanner := bufio.NewScanner(os.Stdin)
fmt.Println("Please provide the company's website URL:")
fmt.Println()
if !scanner.Scan() {
return
}
website := strings.TrimSpace(scanner.Text())
if website == "" {
return
}
companyInformation, err := extractWebsiteContent(website)
if err != nil {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
return
}
// ...
}The structure is familiar from the previous article. We initialize the client, read a URL from standard input, and pass
it to extractWebsiteContent. That function is the first meaningful piece — it fetches the page and converts it into a
form the model can work with efficiently.
import (
// ...
"github.com/k3a/html2text"
)
func extractWebsiteContent(websiteURL string) (string, error) {
req, err := http.NewRequest("GET", websiteURL, nil)
if err != nil {
return "", err
}
httpClient := &http.Client{}
response, err := httpClient.Do(req)
if err != nil {
return "", err
}
defer response.Body.Close()
data, err := io.ReadAll(response.Body)
if err != nil {
return "", err
}
return html2text.HTML2Text(string(data)), nil
}We fetch the page using Go’s standard http package and convert the
HTML body to plain text using github.com/k3a/html2text. That conversion
step is more important than it looks. Raw HTML sent to the model is full of tags, scripts, and attributes that add
tokens without adding meaning. Stripping them down to plain text significantly reduces the size of the input, which
lowers cost and reduces the chance of the model getting distracted by irrelevant markup. Cleaner input tends to produce
more accurate output.
With the content in hand, we define the schema we want the model to conform to.
import (
// ...
"github.com/openai/openai-go/shared"
)
var jsonSchema = shared.ResponseFormatJSONSchemaJSONSchemaParam{
Name: "CompanyInformation",
Strict: openai.Bool(true),
Schema: map[string]interface{}{
"type": "object",
"properties": map[string]interface{}{
"company": map[string]interface{}{
"type": "object",
"description": "The company information.",
"properties": map[string]interface{}{
"name": map[string]interface{}{
"type": "string",
"description": "The name of the company.",
},
"description": map[string]interface{}{
"type": "string",
"description": "The description of the company.",
},
"address": map[string]interface{}{
"type": "object",
"description": "The address of the company.",
"properties": map[string]interface{}{
"streetName": map[string]interface{}{
"type": []string{"string", "null"},
"description": "The street name of the company.",
},
"streetNumber": map[string]interface{}{
"type": []string{"string", "null"},
"description": "The street number of the company.",
},
"city": map[string]interface{}{
"type": []string{"string", "null"},
"description": "The city of the company.",
},
},
"required": []string{
"streetName",
"streetNumber",
"city",
},
"additionalProperties": false,
},
},
"required": []string{
"name",
"description",
"address",
},
"additionalProperties": false,
},
},
"required": []string{
"company",
},
"additionalProperties": false,
},
}This is a standard JSON Schema document expressed as a Go map.
The Strict: true flag tells the model to follow the schema exactly. Address fields like streetName and city use
a union type of string or null because not every company website publishes a full postal address (and API asks to
set all fields as required). The additionalProperties: false constraint at each object level prevents the model
from adding fields outside the schema.
func extractCompanyInformation(companyInformation string) (string, error) {
resp, err := client.Chat.Completions.New(context.Background(), openai.ChatCompletionNewParams{
Model: openai.ChatModelGPT4_1Mini,
Messages: []openai.ChatCompletionMessageParamUnion{
openai.SystemMessage(
"You are a legal advisor. " +
"Given a company's information, extract the company's name, description, and address.",
"You must provide company description in English",
),
openai.UserMessage(companyInformation),
},
ResponseFormat: openai.ChatCompletionNewParamsResponseFormatUnion{
OfJSONSchema: &shared.ResponseFormatJSONSchemaParam{
Type: "json_schema",
JSONSchema: jsonSchema,
},
},
})
if err != nil {
return "", fmt.Errorf("API call failed: %w", err)
} else if len(resp.Choices) != 1 {
return "", fmt.Errorf("unexpected API response: number of choices are %d", len(resp.Choices))
}
return resp.Choices[0].Message.Content, nil
}The ResponseFormat field is set to OfJSONSchema with the schema we defined above. The system message defines the
persona — "You are a legal advisor." — which primes the model to treat the task with the seriousness a legal context
implies, rather than producing casual summaries. We also instruct the model to respond in English regardless of the
source page language, because company websites can be in any language and we want consistent output. Notice the
validation on resp.Choices: rather than silently taking Choices[0] and panicking on an empty slice, we return an
explicit error if the response is not what we expect.
func main() {
// ...
companyInformation, err := extractWebsiteContent(website)
if err != nil {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
return
}
result, err := extractCompanyInformation(companyInformation)
if err != nil {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
return
}
fmt.Println("Result: " + result)
}The main function sequences the two calls: fetch and convert the page, then extract the structured data. Each step returns an error that terminates the process cleanly. The result is printed directly — at this stage it is already valid JSON conforming to our schema, ready to be unmarshalled into a Go struct by any consuming code.
Please provide the company's website URL:
https://thinksurance.de/kontakt/
Result: {
"company": {
"address": {
"city": "Frankfurt am Main",
"streetName": "Niddastraße",
"streetNumber": "91"
},
"description": "Thinksurance GmbH is a company offering...",
"name": "Thinksurance GmbH"
}
}
Process finished with the exit code 0The output is a clean JSON object with the company information extracted from a German-language page — delivered in English, as instructed. The model correctly identified that website is in German, followed the language instruction in the system message, and returned the address in the exact structure the schema required.
What is reasoning_effort? #
The reasoning_effort parameter controls how much internal reasoning the model does before generating its response.
It applies to models that support extended thinking —the newer gpt-5 family. The options are minimal, low, medium,
and high. Higher effort means the model thinks longer, produces more thorough analysis, and handles complex or
ambiguous tasks better. Lower effort is faster and cheaper. For straightforward extraction tasks with a well-defined
schema, high reasoning effort is wasted compute.
This trade-off becomes concrete when you add timing to the same extraction request we built above.
func extractCompanyInformation(companyInformation string) (string, error) {
start := time.Now()
resp, err := client.Chat.Completions.New(context.Background(), openai.ChatCompletionNewParams{
Model: openai.ChatModelGPT4_1Mini,
Messages: []openai.ChatCompletionMessageParamUnion{
openai.SystemMessage(
"You are a legal advisor. " +
"Given a company's information, extract the company's name, description, and address." +
"You must provide company description in English",
),
openai.UserMessage(companyInformation),
},
ResponseFormat: openai.ChatCompletionNewParamsResponseFormatUnion{
OfJSONSchema: &shared.ResponseFormatJSONSchemaParam{
Type: "json_schema",
JSONSchema: jsonSchema,
},
},
})
duration := time.Since(start)
// ...
}Adding time.Now() before the call and time.Since(start) after gives us the wall-clock duration of the API round-trip.
With gpt-4.1-mini, the baseline looks like this:
Baseline Duration
Please provide the company's website URL:
https://thinksurance.de/kontakt/
Duration: 1.502282375s
Result: {
"company": {
"address": {
"city": "Frankfurt am Main",
"streetName": "Niddastraße",
"streetNumber": "91"
},
"description": "Thinksurance GmbH is a company offering...",
"name": "Thinksurance GmbH"
}
}
Process finished with the exit code 0Around 1.5 seconds for a simple extraction — reasonable. Now let’s switch to gpt-5-nano, which does not yet have its
own constant in the openai-go SDK and must be specified as a raw string.
func extractCompanyInformation(companyInformation string) (string, error) {
start := time.Now()
resp, err := client.Chat.Completions.New(context.Background(), openai.ChatCompletionNewParams{
Model: "gpt-5-nano",
Messages: []openai.ChatCompletionMessageParamUnion{
openai.SystemMessage(
"You are a legal advisor. " +
"Given a company's information, extract the company's name, description, and address." +
"You must provide company description in English",
),
openai.UserMessage(companyInformation),
},
ResponseFormat: openai.ChatCompletionNewParamsResponseFormatUnion{
OfJSONSchema: &shared.ResponseFormatJSONSchemaParam{
Type: "json_schema",
JSONSchema: jsonSchema,
},
},
})
duration := time.Since(start)
// ...
}The only change is the model name. Same task, same schema, same system prompt. The result is surprising.
Please provide the company's website URL:
https://thinksurance.de/kontakt/
Duration: 14.115678208s
...Fourteen seconds. The gpt-5 family has reasoning enabled by default — gpt-5-nano and gpt-5-mini both default to
medium reasoning effort. For a task as clear-cut as extracting a company name and address from a schema, the model
is doing far more internal work than the task requires. The output quality is no better; the time cost is ten times higher.
The fix is one line: set ReasoningEffort to minimal.
func extractCompanyInformation(companyInformation string) (string, error) {
start := time.Now()
resp, err := client.Chat.Completions.New(context.Background(), openai.ChatCompletionNewParams{
Model: "gpt-5-nano",
Messages: []openai.ChatCompletionMessageParamUnion{
openai.SystemMessage(
"You are a legal advisor. " +
"Given a company's information, extract the company's name, description, and address." +
"You must provide company description in English",
),
openai.UserMessage(companyInformation),
},
ReasoningEffort: "minimal",
ResponseFormat: openai.ChatCompletionNewParamsResponseFormatUnion{
OfJSONSchema: &shared.ResponseFormatJSONSchemaParam{
Type: "json_schema",
JSONSchema: jsonSchema,
},
},
})
duration := time.Since(start)
// ...
}Here ReasoningEffort is set to "minimal" and tells the model to skip extended internal reasoning entirely for this request.
Please provide the company's website URL:
https://thinksurance.de/kontakt/
Duration: 1.349047041s
Result: {
"company": {
"address": {
"city": "Frankfurt am Main",
"streetName": "Niddastraße",
"streetNumber": "91"
},
"description": "Thinksurance GmbH is a company offering...",
"name": "Thinksurance GmbH"
}
}
Process finished with the exit code 0Back to 1.3 seconds, with identical output quality. To be clear, the reasoning is not bad — it is that reasoning should
match the task. For complex analysis, multi-step problem solving, or ambiguous requirements, higher reasoning_effort
earns its cost. For deterministic extraction with a well-defined schema, minimal is the right setting. Matching the
effort level to the task is one of the more impactful tuning decisions you can make when running LLMs in production at volume.
Conclusion #
The Chat Completions API gives you more control than most developers use. response_format with a JSON Schema turns
the model into a reliable data extraction tool — no parsing hacks, no fragile string matching, just structured output
you can trust. reasoning_effort lets you tune the cost-latency trade-off for models that reason by default, and
the difference between the wrong and right setting can be an order of magnitude. Neither parameter requires complex
code changes; both have an immediate effect on what the model produces and how quickly it produces it.