Skip to main content
  1. Articles/

LLM and Go: Investigating ChatGPT Chat Completions API

·2132 words·11 mins· loading · loading ·
Marko Milojevic
Author
Marko Milojevic
Software engineer and architect. Golang and LLM enthusiast. Awful chess player, gym rat, harmonica newbie and cat lover.
LLM and Go - This article is part of a series.
Part 2: This Article

In the previous article I covered the fundamentals of the Chat Completions API: setting up a client, maintaining conversation history, and integrating tools. That was enough to build a working conversational agent. This article goes a level deeper — into the API parameters that shape what the model returns and how it thinks.

Two parameters stand out as particularly useful in production: response_format and reasoning_effort. The first gives you control over the structure of the model’s output. The second controls how much the model reasons before responding — which turns out to matter more than you might expect once you start caring about latency and cost.

Chat Completions API details
#

The Chat Completions API endpoint accepts a rich set of parameters. Most have sensible defaults and you will rarely touch them, but understanding what is available saves you from reaching for workarounds that already exist in the API. The table below covers the current non-deprecated parameters from the API reference:

Parameter Type Description
model string ID of the model to use
messages array Conversation history as an ordered list of messages
response_format object Output format: text, json_object, or json_schema
reasoning_effort string Reasoning intensity for reasoning models: low, medium, high
temperature number Sampling temperature from 0 to 2; higher values produce more random output
top_p number Alternative to temperature; nucleus sampling probability mass
max_completion_tokens integer Maximum tokens the model may generate in the response
n integer Number of completion choices to return
stream boolean Stream partial responses as server-sent events
stop string/array Sequences at which the API stops generating
presence_penalty number Penalises new tokens based on whether they appear in the text so far
frequency_penalty number Penalises new tokens based on their frequency in the text so far
tools array List of tools (functions) the model may call
tool_choice string/object Controls which tool the model calls
seed integer Seed for deterministic sampling
user string Unique identifier for the end user

In this article we focus on response_format and reasoning_effort — two parameters with a direct, visible impact on production systems.

Information extraction with response_format
#

The response_format parameter controls how the model structures its output. The default is plain text. Setting it to json_object tells the model to return valid JSON, but gives you no control over the schema. Setting it to json_schema goes further: you provide a JSON Schema document and the model guarantees its output will conform to it. OpenAI calls this structured output.

Structured output matters whenever downstream code needs to parse the model’s response. Without it, you are parsing free text — which works until the model changes a field name or adds a sentence before the JSON block. With it, you get a contract. The model’s output either matches the schema or the call fails with an error you can handle, rather than silently producing malformed data.

A good production case for this is extracting company details from a website. I have used exactly this approach: given a company URL, extract name, description, and address as structured data, regardless of the language or layout of the page.

package main

import (
	// ...

	"github.com/openai/openai-go"
	"github.com/openai/openai-go/option"
)

var client openai.Client

func extractWebsiteContent(websiteURL string) (string, error) {
	// ...
}

func main() {
	apiKey := os.Getenv("OPENAI_API_KEY")
	if apiKey == "" {
		fmt.Fprintln(os.Stderr, "error: OPENAI_API_KEY environment variable is not set")
		os.Exit(1)
	}

	client = openai.NewClient(option.WithAPIKey(apiKey))

	scanner := bufio.NewScanner(os.Stdin)
	fmt.Println("Please provide the company's website URL:")
	fmt.Println()
	if !scanner.Scan() {
		return
	}

	website := strings.TrimSpace(scanner.Text())
	if website == "" {
		return
	}

	companyInformation, err := extractWebsiteContent(website)
	if err != nil {
		fmt.Fprintf(os.Stderr, "error: %v\n", err)
		return
	}
	
	// ...
}

The structure is familiar from the previous article. We initialize the client, read a URL from standard input, and pass it to extractWebsiteContent. That function is the first meaningful piece — it fetches the page and converts it into a form the model can work with efficiently.

import (
	// ...

	"github.com/k3a/html2text"
)

func extractWebsiteContent(websiteURL string) (string, error) {
	req, err := http.NewRequest("GET", websiteURL, nil)
	if err != nil {
		return "", err
	}

	httpClient := &http.Client{}
	response, err := httpClient.Do(req)
	if err != nil {
		return "", err
	}
	defer response.Body.Close()

	data, err := io.ReadAll(response.Body)
	if err != nil {
		return "", err
	}

	return html2text.HTML2Text(string(data)), nil
}

We fetch the page using Go’s standard http package and convert the HTML body to plain text using github.com/k3a/html2text. That conversion step is more important than it looks. Raw HTML sent to the model is full of tags, scripts, and attributes that add tokens without adding meaning. Stripping them down to plain text significantly reduces the size of the input, which lowers cost and reduces the chance of the model getting distracted by irrelevant markup. Cleaner input tends to produce more accurate output.

With the content in hand, we define the schema we want the model to conform to.

import (
	// ...

	"github.com/openai/openai-go/shared"
)

var jsonSchema = shared.ResponseFormatJSONSchemaJSONSchemaParam{
	Name:   "CompanyInformation",
	Strict: openai.Bool(true),
	Schema: map[string]interface{}{
		"type": "object",
		"properties": map[string]interface{}{
			"company": map[string]interface{}{
				"type":        "object",
				"description": "The company information.",
				"properties": map[string]interface{}{
					"name": map[string]interface{}{
						"type":        "string",
						"description": "The name of the company.",
					},
					"description": map[string]interface{}{
						"type":        "string",
						"description": "The description of the company.",
					},
					"address": map[string]interface{}{
						"type":        "object",
						"description": "The address of the company.",
						"properties": map[string]interface{}{
							"streetName": map[string]interface{}{
								"type":        []string{"string", "null"},
								"description": "The street name of the company.",
							},
							"streetNumber": map[string]interface{}{
								"type":        []string{"string", "null"},
								"description": "The street number of the company.",
							},
							"city": map[string]interface{}{
								"type":        []string{"string", "null"},
								"description": "The city of the company.",
							},
						},
						"required": []string{
							"streetName",
							"streetNumber",
							"city",
						},
						"additionalProperties": false,
					},
				},
				"required": []string{
					"name",
					"description",
					"address",
				},
				"additionalProperties": false,
			},
		},
		"required": []string{
			"company",
		},
		"additionalProperties": false,
	},
}

This is a standard JSON Schema document expressed as a Go map. The Strict: true flag tells the model to follow the schema exactly. Address fields like streetName and city use a union type of string or null because not every company website publishes a full postal address (and API asks to set all fields as required). The additionalProperties: false constraint at each object level prevents the model from adding fields outside the schema.

func extractCompanyInformation(companyInformation string) (string, error) {
	resp, err := client.Chat.Completions.New(context.Background(), openai.ChatCompletionNewParams{
		Model: openai.ChatModelGPT4_1Mini,
		Messages: []openai.ChatCompletionMessageParamUnion{
			openai.SystemMessage(
				"You are a legal advisor. " +
					"Given a company's information, extract the company's name, description, and address.",
					"You must provide company description in English",
			),
			openai.UserMessage(companyInformation),
		},
		ResponseFormat: openai.ChatCompletionNewParamsResponseFormatUnion{
			OfJSONSchema: &shared.ResponseFormatJSONSchemaParam{
				Type:       "json_schema",
				JSONSchema: jsonSchema,
			},
		},
	})

	if err != nil {
		return "", fmt.Errorf("API call failed: %w", err)
	} else if len(resp.Choices) != 1 {
		return "", fmt.Errorf("unexpected API response: number of choices are %d", len(resp.Choices))
	}

	return resp.Choices[0].Message.Content, nil
}

The ResponseFormat field is set to OfJSONSchema with the schema we defined above. The system message defines the persona — "You are a legal advisor." — which primes the model to treat the task with the seriousness a legal context implies, rather than producing casual summaries. We also instruct the model to respond in English regardless of the source page language, because company websites can be in any language and we want consistent output. Notice the validation on resp.Choices: rather than silently taking Choices[0] and panicking on an empty slice, we return an explicit error if the response is not what we expect.

func main() {
	// ...

	companyInformation, err := extractWebsiteContent(website)
	if err != nil {
		fmt.Fprintf(os.Stderr, "error: %v\n", err)
		return
	}

	result, err := extractCompanyInformation(companyInformation)
	if err != nil {
		fmt.Fprintf(os.Stderr, "error: %v\n", err)
		return
	}

	fmt.Println("Result: " + result)
}

The main function sequences the two calls: fetch and convert the page, then extract the structured data. Each step returns an error that terminates the process cleanly. The result is printed directly — at this stage it is already valid JSON conforming to our schema, ready to be unmarshalled into a Go struct by any consuming code.

Please provide the company's website URL:

https://thinksurance.de/kontakt/

Result: {
  "company": {
    "address": {
      "city": "Frankfurt am Main",
      "streetName": "Niddastraße",
      "streetNumber": "91"
    },
    "description": "Thinksurance GmbH is a company offering...",
    "name": "Thinksurance GmbH"
  }
}
Process finished with the exit code 0

The output is a clean JSON object with the company information extracted from a German-language page — delivered in English, as instructed. The model correctly identified that website is in German, followed the language instruction in the system message, and returned the address in the exact structure the schema required.

What is reasoning_effort?
#

The reasoning_effort parameter controls how much internal reasoning the model does before generating its response. It applies to models that support extended thinking —the newer gpt-5 family. The options are minimal, low, medium, and high. Higher effort means the model thinks longer, produces more thorough analysis, and handles complex or ambiguous tasks better. Lower effort is faster and cheaper. For straightforward extraction tasks with a well-defined schema, high reasoning effort is wasted compute.

This trade-off becomes concrete when you add timing to the same extraction request we built above.

func extractCompanyInformation(companyInformation string) (string, error) {
	start := time.Now()

	resp, err := client.Chat.Completions.New(context.Background(), openai.ChatCompletionNewParams{
		Model: openai.ChatModelGPT4_1Mini,
		Messages: []openai.ChatCompletionMessageParamUnion{
			openai.SystemMessage(
				"You are a legal advisor. " +
					"Given a company's information, extract the company's name, description, and address." +
					"You must provide company description in English",
			),
			openai.UserMessage(companyInformation),
		},
		ResponseFormat: openai.ChatCompletionNewParamsResponseFormatUnion{
			OfJSONSchema: &shared.ResponseFormatJSONSchemaParam{
				Type:       "json_schema",
				JSONSchema: jsonSchema,
			},
		},
	})
	duration := time.Since(start)
	// ...
}

Adding time.Now() before the call and time.Since(start) after gives us the wall-clock duration of the API round-trip. With gpt-4.1-mini, the baseline looks like this:

Baseline Duration

Please provide the company's website URL:

https://thinksurance.de/kontakt/

Duration: 1.502282375s

Result: {
  "company": {
    "address": {
      "city": "Frankfurt am Main",
      "streetName": "Niddastraße",
      "streetNumber": "91"
    },
    "description": "Thinksurance GmbH is a company offering...",
    "name": "Thinksurance GmbH"
  }
}
Process finished with the exit code 0

Around 1.5 seconds for a simple extraction — reasonable. Now let’s switch to gpt-5-nano, which does not yet have its own constant in the openai-go SDK and must be specified as a raw string.

func extractCompanyInformation(companyInformation string) (string, error) {
	start := time.Now()

	resp, err := client.Chat.Completions.New(context.Background(), openai.ChatCompletionNewParams{
		Model: "gpt-5-nano",
		Messages: []openai.ChatCompletionMessageParamUnion{
			openai.SystemMessage(
				"You are a legal advisor. " +
					"Given a company's information, extract the company's name, description, and address." +
					"You must provide company description in English",
			),
			openai.UserMessage(companyInformation),
		},
		ResponseFormat: openai.ChatCompletionNewParamsResponseFormatUnion{
			OfJSONSchema: &shared.ResponseFormatJSONSchemaParam{
				Type:       "json_schema",
				JSONSchema: jsonSchema,
			},
		},
	})
	duration := time.Since(start)
	// ...
}

The only change is the model name. Same task, same schema, same system prompt. The result is surprising.

Please provide the company's website URL:

https://thinksurance.de/kontakt/

Duration: 14.115678208s
...

Fourteen seconds. The gpt-5 family has reasoning enabled by default — gpt-5-nano and gpt-5-mini both default to medium reasoning effort. For a task as clear-cut as extracting a company name and address from a schema, the model is doing far more internal work than the task requires. The output quality is no better; the time cost is ten times higher.

The fix is one line: set ReasoningEffort to minimal.

func extractCompanyInformation(companyInformation string) (string, error) {
	start := time.Now()

	resp, err := client.Chat.Completions.New(context.Background(), openai.ChatCompletionNewParams{
		Model: "gpt-5-nano",
		Messages: []openai.ChatCompletionMessageParamUnion{
			openai.SystemMessage(
				"You are a legal advisor. " +
					"Given a company's information, extract the company's name, description, and address." +
					"You must provide company description in English",
			),
			openai.UserMessage(companyInformation),
		},
		ReasoningEffort: "minimal",
		ResponseFormat: openai.ChatCompletionNewParamsResponseFormatUnion{
			OfJSONSchema: &shared.ResponseFormatJSONSchemaParam{
				Type:       "json_schema",
				JSONSchema: jsonSchema,
			},
		},
	})
	duration := time.Since(start)
	// ...
}

Here ReasoningEffort is set to "minimal" and tells the model to skip extended internal reasoning entirely for this request.

Please provide the company's website URL:

https://thinksurance.de/kontakt/

Duration: 1.349047041s

Result: {
  "company": {
    "address": {
      "city": "Frankfurt am Main",
      "streetName": "Niddastraße",
      "streetNumber": "91"
    },
    "description": "Thinksurance GmbH is a company offering...",
    "name": "Thinksurance GmbH"
  }
}
Process finished with the exit code 0

Back to 1.3 seconds, with identical output quality. To be clear, the reasoning is not bad — it is that reasoning should match the task. For complex analysis, multi-step problem solving, or ambiguous requirements, higher reasoning_effort earns its cost. For deterministic extraction with a well-defined schema, minimal is the right setting. Matching the effort level to the task is one of the more impactful tuning decisions you can make when running LLMs in production at volume.

Conclusion
#

The Chat Completions API gives you more control than most developers use. response_format with a JSON Schema turns the model into a reliable data extraction tool — no parsing hacks, no fragile string matching, just structured output you can trust. reasoning_effort lets you tune the cost-latency trade-off for models that reason by default, and the difference between the wrong and right setting can be an order of magnitude. Neither parameter requires complex code changes; both have an immediate effect on what the model produces and how quickly it produces it.

Useful Resources
#

LLM and Go - This article is part of a series.
Part 2: This Article

Related

LLM and Go: ChatGPT Integration via Chat Completions API

·2719 words·13 mins· loading · loading
For most of my career, integrating external intelligence into an application meant calling a rules engine, training a custom classifier, or encoding business logic that someone had painfully documented in a spreadsheet. The idea that I could describe a task in plain language and have a model respond with genuine reasoning was not something I expected to become production-ready in my working life. Then GPT happened, and it changed what backend developers need to know. This article is the first in a series on using LLMs in Go. We start with the OpenAI Chat Completions API — the stateless, request-based interface that gives you direct control over every aspect of the conversation. By the end, you will have a working conversational agent that can call external tools to answer questions it otherwise could not. A short introduction to ChatGPT # The path to large language models runs through a decade of incremental progress in deep learning. Early models like word2vec and GloVe learned to embed words into dense vector spaces, capturing semantic relationships between terms. The transformer architecture, introduced by Google in 2017, changed the trajectory of the field — it processes sequences in parallel using attention mechanisms that capture long-range dependencies far more effectively than recurrent networks. This architectural shift made it practical to train models on orders of magnitude more data. GPT-1 in 2018 showed that large-scale unsupervised pre-training followed by fine-tuning could match or beat purpose-built models across a range of language tasks. Understanding what these models actually do removes a lot of the mysticism around them. An LLM is, at its core, a next-token predictor. It takes a sequence of tokens as input and outputs a probability distribution over the vocabulary for the next token. The transformer’s attention mechanism allows every token in the input to attend to every other token, building a rich contextual representation before making that prediction. Training adjusts billions of parameters to minimise prediction error across enormous text corpora. What emerges is a model with broad world knowledge encoded in its weights — not because it was taught facts directly, but because predicting text well requires internalising the structure of the world that produced that text. ChatGPT is OpenAI’s conversational product built on the GPT model series. What set it apart from raw GPT-3 was the addition of reinforcement learning from human feedback (RLHF) — a technique that fine-tunes the base model to follow instructions and produce responses that human raters judge as helpful and safe. When ChatGPT launched in late 2022, it became one of the fastest-adopted consumer products in history. For developers, the more relevant artefact is the API behind it — specifically the Chat Completions API, which gives programmatic access to the same models powering the product.

Go Tutorial: Iterators

·2086 words·10 mins· loading · loading
Go 1.22 introduced range over functions, and Go 1.23 brought the iter package to go with it. Together they gave iterators a proper place in the language. Before that, iterating over custom data structures meant either returning slices upfront — loading everything into memory — or writing callback-based helpers that nobody could agree on naming. I have seen both approaches and neither felt right. The core idea behind iterators is straightforward: instead of computing all values upfront and handing them back as a list, you compute each value on demand and yield it to the caller one at a time. The caller controls when to stop. This matters any time you are working with large or potentially infinite sequences. This article walks through why iterators exist in Go, how the yield-based pattern works, what the iter package provides, and where the current limits of the feature sit. Why do we need Iterators? # The simplest case for iteration is a slice of numbers. You range over it, print each value, move on. func main() { numbers := []int{1, 2, 3, 4, 5} for _, i := range numbers { fmt.Println(i) } } // OUT: // 1 // 2 // 3 // 4 // 5 That works fine until the collection gets large. If you need to generate a million numbers, you have to allocate memory for all of them before you can even start ranging. func main() { n := 1_000_000 numbers := make([]int, n) for i := range numbers { numbers[i] = i * 2 } for _, i := range numbers { fmt.Println(i) } } // OUT: // 1 // 2 // ... You can always add a break once you hit your threshold, but the damage is already done — the entire slice was allocated upfront. In other cases, you might not even know how many items you will need. The for range loop can iterate for some time, until it reaches the breakpoint, depending on some value provided in the item. In such cases, the size of such a list must be not just too big, but absolutely unpredictable. func main() { n := 1_000_000 numbers := make([]int, n) for i := range numbers { numbers[i] = i * 2 } for _, i := range numbers { fmt.Println(i) if i > 10 { break } } } // OUT: // 1 // 2 // 4 // 8 // 10 // 12 In a real application, the decision about when to stop often happens dynamically — driven by user input, a timeout, or a condition that evaluates to true before the fifth item. Allocating a million items and then breaking on the fifth is wasteful. This is exactly the problem iterators solve.

Golang Release 1.22: version

·823 words·4 mins· loading · loading
With the release of Go 1.22, the Go standard library introduced several new features. As you might have noticed in articles related to the previous release, here we mostly concentrate on the new exciting packages and features that they give us. This article will start this journey, by providing a deeper look into the implementation of the version package in Go. Lang # The first function we are ready to examine is the Lang function. This function provides a cleaned, valid Go version as a string. In case it can’t determine the actual version, due to an invalid state of the string value, it will return an string as a result. Lang function func Lang(x string) string As we can see the function signature above, function expects one argument, a string, that represents a Go version. An output should be also one value, a string, as a cleaned Go version. Lang function examples package main import ( "fmt" "go/version" ) func main() { fmt.Println(version.Lang("go1.0")) // go1 fmt.Println(version.Lang("go1")) // go1 fmt.Println(version.Lang("go1.22.4")) // go1.22 fmt.Println(version.Lang("go1.22.3")) // go1.22 fmt.Println(version.Lang("go1.22.2")) // go1.22 fmt.Println(version.Lang("go1.22.rc1")) // fmt.Println(version.Lang("go1.22rc1")) // go1.22 fmt.Println(version.Lang("1.22")) // fmt.Println(version.Lang("wrong")) // fmt.Println(version.Lang("")) // } In the example above, we can see how the Lang function adapt the Go version string. It removes all minor versions and appearance of “release candide” phrase, and present them in the end as an official Go versions that we experienced in the past (and we might experience in the future). In cases where we provided an invalid, or empty string, the ending result will be also an empty string, as the Lang function can’t find the actual version name. One interesting point, not just for the Long function, but, as you will see, for all functions in this package, to consider some string as a valid Go version, it needs to have a prefix go. IsValid # The next function we are examining is the IsValid function. This function checks a string with a potential Go version and returns a boolean result that tells us if the version is valid or not. IsValid function func IsValid(x string) bool As we can see the function signature above, function expects one argument, a string, that represents a Go version. An output should be a bool value, which tells us if the Go version is valid or not.