For most of my career, integrating external intelligence into an application meant calling a rules engine, training a custom classifier, or encoding business logic that someone had painfully documented in a spreadsheet. The idea that I could describe a task in plain language and have a model respond with genuine reasoning was not something I expected to become production-ready in my working life. Then GPT happened, and it changed what backend developers need to know.
This article is the first in a series on using LLMs in Go. We start with the OpenAI Chat Completions API — the stateless, request-based interface that gives you direct control over every aspect of the conversation. By the end, you will have a working conversational agent that can call external tools to answer questions it otherwise could not.
A short introduction to ChatGPT #
The path to large language models runs through a decade of incremental progress in deep learning. Early models like word2vec and GloVe learned to embed words into dense vector spaces, capturing semantic relationships between terms. The transformer architecture, introduced by Google in 2017, changed the trajectory of the field — it processes sequences in parallel using attention mechanisms that capture long-range dependencies far more effectively than recurrent networks. This architectural shift made it practical to train models on orders of magnitude more data. GPT-1 in 2018 showed that large-scale unsupervised pre-training followed by fine-tuning could match or beat purpose-built models across a range of language tasks.
Understanding what these models actually do removes a lot of the mysticism around them. An LLM is, at its core, a next-token predictor. It takes a sequence of tokens as input and outputs a probability distribution over the vocabulary for the next token. The transformer’s attention mechanism allows every token in the input to attend to every other token, building a rich contextual representation before making that prediction. Training adjusts billions of parameters to minimise prediction error across enormous text corpora. What emerges is a model with broad world knowledge encoded in its weights — not because it was taught facts directly, but because predicting text well requires internalising the structure of the world that produced that text.
ChatGPT is OpenAI’s conversational product built on the GPT model series. What set it apart from raw GPT-3 was the addition of reinforcement learning from human feedback (RLHF) — a technique that fine-tunes the base model to follow instructions and produce responses that human raters judge as helpful and safe. When ChatGPT launched in late 2022, it became one of the fastest-adopted consumer products in history. For developers, the more relevant artefact is the API behind it — specifically the Chat Completions API, which gives programmatic access to the same models powering the product.
Chat Completions API vs Responses API #
OpenAI exposes its models through two primary APIs. The Chat Completions API is the older and more fundamental of the two. It is stateless: you send a list of messages representing the full conversation history, and the model returns the next message. Every request is self-contained. The application is entirely responsible for maintaining that history and re-sending it with every call.
The Responses API, introduced in 2025, takes a different approach. It manages conversation state on the server side, supports built-in tools such as web search and file reading, and is designed for agent-oriented use cases where you want OpenAI’s infrastructure to handle more of the orchestration. The trade-off is less control over what is sent to the model and tighter coupling to OpenAI’s platform.
| Feature | Chat Completions API | Responses API |
|---|---|---|
| Conversation state | Client-managed | Server-managed |
| History management | Manual — sent with every request | Automatic |
| Tool support | Manual function calling | Built-in tools (web search, code interpreter) |
| Streaming | Yes | Yes |
| Control | Full | Limited |
| Vendor coupling | Low | Higher |
| Best for | Custom agents, full control | Rapid prototyping, built-in tooling |
In this series we use the Chat Completions API. It requires more wiring on your side, but it gives you a clearer mental model of what is actually happening — which matters when things go wrong in production.
First AI agent #
The official Go SDK for the OpenAI API is openai-go, maintained by
OpenAI directly. It provides typed bindings for all major API endpoints and handles authentication, serialisation, and
retries. It covers Chat Completions, Responses, embeddings, images, and more. For this series, we use it exclusively —
no third-party wrappers.
To call the API, you need a secret key from OpenAI. Create an account at platform.openai.com, navigate to the API Keys section, and generate a new key. Store it in an environment variable and never commit it to version control.
package main
import (
// ...
"github.com/openai/openai-go"
"github.com/openai/openai-go/option"
)
var history []openai.ChatCompletionMessageParamUnion
var client openai.Client
func main() {
apiKey := os.Getenv("OPENAI_API_KEY")
if apiKey == "" {
fmt.Fprintln(os.Stderr, "error: OPENAI_API_KEY environment variable is not set")
os.Exit(1)
}
client = openai.NewClient(option.WithAPIKey(apiKey))
history = append(history, openai.SystemMessage(
"You are a helpful assistant.",
))
}Two things are initialised here: the API client and the conversation history. The client is created with the API key read
from the environment — if the variable is missing, the application exits immediately rather than failing later with a
cryptic authentication error. The history slice is the conversation state. Unlike the Responses API, the Chat Completions API
has no memory of its own. Every request must include the full conversation history so the model has the context it needs
to respond correctly. This slice is where we maintain it. The System Message at the top of the history defines the model’s
behaviour and persona for the entire session. It is always the first message and shapes how the model interprets everything that follows.
import (
"bufio"
// ...
)
func talkToAgent() (string, error) {
return "", nil
}
func main() {
// ...
scanner := bufio.NewScanner(os.Stdin)
fmt.Println("AI assistant ready. Type your question or 'exit' to quit.")
fmt.Println()
for {
fmt.Print("Human: ")
if !scanner.Scan() {
break
}
input := strings.TrimSpace(scanner.Text())
if input == "" {
continue
}
if strings.EqualFold(input, "exit") {
fmt.Println("Bye.")
break
}
history = append(history, openai.UserMessage(input))
answer, err := talkToAgent()
if err != nil {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
continue
}
fmt.Printf("Agent: %s\n\n", answer)
}
}The main loop reads from standard input using bufio.Scanner, which
handles line-by-line reading cleanly. Empty inputs are skipped. Typing exit terminates the application. For any other
input, the user’s message is appended to the history as a UserMessage and passed to talkToAgent. The result is printed
back to the terminal. At this stage, talkToAgent is a stub that returns nothing — we fill it in next.
func talkToAgent() (string, error) {
resp, err := client.Chat.Completions.New(context.Background(), openai.ChatCompletionNewParams{
Model: openai.ChatModelGPT4_1Mini,
Messages: history,
})
if err != nil {
return "", fmt.Errorf("API call failed: %w", err)
}
choice := resp.Choices[0]
history = append(history, choice.Message.ToParam())
return choice.Message.Content, nil
}Here Chat.Completions.New sends the full history to the model and returns a completion response. The Model field
specifies which model to use — here we use gpt-4.1-mini to keep costs low during development. The response contains a
Choices slice; we take the first choice, which is the model’s response. The important step is calling ToParam() on
the response message before appending it to the history. This converts the assistant’s response into the format expected
for the chat history, so the model will remember what it said in subsequent turns.
The three GPT-4.1 models cover different points on the cost-quality spectrum:
| Model | Input cost (per 1M tokens) | Output cost (per 1M tokens) | Best for |
|---|---|---|---|
| gpt-4.1 | $2.00 | $8.00 | Complex reasoning, production quality |
| gpt-4.1-mini | $0.40 | $1.60 | Balanced cost and quality |
| gpt-4.1-nano | $0.10 | $0.40 | High-volume, low-complexity tasks |
For development and experimentation, gpt-4.1-mini or gpt-4.1-nano are the right starting point. The quality is
sufficient for most tasks, and the cost difference against gpt-4.1 is significant at scale.
AI assistant ready. Type your question or 'exit' to quit.
Human: hi
Agent: Hello! How can I assist you today?
Human: what is my name?
Agent: I am sorry, I do not know your name. Feel free to tell me!
Human: my name is Marko
Agent: Nice to meet you, Marko! How can I help you today?
Human: what is my name?
Agent: Your name is Marko! How can I assist you further?The key thing to notice in this exchange is that the agent remembers the user’s name. That is not magic — it is the history slice doing its job. Every time a user sends a message and the agent replies, both messages are appended to the history. On the next request, the full history is sent to the model, giving it complete context. Without that history, the model would have no idea who Marko (me!) is.
First tool integration #
Even with full conversation history, the model has a fundamental limitation: it cannot access real-time information. Ask it for the current time, and it will tell you so.
Human: what is the current time in Frankfurt?
Agent: I can't provide real-time information, including the current time. However, Frankfurt is in the
Central European Time Zone (CET), which is UTC+1, and it observes Central European Summer Time (CEST),
which is UTC+2 during the summer months. You can easily check the current time using a clock or a
smartphone. Is there anything else you would like to know?This is where tools come in. A tool is a function that the model can request to be called on its behalf. You define the tool — its name, description, and parameter schema — and send those definitions alongside the conversation history. When the model determines it needs information it cannot generate from its weights alone, it responds not with text but with a list of tool calls it wants executed. Your application runs those calls, sends the results back, and the model incorporates them into its final response. The model never executes code directly; it only requests calls and receives results.
The integration process has a clear shape: send the tool definitions with every request, check whether the response contains tool calls, execute each call, append the results to the history as tool messages, and send another request. Repeat until the model returns a plain assistant message with no tool calls. This loop is what makes the agent able to use multiple tools in sequence when needed.
The first step is building the function the tool will call.
func getCurrentTime(timezone string) (string, error) {
location, err := time.LoadLocation(timezone)
if err != nil {
return "", fmt.Errorf("failed to load location: %w", err)
}
now := time.Now().In(location)
return now.Format("2006-01-02 15:04:05"), nil
}This is a plain Go function with no awareness of the LLM. It takes a timezone variable, loads the location, and returns the current time formatted as a string. Keeping tool implementations as ordinary functions is intentional — it keeps them testable and reusable outside the agent context.
With the function in place, we need to describe it to the model and update the request loop.
var getTimeToolDefinition = openai.ChatCompletionToolParam{
Type: "function",
Function: openai.FunctionDefinitionParam{
Name: "get_time",
Description: openai.String("Fetch the current time in a given timezone."),
Parameters: openai.FunctionParameters{
"type": "object",
"properties": map[string]interface{}{
"timezone": map[string]interface{}{
"type": "string",
"description": "Tne international timezone name, e.g. America/Los_Angeles.",
},
},
"required": []string{"timezone"},
"additionalProperties": false,
},
},
}
func talkToAgent() (string, error) {
for {
resp, err := client.Chat.Completions.New(context.Background(), openai.ChatCompletionNewParams{
Model: openai.ChatModelGPT4_1Mini,
Messages: history,
Tools: []openai.ChatCompletionToolParam{
getTimeToolDefinition,
},
})
// ...
}
}The tool definition is a struct that describes the function to the model. The Name field is what the model uses when
it requests a call. The Description is critical — it is what the model reads to decide whether to use this tool at
all, so it should be precise. The Parameters block follows the JSON Schema format and tells the model exactly what
arguments to provide.
The request loop in talkToAgent is now wrapped in a for loop because a single user message may require multiple
round-trips: the model calls a tool, you return the result, and the model may call another tool before finally producing its answer.
func callTool(call openai.ChatCompletionMessageToolCall) (string, error) {
// ...
}
func talkToAgent() (string, error) {
for {
resp, err := client.Chat.Completions.New(context.Background(), openai.ChatCompletionNewParams{
Model: openai.ChatModelGPT4_1Mini,
Messages: history,
Tools: []openai.ChatCompletionToolParam{
getTimeToolDefinition,
},
})
if err != nil {
return "", fmt.Errorf("API call failed: %w", err)
}
choice := resp.Choices[0]
history = append(history, choice.Message.ToParam())
if len(choice.Message.ToolCalls) == 0 {
return choice.Message.Content, nil
}
for _, call := range choice.Message.ToolCalls {
fmt.Printf(" [tool] %s(%s)\n", call.Function.Name, call.Function.Arguments)
result, err := callTool(call)
if err != nil {
return "", err
}
history = append(history, openai.ToolMessage(result, call.ID))
}
}
}When the model returns a response with no tool calls, ToolCalls is empty and we return the content directly to the
caller. When it does contain tool calls, we iterate over each one, print it for visibility, execute it via callTool,
and append the result to the history as a ToolMessage tied to the call’s ID. The model uses that ID to match results
back to the requests it made. The loop then fires another request with the updated history, and the cycle continues
until the model is satisfied.
func callTool(call openai.ChatCompletionMessageToolCall) (string, error) {
switch call.Function.Name {
case "get_time":
var args timeArgs
if err := json.Unmarshal([]byte(call.Function.Arguments), &args); err != nil {
return "", fmt.Errorf("failed to parse tool arguments: %w", err)
}
currentTime, err := getCurrentTime(args.Timezone)
if err != nil {
errMsg, _ := json.Marshal(map[string]string{"error": err.Error()})
return string(errMsg), nil
}
result := timeResult{
Time: currentTime,
}
out, err := json.Marshal(result)
if err != nil {
return "", fmt.Errorf("failed to marshal tool result: %w", err)
}
return string(out), nil
default:
return "", fmt.Errorf("unknown tool: %q", call.Function.Name)
}
}The function callTool is a router. It receives the tool call struct from the model, switches on the function name,
and dispatches to the appropriate implementation. For get_time, it unmarshals the JSON arguments the model provided,
calls getCurrentTime, and serialises the result back to JSON. If the tool execution fails — for example, an invalid
timezone — the error is returned as a JSON payload rather than bubbling up to the caller. This keeps the loop running:
the model receives the error, understands what went wrong, and can either try a corrected call or inform the user
gracefully. Any tool name the application does not recognise returns an error, which surfaces cleanly through the loop.
The last piece is updating the system prompt to give the model clear instructions about when to use the tool.
func main() {
// ...
client = openai.NewClient(option.WithAPIKey(apiKey))
history = append(history, openai.SystemMessage(
"You are a helpful time assistant. "+
"When asked about time at particular locations, "+
"always use the get_time tool to get current time in a given timezone. "+
"Never guess the time.",
))
// ...The system prompt now explicitly instructs the model to use the get_time tool whenever time at a specific location
is requested. Without this instruction, the model might attempt to reason from its training data and produce a plausible
but wrong answer. The instruction removes the ambiguity, especially by saying that the model should never guess the time
itself.
Time assistant ready. Type your question or 'exit' to quit.
You: hi
Agent: Hello! How can I assist you today?
You: what is the current time in Frankfurt?
[tool] get_time({"timezone":"Europe/Berlin"})
Agent: The current time in Frankfurt is 14:23 on March 5, 2026.The [tool] line shows the model calling get_time with the correct IANA timezone for Frankfurt. The model resolved
“Frankfurt” to Europe/Berlin on its own, based on the tool description and its world knowledge. The full source for
this example is available at llm-and-golang-examples.
Conclusion #
The Chat Completions API is the right starting point for anyone building LLM-powered features in Go. It is explicit, stateless, and gives you full control over what the model sees. The history management is your responsibility, but that also means you understand exactly what is happening in every request. Adding tools extends the model’s reach into live data without complicating the core loop much — define the tool, check for calls, execute, return results, repeat. This pattern scales to multiple tools cleanly. The next article in this series will look at the Responses API and where it makes more sense than the Chat Completions approach.