Skip to main content

Llm

2026

LLM and Go: OpenAI integration via Responses API

·2365 words·12 mins· loading · loading
The previous two articles in this series covered the Chat Completions API — how to set up a client, maintain conversation history manually, call external tools, and control output structure with response_format. That API gives you full control and a clear mental model of what goes over the wire. This article covers the other primary OpenAI interface: the Responses API. The Responses API moves conversation state from the client to OpenAI’s servers. You no longer maintain a history slice and re-send it with every call. Instead, you track a response ID and pass it back on the next request. That is a meaningful shift for agent-oriented applications — less maintenance, but also less transparency. Understanding the trade-offs between the two APIs is worth doing before choosing which one to build on. Responses API # OpenAI introduced the Responses API in 2025, positioning it as the foundation for building agents. The Chat Completions API is stateless — every request must carry the full conversation history, and the client owns that state entirely. The Responses API inverts this: conversation state lives on OpenAI’s servers, and you reference previous turns by ID rather than re-sending them. Both APIs give you access to the same underlying models and tool-calling mechanics. The difference is where the orchestration responsibility sits. The table below, first introduced in the Chat Completions API article, summarizes the trade-offs: Feature Chat Completions API Responses API Conversation state Client-managed Server-managed History management Manual — sent with every request Automatic Tool support Manual function calling Built-in tools (web search, code interpreter) Streaming Yes Yes Control Full Limited Vendor coupling Low Higher Best for Custom agents, full control Rapid prototyping, built-in tooling The Chat Completions API is the right default when you want to control exactly what the model sees and when you need portability across providers. The Responses API reduces boilerplate and fits well when you want to prototype quickly or lean on OpenAI’s managed tooling. In this article we build the same conversational agent we built before — but with the Responses API driving state management. First AI agent # The openai-go SDK covers both APIs under one package. The same client initialization you used for Chat Completions works here. To call the API, you need a secret key from platform.openai.com — navigate to the API Keys section, generate a key, and store it in an environment variable.

LLM and Go: Investigating OpenAI Chat Completions API

·2132 words·11 mins· loading · loading
In the previous article I covered the fundamentals of the Chat Completions API: setting up a client, maintaining conversation history, and integrating tools. That was enough to build a working conversational agent. This article goes a level deeper — into the API parameters that shape what the model returns and how it thinks. Two parameters stand out as particularly useful in production: response_format and reasoning_effort. The first gives you control over the structure of the model’s output. The second controls how much the model reasons before responding — which turns out to matter more than you might expect once you start caring about latency and cost. Chat Completions API details # The Chat Completions API endpoint accepts a rich set of parameters. Most have sensible defaults and you will rarely touch them, but understanding what is available saves you from reaching for workarounds that already exist in the API. The table below covers the current non-deprecated parameters from the API reference: Parameter Type Description model string ID of the model to use messages array Conversation history as an ordered list of messages response_format object Output format: text, json_object, or json_schema reasoning_effort string Reasoning intensity for reasoning models: low, medium, high temperature number Sampling temperature from 0 to 2; higher values produce more random output top_p number Alternative to temperature; nucleus sampling probability mass max_completion_tokens integer Maximum tokens the model may generate in the response n integer Number of completion choices to return stream boolean Stream partial responses as server-sent events stop string/array Sequences at which the API stops generating presence_penalty number Penalises new tokens based on whether they appear in the text so far frequency_penalty number Penalises new tokens based on their frequency in the text so far tools array List of tools (functions) the model may call tool_choice string/object Controls which tool the model calls seed integer Seed for deterministic sampling user string Unique identifier for the end user In this article we focus on response_format and reasoning_effort — two parameters with a direct, visible impact on production systems. Information extraction with response_format # The response_format parameter controls how the model structures its output. The default is plain text. Setting it to json_object tells the model to return valid JSON, but gives you no control over the schema. Setting it to json_schema goes further: you provide a JSON Schema document and the model guarantees its output will conform to it. OpenAI calls this structured output.

LLM and Go: OpenAI Integration via Chat Completions API

·2721 words·13 mins· loading · loading
For most of my career, integrating external intelligence into an application meant calling a rules engine, training a custom classifier, or encoding business logic that someone had painfully documented in a spreadsheet. The idea that I could describe a task in plain language and have a model respond with genuine reasoning was not something I expected to become production-ready in my working life. Then GPT happened, and it changed what backend developers need to know. This article is the first in a series on using LLMs in Go. We start with the OpenAI Chat Completions API — the stateless, request-based interface that gives you direct control over every aspect of the conversation. By the end, you will have a working conversational agent that can call external tools to answer questions it otherwise could not. A short introduction to ChatGPT and OpenAI # The path to large language models runs through a decade of incremental progress in deep learning. Early models like word2vec and GloVe learned to embed words into dense vector spaces, capturing semantic relationships between terms. The transformer architecture, introduced by Google in 2017, changed the trajectory of the field — it processes sequences in parallel using attention mechanisms that capture long-range dependencies far more effectively than recurrent networks. This architectural shift made it practical to train models on orders of magnitude more data. GPT-1 in 2018 showed that large-scale unsupervised pre-training followed by fine-tuning could match or beat purpose-built models across a range of language tasks. Understanding what these models actually do removes a lot of the mysticism around them. An LLM is, at its core, a next-token predictor. It takes a sequence of tokens as input and outputs a probability distribution over the vocabulary for the next token. The transformer’s attention mechanism allows every token in the input to attend to every other token, building a rich contextual representation before making that prediction. Training adjusts billions of parameters to minimise prediction error across enormous text corpora. What emerges is a model with broad world knowledge encoded in its weights — not because it was taught facts directly, but because predicting text well requires internalising the structure of the world that produced that text. ChatGPT is OpenAI’s conversational product built on the GPT model series. What set it apart from raw GPT-3 was the addition of reinforcement learning from human feedback (RLHF) — a technique that fine-tunes the base model to follow instructions and produce responses that human raters judge as helpful and safe. When ChatGPT launched in late 2022, it became one of the fastest-adopted consumer products in history. For developers, the more relevant artefact is the API behind it — specifically the Chat Completions API, which gives programmatic access to the same models powering the product.