What is LLM function calling? How AI agents connect to external tools and APIs

LLM function calling lets agents generate structured JSON to invoke external tools instead of hallucinating data. Covers how it works and use cases

Nicolas Lecomte

Published March 20, 2026

13 min

Your AI agent confidently returns stock prices from its training data. But the numbers are two years out of date. Users complain, and you realize the model can reason about problems but can't access live data.

LLM function calling solves this by letting language models generate structured JSON requests for external tools rather than free-form text. When a user asks for real-time stock prices, the model doesn't hallucinate numbers. Instead, it outputs structured JSON like {"function": "get_stock_price", "parameters": {"symbol": "AAPL"}}. Your application executes this against a live API and returns accurate information.

Language models can reason about problems, but they can't directly trigger external actions. Function calling bridges this gap to turn models from passive reasoning systems into active orchestrators. The model specifies which tools to invoke and what parameters to use. Meanwhile, your application handles execution. This separation maintains security while creating genuine utility.

Function calling directly impacts your risk surface, compliance posture, operational overhead, and infrastructure costs as you scale AI agents into production. Understanding these tradeoffs helps you design secure, performant, and cost‑effective systems rather than inheriting them as unplanned technical debt. This article covers how function calling works, the infrastructure challenges it creates, and common implementation patterns.

What is LLM function calling?

Function definitions use JSON Schema to describe available tools. Each schema specifies the function name, a description the model uses to decide when to call it, and parameter definitions with types and constraints. The description matters more than you might expect. Models rely on it to match user intent to the right function.

When multiple functions are available, the model evaluates which one best matches the user's request. A query about "recent orders" routes to get_order_history() rather than get_product_details() based on semantic matching against function descriptions. But ambiguous queries sometimes trigger the wrong function or no function at all. Clear, specific descriptions reduce routing errors.

Your application must handle malformed outputs. Models occasionally generate invalid JSON, reference nonexistent functions, or pass parameters that fail type validation. Robust implementations wrap function call parsing in try-catch blocks, validate against the original schema before execution, and return structured errors that help the model self-correct on retry.

A practical example: coding agents for PR review

Consider a coding agent that helps engineering teams analyze and review code changes. Without function calling, the agent can only discuss code review best practices in general terms.

With function calling, the agent becomes genuinely useful. It can access the codebase through functions like fetch_pull_request(), analyze_code_diff(), and check_test_coverage().

When a developer asks the agent to review a pull request, the model calls get_pr_diff(pr_id="1234"). This function retrieves the code changes for analysis. Then it executes run_tests(branch="feature-x") to run the test suite in an isolated sandbox environment. Finally, it calls check_coverage(files=["auth.py", "utils.py"]) to analyze test coverage.

The agent orchestrates results from multiple function calls into a coherent response. This pattern extends to other coding agent use cases: generating code, executing it in sandboxes, analyzing results, and iterating based on feedback.

How does LLM function calling work?

Function schemas follow JSON Schema Draft 2020-12 with provider-specific extensions. Each function definition includes a name, description, and parameters object. The parameters object defines argument types, required fields, enums for constrained values, and nested object structures. OpenAI's implementation adds a strict mode that enforces exact schema compliance. Anthropic requires explicit type annotations for each parameter.

The model doesn't switch between "text mode" and "function mode" as a binary state. During token generation, the model predicts the next token from its vocabulary, which includes special tokens that signal function call intent. When the model generates a function call token, subsequent tokens are constrained to produce valid JSON matching the provided schemas. This constrained decoding ensures syntactic validity but doesn't guarantee semantic correctness. The model might generate valid JSON with nonsensical parameter values.

Parallel and sequential function calling follow different patterns. Sequential calling waits for each result before proceeding. The model calls function A, receives the result, then decides whether to call function B based on that result. Parallel calling allows the model to request multiple functions simultaneously when the calls are independent. For example, a weather agent might request forecasts for three cities in parallel rather than waiting for each response sequentially. Your application receives an array of function calls and can execute them concurrently.

Multi-turn context accumulates through message history. Each function call and its result become part of the conversation context. The model sees the full sequence: user message, function call request, function result, assistant response. This history consumes tokens. Long workflows with many function calls can exceed context limits. So production systems often summarize or prune older function results to stay within token budgets.

Error handling requires structured feedback. When a function fails, return an error object that the model can interpret. Include the error type, a human-readable message, and suggested corrections if applicable. Models can often self-correct on retry when given specific feedback about what went wrong. A response like {"error": "invalid_date_format", "message": "Expected ISO 8601 format", "received": "March 5"} gives the model enough information to reformulate the request correctly.

The infrastructure layer

Function calling documentation emphasizes prompt engineering and schema design. But production deployments reveal a more complex infrastructure challenge. The LLM outputs structured function specifications. Actual function execution occurs across diverse infrastructure environments.

The isolation technology you choose determines your security posture. Containers share the host operating system kernel. This creates potential container escape vectors when executing untrusted LLM-generated code. An attacker who compromises a container can potentially access the host system or other tenants' workloads.

MicroVMs solve this problem through hardware-level isolation. Each workload runs in its own kernel with dedicated memory boundaries enforced by the hypervisor. Executing untrusted code in multi-tenant environments requires stronger isolation than containers provide. While containers boot quickly with minimal overhead, they don't meet the security requirements for production AI agents executing arbitrary code.

Infrastructure latency becomes visible when agents make multiple sequential function calls. Traditional serverless cold starts take two to five seconds to initialize a new execution environment, depending on the runtime.

For multi-call workflows, this compounds dramatically. A three-call workflow accumulates significant latency overhead. Some specialized platforms address this by keeping execution environments in standby mode so they can resume in milliseconds with complete state preserved.

Agent co-hosting further reduces latency by running agent logic and function execution in the same environment. Traditional architectures require network round-trip between the orchestrator and the function execution layer. Co-hosted agents eliminate this overhead entirely. The agent and its tools share the same memory space.

State management across function calls

LLMs are stateless. Each function call executes independently unless context is explicitly preserved. Production systems require deliberate architectural decisions.

Traditional serverless offers infinite horizontal scaling and pay-per-use pricing. However, it requires external state stores for multi-turn workflows. Redis provides sub-100ms access for active sessions. Databases add durability. But running both increases complexity and cost.

Some specialized platforms take a different approach. They maintain state automatically during active sessions without external stores. State persists across function calls within a session. You write to durable storage only when the session ends or on periodic intervals.

The production-grade pattern for traditional serverless loads agent state from Redis for fast read/write access. It then writes to databases on periodic intervals for durability. This reduces the complexity of managing Redis and databases together for most agent workflows.

Advantages of LLM function calling

Function calling creates several measurable advantages over traditional text-only completions:

Structured output allows reliable integration: Function calling produces JSON with predefined schemas that validate parameters before API calls. This eliminates brittle parsing and hallucination risks.
Real-time data access overcomes training limitations: Models can query databases, fetch current prices, and retrieve customer records during conversations. For instance, an e-commerce agent checks live inventory instead of relying on stale training data.
Multi-step workflows become possible: Agents can orchestrate complex sequences: query a database, call a search API, execute generated code, and synthesize results.
Function calling becomes the universal interface: Users interact with multiple backend systems through structured API integration rather than learning specialized UIs.

These capabilities transform what agents can accomplish, but they introduce new requirements around security, latency, and state management that teams must address.

Challenges of LLM function calling

Production deployments face several challenges that require deliberate architectural decisions:

Security vulnerabilities multiply: Standard containers share the host kernel, creating escape vectors when executing LLM-generated code. Production systems require microVM isolation.
Privilege escalation risks emerge: According to OWASP's Top 10 for LLM Applications 2025, improper validation of LLM outputs can lead to downstream security exploits. This includes code execution that compromises systems.
Data exfiltration becomes possible: Prompt injection attacks can manipulate agents into extracting sensitive information through function calls. Multi-tenant environments face additional risks. Embeddings from one customer could be inadvertently retrieved for another.
Latency overhead affects user experience: Each function call adds network round trips, infrastructure boot time, and execution duration.
State management adds complexity: Multi-turn conversations require external mechanisms to preserve context.

These challenges explain why many agent projects stall between prototype and production. The gap isn't in the LLM capabilities but in the infrastructure required to run function execution securely and reliably.

Common use cases of LLM function calling

Function calling patterns vary by workload type. Understanding where they're most established helps teams evaluate whether their infrastructure can support the requirements each pattern creates.

Coding agents represent the most validated production use case. These agents generate code, execute it in sandboxes, analyze results, and iterate. PR review agents fetch code changes, run test suites in isolated environments, and analyze coverage. Code generation agents write functions, execute them against test cases, and refine based on failures. Both patterns require secure isolation for untrusted code, fast execution for interactive feedback, and state persistence across complex multi-file projects.

Teams building agent‑native developer tools often combine coding agents with real‑time previews and fast sandbox provisioning, like how SpawnLabs enables real‑time previews for coding agents using Blaxel.

Other popular LLM function calling use cases include:

Data analysis workflows convert natural language questions into SQL queries, execute them against databases, and produce readable reports.
Customer support systems access order history, initiate refunds, schedule callbacks, and update account information based on customer needs.
Enterprise knowledge systems query internal databases, retrieve documents, and enforce access controls during conversations.

Each of these use cases requires security, speed, and state management simultaneously. Traditional infrastructure forces tradeoffs between these requirements.

How to secure your function calling implementation

Production AI agents executing LLM-generated code require security at multiple layers. Input validation, permission scoping, and infrastructure isolation each address different attack vectors.

Validate all function call outputs before execution

Schema validation catches type mismatches and missing required fields. But semantic validation matters too. Check that referenced IDs exist in your systems. Verify that requested operations fall within the user's permissions. Rate limit function calls to prevent runaway execution.

For sensitive operations like payments or data deletion, require explicit human approval regardless of how confident the model appears.

Scope permissions to the minimum required

Each function should have access only to the data and operations it needs. A function that reads order history shouldn't have write access to the orders table. A function that generates code shouldn't have network access unless specifically required.

Implement these boundaries at the infrastructure level, not just in application code. If a malicious prompt tricks the model into requesting elevated permissions, the infrastructure should reject the request.

Sanitize outputs before returning to users

Agents can inadvertently include sensitive information in responses. A customer support agent might expose internal IDs, pricing logic, or data from other customers if outputs aren't filtered.

Implement output validation that checks for PII, internal identifiers, and data that shouldn't leave the system. Make sure to log all function calls and their results for audit trails.

Use VM-level isolation for code execution

Running untrusted code in containers creates escape risks because containers share the host kernel. MicroVMs provide hardware-enforced boundaries where each workload runs in its own kernel. This prevents a compromised sandbox from accessing the host system or other tenants' data.

Try a perpetual sandbox platform for LLM function calling

Coding agents represent the most validated production use case for function calling infrastructure. These agents generate code, execute it in isolated environments, analyze results, and iterate. They require the exact combination of security isolation, low latency, and state persistence that traditional infrastructure struggles to provide.

Blaxel is a perpetual sandbox platform that provides microVM-based compute environments designed specifically for AI agent workloads. Each sandbox runs in its own hardware-enforced boundary, eliminating container escape risks. Sandboxes resume from standby in under 25ms with complete state preserved. After 15 seconds of inactivity, they scale to zero but maintain filesystem and memory for instant resume. So you pay compute costs only during active execution.

Function calling is fundamentally about tool execution. MCP Servers Hosting deploys your tool servers as serverless endpoints with automatic authentication, rate limiting, and built-in observability. Agents discover and invoke tools dynamically through the Model Context Protocol without hardcoded function definitions. Each MCP server boots in 25ms and handles requests up to 15 minutes in duration. For data analysis workflows that split datasets across parallel queries, Batch Jobs scales to thousands of concurrent executions.

Sign up free to deploy your first agent, or book a demo to discuss your infrastructure requirements with the Blaxel team.

FAQs about LLM function calling

What is the difference between function calling and tool use in LLMs?

Function calling and tool use describe the same capability. Different providers use varying terminology. The underlying architecture is identical. The model outputs structured JSON specifying which external function to invoke and what parameters to pass. Your application executes the function and returns results to the model.

Which major LLM providers support function calling?

Major frontier models from OpenAI, Anthropic, and Google all support function calling through their APIs. OpenAI uses explicit JSON tool definitions through a tools parameter. Anthropic implements through tool use parameters. Google Gemini supports both OpenAPI-compatible JSON Schema and Python function definitions.

How does function calling affect LLM costs?

Function calling increases costs in two ways. First, function schemas consume tokens in every request. Second, multi-turn function calling workflows require multiple API calls. Optimize by including only relevant functions per request and consolidating related tools where possible.

What happens when an LLM hallucinates function parameters?

Your application must validate all function call outputs before execution. Implement schema validation to catch type mismatches. Verify that referenced IDs exist in your systems. Apply rate limiting to prevent runaway function calls. For security-sensitive operations, require human approval or use deterministic validation rules independent of LLM reasoning.

COMPUTE

STORAGE

NETWORKING

Get started for free