Tutorial 3 — Adding AI to Your Worker

What Is the AI Binding?

Every Cloudflare Worker can be given access to powerful AI models — including Meta’s Llama models — running directly on Cloudflare’s network. No API key to manage, no separate account, no monthly bill. You just connect it like any other binding (the same way you’ll later connect storage or a database) and one line of code does the rest.

💡 This is the foundation of every chat app in this series. PrivateAI, Qluv, DistantGhost — every one of them is this same binding, called from a Worker exactly like the one you built in Tutorial 2.

Step 1 — Add the AI Binding

1Open your Worker’s settings

Go to your Worker from Tutorial 2 in the Cloudflare dashboard. Click the Bindings tab.

2Add a binding

Click Add a binding. Choose Workers AI from the list. For the variable name, type exactly: AI (all capitals). Click Save and deploy.

💡 Why capital AI? Whatever name you give the binding here is the exact name you’ll use in your code as env.AI. Naming it AI keeps your code clean and matches every example in this series.

Step 2 — The One Line That Calls AI

With the binding in place, calling an AI model is a single line:

const response = await env.AI.run(modelId, { messages });

That’s it. modelId tells Cloudflare which AI model to use, and messages is the conversation you’re sending it — a list of who said what.

⚠️ Use the exact model name — this one mistake breaks more sites than anything else. Cloudflare periodically retires older model names. A site that worked perfectly for a month can suddenly fail with no warning if the model it calls gets deprecated. The fix is always the same: swap in a current model name. The one used throughout this series, reliable as of this writing, is:

"@cf/meta/llama-3.1-8b-instruct-fp8-fast"

Notice the -fp8-fast suffix — that marks it as a current, optimized version. If you ever see a Worker fail with a vague "exception" error and nothing else changed, check the model name first.

Step 3 — Build a Question-Answering Worker

Let’s extend the Worker from Tutorial 2 so it answers a question typed right into the URL. Replace your code with this:

export default {
  async fetch(request, env) {
    const url = new URL(request.url);
    const question = url.searchParams.get("q") || "Say hello and introduce yourself in one sentence.";

    const response = await env.AI.run("@cf/meta/llama-3.1-8b-instruct-fp8-fast", {
      messages: [
        { role: "user", content: question }
      ]
    });

    const html = "<!DOCTYPE html><html><head><title>Ask AI</title></head><body><h1>Question:</h1><p>" + question + "</p><h1>Answer:</h1><p>" + response.response + "</p></body></html>";

    return new Response(html, {
      headers: { "Content-Type": "text/html;charset=UTF-8" }
    });
  }
};

Click Save and deploy.

1Test with the default question

Visit your Worker’s URL with nothing added. You’ll see the AI introduce itself.

2Ask your own question

Add ?q= to the end of your URL followed by a question, like:

yourworker.workers.dev/?q=What is the capital of France

Refresh and you’ll get a real, live AI-generated answer — running entirely on your own Worker.

🎉 Take a moment with this. You just built something that, a few years ago, would have required a server, an API key, a monthly subscription, and hundreds of lines of setup code. Now it’s one binding and a handful of lines — running for free on Cloudflare’s global network.

What You Learned

✓What the Workers AI binding is and why it needs no API key
✓How to add the AI binding to a Worker
✓The exact line of code that calls an AI model: env.AI.run()
✓Why model names matter and what -fp8-fast means
✓How to read a question from the URL and return a real AI answer

← All Tutorials Tutorial 4 — Build a Chat App →