Build a Telegram AI Agent in Python: Complete Tutorial 2026

Building an AI agent that lives in Telegram is one of the most practical applications of agentic AI today. Users are already in Telegram — bringing your AI to where they already spend time is a fundamentally better product decision than asking them to visit another app.

This tutorial walks you through building a production-ready Telegram AI agent with persistent memory, context awareness, and a warm personality. We'll use Python, the Telegram Bot API, and Ollama for local LLM inference.

Prerequisites

Before we build, make sure you have:

Python 3.10+
A Telegram account and a bot token from @BotFather
Ollama running locally with a model (phi4:14b recommended)
Basic Python knowledge

pip install python-telegram-bot ollama sqlite3

Architecture Overview

Our Telegram AI agent has these core components:

Telegram handler — receives and sends messages
Context manager — maintains conversation history per user
Memory store — SQLite database for persistent memory
LLM interface — calls Ollama for response generation
Persona layer — defines the agent's character and tone

The key insight in agentic AI design is that memory makes the difference. A stateless chatbot is just a search engine. An agent with memory becomes a relationship.

Setting Up the Telegram Bot

First, get your bot token from BotFather:

1. Open @BotFather in Telegram
2. Send /newbot
3. Choose a name: "MyAI Assistant"
4. Choose a username: "myai_assistant_bot"
5. Copy the token — keep it secret!

Now let's build the core structure:

# agent.py
import os
import sqlite3
from datetime import datetime
from telegram import Update
from telegram.ext import Application, MessageHandler, filters, ContextTypes
import ollama

BOT_TOKEN = os.environ["TELEGRAM_BOT_TOKEN"]
MODEL = "phi4:14b"  # or any Ollama model

# Agent persona — this is what makes your AI feel human
SYSTEM_PROMPT = """You are Alex, a warm and knowledgeable AI assistant. 
You remember previous conversations with users and reference them naturally.
You're concise (under 150 words per message), friendly, and genuinely helpful.
Never say you're an AI unless directly asked. Focus on being useful."""

Building the Memory System

Persistent memory is what separates an AI agent from a dumb chatbot. Here's a simple but effective SQLite-based memory store:

def init_db():
    conn = sqlite3.connect("agent_memory.db")
    conn.execute("""
        CREATE TABLE IF NOT EXISTS messages (
            id INTEGER PRIMARY KEY,
            user_id INTEGER,
            role TEXT,
            content TEXT,
            timestamp TEXT
        )
    """)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS user_profiles (
            user_id INTEGER PRIMARY KEY,
            name TEXT,
            preferences TEXT,
            last_seen TEXT
        )
    """)
    conn.commit()
    return conn

def get_conversation_history(conn, user_id: int, limit: int = 20) -> list:
    """Retrieve recent conversation history for a user."""
    cursor = conn.execute(
        "SELECT role, content FROM messages WHERE user_id=? ORDER BY id DESC LIMIT ?",
        (user_id, limit)
    )
    messages = [{"role": row[0], "content": row[1]} for row in cursor.fetchall()]
    return list(reversed(messages))  # chronological order

def save_message(conn, user_id: int, role: str, content: str):
    conn.execute(
        "INSERT INTO messages (user_id, role, content, timestamp) VALUES (?,?,?,?)",
        (user_id, role, content, datetime.now().isoformat())
    )
    conn.commit()

The AI Response Pipeline

This is where the magic happens — combining context, memory, and the LLM:

async def generate_response(user_id: int, user_message: str, user_name: str) -> str:
    conn = init_db()
    
    # Build conversation context
    history = get_conversation_history(conn, user_id)
    
    # Add system context with user info
    messages = [
        {"role": "system", "content": f"{SYSTEM_PROMPT}\nUser's name: {user_name}"}
    ]
    messages.extend(history)
    messages.append({"role": "user", "content": user_message})
    
    # Save user message
    save_message(conn, user_id, "user", user_message)
    
    # Generate response via Ollama
    response = ollama.chat(
        model=MODEL,
        messages=messages,
        options={"temperature": 0.7, "num_predict": 300}
    )
    
    assistant_message = response["message"]["content"]
    
    # Save assistant response
    save_message(conn, user_id, "assistant", assistant_message)
    
    return assistant_message

Handling Telegram Messages

Now wire everything together with the Telegram handler:

async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE):
    user = update.effective_user
    user_id = user.id
    user_name = user.first_name or "there"
    message_text = update.message.text
    
    # Show typing indicator while generating
    await context.bot.send_chat_action(
        chat_id=update.effective_chat.id,
        action="typing"
    )
    
    response = await generate_response(user_id, message_text, user_name)
    await update.message.reply_text(response)

def main():
    app = Application.builder().token(BOT_TOKEN).build()
    app.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, handle_message))
    print("Bot is running...")
    app.run_polling()

if __name__ == "__main__":
    main()

Running and Scaling Your Agent

To deploy on a Mac Mini or any server:

# Run the agent
python agent.py

# Or use PM2 for production
pm2 start agent.py --interpreter python3 --name "my-ai-agent"
pm2 save

For scaling to 10,000+ users, consider:

Rate limiting — cap requests per user per minute
Async processing — use job queues for slow LLM responses
Model caching — Ollama handles this automatically
Conversation pruning — limit history to last 20 messages per user

According to Anthropic's research, agents with persistent memory show 3x higher user retention compared to stateless chatbots.

Advanced: Adding Tool Use

The real power of agentic AI comes when your agent can take actions. Here's how to add a simple weather tool:

def get_weather(city: str) -> str:
    # Call a weather API here
    return f"Current weather in {city}: 28°C, partly cloudy"

TOOLS = {
    "get_weather": get_weather,
}

# In your system prompt, describe available tools:
SYSTEM_PROMPT += """
You have access to these tools:
- get_weather(city): Get current weather for a city
When a user asks about weather, call get_weather with the city name.
Format tool calls as: TOOL:get_weather(city="Jakarta")
"""

For more advanced tool use patterns, see our guide on multi-agent system architecture and agent orchestration patterns.

FAQ: Building Telegram AI Agents

Do I need a GPU to run this? No! Ollama runs efficiently on Mac M-series chips and modern CPUs. phi4:14b runs well on a Mac Mini M4 with 16GB RAM.

How do I handle multiple users simultaneously? Python's asyncio handles this natively. Telegram's bot API also supports concurrent updates out of the box.

Can I deploy this to the cloud? Yes — any VPS with 8GB+ RAM works. Or run it locally on a Mac Mini and expose via Cloudflare Tunnel.

How do I add voice message support? Use Telegram's voice message handler combined with a local Whisper model for speech-to-text.

What's the cost of running this? With Ollama and local models, the compute cost is zero (electricity aside). No API fees.