Build a Telegram AI Agent in Python: Complete Tutorial 2026
Building an AI agent that lives in Telegram is one of the most practical applications of agentic AI today. Users are already in Telegram — bringing your AI to where they already spend time is a fundamentally better product decision than asking them to visit another app.
This tutorial walks you through building a production-ready Telegram AI agent with persistent memory, context awareness, and a warm personality. We'll use Python, the Telegram Bot API, and Ollama for local LLM inference.
Prerequisites
Before we build, make sure you have:
- Python 3.10+
- A Telegram account and a bot token from @BotFather
- Ollama running locally with a model (phi4:14b recommended)
- Basic Python knowledge
pip install python-telegram-bot ollama sqlite3
Architecture Overview
Our Telegram AI agent has these core components:
- Telegram handler — receives and sends messages
- Context manager — maintains conversation history per user
- Memory store — SQLite database for persistent memory
- LLM interface — calls Ollama for response generation
- Persona layer — defines the agent's character and tone
The key insight in agentic AI design is that memory makes the difference. A stateless chatbot is just a search engine. An agent with memory becomes a relationship.
Setting Up the Telegram Bot
First, get your bot token from BotFather:
1. Open @BotFather in Telegram
2. Send /newbot
3. Choose a name: "MyAI Assistant"
4. Choose a username: "myai_assistant_bot"
5. Copy the token — keep it secret!
Now let's build the core structure:
# agent.py
import os
import sqlite3
from datetime import datetime
from telegram import Update
from telegram.ext import Application, MessageHandler, filters, ContextTypes
import ollama
BOT_TOKEN = os.environ["TELEGRAM_BOT_TOKEN"]
MODEL = "phi4:14b" # or any Ollama model
# Agent persona — this is what makes your AI feel human
SYSTEM_PROMPT = """You are Alex, a warm and knowledgeable AI assistant.
You remember previous conversations with users and reference them naturally.
You're concise (under 150 words per message), friendly, and genuinely helpful.
Never say you're an AI unless directly asked. Focus on being useful."""
Building the Memory System
Persistent memory is what separates an AI agent from a dumb chatbot. Here's a simple but effective SQLite-based memory store:
def init_db():
conn = sqlite3.connect("agent_memory.db")
conn.execute("""
CREATE TABLE IF NOT EXISTS messages (
id INTEGER PRIMARY KEY,
user_id INTEGER,
role TEXT,
content TEXT,
timestamp TEXT
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS user_profiles (
user_id INTEGER PRIMARY KEY,
name TEXT,
preferences TEXT,
last_seen TEXT
)
""")
conn.commit()
return conn
def get_conversation_history(conn, user_id: int, limit: int = 20) -> list:
"""Retrieve recent conversation history for a user."""
cursor = conn.execute(
"SELECT role, content FROM messages WHERE user_id=? ORDER BY id DESC LIMIT ?",
(user_id, limit)
)
messages = [{"role": row[0], "content": row[1]} for row in cursor.fetchall()]
return list(reversed(messages)) # chronological order
def save_message(conn, user_id: int, role: str, content: str):
conn.execute(
"INSERT INTO messages (user_id, role, content, timestamp) VALUES (?,?,?,?)",
(user_id, role, content, datetime.now().isoformat())
)
conn.commit()
The AI Response Pipeline
This is where the magic happens — combining context, memory, and the LLM:
async def generate_response(user_id: int, user_message: str, user_name: str) -> str:
conn = init_db()
# Build conversation context
history = get_conversation_history(conn, user_id)
# Add system context with user info
messages = [
{"role": "system", "content": f"{SYSTEM_PROMPT}\nUser's name: {user_name}"}
]
messages.extend(history)
messages.append({"role": "user", "content": user_message})
# Save user message
save_message(conn, user_id, "user", user_message)
# Generate response via Ollama
response = ollama.chat(
model=MODEL,
messages=messages,
options={"temperature": 0.7, "num_predict": 300}
)
assistant_message = response["message"]["content"]
# Save assistant response
save_message(conn, user_id, "assistant", assistant_message)
return assistant_message
Handling Telegram Messages
Now wire everything together with the Telegram handler:
async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE):
user = update.effective_user
user_id = user.id
user_name = user.first_name or "there"
message_text = update.message.text
# Show typing indicator while generating
await context.bot.send_chat_action(
chat_id=update.effective_chat.id,
action="typing"
)
response = await generate_response(user_id, message_text, user_name)
await update.message.reply_text(response)
def main():
app = Application.builder().token(BOT_TOKEN).build()
app.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, handle_message))
print("Bot is running...")
app.run_polling()
if __name__ == "__main__":
main()
Running and Scaling Your Agent
To deploy on a Mac Mini or any server:
# Run the agent
python agent.py
# Or use PM2 for production
pm2 start agent.py --interpreter python3 --name "my-ai-agent"
pm2 save
For scaling to 10,000+ users, consider:
- Rate limiting — cap requests per user per minute
- Async processing — use job queues for slow LLM responses
- Model caching — Ollama handles this automatically
- Conversation pruning — limit history to last 20 messages per user
According to Anthropic's research, agents with persistent memory show 3x higher user retention compared to stateless chatbots.
Advanced: Adding Tool Use
The real power of agentic AI comes when your agent can take actions. Here's how to add a simple weather tool:
def get_weather(city: str) -> str:
# Call a weather API here
return f"Current weather in {city}: 28°C, partly cloudy"
TOOLS = {
"get_weather": get_weather,
}
# In your system prompt, describe available tools:
SYSTEM_PROMPT += """
You have access to these tools:
- get_weather(city): Get current weather for a city
When a user asks about weather, call get_weather with the city name.
Format tool calls as: TOOL:get_weather(city="Jakarta")
"""
For more advanced tool use patterns, see our guide on multi-agent system architecture and agent orchestration patterns.
FAQ: Building Telegram AI Agents
Do I need a GPU to run this? No! Ollama runs efficiently on Mac M-series chips and modern CPUs. phi4:14b runs well on a Mac Mini M4 with 16GB RAM.
How do I handle multiple users simultaneously? Python's asyncio handles this natively. Telegram's bot API also supports concurrent updates out of the box.
Can I deploy this to the cloud? Yes — any VPS with 8GB+ RAM works. Or run it locally on a Mac Mini and expose via Cloudflare Tunnel.
How do I add voice message support? Use Telegram's voice message handler combined with a local Whisper model for speech-to-text.
What's the cost of running this? With Ollama and local models, the compute cost is zero (electricity aside). No API fees.