Can an AI agent tutor students through math — teaching them to find the answer rather than handing it over — without hallucinating?

[

Overview

]

Fibo is our internal R&D project exploring how AI can support education. It's an AI math tutor for secondary and high-school students that helps them understand fundamental concepts and practice — through a teaching-first chat, learning videos, and quizzes — proving an AI agent can guide learning, not just spit out answers.

Fibo — AI Math Tutor

[ Year ]

2024

[ Context ]

We wanted to test whether an AI agent could stand in for a tutor: making one-to-one help more accessible and affordable. The central risk was obvious — AI hallucinations. If the tutor confidently teaches the wrong method, it does more harm than good.

[ Solution ]

A from-scratch proof of concept: a cross-platform Flutter app, built by a team of two, backed by a NestJS/PostgreSQL/Redis service, with LangChain and OpenAI driving a teaching-first chat.

The problem space

The promise of AI in education is one-to-one tutoring at near-zero marginal cost — something only wealthier families can buy today. But education is exactly where hallucinations are most dangerous: a tutor that's confidently wrong teaches bad methods. So the real R&D question wasn't "can AI answer math questions" (ChatGPT can, roughly) — it was "can AI *teach* math reliably enough to trust with a student," which means controlling accuracy and resisting hallucination on exam-level problems.

developers built the full cross-platform MVP (Flutter)

ways to learn in one app: AI chat tutor, learning videos, and quizzes

answers handed over without teaching — Fibo guides students to find them

Technology choices

What we evaluated, what we chose, and why.

Chosen

Flutter

One codebase let just two developers ship a high-quality MVP for Android and iOS — the lean-team economics that make internal R&D viable.

Chosen

NestJS + PostgreSQL (+ Redis)

A progressive Node.js framework serving the API, PostgreSQL for reliable data, and Redis to push performance-heavy work to background threads — keeping response times low.

Chosen

LangChain + OpenAI

The NLP/generation core, enabling context-aware tutoring interactions.

Chosen

Tree-of-Thought prompt chain

A chain of prompts based on the Tree-of-Thought method, which (per our tests) dramatically improved accuracy and relevance — the key lever against hallucination.

Chosen

TeX

A typesetting standard for rendering mathematical formulas precisely on screen — a genuine front-end challenge for a math app.

Evaluated

Raw ChatGPT, no prompt engineering

Rejected as insufficient alone. It performed adequately on basic exercises but hallucinated on advanced problems — which is exactly why the Tree-of-Thought chain was needed.

Evaluated

Custom / fine-tuned math model

Not needed at PoC stage. Prompt engineering plus Tree-of-Thought met the accuracy bar far faster.

The POC in action

The working thing — capabilities, not a scope list.

Teaching-first AI chat

The tutor is tuned to help students understand *how* to reach an answer rather than just providing it — the difference between a tutor and an answer key.

Accurate math, properly rendered

A Tree-of-Thought prompt chain improves answer accuracy, and TeX renders formulas cleanly — we iterated specifically on getting AI-generated formulas to display correctly.

In-app knowledge base

Learning videos and quizzes let students study and practice alongside the chat.

Lean, cross-platform

A two-person team delivered the MVP on both Android and iOS via Flutter.

Results & takeaways

Honest feasibility findings.

Confirmed

AI can effectively support students learning math

We proved an AI agent can act as a teaching-first tutor — guiding students to find answers themselves, not just supplying them.

Confirmed

Tree-of-Thought prompt chaining reins in hallucination

Chaining prompts via the Tree-of-Thought method dramatically improved accuracy on exam-level problems versus naive prompting — the crux of making AI trustworthy enough for education.

Limitation found

Validating an AI product is a two-track, never-ending cycle

Quality splits into AI-independent testing (deterministic, like normal QA) and AI-dependent testing (non-deterministic — small prompt changes swing results). The latter demands continuous iteration; it's an ongoing loop, not a one-time sign-off.

Next step

Scale the teaching-first agent across the edtech market

The same Tree-of-Thought, guide-don't-answer approach extends beyond foundational math into other STEM subjects, exam and test prep, and language learning — and licenses well to schools, tutoring platforms, and corporate upskilling. Proven hallucination control is what makes it sellable into education, not just demoable.