Mozhii AL
An AI-powered Tamil chemistry tutoring platform for Sri Lankan Advanced Level students — providing structured past-paper explanations, topic learning paths, and related question discovery across 45 years of exam history.
01 Product Overview
Mozhii AL is a Tamil-first, AI-powered A/L Chemistry tutoring platform built for Sri Lankan Advanced Level students. The platform provides structured past-paper question lookup, detailed Tamil explanations, topic-based learning paths, and related question discovery — built on a low-cost, production-ready architecture.
Mission
Core Problem Being Solved
- Tamil-medium A/L students lack quality explanation resources in their language
- Past-paper websites show answers only — no explanation of why
- Private tutors are expensive and inaccessible in rural areas
- No platform connects related questions across 1980–2025 exam history
Competitive Advantage
| Feature | Other Platforms | Mozhii AL |
|---|---|---|
| Language | English / Sinhala | Tamil (Primary) |
| Explanation | Answer only | Step-by-step Tamil |
| Related Questions | None | 1980–2025 linked |
| Topic Learning | None | Full topic pages |
| Sri Lanka Syllabus | Partial | 100% aligned |
| Cost | Subscription | Low / Free tier |
02 Product Scope — Phase 1
Launch Scope
- Subject: Chemistry only
- Question count: 2,000 questions
- Years covered: 1980 – 2025
- Question type: MCQ (Paper 1)
- Medium: Tamil
What Students Can Do
- Search any past-paper question by year and number
- View step-by-step Tamil explanation
- See why each option is correct or wrong
- Discover related questions from across 45 years
- Navigate topics: Organic Chemistry, Physical Chemistry, etc.
- Click a topic and get theory + practice questions
Excluded from Phase 1
- Student login / accounts
- Payment system
- Teacher dashboard
- Mobile app (web-first)
- Voice input / output
- Fine-tuned AI model
- Advanced analytics
- Multi-subject support
03 System Architecture
Production Flow
Tech Stack
LLM Usage Rules
The LLM is called ONLY in these situations:
- Student asks for a longer or simpler explanation
- Student asks for a related concept explained
- Topic page theory formatting
- First-time generation of an explanation (then cached)
The LLM is NEVER called for:
- Answering what the correct answer is
- Returning a stored explanation that is already cached
- Simple answer-only requests
04 Data Architecture
Every question in Mozhii AL is stored across three interconnected data layers.
Layer 1 — Question Layer
The core past-paper question database containing raw question text, options, and correct answer.
Layer 2 — Explanation Layer
Stored separately so explanations can be reviewed and updated independently of the source question. This is what makes Mozhii AL better than normal past-paper websites.
Layer 3 — Topic Layer
Every question is tagged to the A/L Chemistry syllabus. This enables topic-click learning and related question discovery.
Database Schema
Table: questions
Table: explanations
Table: topics
05 Query Parser
Students may type questions in many different formats. The parser must handle all of them and produce one normalized internal object.
Supported Input Formats
| Student Input | Parsed Result |
|---|---|
| Chemistry - 22 - 2002 | { year: 2002, q: 22 } |
| Chem 2002 22 | { year: 2002, q: 22 } |
| 2002 chem q22 explain | { year: 2002, q: 22 } |
| Chemistry 2002 Q22 | { year: 2002, q: 22 } |
| chem q22 2002 | { year: 2002, q: 22 } |
| இரசாயனவியல் 2002 22 | { year: 2002, q: 22 } |
Parser Implementation
Validation Rules
- Year missing → Ask: "Which year? (e.g., 2002)"
- Question number missing → Ask: "Which question number?"
- Subject missing → Assume Chemistry (Phase 1 only)
- Multiple matches → Show options list
06 Response Format
Every question lookup returns this exact structure, displayed to the student in Tamil.
Standard Question Response
API Response JSON
07 Data Collection Workflow
Folder Structure
Master CSV Columns
| Column | Description |
|---|---|
| question_id | Unique ID e.g. chem_2002_p1_q22 |
| subject | Chemistry |
| medium | Tamil |
| year | e.g. 2002 |
| paper | Paper 1 / Paper 2 |
| question_no | e.g. 22 |
| question_text | Full question in Tamil |
| option_a/b/c/d | MCQ options |
| correct_answer | A / B / C / D |
| main_topic | e.g. Organic Chemistry |
| subtopic | e.g. Alcohols |
| concepts | Comma-separated |
| difficulty | easy / medium / hard |
| explanation_status | draft / ai_generated / human_checked / teacher_verified |
| source_file | PDF filename |
Data Collection Phases
- Phase 1: Collect 2,000 questions with correct answers
- Phase 2: Add topic, subtopic, concept tags to all questions
- Phase 3: Generate draft Tamil explanations using LLM
- Phase 4: Teacher review and verification of explanations
- Phase 5: Generate embeddings for vector similarity search
Explanation Priority Order
Do not try to explain all 2,000 questions at once. Prioritize:
- Most-repeated topics (Organic Chemistry, Electrochemistry)
- Questions from recent 10 years (2015–2025)
- High-difficulty questions
- Old rare questions from 1980–1990
Explanation Status Workflow
08 Related Questions System
How Related Questions Are Found
- Embed every question using
text-embedding-3-small - For each question, find top 10 most similar vectors
- Filter: same
main_topicgets highest priority - Filter: same
subtopicgets very high priority - Save top 5 as related questions in
similar_questionstable - Teacher reviews important ones manually
Similarity Priority Rules
| Similarity Type | Priority |
|---|---|
| Same topic AND same concept | Very High |
| Same topic, different concept | High |
| Same concept, different topic | Medium-High |
| Same year nearby | Low |
| Similar wording only | Medium |
Example
09 Topic Pages
Topic Navigation Structure
Topic Page Content
When a student clicks a topic (e.g., Organic Chemistry → Alcohols), they see:
- Topic explanation in Tamil
- Key theory and definitions
- Important reactions and formulas
- Common exam mistakes
- All past paper questions from this topic (1980–2025)
- 5 practice questions
Topic Page Flow
10 Cost Architecture
Estimated Monthly Cost at 4,000 Users
| Service | Plan | Est. Monthly Cost |
|---|---|---|
| Supabase (DB + pgvector) | Free tier | LKR 0 |
| Vercel (Frontend) | Free tier | LKR 0 |
| Cloudflare R2 (Storage) | Free 10GB | LKR 0 – 600 |
| Gemini Flash (LLM) | Pay per use | LKR 600 – 2,400 |
| OpenAI Embeddings | One-time setup | LKR 120 (once) |
| TOTAL | LKR 600 – 3,000/month |
Cost-Saving Rules
answer_cache. All future students get the cached version instantly — no LLM cost.11 Development Roadmap
- Collect 2,000 Chemistry questions
- Add correct answers from official keys
- Add topic / subtopic / concept tags
- Create source file references
- Build query parser
- Build exact DB lookup
- Build basic frontend: search + results
- Add explanation fields to DB
- Generate draft explanations in batch
- Manual review of high-priority questions
- Publish verified explanations
- Create question embeddings
- Run similarity search for all questions
- Filter by topic, save related IDs
- Display related questions on page
- Create topic hierarchy in DB
- Build topic pages with theory
- Add vector search for theory chunks
- Add answer_cache for all common queries
- Add IP rate limiting
- Add feedback / report mistake button
- Deploy to Vercel + Supabase production
- Soft launch to first 1,000 users
12 What NOT to Build First
| Feature | Reason to Defer |
|---|---|
| Fine-tuning AI model | Expensive — not needed with good RAG |
| Full chatbot memory | Adds complexity with little benefit at start |
| Student dashboard | Build after login is proven useful |
| Payment system | Start free, validate product first |
| Teacher dashboard | Manual process is fine at 2,000 questions |
| Mobile app | Web-first, responsive design covers mobile |
| Voice input/output | High engineering cost, low initial demand |
| Multi-subject | Chemistry only for Phase 1 |
| Advanced analytics | Not needed until you have steady traffic |
13 Hosting Infrastructure
Architecture Diagram
Upgrade Path
Start with free tiers. Upgrade only when you hit limits:
| When | Action |
|---|---|
| DB > 500MB | Upgrade Supabase to $25/month plan |
| LLM cost > $20/month | Increase caching, review batch generation |
| Traffic > 10k req/day | Add Redis cache layer |
| Need mobile app | Build React Native from existing Next.js logic |
14 Security & Rate Limiting
Rate Limiting Strategy
Input Sanitization
- Sanitize all student query inputs before DB lookup
- Use parameterized SQL queries — never string concatenation
- Validate year range: 1980 to 2025 only
- Validate question number range: 1 to 100
- Reject inputs exceeding 500 characters
LLM Prompt Injection Protection
15 Frontend Design Guidelines
UI Principles
- Mobile-first: Most students access from phones
- Tamil font support: Use Google Fonts Noto Sans Tamil
- High contrast: Readable in bright sunlight
- Minimal data usage: Compress assets, lazy load images
- Fast load: Target <2 seconds on 3G connection
Key Pages
| Page | URL | Purpose |
|---|---|---|
| Home / Search | / | Main search interface |
| Question Detail | /q/[id] | Answer + explanation + related |
| Topic Index | /topics | Browse all topics |
| Topic Page | /topics/[slug] | Theory + practice questions |
| Year Browse | /years/[year] | All questions from one year |
| Search Results | /search | Full-text search results |
| About | /about | Platform info |
Search Box Behavior
16 Summary
Most Important Work Order
- Build clean Chemistry question database (2,000 questions)
- Add topic tagging to every question
- Add explanation fields and pre-generate in Tamil
- Build query parser for all student input formats
- Add related question search via embeddings
- Add topic pages with theory + practice
- Add LLM only for explanation formatting
- Cache everything — generate once, serve always
Architecture Summary
| Component | Choice |
|---|---|
| Database | Supabase Postgres + pgvector |
| Frontend | Next.js on Vercel (free) |
| Backend | Next.js API Routes |
| Storage | Cloudflare R2 |
| LLM | Gemini 2.0 Flash |
| Embeddings | text-embedding-3-small |
| Caching | answer_cache DB table |
| Rate Limiting | Next.js middleware (free) |
| Est. Monthly Cost | LKR 600 – 3,000 for 4,000 users |
