Kunth Shah

available for work

Automated Test Paper Checker

An AI-powered automated test paper checking system built for UPAY, an educational organisation struggling with a significant operational challenge in student assessment. UPAY was manually processing between 2,000 to 4,000 student answer sheets every month, with teachers spending an average of 15 minutes per paper on mechanical grading tasks. This translated to over 1,000 hours of monthly labour dedicated solely to evaluation, resulting in delayed feedback cycles and teacher burnout.

The solution needed to address three critical constraints: maintain grading accuracy comparable to human teachers, operate within an extremely limited budget, and preserve educational quality through human oversight. The resulting system reduces grading time to near-zero while maintaining 85-90% accuracy through a human-in-the-loop workflow, operating at just $0.00375 per paper compared to commercial alternatives costing $0.10-0.50 per page.

Live Preview

Client

UPAY NGO

Client

UPAY NGO

Technology

React, N8N, Vercel

Technology

React, N8N, Vercel

Timeline

AI powered auomated test paper checker for multi-lingual test paper checking

Timeline

AI powered auomated test paper checker for multi-lingual test paper checking

The system architecture leverages a carefully selected technology stack optimized for cost efficiency and scalability. Google Gemini 2.5 Flash serves as the AI grading engine, chosen for its superior multimodal capabilities in processing handwritten text across multiple languages (English, Hindi, and regional languages) while being 60% more cost-effective than alternatives like GPT-4. N8N handles workflow orchestration, providing visual automation that eliminates the need for weeks of custom backend development. Supabase offers both PostgreSQL database persistence and serverless Edge Functions for email delivery, while a React-based admin dashboard provides the critical human oversight interface.

The complete workflow begins when teachers submit answer sheets through Google Forms, which automatically uploads documents to Google Drive and populates a tracking spreadsheet. Every 60 seconds, N8N polls for new submissions and triggers a multi-step pipeline: downloading answer sheets from Drive, retrieving subject-specific answer keys when available, constructing structured prompts with explicit grading criteria, calling the Gemini API with base64-encoded images, receiving JSON-formatted responses with per-question breakdowns, and storing results in Supabase with a "GRADED" status.

The critical differentiator is the mandatory admin review layer. Rather than immediately distributing AI-generated grades, the system presents all results in a filterable dashboard where admins can view detailed question-by-question analysis, compare student responses against answer keys, and identify potential grading errors. When discrepancies are found, admins can trigger an iterative recheck workflow by providing specific feedback, such as "Re-evaluate Question 3 for partial credit." This updates the submission status to "RECHECKING," sends the paper back to Gemini with additional context and admin instructions, and generates revised scores that preserve the full audit trail. Only after explicit admin approval are results distributed via automated emails containing total scores, detailed breakdowns, and personalised AI-generated feedback.

This project validates a fundamental principle in applied AI: effective integration requires augmentation rather than replacement of human expertise. By positioning AI as a first-pass grader that handles mechanical evaluation while humans provide quality control and nuanced judgment, the system achieves the productivity gains of automation without sacrificing educational integrity. Teachers who previously spent hours on routine checking now focus their expertise on providing meaningful pedagogical feedback and addressing learning gaps identified through the grading process.

The technical implementation demonstrates that production-grade AI systems can be built on minimal budgets through strategic architecture decisions. By leveraging free-tier infrastructure (Supabase's 500MB database allowance, Vercel's hobby tier, self-hosted N8N) and cost-optimized AI models, the system processes thousands of papers monthly for approximately $15 in variable costs. The open-source codebase serves as a practical blueprint for educational institutions and developers building similar workflow automation systems where reliability, human oversight, and cost efficiency are non-negotiable requirements.

Check out some of my recent projects.

Neo ThriveUs

LLM Powered GST Finder

Crypto Price Movement Predicition

See All

Kunth Shah

available for work

Automated Test Paper Checker

An AI-powered automated test paper checking system built for UPAY, an educational organisation struggling with a significant operational challenge in student assessment. UPAY was manually processing between 2,000 to 4,000 student answer sheets every month, with teachers spending an average of 15 minutes per paper on mechanical grading tasks. This translated to over 1,000 hours of monthly labour dedicated solely to evaluation, resulting in delayed feedback cycles and teacher burnout.

The solution needed to address three critical constraints: maintain grading accuracy comparable to human teachers, operate within an extremely limited budget, and preserve educational quality through human oversight. The resulting system reduces grading time to near-zero while maintaining 85-90% accuracy through a human-in-the-loop workflow, operating at just $0.00375 per paper compared to commercial alternatives costing $0.10-0.50 per page.

Live Preview

Client

UPAY NGO

Client

UPAY NGO

Technology

React, N8N, Vercel

Technology

React, N8N, Vercel

Timeline

AI powered auomated test paper checker for multi-lingual test paper checking

Timeline

AI powered auomated test paper checker for multi-lingual test paper checking

The system architecture leverages a carefully selected technology stack optimized for cost efficiency and scalability. Google Gemini 2.5 Flash serves as the AI grading engine, chosen for its superior multimodal capabilities in processing handwritten text across multiple languages (English, Hindi, and regional languages) while being 60% more cost-effective than alternatives like GPT-4. N8N handles workflow orchestration, providing visual automation that eliminates the need for weeks of custom backend development. Supabase offers both PostgreSQL database persistence and serverless Edge Functions for email delivery, while a React-based admin dashboard provides the critical human oversight interface.

The complete workflow begins when teachers submit answer sheets through Google Forms, which automatically uploads documents to Google Drive and populates a tracking spreadsheet. Every 60 seconds, N8N polls for new submissions and triggers a multi-step pipeline: downloading answer sheets from Drive, retrieving subject-specific answer keys when available, constructing structured prompts with explicit grading criteria, calling the Gemini API with base64-encoded images, receiving JSON-formatted responses with per-question breakdowns, and storing results in Supabase with a "GRADED" status.

The critical differentiator is the mandatory admin review layer. Rather than immediately distributing AI-generated grades, the system presents all results in a filterable dashboard where admins can view detailed question-by-question analysis, compare student responses against answer keys, and identify potential grading errors. When discrepancies are found, admins can trigger an iterative recheck workflow by providing specific feedback, such as "Re-evaluate Question 3 for partial credit." This updates the submission status to "RECHECKING," sends the paper back to Gemini with additional context and admin instructions, and generates revised scores that preserve the full audit trail. Only after explicit admin approval are results distributed via automated emails containing total scores, detailed breakdowns, and personalised AI-generated feedback.

This project validates a fundamental principle in applied AI: effective integration requires augmentation rather than replacement of human expertise. By positioning AI as a first-pass grader that handles mechanical evaluation while humans provide quality control and nuanced judgment, the system achieves the productivity gains of automation without sacrificing educational integrity. Teachers who previously spent hours on routine checking now focus their expertise on providing meaningful pedagogical feedback and addressing learning gaps identified through the grading process.

The technical implementation demonstrates that production-grade AI systems can be built on minimal budgets through strategic architecture decisions. By leveraging free-tier infrastructure (Supabase's 500MB database allowance, Vercel's hobby tier, self-hosted N8N) and cost-optimized AI models, the system processes thousands of papers monthly for approximately $15 in variable costs. The open-source codebase serves as a practical blueprint for educational institutions and developers building similar workflow automation systems where reliability, human oversight, and cost efficiency are non-negotiable requirements.