Deep Dives

Hands-on Guide to Multi-Agent Project Evaluation with Praison AI

Automated project evaluation pipeline using AI agents for fair scoring, PDF reports, and data visualization."

Explore more from ADaSci

Mastering Multimodal Understanding and Generation with Janus-Pro

How Causal Knowledge Graphs Outperform Traditional Knowledge Graphs?

A Hands-On Guide to RecurrentGemma With Hugging Face

A Deep Dive into Continuous Thought Machines

Benchmarking AI on Software Tasks with OpenAI SWE-Lancer

Mastering Tiledesk for Building Chatbots with Custom Knowledge Bases

Generative AI Through the Lens of an AI Scientist: Navigating the Technological Tsunami

Understanding Okapi BM25: A Guide to Modern Information Retrieval

Simplifying Seminal AI Papers: Latent Diffusion Model

Breaking the Language Barrier: Natural Language to SQL Using Large Language Models

Evaluating complex projects, such as hackathon submissions or technical demos, poses a lot of challenges due to the multifaceted nature of innovation, technical depth, user experience, and market potential. Where traditional evaluation methods rely on human judges, which can introduce bias, subjectivity, inconsistency issues. In this article, we explore a different approach which uses multi-agent AI systems that integrates video analysis, audio transcription, and specialized evaluation agents to deliver a structured, objective and comprehensive project assessment.

Table of Content

Introduction
Architecture Overview
Key Features
Practical Use Cases
Technical Deep Dive

Introduction

Evaluating project demos and presentations can be tricky as there’s a lot to consider, from technical complexity and innovation to user experience and market potential. What if we tell you could make this process faster, more objective, and even a bit smarter? In this guide, we’ll walk you through a multi-agent AI system that analyzes project videos, transcribes audio explanations and then scores each submission across multiple dimensions. By the end, you’ll be able to build an AI that can help you generate consistent evaluations, actionable insights, and professional reports making project assessment more efficient, transparent, and insightful than ever before.

Architecture Overview

Video & Audio Extraction

The first step in our evaluation pipeline is to extract meaningful content from project videos. Key frames are captured to highlight important visual moments, while the audio is transcribed into text for analysis. This dual extraction ensures that both visual and spoken content are available for the AI agents, allowing a comprehensive understanding of the project’s presentation and technical details.

Multi-Agent Evaluation

Once the content is extracted, specialized AI agents evaluate different aspects of the project. Each agent focuses on a specific dimension, technical complexity, design and user experience, or market potential allowing for expert-level analysis across multiple facets. By dividing responsibilities, the system can provide more accurate and nuanced feedback than a single evaluator could achieve.

Tech Agent

The Tech Agent is responsible for scoring the project’s technical complexity and identifying key innovations. It analyzes the algorithm design, and problem-solving approach. By highlighting strengths and potential weaknesses, this agent ensures that technical merit is thoroughly evaluated, helping judges or stakeholders understand the depth, feasibility, and originality of the project’s implementation.

Design Agent

The Design Agent evaluates the project’s user experience, presentation quality, and overall completeness. It examines how effectively the interface communicates functionality, whether the flow is intuitive, and how visually engaging the presentation is. By providing feedback on usability and aesthetics, the agent ensures that projects are not only technically sound but also accessible, polished, and impactful for end users.

Market Agent

The Market Agent assesses scalability, business relevance, and market potential. It evaluates how well the project could fit into a real-world context, identifying opportunities for adoption or commercialization. By considering factors like target audience, growth potential, and industry trends, this agent provides insights that go beyond technical performance, helping teams understand the broader implications of their solution.

Aggregator Agent

The Aggregator Agent merges the outputs of all individual evaluators into a unified, structured evaluation. It standardizes scores, compiles qualitative feedback, and generates an output that captures strengths, weaknesses, technical highlights, and recommendations. This ensures a consistent, reproducible assessment that combines insights from multiple perspectives into a single, actionable evaluation for stakeholders or judges.

Report & Visualization

Finally, the system generates interactive reports and visualizations. PDF summaries, CSV datasets, radar charts, and bar charts present the project’s evaluation in a clear, digestible format. By combining textual feedback with visual analytics, this step allows stakeholders to quickly grasp overall performance, compare submissions, and make informed decisions, transforming raw AI evaluations into professional, presentation-ready insights.

Key Features

Weighted Scoring: Evaluations are calculated using customizable weights, balancing technical complexity, innovation, UX, and completeness.
Frame-Based Visual Analysis: Scene-based frame extraction ensures critical visual moments are assessed.
Audio Transcription & Analysis: Whisper-based transcription enables the system to understand and evaluate spoken explanations.
Automated Reporting: Generates professional PDF reports with scores, visualizations, and strengths/weaknesses.
Interactive Visualization: Charts highlight evaluation breakdowns, aiding rapid insight for stakeholders.

Practical Use Cases

Hackathon Evaluation

Quickly evaluate multiple hackathon submissions with consistent, unbiased scoring of technical, design, and market aspects.

Technical Demos & POCs

Assess demos and POCs efficiently, analyzing innovation, feasibility, and technical depth for actionable improvement insights.

Academic Project Assessment

Support faculty in grading complex projects with objective metrics.

Investor Pitches

Provide structured insights on technical feasibility and market readiness.

Step By Step Guide

Step 1: Install Dependencies

!pip install praisonaiagents[llm] opencv-python moviepy pywhisper reportlab matplotlib pandas plotly openai-whisper

Installs all required Python libraries for AI agents, video/audio processing, reporting, and visualization.

Step 2: Import Libraries

import os
import cv2
import re
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from typing import List, Dict
from pydantic import BaseModel, Field
from praisonaiagents import Agent, Task, PraisonAIAgents
from moviepy.editor import VideoFileClip
from IPython.display import display, Image
import whisper
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Image as RLImage, Table
from reportlab.lib.styles import getSampleStyleSheet

Imports all necessary modules for video processing, AI agents, data handling, and report generation.

Step 3: Configure Evaluation Settings

CONFIG = {
    "num_frames": 5,          # how many frames to extract
    "keep_frames": False,     # delete extracted frames after evaluation
    "export_pdf": True,       # generate PDF report or not
    "output_dir": "/content/project_eval",
    "weights": {
        "innovation": 0.2,
        "technical_complexity": 0.25,
        "user_experience": 0.2,
        "presentation_quality": 0.2,
        "completeness": 0.15
    }
}

# Make sure directory exists

os.makedirs(CONFIG["output_dir"], exist_ok=True)

Sets up parameters like number of frames, output paths, scoring weights, and ensures output directory exists.

Step 4: Define Evaluation Data Model

class ProjectEvaluation(BaseModel):
    innovation_score: int = Field(..., ge=0, le=100)
    technical_complexity: int = Field(..., ge=0, le=100)
    presentation_quality: int = Field(..., ge=0, le=100)
    user_experience: int = Field(..., ge=0, le=100)
    completeness: int = Field(..., ge=0, le=100)
    overall_score: float
    key_strengths: List[str]
    areas_for_improvement: List[str]
    technical_highlights: List[str]
    recommendations: List[str]
    market_potential: str
    scalability_assessment: str

Defines the structured schema for storing evaluation scores, feedback, and insights.

Step 5: Extract Key Video Frames

def extract_frames_scene_based(video_path: str, num_frames: int = 5) -> List[str]:
    import cv2, os
    frames_dir = os.path.join(CONFIG["output_dir"], "frames")
    os.makedirs(frames_dir, exist_ok=True)
    cap = cv2.VideoCapture(video_path)
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    interval = max(total_frames // (num_frames + 1), 1)
    frame_paths = []

    for i in range(1, num_frames + 1):
        cap.set(cv2.CAP_PROP_POS_FRAMES, i * interval)
        ret, frame = cap.read()
        if not ret:
            break
        frame_path = os.path.join(frames_dir, f"frame_{i}.jpg")
        cv2.imwrite(frame_path, frame)
        frame_paths.append(frame_path)
    cap.release()
    return frame_paths

Captures evenly spaced video frames for visual analysis.

Step 6: Transcribe Audio from Video

def extract_audio_transcript(video_path: str) -> str:
    clip = VideoFileClip(video_path)
    audio_path = video_path.replace(".mp4", ".wav")
    clip.audio.write_audiofile(audio_path)
    model = whisper.load_model("base")
    result = model.transcribe(audio_path)
    return result["text"]

Converts spoken content in the video into text for AI analysis.

Step 7: Initialize AI Agents

from IPython.display import display, Markdown
os.environ["GEMINI_API_KEY"] = ""

tech_agent = Agent(name="TechEvaluator", role="Tech Expert", goal="Evaluate technical complexity", llm="gemini/gemini-2.0-flash") 

design_agent = Agent(name="DesignEvaluator", role="UX Designer", goal="Evaluate presentation & UX", llm="gemini/gemini-2.0-flash") 

market_agent = Agent(name="MarketAnalyst", role="Business Analyst", goal="Evaluate scalability & market potential", llm="gemini/gemini-2.0-flash") 

aggregator = Agent(name="Aggregator", role="Lead Judge", goal="Merge scores & finalize evaluation", llm="gemini/gemini-2.0-flash")

Creates specialized agents to evaluate technical, design, market, and aggregate results.

Step 8: Define Project Evaluation Function

def evaluate_project(video_path: str) -> ProjectEvaluation:

    # === Frame & Audio Extraction ===
    frames = extract_frames_scene_based(video_path, CONFIG["num_frames"])
    transcript = extract_audio_transcript(video_path)

    # === Tasks for Agents ===
    tasks = [
        Task(
            name="tech_eval",
            description=f"Evaluate the *technical complexity* of this project.\nTranscript:\n{transcript}",
            agent=tech_agent,
            images=frames
        ),

        Task(
            name="design_eval",
            description=f"Evaluate the *user experience & presentation quality*.\nTranscript:\n{transcript}",
            agent=design_agent,
            images=frames
        ),

        Task(
            name="market_eval",
            description=f"Evaluate the *market potential & scalability*.\nTranscript:\n{transcript}",
            agent=market_agent
        ),

        Task(
            name="aggregate_eval",
            description="""

Merge the previous evaluations into a single JSON following this schema:

{
  "innovation_score": int (0-10),
  "technical_complexity": int (0-10),
  "presentation_quality": int (0-10),
  "user_experience": int (0-10),
  "completeness": int (0-10),
  "overall_score": float,
  "key_strengths": [str],
  "areas_for_improvement": [str],
  "technical_highlights": [str],
  "recommendations": [str],
  "market_potential": str,
  "scalability_assessment": str
}

""",
            agent=aggregator,
            output_pydantic=ProjectEvaluation
        )
    ]

    # === Run Agents ===
    agents = PraisonAIAgents(
        agents=[tech_agent, design_agent, market_agent, aggregator],
        tasks=tasks,
        process="sequential",
        verbose=True
    )

    response = agents.start()

    # === Parse Final Aggregator Output ===

    output = None
    if isinstance(response, dict) and "task_results" in response:
        final_task = response["task_results"][-1]
        if hasattr(final_task, "pydantic") and final_task.pydantic:
            output = final_task.pydantic

        elif hasattr(final_task, "raw") and final_task.raw:
            try:
                raw = final_task.raw.strip()
                output = ProjectEvaluation(**json.loads(raw))

            except Exception as e:
                raise ValueError(f"Could not parse aggregator raw output: {e}")

    elif isinstance(response, str):
        try:
            output = ProjectEvaluation(**json.loads(response))

        except Exception as e:
            raise ValueError(f"Aggregator returned invalid JSON string: {e}")

    else:
        raise ValueError(f"Unexpected response type from agents: {type(response)}")

    # === Weighted Overall Score ===

    weighted = sum([
        output.innovation_score * CONFIG['weights']['innovation'],
        output.technical_complexity * CONFIG['weights']['technical_complexity'],
        output.user_experience * CONFIG['weights']['user_experience'],
        output.presentation_quality * CONFIG['weights']['presentation_quality'],
        output.completeness * CONFIG['weights']['completeness']
    ])

    output.overall_score = round(weighted, 2)

    # After evaluation + Markdown display

    if CONFIG["export_pdf"]:
        pdf_path = generate_pdf_report(output, frames)
        print(f"📄 PDF report saved at: {pdf_path}")

    # === Cleanup Frames ===

    if not CONFIG["keep_frames"]:
        for f in frames:
            try: os.remove(f)
            except: pass

        try:
            os.rmdir(os.path.join(os.path.dirname(video_path), "frames"))
        except: pass

    # === Markdown Report ===

    report_md = f"""

# 📊 Final Evaluation Report

**Overall Score:** {output.overall_score}/10  
---
### ✅ Key Strengths
- {"\n- ".join(output.key_strengths)}

### ⚠️ Areas for Improvement
- {"\n- ".join(output.areas_for_improvement)}
### 🔧 Technical Highlights
- {"\n- ".join(output.technical_highlights)}
### 💡 Recommendations
- {"\n- ".join(output.recommendations)}
---
### 🌍 Market Potential
{output.market_potential}
---
### 📈 Scalability Assessment
{output.scalability_assessment}

"""
    display(Markdown(report_md))

    return output  # JSON-compatible object

Runs all AI agents sequentially on extracted frames and transcript, returning structured evaluation results.

Step 9: Save Results to JSON & CSV

from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Image as RLImage, Table
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.pagesizes import A4

def save_results(project_name: str, result: ProjectEvaluation):
    json_path = os.path.join(CONFIG["output_dir"], f"{project_name}.json")
    csv_path = os.path.join(CONFIG["output_dir"], f"results.csv")
    with open(json_path, "w") as f:
        json.dump(result.dict(), f, indent=2)
    df = pd.DataFrame([result.dict()])
    if os.path.exists(csv_path):
        df.to_csv(csv_path, mode="a", header=False, index=False)

    else:
        df.to_csv(csv_path, index=False)
    return json_path, csv_path

Stores evaluation data in both JSON and CSV formats for record-keeping.

Step 10: Generate PDF Report

def generate_pdf_report(evaluation: ProjectEvaluation, frame_paths: List[str], output_path="/content/evaluation_report.pdf"):

    doc = SimpleDocTemplate(output_path, pagesize=A4)
    styles = getSampleStyleSheet()
    elements = []
    elements.append(Paragraph("📊 Hackathon Project Evaluation Report", styles['Title']))
    elements.append(Spacer(1, 12))
    elements.append(Paragraph(f"Overall Score: {evaluation.overall_score}/10", styles['Heading2']))
    elements.append(Spacer(1, 12))
    # Category Scores Table

    data = [
        ["Innovation", evaluation.innovation_score],
        ["Technical Complexity", evaluation.technical_complexity],
        ["Presentation Quality", evaluation.presentation_quality],
        ["User Experience", evaluation.user_experience],
        ["Completeness", evaluation.completeness]
    ]

    table = Table(data, hAlign="LEFT")
    elements.append(table)
    elements.append(Spacer(1, 20))

    # Sections
    def add_section(title, items):
        elements.append(Paragraph(title, styles['Heading2']))
        if isinstance(items, list):
            for i in items:
                elements.append(Paragraph(f"- {i}", styles['Normal']))
        else:
            elements.append(Paragraph(items, styles['Normal']))
        elements.append(Spacer(1, 12))

    add_section("✅ Key Strengths", evaluation.key_strengths)
    add_section("⚠️ Areas for Improvement", evaluation.areas_for_improvement)
    add_section("🔧 Technical Highlights", evaluation.technical_highlights)
    add_section("💡 Recommendations", evaluation.recommendations)
    add_section("🌍 Market Potential", evaluation.market_potential)
    add_section("📈 Scalability Assessment", evaluation.scalability_assessment)

    # Add Frames (Screenshots)
    elements.append(Paragraph("🎞️ Extracted Frames", styles['Heading2']))

    for frame in frame_paths:
        try:
            elements.append(RLImage(frame, width=250, height=150))
            elements.append(Spacer(1, 12))

        except:
            pass
    doc.build(elements)
    return output_path

Creates a professional PDF report including scores, feedback, and extracted frames.

Step 11: Visualize Results

def visualize_results(result: ProjectEvaluation):
    categories = ["Innovation", "Tech", "UX", "Presentation", "Completeness"]
    scores = [result.innovation_score, result.technical_complexity, result.user_experience, result.presentation_quality, result.completeness]

    # Radar chart
    fig = go.Figure(data=go.Scatterpolar(r=scores, theta=categories, fill='toself'))
    fig.update_layout(title="Project Evaluation Radar Chart", polar=dict(radialaxis=dict(visible=True, range=[0,100])))
    fig.show()

    # Bar chart
    plt.bar(categories, scores)
    plt.title("Evaluation Breakdown")
    plt.show()

Displays radar and bar charts for a quick visual understanding of project scores.

Step 12: Run Full Evaluation

video_file = "/content/presentation.mp4"

try:
    result = evaluate_project(video_file)
    json_path, csv_path = save_results("project1", result)
    visualize_results(result)
    print(f"Results saved: {json_path}, {csv_path}")

except Exception as e:
    print(f"❌ Error: {e}")

Executes the complete evaluation pipeline on a video, saves results, and shows visualizations.

Output

As you can see, The project scores highly in presentation quality and completeness, with strong performance in innovation and UX, while technical complexity is slightly behind the other areas.

Result

Final Thoughts

The integration of AI-driven multi-agent evaluation systems offers an objective, scalable, and reproducible approach to project assessment. By combining video analysis, audio transcription, and specialized evaluators, organizations can standardize evaluation metrics, improve transparency, and accelerate decision-making. Whether for hackathons, academic projects, or investor pitches, this system provides a powerful blueprint for modern project assessment.

References

Aniruddha Shrikhande

Aniruddha Shrikhande is an AI enthusiast and technical writer with a strong focus on Large Language Models (LLMs) and generative AI. Committed to demystifying complex AI concepts, he specializes in creating clear, accessible content that bridges the gap between technical innovation and practical application. Aniruddha's work explores cutting-edge AI solutions across various industries. Through his writing, Aniruddha aims to inspire and educate, contributing to the dynamic and rapidly expanding field of artificial intelligence.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Latest Courses

Hands-on Guide to Multi-Agent Project Evaluation with Praison AI

Explore more from ADaSci

Table of Content

Introduction

Architecture Overview

Video & Audio Extraction

Multi-Agent Evaluation

Tech Agent

Design Agent

Market Agent

Aggregator Agent

Report & Visualization

Key Features

Practical Use Cases

Step By Step Guide

Final Thoughts

References

Aniruddha Shrikhande

The Chartered Data Scientist Designation

Elevate Your Team's AI Skills with our Proven Training Programs

Our AI Courses

[Upcoming Webinar] Autonomous Enterprises: How to leverage Agentic AI in Enterprises?

Webinar Recording – How to Become an Agentic AI Engineer?

Agentic AI Workforce Readiness Strategies for CXOs

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.

The power of intelligence to propel humanity and make a difference

Our Accrediations

CDS Program

Membership

About

For Organizations

Journal