Memberships

Individual Membership
Join the world’s leading Data Science professional community. You can access both General & Premium Memberships.

Learn More

Corporate Membership
Any corporate, organization or academic institution having common interests in the AI field can become a member of ADaSci.

Learn More
Accreditations

Institutional Accreditation
Our accreditation is a mark of excellence, validating the quality, relevance, and industry alignment of your programs, products, and services.

Learn More

Chartered Data Scientist™
The Chartered Data Scientist (CDS) credential gives a strong understanding of advanced data science profession and in-depth, applied analytics skills.

Learn More

Certified Generative AI Engineer
An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models.

Learn More
Continuous Learning

Our Latest Courses

[Upcoming Hands-on Workshop] Integrating MCP and A2A to Build Modular AI Ecosystems

$19.99
Add to cart

Mastering LLM Observability with Arize Phoenix

$49.99
Add to cart

Vibe Coding Bootcamp: Build Apps with AI and No Code

$49.90
Add to cart

Generative AI Crash Course with Hands-on Implementations

Original price was: $39.99.Current price is: $0.00.
Add to cart

Hi, Welcome back!

Keep me signed in
Forgot Password?

Don't have an account? Register Now

Access all Courses
Corporate Trainings
Contact

Lattice | Volume 3 ISSUE-1

Blended Document Similarity based on Text & Image Features

Author(s):Anand Jha

Explore more from ADaSci

Implementing RAG Pipelines using LightRAG and GPT-4o mini

Enhancing Investment Committee Decisions with LLM-Powered Q&A Assistance : Best practices for Building LLM-Powered Enterprise Knowledge Retrieval

Ensemble Model-Based Vulnerability Assessment of IndianOil Pipelines for Theft Prevention

A practical approach to enhance user engagement by optimizing time to market in mobile advertisement

PII Detection in Emails through QLoRA Fine-tuned LLMs: A comparative analysis with BERT and GPT3.5

Hands-On Guide to build an AI-Driven Local Search Engine with Ollama

Diving into AutoGen Studio for Building Multi-Agent Systems

InnovFaceNet: Deep Face Recognition for Industrial Environments

How Does RAG Enhance the Contextual Understanding of LLMs?

DeepSeek-V3 Explained: Optimizing Efficiency and Scale

Abstract:

Document Similarity could be a building block for many useful applications, including Information Retrieval, Document Clustering, and Question-Answering Systems, to name a few. In the modern digital world, Informative Documents are composed of Text, Images and Videos. In such a scenario, similarity-based purely on Text, Image or Video may not be adequate. Hence a metrics blending similarity on all these aspects should be used. In this paper, a weighted similarity measure based on Texts and Images has been developed, using some popular open-source Machine Learning (ML) libraries. This provides a flexible and easy method without using large training data, which often is the case with ML tasks.