Deep Dives

Mastering Lightweight AI with Falcon 3 : A Hands-On Guide

Falcon 3 redefines AI with its optimized architecture, extended context handling, and quantized models for efficient deployment. This guide covers its features, implementation, and real-world applications.

Explore more from ADaSci

Harnessing AI for Detection and Correction of Hallucinations in Large Search Systems

How does Modular RAG improve upon Naive RAG?

Integrating Continue AI with VS Code to Boost Coding Efficiency

Harnessing Generative AI: Unlocking Business Potential in the Early Frontier

Understanding FLAME, The Factuality Aware Alignment for LLMs

Optimizing Cost per Click for Digital Advertising Campaigns

What are the Benefits of Chartered Data Scientist™

Mastering Tiledesk for Building Chatbots with Custom Knowledge Bases

Brain tumor Detection and Classification using EfficientNet-B5 and Attention-based Global Average Pooling with Explainable AI

Choosing the Right Generative AI Training Providers for Your Team

AI is redefining industries and transforming the way we communicate with technology. Its full potential is, however, constrained by infrastructural and accessibility issues. Presenting Falcon 3, TII’s most recent large language model (LLM) that is available as open source. Falcon 3, which can operate smoothly on small devices, combines remarkable performance with unparalleled efficiency in an effort to democratise powerful AI. This article offers a thorough guide examining Falcon 3’s architecture, capabilities, and real-world applications.

Table of Content

Introduction to Falcon 3
Falcon’s Key Features
Hands-On Implementation
Technical Deep Dive
Enhanced Capabilities

Introduction to Falcon 3

The cutting-edge LLM Falcon 3 redefines efficiency and scalability. It performs exceptionally well on tasks like reasoning, language comprehension, and code generation and comes in four model sizes: 1B, 3B, 7B, and 10B. Falcon 3 guarantees flawless performance even on devices with limited resources because of its quantised variants (GGUF, AWQ, and GPTQ) and optimised decoder-only architecture.

Why Choose Falcon 3?

High Accessibility: Runs on lightweight infrastructures.
State-of-the-Art Performance: Surpasses global benchmarks for small LLMs.
Versatile Applications: Supports generative tasks, conversational AI, and more.

Falcon 3’s Key Features

Key Features Overview

1. Optimized Architecture

It employs a decoder-only design with flash attention and Grouped Query Attention (GQA), reducing memory overhead while enhancing speed and efficiency.

2. Advanced Tokenization

The tokenizer supports an extensive vocabulary of 131K tokens, which is double of Falcon 2 model, enabling superior compression and exceptional performance across diverse tasks.

3. Extended Context Handling

With native training on 32K context size, Falcon 3 excels at processing long and complex inputs.

4. Quantization for Efficiency

Quantized versions (int4, int8, and 1.58 Bitnet) ensure deployment on low-resource environments without performance compromise.

Performance Benchmark

Hands-On Implementation

We will be testing out Falcon using ollama on colab

Step 1: Installing Dependencies

The first stage involves preparing your Colab environment. You’ll need to install two key components:

pciutils: Helps Ollama detect GPU configurations

Ollama installation script: Sets up the Ollama service

!sudo apt update
!sudo apt install -y pciutils
!curl -fsSL https://ollama.com/install.sh | sh

Step 2: Starting the Ollama Service

Since Jupyter Notebooks run code sequentially, we’ll use Python’s threading to run the Ollama service in the background:

import threading
import subprocess
import time

def run_ollama_serve():
    subprocess.Popen(["ollama", "serve"])

thread = threading.Thread(target=run_ollama_serve)
thread.start()
time.sleep(5)  # Allows service to initialize

Step 3: Pulling a Language Model

Ollama offers a wide range of models. For this article we will be pulling falcon3 10b model.

!ollama pull falcon3:10b

Step 4: Integrating with LangChain

To interact with the model, we’ll use LangChain’s Ollama integration:

from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM
from IPython.display import Markdown

template = """Question: {question}

Answer: Let's think step by step."""

prompt = ChatPromptTemplate.from_template(template)

model = OllamaLLM(model="falcon3:10b")

chain = prompt | model

display(Markdown(chain.invoke({"question": "This theorem states that there are no integers \(a\), \(b\), and \(c\) that can satisfy the equation \(a^{n}+b^{n}=c^{n}\) for \(n>2\). Andrew Wiles proved the theorem in 1994. "})))

Output

Step 1: Identify the given information in the question.

The question states that there are no integers (a), (b), and (c) that can satisfy the equation (a^{n}+b^{n}=c^{n}) for (n>2).

Step 2: Recognize the theorem mentioned in the question.

The given information is a statement of Fermat's Last Theorem, which was proposed by Pierre de Fermat in the 17th century.

Step 3: Understand the significance of Fermat's Last Theorem.

Fermat's Last Theorem is a famous problem in number theory, stating that there are no three positive integers (a), (b), and (c) that can satisfy the equation (a^{n}+b^{n}=c^{n}) for any integer value of (n) greater than 2.

Step 4: Identify the person who proved the theorem.

The question mentions that Andrew Wiles proved Fermat's Last Theorem in 1994.

Step 5: Conclude the answer based on the information provided.

Fermat's Last Theorem states that there are no integers (a), (b), and (c) that can satisfy the equation (a^{n}+b^{n}=c^{n}) for (n>2), and it was proven by Andrew Wiles in 1994.

Final answer: Fermat's Last Theorem.

Technical Deep Dive

Training Paradigm

Trained on 14 trillion tokens, doubling the capacity of its predecessor, Falcon 2.
Enhanced with multi-stage training to improve reasoning and mathematical capabilities.

Deployment Insights

Grouped Query Attention (GQA): Optimizes inference by minimizing Key-Value (KV) cache memory.
Quantized Models: Int4 and Int8 models ensure Falcon 3 runs efficiently without GPU acceleration.

Model Specifications

Advancements in Falcon 3

Enhanced Capabilities

The Falcon 3 family excels across scientific, reasoning, and general knowledge tasks, as demonstrated by internal evaluations using lm-evaluation-harness. Key highlights include:

Math Capabilities: 10B-Base achieves 22.9 on MATH-Lvl5 and 83.0 on GSM8K, showcasing its ability to tackle complex mathematical problems.
Coding Proficiency: 10B-Base scores 73.8 on MBPP, while 10B-Instruct achieves 45.8 on Multipl-E, demonstrating strong generalization in programming-related tasks.
Extended Context Handling: Models support up to 32K tokens (8K for Falcon3-1B), with 10B-Instruct scoring 86.3 on BFCL.
Improved Reasoning: 7B-Base and Falcon3-10B-Base achieve 51.0 and 59.7 on BBH, reflecting advanced reasoning capabilities.
Scientific Knowledge Expansion: Performance on MMLU benchmarks highlights domain-specific strengths, with Falcon3-7B-Base scoring 67.4/39.2 (MMLU/MMLU-PRO) and Falcon3-10B-Base achieving 73.1/42.5 (MMLU/MMLU-PRO).

Final words

Falcon 3 sets a new standard in accessible AI, offering unprecedented performance and versatility. Whether you’re a researcher exploring innovative applications or a developer building efficient AI systems, It empowers you to achieve more with less. Start your journey today by downloading Falcon 3 and exploring its capabilities.

References

Falcon 3’s Official Website

Aniruddha Shrikhande

Aniruddha Shrikhande is an AI enthusiast and technical writer with a strong focus on Large Language Models (LLMs) and generative AI. Committed to demystifying complex AI concepts, he specializes in creating clear, accessible content that bridges the gap between technical innovation and practical application. Aniruddha's work explores cutting-edge AI solutions across various industries. Through his writing, Aniruddha aims to inspire and educate, contributing to the dynamic and rapidly expanding field of artificial intelligence.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our AI Courses

Build AI Agents with Google ADK
₹1,713.00
Add to cart

Our Latest Courses

Mastering Lightweight AI with Falcon 3 : A Hands-On Guide

Explore more from ADaSci

Table of Content

Introduction to Falcon 3

Why Choose Falcon 3?

Key Features Overview

1. Optimized Architecture

2. Advanced Tokenization

3. Extended Context Handling

4. Quantization for Efficiency

Hands-On Implementation

Technical Deep Dive

Training Paradigm

Deployment Insights

Enhanced Capabilities

Final words

References

Aniruddha Shrikhande

The Chartered Data Scientist Designation

Elevate Your Team's AI Skills with our Proven Training Programs

Our AI Courses

Build AI Agents with Google ADK

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.

The power of intelligence to propel humanity and make a difference

Our Accrediations

CDS Program

Membership

About

For Organizations

Journal