Hands-On Guide to Running LLMs Locally using Ollama

Explore how Ollama enables local execution of large language models for enhanced privacy and cost savings.

Downloading and executing large language models locally can be beneficial in terms of privacy and cost considerations. The data stays on the device locally and isn’t subjected to third-party terms of service, also, there are no inference costs involved that allow execution of tasks requiring a large amount of tokens. Ollama is one of the most used tools for running and executing LLMs locally. 

This article explores the Ollama platform for downloading and running LLMs locally. 

Table of Content

  1. Understanding Ollama 
  2. Local Execution of LLMs using Ollama Shell
  3. Ollama API Calling through Python

Understanding Ollama

Ollama is an open-source tool designed to assist users in setting up and running large language models such as Phi2, Llama3, etc. locally. This tool is built based on llama.cpp, a C++ library specifically designed for efficient local inference of LLMs on different platforms and hardware configurations. Llama.cpp is an inference of Meta’s base Llama model in pure C/C++ and it supports a huge range of LLMs for finetuning and other advanced tasks. 

Ollama supports a wide range of LLMs which can be viewed on the official website of Ollama – https://ollama.com/library and Git repo (https://github.com/ollama/ollama) as well. 

Ollama Supported Model Examples

Local Execution of LLMs using Ollama Shell

Step 1: Visit https://ollama.com/download, download the Ollama installer and install it. 

Step 2: Once the installation is finished, run Ollama and check if it’s under execution through the terminal command –  

sachintripathi@Sachins-MacBook-Air ~ % ollama

Step 3: Download and run the open-source LLM model of your choice. I’m using Phi-2 model for demonstration here – 

sachintripathi@Sachins-MacBook-Air ~ % ollama run phi

Step 4: Once the model is pulled and executed, we can provide prompts and generate responses. 

>>> can you tell me about India in brief? 

Step 5: Let’s exit the Ollama shell using /? command

>>> /?

Step 6: Let’s execute a prompt and save its response in a text file for easy accessibility – 

sachintripathi@Sachins-MacBook-Air ~ % ollama run phi "can you tell me about India in brief?" >> response.md

The response.md is generated with the response of the entered prompt. 

Ollama API Calling through Python

Ollama can also be implemented locally using API calls if the Ollama server is running. 

Step 1: Create a Python script and import requests and JSON packages – 

import json
import requests

Step 2: Create the URL, headers and data variables encapsulating the Ollama localhost URL, content type and model details – 

url = "http://localhost:11434/api/generate"

headers = {
   "Content-Type": "application/json"
}

data = {
   "model": "phi",
   "prompt": "can you tell me about India in brief?",
   "stream": False
}

Step 3: Implement the post method of the response package and pass the artefacts declared in Step 2. Store the response in a variable and extract the response text from the JSON object using the code shown below – 

response = requests.post(url, headers = headers, data = json.dumps(data))

if response.status_code == 200:
   response_text = response.text
   data = json.loads(response_text)
   actual_response = data['response']
   print(actual_response)
else:
   print("Error:", response.status_code, response.text)

Output

The execution of the script generates the response as per the given prompt based on Phi2 open-source LLM which ran locally. 

Final Words

By enabling users to execute and prompt LLMs locally on their machines, Ollama empowers a wider range of audience to use and learn Generative AI safely and easily. This open-source project allows the LLMs to be more easily accessible, breaking down the cost and privacy limitations and allowing the users to innovate and research easily. 

References

  1. Link to Code
  2. Ollama Official Website
  3. Llama.cpp Git Repo
  4. Ollama Git Repo
Picture of Sachin Tripathi

Sachin Tripathi

Sachin Tripathi is the Manager of AI Research at AIM, with over a decade of experience in AI and Machine Learning. An expert in generative AI and large language models (LLMs), Sachin excels in education, delivering effective training programs. His expertise also includes programming, big data analytics, and cybersecurity. Known for simplifying complex concepts, Sachin is a leading figure in AI education and professional development.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.