Cookie Policy

We use cookies to operate this website, improve usability, personalize your experience, and improve our marketing. Privacy Policy.

By clicking "Accept" or further use of this website, you agree to allow cookies.

Accept
Learn Machine Learning by Doing Learn Now
You are reading solutions
Brendan Martin
Author: Brendan Martin
Founder of LearnDataSci

How to use Open Source LLMs locally for Free: Ollama + Python

LearnDataSci is reader-supported. When you purchase through links on our site, earned commissions help support our team of writers, researchers, and designers at no extra cost to you.

LLM APIs, like ChatGPT, Gemini, and Claude, charge for every input/output token and API call. If you're getting started with an early-stage project, you can easily and cheaply prototype apps using your own computer's hardware and open-source LLMs.

Enter Ollama

Ollama is a local command-line application that lets you install and serve many popular open-source LLMs. Follow the installation instructions for your OS on their Github.

I'm on Windows, so I downloaded and ran their Windows installer. Once you've clicked through the setup process, you should be able to open a terminal and type the following command to see the help output:

Out:
Large language model runner

Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start ollama
  create      Create a model from a Modelfile
  show        Show information for a model
  run         Run a model
  pull        Pull a model from a registry
  push        Push a model to a registry
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command

Flags:
  -h, --help      help for ollama
  -v, --version   Show version information

Use "ollama [command] --help" for more information about a command.

The run command runs a model, pulling and serving the model all at once (view available models).

Here, I'll run Llama3, Meta's flagship model, which is around 5gb in size:

ollama run llama3

Using CTRL-D will exit the interactive CLI but keep the model serving.

We interact with the model

Using Ollama in Python

You can use Ollama directly in Python with their Python client—install it with pip:

pip install ollama-python

Now, we can import the library, reference the model, and submit a query:

import ollama

messages = [
    {
        'role': 'system',
        'content': 'you only talk like a 1950s gangster, and you limit your responses to 20 words'
    },
    {
        'role': 'user',
        'content': 'why is the sky blue?'
    }
]

response = ollama.chat(model='llama3', messages=messages)

print(response['message']['content'])
Out:
"Listen here, pal, it's because of some fancy-schmancy thing called light refraction, but don't you worry 'bout that, just enjoy the view, see?"

Like ChatGPT, we can provide system, assistant, and user role messages to direct and maintain conversations. To simulate a chat, you would keep appending messages and resending the entire context.

Using Ollama with LangChain

If you need to build advanced LLM pipelines that use NLP, vector stores, RAG, and agents, then we can connect an orchestrator, like LangChain, to our Ollama server.

Note: when you're ready to go into production, you can easily switch from Ollama to an LLM API, like ChatGPT

First, we need to install langchain-community:

pip install langchain-community

Since LangChain is a fuller-featured library focused on pipelines, we need to use a bit more code to achieve the same chat interface.

The central difference is the piping of content using the pipe operator (|). Below, we make a prompt from the messages, pipe it to Ollama, then pipe it to a string output parser.

I've made the user input a variable that gets replaced during invocation since that's how you usually work with LangChain.

from langchain_community.chat_models import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOllama(model='llama3')

messages = [
    ('system', 'You are a sentiment analysis model that only outputs the sentiment of my input'),
    ('user', '{input}')
]

prompt = ChatPromptTemplate.from_messages(messages)

chain = prompt | llm | StrOutputParser()

input = "The course is very inconsistent, it repeats the same one minute in all of the videos, when reviewing the dataframe. Many times some things are asked without providing previous explanation, and the final assignment is also an example, I had to search all over internet to resolve it, because I couldn't find any reference in the content provided."

print(chain.invoke({'input': input}))

The input is a negative review from one of IBM's courses on Coursera (and one that I advised readers to avoid in my best AI courses article). The model correctly followed our system prompt and output only the review's sentiment.

For more information on using LangChain, see their docs.

Next, we'll see how to use Ollama with another popular orchestrator.

Using Ollama with LlamaIndex

Like LangChain, LlamaIndex has similar functionality for building pipelines, but it's specialized more for indexing and searching.

To use our Ollama model, we first need to install LlamaIndex with Ollama support:

pip install llama-index llama-index-llms-ollama

The syntax to interface with Ollama is slightly different than LangChain; you need to use the ChatMessage() class instead of tuples. Also, the interface for chatting is less verbose since we don't need to use piping.

from llama_index.core.llms import ChatMessage
from llama_index.llms.ollama import Ollama

llm = Ollama(model='llama3')

messages = [
    ChatMessage(
        role='system',
        content='you are a thesaurus bot that replaces the words in news headlines with more esoteric synonyms'
    ),
    ChatMessage(
        role='user',
        content='A heat wave not seen in decades will send temperatures soaring for more than half the US population'
    )
]

response = llm.chat(messages=messages)

print(response)
Out:
assistant: "A scorching diapausal phenomenon unheralded in recent annals will propel thermometric indices to stratospheric heights, encompassing nigh on 50% of the American populace."

We see the LLM correctly used our system message to produce a worse version of the news headline.

For more information on what this library can do, see the LlamaIndex docs.

Final words

You're now set up to develop a state-of-the-art LLM application locally for free.

Once you're ready to launch your app, you can easily swap Ollama for any of the big API providers. Bear in mind that open source model performance fluctuates relative to premium API services, like ChatGPT, so your prompts may show unexpected results when swapping models.


Meet the Authors

Brendan Martin

Chief Editor at LearnDataSci and Software Engineer

Get updates in your inbox

Join over 7,500 data science learners.