How to use Open Source LLMs locally for Free: Ollama + Python
LearnDataSci is reader-supported. When you purchase through links on our site, earned commissions help support our team of writers, researchers, and designers at no extra cost to you.
LLM APIs, like ChatGPT, Gemini, and Claude, charge for every input/output token and API call. If you're getting started with an early-stage project, you can easily and cheaply prototype apps using your own computer's hardware and open-source LLMs.
Enter Ollama
Ollama is a local command-line application that lets you install and serve many popular open-source LLMs. Follow the installation instructions for your OS on their Github.
I'm on Windows, so I downloaded and ran their Windows installer. Once you've clicked through the setup process, you should be able to open a terminal and type the following command to see the help output:
The run
command runs a model, pulling and serving the model all at once (view available models).
Here, I'll run Llama3, Meta's flagship model, which is around 5gb in size:
Using CTRL-D will exit the interactive CLI but keep the model serving.
We interact with the model
Using Ollama in Python
You can use Ollama directly in Python with their Python client—install it with pip:
Now, we can import the library, reference the model, and submit a query:
Like ChatGPT, we can provide system, assistant, and user role messages to direct and maintain conversations. To simulate a chat, you would keep appending messages and resending the entire context.
Using Ollama with LangChain
If you need to build advanced LLM pipelines that use NLP, vector stores, RAG, and agents, then we can connect an orchestrator, like LangChain, to our Ollama server.
Note: when you're ready to go into production, you can easily switch from Ollama to an LLM API, like ChatGPT
First, we need to install langchain-community:
Since LangChain is a fuller-featured library focused on pipelines, we need to use a bit more code to achieve the same chat interface.
The central difference is the piping of content using the pipe operator (|
). Below, we make a prompt from the messages, pipe it to Ollama, then pipe it to a string output parser.
I've made the user input a variable that gets replaced during invocation since that's how you usually work with LangChain.
The input is a negative review from one of IBM's courses on Coursera (and one that I advised readers to avoid in my best AI courses article). The model correctly followed our system prompt and output only the review's sentiment.
For more information on using LangChain, see their docs.
Next, we'll see how to use Ollama with another popular orchestrator.
Using Ollama with LlamaIndex
Like LangChain, LlamaIndex has similar functionality for building pipelines, but it's specialized more for indexing and searching.
To use our Ollama model, we first need to install LlamaIndex with Ollama support:
The syntax to interface with Ollama is slightly different than LangChain; you need to use the ChatMessage()
class instead of tuples. Also, the interface for chatting is less verbose since we don't need to use piping.
We see the LLM correctly used our system message to produce a worse version of the news headline.
For more information on what this library can do, see the LlamaIndex docs.
Final words
You're now set up to develop a state-of-the-art LLM application locally for free.
Once you're ready to launch your app, you can easily swap Ollama for any of the big API providers. Bear in mind that open source model performance fluctuates relative to premium API services, like ChatGPT, so your prompts may show unexpected results when swapping models.