Generative AI has been a big part of tech over the past 2 years, everything from the latest AI nonsense OpenAI has cooked up to new competition from DeepSeek’s R1. Let me make a different proposal to what you might here from a lot of more traditional tech users–we need all the AI. And I mean all the AI. I’m not talking about stuff like Claude or Perplexity, it’s all about local models. Here’s why you need to have local AI models that serve your needs and some examples you can use it to augment the work that you do.
The AI That Respects You Is Open
Generative AI can be a powerful tool, but there’s more to consider than just capability. For years, companies like Google and Facebook now utilize their own versions of generative AI, but they also leverage their platforms to further advantage themselves.
A big problem with a lot of generative AI tools is many of them are developed in secret and we have very little knowledge of what kind of information they were trained on beyond “publicly available information.”
What’s more is as AI because more prevalent, people begin to expose their most sensitive selves when they may not have intended. Many of the major players of generative AI either have a vested interest in selling personal information (Google, Facebook, Microsoft, etc) or parlay other surveillance giants/governments (ChatGPT, Claude, DeepSeek, etc).
What’s more, large language models like ChatGPT and Perplexity are only available in the cloud and have significant environmental impact (swept under the rug of course). While AI companies are quick to release research papers about their AI, reviewing sources of said papers reveals that these papers are often pushed out with inaccurate sources and references to boost their credibility and bypass academic peer review.
Owning Offline AI
That being said, I’m going to be up front and admit I’m not a moralist nor an accelerationist. The cat has been out of the bag and at this point as many of these tools are freely available, you should use them where you think they will work.
There’s a cultural problem endemic to tech enthusiasts: either you are an accelerationist who relies so much on AI and fail to admit its shortcomings or the doomer who will shout from high heavens at every mistake and copyright violation.
Read Ethan Zuckerman’s article: “Two warring visions of AI”
Owning your AI is the solution to this problem. Even if someone from your government threatens to ban the app, open source will find a way. Even if your model is censored by its makers, open source will find a way. Any law banning/regulating AI only punishes the law abiding citizens and the nefarious netizens will continue to develop in secret. It’s too late to put the genie back in the bottle, so you might as well make the best of the situation.
When you use offline AI, it’s just you and the AI. There’s total privacy as it all happens on your device. You can give the AI greater access to your data because you’re operating in 1 system. The best part is integrations with stuff that’s already on your operating system–you can write scripts and find ways to bring it into the work you do every day.
The Weakness of Offline AI
Before we get started, there are some drawbacks. While models like DeepSeek have shaken the industry up, the cloud-based models are still better in terms of performance and quality of results. The tradeoff is you giving your data of course.
There’s always pushback from people online that AI is a bubble, generates garbage slop that ruins the internet, and is making people lazier. All of those things are true, but the inverse is true as well. AI is being used to propagate knowledge and provide new forms of accessibility.
Blanket statements for and against AI do not accomplish anything, but a real tangible problem AI has is a usecase for the individual outside of academia and the workplace. Want to get some code quickly written? Need to proofread a document? Want an answer to your math homework? AI has you covered, but otherwise, there’s no reason to use AI at all.
Steep Hardware Requirements
Before you get excited about offline large language models, you should be aware of the hardware required to run many of these models. I have a high-end NVIDIA card and running some AI is no problem, but it’s incredibly power intensive and largely favors NVIDIA hardware.
There’s also a major concern for storage requirements. While you can get some memory efficient models, they often don’t perform as well as their highly tokenized counterparts. DeepSeek may advertise itself as a offline ChatGPT, but what they don’t tell you is they require over 400 GB of storage to operate in addition to steep GPU requirements.
This is a Developing Story
The last thing to be aware of is AI is rapidly changing and advancements are being made all the time. Things you hear from me will likely be outdated within a year. And for those of you who are still skeptical, if you don’t support open source AI, you are allowing proprietary companies like OpenAI, Claude, and Perplexity to dominate conversation.
If you are interested my other thoughts about AI, I wrote about it last year. All this being said, if you want to support open source software, we need to welcome and use open source AI.
Ollama
Now we get into tooling and there’s plenty of options available for you, but the most popular is a program called Ollama, It pulls models from some of the big contributors to open source AI, and provides a nice command line front-end.
Now this is where the complications come in, because Ollama is installed differently depending on which operating system you use. On Windows/Mac, there’s tray icon support. On Linux, the key differences is you don’t get a tray icon. If you prefer a normal graphical Linux frontend, try Jeff Samuel’s (AKA Jeffser) Alpaca, which automates the installation through Flatpak and then you can pick and choose what models you want. Alpaca also makes it easy to manage previous chats and upload documents.
If you are using Linux (or Windows) and are interested in more work with Ollama from the command-line or with custom server commands, you can try running the official Docker container.
For example, I use the Ollama Docker image in a Distrobox with access to my NVIDIA card. Then I export the Ollama binary to my host system. Whether you are running Windows, Mac, or Linux, you will need to run the Ollama server on your device to make your AI chat work (even if you use Alpaca).
distrobox create -i ollama/ollama -n ollama --nvidia
distrobox enter ollama -- distrobox-export -b /usr/bin/ollama
If you choose the Docker container route, you will need to periodically update the container image. Because of Docker’s nature, it’s also prudent to subscribe to Ollama’s GitHub RSS to get update notifications. You can also configure a Podman Quadlet or systemd job to auto-update Ollama for you.
docker pull docker.io/ollama/ollama
From here, I can run the Ollama server,
ollama serve
Then in a new tab/window, launch the Ollama client.
# List available models
ollama list
# Install a new models
ollama pull gemma2
# Run a model, install if not available
ollama run deepseek-r1
# Remove a model
ollama rm llama3.2-vision
What Would I Use AI For??
This begs the question: I store video games and family photos on my computer; I have limited space on my computer. How can I make the best use of my storage and what AI models should I use?
I want to break this up into a few categories, then some blanket recommendations. Especially with the general purpose ones, this can be consolidated, so don’t go downloading all of them, just pick and choose what you are comfortable with.
- Real world answers: This is where you ask questions that you would normally ask a search engine. The benefit of this is you don’t involve a third party service and it’s all done on your device. Downside is you might need to fact check because AI is not perfect. These are the big AI models most associate with: Facebook’s Llama, Google’s Gemma, and DeepSeek.
- Image description: This is very useful for alt-text or those with visual impairments, but very prone to error, so be prepared to edit responses. The best as of writing is Facebook’s Llama with is special vision models.
- Mathematics: Models like Phi3.5+ and Qwen excel in solving advanced algebra and calculus when most general models fail. The best way to word your prompts is like word problems. An example is “Jack and Joe leave their homes at the same time and drive towards each other. Jack drives at 60 mph, while Joe drives at 30 mph. They pass each other in 10 minutes. How far apart were Jack and Joe when they started?”
- Coding: If you are a programmer or server maintainer, AI can save you the headache of trying to search forums and documentation. Results may vary, so don’t blindly ship the code, but test it. It’s also a great way to experience programming languages you don’t know or may otherwise never learn.
- Proofreading/Summarization: If you are having writer’s block or you need your work reviewed, feed your work to an AI and get it proofed. It can often correct grammar or introduce counterpoints to your arguments.
Video Credits
- #MadeByGoogle ‘24: Keynote
- DeepSeek’s X (Formerly Twitter)
- Introducing GPT-4o
- Meta Connect 2024
- More than a decade after a stroke, Randy Travis sings again, courtesy of AI - Lee Cowan et al.; CBS News
- Ollama’s blogpost for the Windows preview build
- Perplexity Is a Bullshit Machine - Dhruv Mehrotra and Tim Marchman; WIRED and animation by Jacqui VanLiew; Getty Images
Track Listing
- Minobe Yutaka (蓑部雄崇) - City (シティ) from Yu-Gi-Oh! 5Ds (遊☆戯☆王5D’s(ファイブディーズ))
- gooset - Bittersweet
- The song for the capital of Assyria scroll is Minobe Yutaka (蓑部雄崇) - Break time! (休み時間)) from Yu-Gi-Oh! GX (遊☆戯☆王デュエル モンスターズGX)
- zukisuzuki BGM - Manager
- Outro: Khaim - Neon Lamp