Skip to content
← Catalogue Modern Skills 250 level Created by AI

Run AI on Your Own Computer: A Plain-English Guide to Local Models

Professor: Sikh Archive · Source: Sikh Archive

Run AI on Your Own Computer: A Plain-English Guide to Local Models

Begin course 6 lessons · 8-question test · 80% to pass
Created by AI. Drafted with AI and reviewed for accuracy. Spotted an error? Tell us.

What you'll learn

  • Explain in simple words what an open-weight AI model is and how it differs from a closed cloud model.
  • Describe what tools that run models on your own machine actually do, in everyday language.
  • Read a model's size and quantization label and guess whether it will fit on your computer.
  • Estimate the RAM and hardware you need to run a small, medium, or large model.
  • Weigh the privacy, cost, and offline benefits of local AI against its trade-offs versus the cloud.
  • Take the first practical steps to install a runner, download a model, and chat with it.

Key terms — ਸ਼ਬਦਾਵਲੀ

TermAcademic context
Open-weight modelAn AI model whose learned numbers (its 'weights') are made public so anyone can download and run it themselves.
WeightsThe huge list of numbers a model learned during training. They are the model. Running a model means doing math with these numbers.
Local AIRunning the model on your own laptop or desktop instead of sending your words to a company's servers over the internet.
Runner / inference toolA program that loads a model file and lets you chat with it. Examples include desktop apps and command-line helpers.
QuantizationShrinking a model by storing its numbers with less detail (like rounding). It makes the file smaller and faster, with a small quality cost.
ParametersThe count of weights in a model, usually written like 7B (7 billion). More parameters often means smarter but heavier.
RAM / VRAMThe fast memory your model has to fit into. RAM is general computer memory; VRAM is the memory on a graphics card (GPU).
TokenA small chunk of text (roughly a word piece) that the model reads and writes. Speed is often measured in tokens per second.

Lessons

1. Why Run AI on Your Own Machine?

Course Contents
  1. Why Run AI on Your Own Machine?
  2. What 'Open Weights' Really Means
  3. The Tools That Run Models for You
  4. Model Sizes and Quantization Made Simple
  5. Hardware: Will It Fit on My Computer?
  6. Trade-offs and Your First Steps

When you use a popular AI chatbot online, your words travel over the internet to a company's computers. The model 'thinks' there and sends an answer back. That is cloud AI. It is easy and powerful, but you are renting someone else's computer and trusting them with what you type.

Local AI flips this around. You download a model file onto your own laptop or desktop, and the thinking happens right there. Nothing leaves your machine. No internet needed once it is set up.

People choose local AI for a few plain reasons:

  • Privacy. Your notes, journals, or work documents never leave your computer.
  • Cost. After the free download, there is no monthly bill and no per-question charge.
  • Offline. It works on a plane, in a village with no signal, or when the internet is down.
  • Control. The model will not change or disappear on you. It is yours to keep.

The catch is that a model running on your own computer is usually smaller and a bit less capable than the giant ones in the cloud. The rest of this course explains the words you need, the tools that make it easy, and how to start.

References: Mozilla AI guide to running open-source LLMs locally; MIT Technology Review on open-weight models.

2. What 'Open Weights' Really Means

Every AI model is, at heart, a giant pile of numbers called weights. During training, the model adjusts these numbers until it gets good at predicting text. Once training is done, those numbers are the model. To use the model, your computer just does math with them.

A closed model keeps its weights secret. You can only reach it through the company's website or app. You never get the numbers themselves.

An open-weight model is the opposite: the company publishes the weights so anyone can download the file and run it on their own machine. This is what makes local AI possible. Note that 'open weights' is not always the same as fully 'open source' (which would also share the training data and code), but for running a model at home, having the weights is what counts.

Why does this matter to you?

  • You can run the model with no permission and no account.
  • You can use it privately and offline.
  • Many open-weight models are free for personal use; always glance at the licence.

Think of a closed model as a meal you order at a restaurant, and an open-weight model as a recipe you take home and cook yourself. Both feed you, but only one lets you keep the recipe.

References: Hugging Face model hub overview; MIT Technology Review on open-weight AI.

3. The Tools That Run Models for You

You do not need to be a programmer to run a model. A runner (also called an inference tool) is a program that loads the model file and gives you a chat box. There are three common styles:

  • Simple desktop apps. You install one program, click to download a model from a built-in list, and start chatting in a window. This is the easiest path for most people.
  • Command-line helpers. You type a short command and the tool downloads and runs the model in your terminal. Great for tinkerers and for connecting models to other software.
  • Lightweight engines. The core technology many tools are built on. It is highly efficient and can run models even on ordinary computers without a fancy graphics card.

All of these do the same basic job: take your message, feed it through the model's weights, and stream back an answer one piece at a time. Many also offer a small built-in 'server' so other apps on your computer can talk to the model.

You do not need to pick the 'perfect' tool. Start with a simple desktop app, see if you like local AI, and explore the others later. They all run the same kinds of open-weight model files.

References: Ollama documentation; LM Studio documentation.

4. Model Sizes and Quantization Made Simple

Model names often include a number like 3B, 7B, 13B, or 70B. The 'B' means billions of parameters (weights). More parameters usually means a smarter model, but also a bigger file and more memory needed. A 7B model is a comfortable middle ground for many home computers.

You will also see labels like Q4, Q5, or Q8. This is quantization. The model's numbers are normally very precise, which makes the file large. Quantization rounds them off to save space, like writing 3.14 instead of 3.14159265. A lower number (Q4) means smaller and faster, with a small drop in quality. A higher number (Q8) keeps more quality but needs more memory. For most people, Q4 is a great balance.

Here is a rough guide for how a 7B model shrinks with quantization:

QuantizationQualityApprox. file size (7B model)Speed
Full precision (no quant)Best~14 GBSlowest
Q8Very high~7-8 GBSlower
Q5High~5 GBMedium
Q4Good (recommended)~4 GBFast

So the size of the file you download depends on both the parameter count and the quantization. A small, well-quantized model can fit on a modest laptop.

References: Hugging Face quantization guides; LM Studio documentation on model formats.

5. Hardware: Will It Fit on My Computer?

The single most important question is: will the model fit in memory? A model has to load into your fast memory to run. That memory is either your computer's main RAM or, if you have a graphics card, its VRAM. A graphics card (GPU) makes answers come back much faster, but many small models run fine on the regular processor (CPU) using ordinary RAM, just more slowly.

A simple rule of thumb: the model needs roughly its file size in free memory, plus a little extra for working room. So a 4 GB model file wants around 5-6 GB of free memory.

Model size (Q4)File sizeRecommended memoryGood for
1B-3B~1-2 GB8 GB RAMOlder or budget laptops, quick tasks
7B-8B~4-5 GB16 GB RAMMost modern laptops; the sweet spot
13B~8 GB16-32 GB RAMStrong laptops and desktops
70B~40 GB48-64 GB RAM or a big GPUPowerful workstations only

Apple computers with the M-series chips are especially handy here, because their memory is shared between the processor and graphics, so a model can use most of it. On Windows or Linux, a graphics card with 8 GB or more of VRAM gives a big speed boost.

If your first model feels slow, pick a smaller one or a lower quantization. It is normal to experiment until you find a model that is both useful and comfortable on your hardware.

References: Ollama hardware guidance; LM Studio system requirements documentation.

6. Trade-offs and Your First Steps

Local AI is wonderful, but it is fair to know the trade-offs before you start.

Where local AI wins: privacy (your data stays put), no monthly cost, works offline, and full control over which model you keep.

Where the cloud still wins: the biggest cloud models are usually smarter and more up to date, they need no setup, and they run on someone else's powerful hardware so your laptop fan stays quiet. Local models are also frozen in time at their training date and may know less about very recent events.

Many people end up using both: a private local model for personal or sensitive tasks, and a cloud model when they need maximum power.

Your first steps:

  1. Check your computer's memory. 16 GB of RAM is a comfortable target.
  2. Install one simple desktop runner app.
  3. From its built-in list, download a small, popular model, ideally a 7B or 8B model in Q4.
  4. Open the chat window and type a question, just like any chatbot.
  5. If it feels slow, try a smaller model or a lower quantization.
  6. Once comfortable, explore command-line tools or larger models.

That is the whole journey. Within an afternoon you can have a private, free, offline AI assistant living on your own machine, ready whenever you are.

References: Ollama getting-started documentation; Mozilla AI guide to running LLMs locally.

Course test

Pass with 80% or higher to complete the course and unlock the next one.

1. What does 'open weights' mean for a model?
2. What is the main privacy benefit of running AI locally?
3. In a model name, what does the 'B' in '7B' refer to?
4. What does quantization do to a model?
5. For most home users, which quantization level is a good balance of size and quality?
6. About how much free memory does a roughly 4 GB model file want to run comfortably?
7. Which model size is often called the 'sweet spot' for most modern laptops?
8. Which is a real trade-off of local AI compared to big cloud models?

References & further reading

  1. Ollama official documentation and model library (ollama.com)
  2. LM Studio documentation (lmstudio.ai)
  3. Hugging Face model hub and learning guides (huggingface.co)
  4. Mozilla's article on running open-source LLMs locally (Mozilla AI / Mozilla blog)
  5. MIT Technology Review reporting on open-weight AI models

Read the source texts

Read the primary sources for yourself — the Gurbani in our read-along reader, and the original works in the source library.

Rate this course

Discussion & Q&A

Sign in to post.