For true ease of use, you’ll want to start with one of these applications. They package the models and provide a simple interface (either graphical or a single command) to get you started in minutes, with no coding required.
Ollama: This is arguably the easiest and most popular command-line tool. It bundles model weights, configuration, and a server into one simple package. You install Ollama, then run a single command like ollama run llama3 in your terminal to download the model and start chatting. It’s available for Windows, macOS, and Linux.
LM Studio: A fantastic desktop application with a graphical user interface (GUI). It allows you to browse and download a massive library of models (in the popular GGUF format), configure settings, and chat with the model, all within a user-friendly window. It’s perfect if you prefer not to use the command line.
GPT4All: Another great GUI-based option that is optimized to run a wide variety of quantized models on your computer’s CPU, making it accessible even without a powerful graphics card.
Top Open-Source LLMs for Personal Use
These models are great because they offer a fantastic balance of performance and manageable size, making them ideal for running on consumer hardware like modern laptops and desktops.
General Purpose & Chat
Meta Llama 3
Why it’s great: This is the current state-of-the-art open-source model. It’s incredibly capable for chatting, writing, summarizing, and coding.
Best Version for Personal Use:Llama 3 8B Instruct. The “8B” stands for 8 billion parameters. It’s the sweet spot, requiring about 8 GB of RAM/VRAM to run smoothly.
Supported by: Ollama, LM Studio, GPT4All.
Mistral 7B
Why it’s great: Before Llama 3, this model was the king of its size class. It’s known for being very fast, coherent, and excellent at following instructions and coding, often outperforming larger models.
Best Version for Personal Use:Mistral 7B Instruct. It’s very lightweight and efficient.
Supported by: Ollama, LM Studio, GPT4All.
Google Gemma
Why it’s great: Developed by Google, these models are built with the same technology as the powerful Gemini models. They are solid all-rounders.
Best Version for Personal Use:Gemma 7B for powerful machines, or Gemma 2B for less powerful ones (like laptops without a dedicated GPU).
Supported by: Ollama, LM Studio.
Specialized & Lightweight Models
Microsoft Phi-3
Why it’s great: A new generation of “small language models” (SLMs) that pack a surprising punch. They are designed to run very efficiently on low-resource devices, including phones.
Best Version for Personal Use:Phi-3 Mini 3.8B. It performs at a level far above what you’d expect from such a small model, making it perfect for laptops or older desktops.
Supported by: Ollama, LM Studio.
Qwen2 (from Alibaba Cloud)
Why it’s great: A very strong family of models with excellent multilingual capabilities and strong performance in both chat and coding. They come in many sizes.
Best Version for Personal Use:Qwen2 7B is a great Llama 3 alternative. For lower-spec machines, Qwen2 1.5B is a fantastic and fast option.
Supported by: Ollama, LM Studio.
What You Need to Consider
VRAM (GPU Memory): This is the most important factor. The model needs to be loaded into your graphics card’s memory. A model’s size (e.g., 7B) roughly corresponds to the VRAM needed in GB (e.g., a 7B model needs about 7-8 GB of VRAM).
Quantization: This is a technique to shrink models to run on less powerful hardware, with a small trade-off in performance. Tools like LM Studio and Ollama handle this for you automatically, downloading pre-quantized versions so you don’t have to worry about it.
CPU vs. GPU: While you can run these models on your CPU, it will be much slower. For a good interactive experience, a modern dedicated GPU (like an NVIDIA RTX 3060 or better) with at least 8 GB of VRAM is recommended.
LangChain: The oldest and most comprehensive framework, offering extensive integrations but often criticized for its steep learning curve and boilerplate code.
LlamaIndex: Primarily focused on data-intensive applications, excelling at connecting language models to external data sources through advanced retrieval and indexing.
AutoGen (Microsoft): A multi-agent framework that shines at creating conversational agents that can collaborate and delegate tasks to solve complex problems.
CrewAI: Designed for orchestrating role-playing autonomous agents, making it easy to define agents with specific jobs and have them work together in a structured crew.
AgentVerse: A versatile framework that provides a “lego-like” approach to building and composing customized multi-agent environments for various applications.
ChatDev: A “virtual software company” framework where different agents (CEO, programmer, tester) simulate a software development lifecycle to complete coding tasks.
SuperAGI: A developer-centric framework focused on building autonomous agents with useful features like provisioning, deployment, and a graphical user interface.
AI Droid (by Vicuna): A lightweight and fast framework designed for mobile and edge devices, prioritizing efficiency and low-resource consumption.
GPTeam: Similar to ChatDev, this framework uses role-playing agents (like product managers and engineers) to collaboratively work on development tasks from a single prompt.
Agenta: An open-source platform that helps developers evaluate, test, and deploy language model applications with features for prompt management and A/B testing.
OpenAI Assistants API: OpenAI’s native solution for building stateful, assistant-like agents directly on their platform, handling conversation history and tool integration internally.
LangGraph: Built on LangChain, this framework is specifically for creating cyclical, stateful multi-agent workflows, treating agent interactions as steps in a graph.
Alpine.js : Alpine is a rugged, minimal tool for composing behavior directly in your markup. Think of it like jQuery for the modern web. Plop in a script tag and get going.
Something in between a Product Manager and a Software Engineer : Product Engineer i.e. PMs are sometimes not enough technical and SWEs are sometimes not enough product oriented https://refactoring.fm/p/how-to-become-a-product-engineer
Meta, for instance, trained its new Llama 3 models with about 10 times more data and 100 times more compute than Llama 2. Amid a chip shortage, it used two 24,000 GPU clusters, with each chip running around the price of a luxury car. It employed so much data in its AI work, it considered buying the publishing house Simon & Schuster to find more.
Redis forks (after the licence change) : – redict : https://redict.io/ Drew DeVault + others? – valkey : https://valkey.io/ backed by AWS, Google, Oracle, Ericsson, and Snap, with the Linux Foundation; more to come imo.
golang fasthttp (replacement for standard net/http if you need “to handle thousands of small to medium requests per second and needs a consistent low millisecond response time”. “Currently fasthttp is successfully used by VertaMedia in a production serving up to 200K rps from more than 1.5M concurrent keep-alive connections per physical server.” https://github.com/valyala/fasthttp
I find truly interesting the point around promoting a write culture (Execs/Directors in tech blog, SWEs on tech blogs/internal technical documents) : https://newsletter.pragmaticengineer.com/i/140970283/writing-culture I’m a long-time believer that writing clarifies thinking more than talking and writing persists information, makes it searchable, talking does not. “Verba volant, scripta manent” as the Latins use to say. But this idea shifted into “just enough” documentation (which means it is not necessary) in SW engineering latest methodologies so it is interesting that a multi billion company like stripe is going totally against the tide.
(or the Attraction for Complexity) There is a very common tendency in computer science and it is to complicate solutions. This complication is often referred as incidental/accidental complexity i.e. anything we coders/designers do to make more complex a simple matter. Some times this is called over engineering and stems from the best intentions :
Attraction to Complexity: there’s often a misconception that more complex solutions are inherently better or more sophisticated. This can lead to choosing complicated approaches over simpler, more effective ones.
Technological Enthusiasm: developers might be eager to try out new technologies, patterns, or architectures. While innovation is important, using new tech for its own sake can lead to unnecessary complexity.
Anticipating Future Needs: developers may try to build solutions that are overly flexible to accommodate potential future requirements. This often leads to complex designs that are not needed for the current scope of the project.
Lack of Experience or Misjudgment: less experienced developers might not yet have the insight to choose the simplest effective solution, while even seasoned developers can sometimes overestimate what’s necessary for a project.
Avoiding Refactoring: In an attempt to avoid refactoring in the future, developers might add layers of abstraction or additional features they think might be needed later, resulting in over-engineered solutions.
Miscommunication or Lack of Clear Requirements: without clear requirements or effective communication within a team, developers might make assumptions about what’s needed, leading to solutions that are more complex than necessary.
Premature Optimization: trying to optimize every aspect of a solution from the beginning can lead to complexity. The adage “premature optimization is the root of all evil” highlights the pitfalls of optimizing before it’s clear that performance is an issue.
Unclear Problem Definition: not fully understanding the problem that needs to be solved can result in solutions that are more complicated than needed. A clear problem definition is essential for a simple and effective solution.
Personal Preference or Style: sometimes, the preference for certain coding styles, architectures, or patterns can lead to more complex solutions, even if simpler alternatives would suffice.
Fear of Under-Engineering: there can be a fear of delivering a solution that appears under-engineered or too simplistic, leading to adding unnecessary features or layers of abstraction.
You’ve probably seen these acronyms around : SSG, SSR, MPA, SPA, PWA. Web design is getting complex and this tries to explain (with the help of other good content) what these acronyms mean :
PWA : Progressive Web App, web apps developed using a number of specific technologies and standard patterns to allow them to take advantage of both web and native app features. Not sure if this is a web design pattern
MPA : Multi Page Application, every operation requests data from server, receives a new page (html+css+js) and renders the data in the browser.
SPA : Single Page Application, performs inside a browser and does not require page reloading during its use. Initial html+cs+js is obtained from the server and then all login is done in js browser side.More info here and here
CSR : Client Side Rendering, all rendering happens in the browser. Use when UI is complex, lots of dynamic data, you need auth and SEO contents are not that many.
SSR : Server Side Rendering, as the acronyms imply, renders content in the server and sends ready .html+js files to the browser. Browser still executes js to reload pages. Use when UI has little interactivity, when you need best SEO and faster loading.
SSG : Static Site Generating, all pages are already generated and rendered server side.
There are several popular frameworks that can be used for static site generation (SSG) and server-side rendering (SSR).
For SSG, some popular options include Gatsby, Next.js, and Jekyll. These frameworks use a variety of technologies, such as React, Vue, and Ruby, to generate static HTML pages from dynamic content.
For SSR, some popular options include Express, Flask, and Hapi. These frameworks use Node.js and other technologies to generate the HTML for a page on the server and send it to the client.
Overall, there are many different frameworks available for both SSG and SSR, and the best choice will depend on the specific needs and goals of the project. It is important to carefully evaluate the features and capabilities of each framework to determine which one is the best fit for the project.