Category Archives: technology

August 2025 – Open source LLMs deployable for personal use

Easiest Deployment Tools

For true ease of use, you’ll want to start with one of these applications. They package the models and provide a simple interface (either graphical or a single command) to get you started in minutes, with no coding required.

Ollama: This is arguably the easiest and most popular command-line tool. It bundles model weights, configuration, and a server into one simple package. You install Ollama, then run a single command like ollama run llama3 in your terminal to download the model and start chatting. It’s available for Windows, macOS, and Linux.
LM Studio: A fantastic desktop application with a graphical user interface (GUI). It allows you to browse and download a massive library of models (in the popular GGUF format), configure settings, and chat with the model, all within a user-friendly window. It’s perfect if you prefer not to use the command line.
GPT4All: Another great GUI-based option that is optimized to run a wide variety of quantized models on your computer’s CPU, making it accessible even without a powerful graphics card.

Top Open-Source LLMs for Personal Use

These models are great because they offer a fantastic balance of performance and manageable size, making them ideal for running on consumer hardware like modern laptops and desktops.

General Purpose & Chat

Meta Llama 3
- Why it’s great: This is the current state-of-the-art open-source model. It’s incredibly capable for chatting, writing, summarizing, and coding.
- Best Version for Personal Use: Llama 3 8B Instruct. The “8B” stands for 8 billion parameters. It’s the sweet spot, requiring about 8 GB of RAM/VRAM to run smoothly.
- Supported by: Ollama, LM Studio, GPT4All.
Mistral 7B
- Why it’s great: Before Llama 3, this model was the king of its size class. It’s known for being very fast, coherent, and excellent at following instructions and coding, often outperforming larger models.
- Best Version for Personal Use: Mistral 7B Instruct. It’s very lightweight and efficient.
- Supported by: Ollama, LM Studio, GPT4All.
Google Gemma
- Why it’s great: Developed by Google, these models are built with the same technology as the powerful Gemini models. They are solid all-rounders.
- Best Version for Personal Use: Gemma 7B for powerful machines, or Gemma 2B for less powerful ones (like laptops without a dedicated GPU).
- Supported by: Ollama, LM Studio.

Specialized & Lightweight Models

Microsoft Phi-3
- Why it’s great: A new generation of “small language models” (SLMs) that pack a surprising punch. They are designed to run very efficiently on low-resource devices, including phones.
- Best Version for Personal Use: Phi-3 Mini 3.8B. It performs at a level far above what you’d expect from such a small model, making it perfect for laptops or older desktops.
- Supported by: Ollama, LM Studio.
Qwen2 (from Alibaba Cloud)
- Why it’s great: A very strong family of models with excellent multilingual capabilities and strong performance in both chat and coding. They come in many sizes.
- Best Version for Personal Use: Qwen2 7B is a great Llama 3 alternative. For lower-spec machines, Qwen2 1.5B is a fantastic and fast option.
- Supported by: Ollama, LM Studio.

What You Need to Consider

VRAM (GPU Memory): This is the most important factor. The model needs to be loaded into your graphics card’s memory. A model’s size (e.g., 7B) roughly corresponds to the VRAM needed in GB (e.g., a 7B model needs about 7-8 GB of VRAM).
Quantization: This is a technique to shrink models to run on less powerful hardware, with a small trade-off in performance. Tools like LM Studio and Ollama handle this for you automatically, downloading pre-quantized versions so you don’t have to worry about it.
CPU vs. GPU: While you can run these models on your CPU, it will be much slower. For a good interactive experience, a modern dedicated GPU (like an NVIDIA RTX 3060 or better) with at least 8 GB of VRAM is recommended.

June 2025 : current AI Agent frameworks

Just a recap for my personal use

LangChain: The oldest and most comprehensive framework, offering extensive integrations but often criticized for its steep learning curve and boilerplate code.
LlamaIndex: Primarily focused on data-intensive applications, excelling at connecting language models to external data sources through advanced retrieval and indexing.
AutoGen (Microsoft): A multi-agent framework that shines at creating conversational agents that can collaborate and delegate tasks to solve complex problems.
CrewAI: Designed for orchestrating role-playing autonomous agents, making it easy to define agents with specific jobs and have them work together in a structured crew.
AgentVerse: A versatile framework that provides a “lego-like” approach to building and composing customized multi-agent environments for various applications.
ChatDev: A “virtual software company” framework where different agents (CEO, programmer, tester) simulate a software development lifecycle to complete coding tasks.
SuperAGI: A developer-centric framework focused on building autonomous agents with useful features like provisioning, deployment, and a graphical user interface.
AI Droid (by Vicuna): A lightweight and fast framework designed for mobile and edge devices, prioritizing efficiency and low-resource consumption.
GPTeam: Similar to ChatDev, this framework uses role-playing agents (like product managers and engineers) to collaboratively work on development tasks from a single prompt.
Agenta: An open-source platform that helps developers evaluate, test, and deploy language model applications with features for prompt management and A/B testing.
OpenAI Assistants API: OpenAI’s native solution for building stateful, assistant-like agents directly on their platform, handling conversation history and tool integration internally.
LangGraph: Built on LangChain, this framework is specifically for creating cyclical, stateful multi-agent workflows, treating agent interactions as steps in a graph.

2024 Reading list

“While AI can suggest statistically optimal moves, it cannot explain its reasoning, leading to a form of rote learning that lacks the deep reflection on intention that characterized traditional Go. ” – How AI as changed the game of Go : https://medium.com/digital-architecture-lab/where-did-go-go-a-case-study-of-a-mechanized-mind-e609f3a1139e

Open Source Observatory (OSOR) Fab City OS Suite: Open Source for Circular Economy and Transparency.

Building Blocks for Renewable Energy Systems : https://libre.solar/

Alpine.js : Alpine is a rugged, minimal tool for composing behavior directly in your markup. Think of it like jQuery for the modern web. Plop in a script tag and get going.

Figures from the Global Carbon Budget 2024 : https://robbieandrew.github.io/GCB2024/

C Just In Time : cjit https://dyne.org/cjit/ : fast like hell

State of HTML (and related) https://2024.stateofhtml.com/en-US

Neural Networks : Zero to Hero https://karpathy.ai/zero-to-hero.html

state of swe jobs market https://newsletter.pragmaticengineer.com/p/state-of-eng-market-2024

reinforcement learning explained : https://ai.gopubby.com/how-did-alphago-beat-lee-sedol-1a160d76612b

O(1) lfu : http://dhruvbird.com/lfu.pdf

C11 atomics (atomic_int for ex) are still not supported by C++ when including C code : https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0943r6.html and https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0063r0.html

Round Robin DNS : https://datatracker.ietf.org/doc/html/rfc8305

Srinivasa Ramanujan : https://www.quantamagazine.org/srinivasa-ramanujan-was-a-genius-math-is-still-catching-up-20241021/

What is legacy code ? According to https://understandlegacycode.com/blog/key-points-of-working-effectively-with-legacy-code/ “Legacy Code is code without tests”

[thoughts] we are transitioning from swe tools to product design tools; llms are blending the boundaries between code, UI/UX, and product ideation.

Are LLMs reasoning ? https://arxiv.org/pdf/2410.05229

Open-Meteo is an open-source weather API https://open-meteo.com/en/docs

Remote work is young and we have not built up methodologies or just even habits or practices : https://intenseminimalism.com/2024/the-myth-of-the-missing-remote-work-culture/

A life spent watching the sky : https://www.majakmikkelsen.com/film

Hard life for rust and linux : a proposal for a rust interface to fs .. https://www.youtube.com/watch?v=WiPp9YEBV0Q&t=67s

Segment anything : a new AI model from Meta AI that can “cut out” any object, in any image, with a single click https://segment-anything.com/

14 years since Go launched : the good and the bad by Rob Pike https://commandcenter.blogspot.com/2024/01/what-we-got-right-what-we-got-wrong.html

Writebook : everything you need to edit and publish your online books

Merchants of complexity : https://world.hey.com/dhh/merchants-of-complexity-4851301b ( on the attraction for complexity read here )

Tired of slack and not owning the data ? https://once.com/campfire#requirements

Something in between a Product Manager and a Software Engineer : Product Engineer i.e. PMs are sometimes not enough technical and SWEs are sometimes not enough product oriented https://refactoring.fm/p/how-to-become-a-product-engineer

myspace reborn https://spacehey.com/

Stephen Wolfram on neural nets : https://writings.stephenwolfram.com/2024/08/whats-really-going-on-in-machine-learning-some-minimal-models/

Some good recommendations https://levelup.gitconnected.com/follow-these-6-patterns-or-i-will-reject-your-pull-request-fc08f908e7fe :

Early return and align the happy path left
Avoid boolens in methods signature
Avoid double negations
Use default values to avoid unnecessary else in initializations
Avoid functions with side effects

3D Mesh generation with object imageshttps://omages.github.io/

Hetzner de servers auction https://www.hetzner.com/sb

Ransomware victims : https://www.ransomware.live/#/recent

Red and Blue teams in cybersecurity : https://anywhere.epam.com/en/blog/red-team-vs-blue-team

How google is using AI internally https://research.google/blog/ai-in-software-engineering-at-google-progress-and-the-path-ahead/

Protecting artists from gen ai : https://glaze.cs.uchicago.edu/what-is-glaze.html

configuring core dumps in linux/docker https://ddanilov.me/how-to-configure-core-dump-in-docker-container

dolt, a version controlled database mysql compatible https://github.com/dolthub/dolt

MS/DOS 4.01 is open source https://cloudblogs.microsoft.com/opensource/2024/04/25/open-sourcing-ms-dos-4-0/

Contents shortage for AI :

Meta, for instance, trained its new Llama 3 models with about 10 times more data and 100 times more compute than Llama 2. Amid a chip shortage, it used two 24,000 GPU clusters, with each chip running around the price of a luxury car. It employed so much data in its AI work, it considered buying the publishing house Simon & Schuster to find more.
https://www.bigtechnology.com/p/are-llms-about-to-hit-a-wall

Stop doing cloud if not necessary (I’m saying this since years..) https://grski.pl/self-host

Redis forks (after the licence change) :
– redict : https://redict.io/ Drew DeVault + others?
– valkey : https://valkey.io/ backed by AWS, Google, Oracle, Ericsson, and Snap, with the Linux Foundation; more to come imo.

nginx new fork https://freenginx.org/ (others forks are openresty)

Too much hype about Devin : Debunking Devin: “First AI Software Engineer” Upwork lie exposed! https://www.youtube.com/watch?v=tNmgmwEtoWE

Matt Mullenweg buys Beeper (already owns Texts.com and Element (New Vector)) consolidating his position in Matrix.org based messaging services : https://techcrunch.com/2024/04/09/wordpress-com-owner-automattic-acquires-multi-service-messaging-app-beeper-for-125m/

golang fasthttp (replacement for standard net/http if you need “to handle thousands of small to medium requests per second and needs a consistent low millisecond response time”. “Currently fasthttp is successfully used by VertaMedia in a production serving up to 200K rps from more than 1.5M concurrent keep-alive connections per physical server.” https://github.com/valyala/fasthttp

Back to basics 🙂 Bloom filter https://en.wikipedia.org/wiki/Bloom_filter

1 billion row challenge : https://github.com/gunnarmorling/1brc

golang : alternative to cgo ? https://github.com/ebitengine/purego

Command line benchmark tool : https://github.com/sharkdp/hyperfine

New jpegli, jpeg-xl derived : https://giannirosato.com/blog/post/jpegli/

Edge CDN techniques : Shielding from fastly i.e. use a designated edge cache instead of origin https://docs.fastly.com/en/guides/shielding on a edge cache miss.

Apple car not interesting anymore : https://www.bloomberg.com/news/articles/2024-02-27/apple-cancels-work-on-electric-car-shifts-team-to-generative-ai

golang error handling the Uber way : https://github.com/uber-go/guide/blob/master/style.md#errors

nginx forking : Maxim Dounin annouces https://freenginx.org/en/ on the nginx forum https://forum.nginx.org/read.php?2,299130

Quad 9 free dns 9.9.9.9 : https://www.quad9.net/

UI testing the netflix way : https://netflixtechblog.com/introducing-safetest-a-novel-approach-to-front-end-testing-37f9f88c152d

Check it out : the new super-ide https://zed.dev/

Lex/Yacc today : https://langium.org/

Inside Stripe Engineering Culture, a series a posts : https://newsletter.pragmaticengineer.com/p/stripe

I find truly interesting the point around promoting a write culture (Execs/Directors in tech blog, SWEs on tech blogs/internal technical documents) : https://newsletter.pragmaticengineer.com/i/140970283/writing-culture
I’m a long-time believer that writing clarifies thinking more than talking and writing persists information, makes it searchable, talking does not. “Verba volant, scripta manent” as the Latins use to say. But this idea shifted into “just enough” documentation (which means it is not necessary) in SW engineering latest methodologies so it is interesting that a multi billion company like stripe is going totally against the tide.

Chat/Instant Messaging protocols comparison

Comparison Table

Protocol	Decentralized	Encryption	Main Use Case	Examples
XMPP	Yes	Optional	Federated messaging	ejabberd, Prosody
Matrix	Yes	Yes	Decentralized chat	Element, Synapse
Signal	No	Yes	Secure messaging	Signal, WhatsApp
SIP	No	Optional	Multimedia communication	Asterisk, Linphone
IRC	No	No	Community channels	Libera Chat, EFnet
ActivityPub	Yes	Optional	Social networking	Mastodon, Pleroma
WebRTC	Peer-to-peer	Optional	Real-time communication	Video calls, games
Tox	Yes	Yes	Peer-to-peer messaging	qTox, µTox
Slack RTM	No	No	Team collaboration	Slack
MTProto	No	Yes	Secure messaging	Telegram
Jingle	Yes	Optional	Real-time multimedia (via XMPP)	Conversations, Dino

Matrix protocol servers :

Server Name	Repository	License	Language	Description/Focus	Maturity	Key Features
Synapse	github.com/matrix-org/synapse	Apache 2.0	Python	Reference implementation, feature-rich, large-scale deployments.	Mature	Full spec compliance, federation, E2EE, bridges, application services, admin APIs, horizontal scaling.
Dendrite	github.com/matrix-org/dendrite	Apache 2.0	Go	“Second-generation” homeserver, performance-focused, smaller footprint.	Developing	Good performance, smaller footprint, aims for full spec compliance, monolithic/polylith deployments, sliding sync.
Conduit	gitlab.com/famedly/conduit	AGPLv3	Rust	Community-driven, speed, simplicity, ease of self-hosting, lightweight.	Developing	Fast, lightweight, simple deployment, SQLite/PostgreSQL/MySQL support, good for small/medium deployments.
Construct	github.com/matrix-construct/construct	ISC	C++	High-performance server for large, complex deployments.	Experimental	Highly performant, aims for very large deployments, customizability.
Ruma	(No single repo – see description)	MIT	Rust	Collection of Rust libraries for building Matrix clients/servers/services.	Varies	Building blocks for custom Matrix software in Rust, high customization.

Matrix protocol compatible clients :

Client Name	Platforms	License	Language(s)	Description/Focus	Spec Compliance	UI Technology	Key Features
Element	Web, Desktop (Linux, macOS, Windows), Mobile (iOS, Android)	Apache 2.0	JavaScript (React), TypeScript, Swift, Kotlin	Flagship client, feature-rich, modern UI.	High	Web (React), various	E2EE, Spaces, Threads, Voice/Video, Widgets, Rich Text, Polls, Location Sharing, Communities, Cross-signing, Key verification.
SchildiChat	Web, Desktop (Linux, macOS, Windows), Mobile (Android)	AGPLv3	JavaScript (React), TypeScript	Fork of Element, improved UX.	High	Web (React)	All Element features + UI/UX improvements (themes, faster startup, better notifications, media handling).
FluffyChat	Mobile (iOS, Android), Web, Desktop (Linux, macOS, Windows)	AGPLv3	Dart (Flutter)	User-friendly, ease of use, clean interface, multi-account.	Medium	Flutter	E2EE, Simple UI, fast, cross-platform, push notifications, multi-account.
Nheko	Desktop (Linux, macOS, Windows)	GPLv3	C++ (Qt)	Native desktop client, speed, efficiency, keyboard-centric.	Medium	Qt	E2EE, fast, native look, keyboard shortcuts, reactions, redactions, room upgrades, basic Spaces support.
NeoChat	Desktop (Linux, macOS, Windows), Mobile (Android)	GPLv3	C++ (Qt, Kirigami)	KDE-based client, KDE Plasma integration.	Medium	Qt, Kirigami	E2EE, KDE integration, clean interface, follows KDE Human Interface Guidelines.
Hydrogen	Web	Apache 2.0	JavaScript (vanilla)	Lightweight web client, speed, minimal resource usage. Runs well on low-powered devices.	Medium	Custom (vanilla JS)	Fast, lightweight, low-end hardware support, E2EE, basic features.
weechat-matrix	Terminal (Linux, macOS, Windows via WSL)	GPLv3	C, Python, Lua, etc.	Plugin for WeeChat IRC client, terminal Matrix support.	Medium	Terminal (ncurses)	Terminal-based, customizable, integrates with WeeChat’s features (scripting, triggers).
gomuks	Terminal (Linux, macOS, Windows via WSL)	AGPLv3	Go	Terminal-based client in Go, inspired by weechat-matrix.	High	Terminal (tview)	E2EE, fast, relatively feature-rich for a terminal client (image previews, reactions).
matrix-commander	Terminal (Linux, macOS, Windows via WSL)	MIT	Python	command-line tool

The charm of complication

(or the Attraction for Complexity) There is a very common tendency in computer science and it is to complicate solutions. This complication is often referred as incidental/accidental complexity i.e. anything we coders/designers do to make more complex a simple matter. Some times this is called over engineering and stems from the best intentions :

Attraction to Complexity: there’s often a misconception that more complex solutions are inherently better or more sophisticated. This can lead to choosing complicated approaches over simpler, more effective ones.
Technological Enthusiasm: developers might be eager to try out new technologies, patterns, or architectures. While innovation is important, using new tech for its own sake can lead to unnecessary complexity.
Anticipating Future Needs: developers may try to build solutions that are overly flexible to accommodate potential future requirements. This often leads to complex designs that are not needed for the current scope of the project.
Lack of Experience or Misjudgment: less experienced developers might not yet have the insight to choose the simplest effective solution, while even seasoned developers can sometimes overestimate what’s necessary for a project.
Avoiding Refactoring: In an attempt to avoid refactoring in the future, developers might add layers of abstraction or additional features they think might be needed later, resulting in over-engineered solutions.
Miscommunication or Lack of Clear Requirements: without clear requirements or effective communication within a team, developers might make assumptions about what’s needed, leading to solutions that are more complex than necessary.
Premature Optimization: trying to optimize every aspect of a solution from the beginning can lead to complexity. The adage “premature optimization is the root of all evil” highlights the pitfalls of optimizing before it’s clear that performance is an issue.
Unclear Problem Definition: not fully understanding the problem that needs to be solved can result in solutions that are more complicated than needed. A clear problem definition is essential for a simple and effective solution.
Personal Preference or Style: sometimes, the preference for certain coding styles, architectures, or patterns can lead to more complex solutions, even if simpler alternatives would suffice.
Fear of Under-Engineering: there can be a fear of delivering a solution that appears under-engineered or too simplistic, leading to adding unnecessary features or layers of abstraction.

Paul Stephen Borile

software teams management, golang/c coding, bass player, biking addicted