Talos Linux Homelab Pt. 2: Cluster and Argo CD Bootstrapping

No, I’m not going to duplicate the tutorial

• Table of Contents •

⚠️

I began writing this article about a year ago. Some of the content may be outdated and I’m now hazy on the details. An attempt was made to keep it accurate, but note that this was never meant to be a tutorial or a step-by-step guide.

This article is not meant as a replacement for the Talos documentation page on getting started with a cluster. Instead, I’ll just mention what I’ve done and how easy it is to set it up, as well as the steps that I took to set up Argo CD and an App of Apps repository. For reference, the repository I ended up with is available on GitHub under d3adb5/homelab.

Wrapping the CLI with a Makefile

Talos has a neat CLI called talosctl, but to perform operations on multiple nodes at a time it is necessary to use the --nodes flag. If you want to apply-config on any node, you have to also pass the configuration file, and since there is no automatic detection of patch files, I took inspiration from someone else’s article on Talos and wrote a Makefile.

Why a Makefile? GNU Make is good for generating files, and that’s precisely what we’ll be doing at the core of managing a Talos cluster. Talos luckily implements IaC at the cluster management level. It does have state, don’t get me wrong, and sometimes you might want to talosctl reset a node, but for the most part it is as simple as editing the machine configuration and applying it.

Let’s start with targets in the output of make help:

config                  Generate configuration files for all nodes
  config-control          Generate config files for controlplane nodes
  config-workers          Generate config files for worker nodes
apply                   Apply configuration files to all nodes
  apply-control-<node>    Apply config file to controlplane node
  apply-worker-<node>     Apply config file to worker node
upgrade                 Upgrade all nodes to desired installer image
  upgrade-control         Upgrade only controlplane nodes
  upgrade-workers         Upgrade only worker nodes
  upgrade-control-<node>  Upgrade a controlplane node
  upgrade-worker-<node>   Upgrade a worker node
help                    Display this help message
clean                   Remove generated configuration files

Essentially, this Makefile is used to generate Talos machine configuration files for each node in both the control and data planes. The machine configuration files will end up under a directory called generated, which can be reset with a simple make clean. Upon updating the source files, new machine configuration can be applied through make apply. New nodes are added by creating new files under nodes/ with their name, followed by .yaml.

Directories and file hierarchy

I’ll briefly explain what the file hierarchy is like, and potentially copy this from this article into my repository’s README.md for future reference. Here is the file tree I’m currently working with:

├── Makefile
├── nodes
│   ├── controlplane
│   │   └── ottawa.yaml
│   └── workers
│       ├── edmonton.yaml
│       ├── halifax.yaml
│       ├── regina.yaml
│       └── victoria.yaml
├── patches
│   ├── allow-controlplane-workloads.yaml
│   ├── cluster-name.yaml
│   ├── longhorn-mount.yaml
│   ├── resolve-cluster-members.yaml
│   ├── unprivileged-user-ns-creation.yaml
│   └── use-calico-and-flannel-for-cni.yaml
├── schematic.yaml
└── secrets.yaml

secrets.yaml

This file is generated through talosctl gen secrets. It generates a “secret bundle”, as Talos calls it, which is a collection of credentials and certificate keypairs that are used to secure the cluster. Through keeping this separate we can reuse it without having to repeat it across multiple files.

schematic.yaml

This file is the schematic for the node image. It is used to grab the right Talos image to use, based on the extensions I wish to be included in it. You can read more about system extensions on the Talos documentation page, and for reference this is what I’m using as of the time of writing:

customization:
  systemExtensions:
    officialExtensions:
      - siderolabs/i915-ucode
      - siderolabs/intel-ucode
      - siderolabs/iscsi-tools
      - siderolabs/gvisor

The way this file is used in the Makefile is as follows:

FACTORY_URL      := https://factory.talos.dev/schematics
TALOS_VERSION    := 1.9.0
INSTALL_IMAGE_ID := $(shell curl -s --data-binary @schematic.yaml $(FACTORY_URL) | jq -r '.id')
INSTALL_IMAGE    := factory.talos.dev/installer/$(INSTALL_IMAGE_ID):v$(TALOS_VERSION)

ℹ️

This was later changed into a directory, to allow setting a default Talos image schematic as well as node-specific schematics. This is because one of my nodes has an NVIDIA GPU and should use related extensions.

patches/

This is the most important directory. All the YAML files within will be passed to talosctl gen config through the --config-patch flag, merging them with each base node machine configuration file. An example of what that looks like:

cluster:
  allowSchedulingOnControlPlanes: true

nodes/

These files are the base machine configuration that’ll be applied to each individual node. They are separated into controlplane/ and workers/ directories, and are used to generate the final machine configuration files. They can be as simple as just stating the node’s network hostname and identifying which disk Talos should be installed to. Here’s an example:

machine:
  network:
    hostname: regina.lan
  install:
    disk: /dev/nvme0n1

Important operations

Since Talos implements something akin to stateless IaC, the only operations we truly need to make easy through this Makefile — other than, of course, generating machine configuration — is applying the configuration and upgrading Talos itself. The latter is done through the upgrade command, and is a separate operation from applying machine configuration or upgrading the Kubernetes version in use.

It’s sadly somewhat necessary to separate these by control and data plane, since Talos operations often jump from where the cluster was bootstrapped, rather than from the client machine to each of the target nodes individually. Combine this with the fact my cluster has, currently, a single control plane node and no external load balancer to route traffic to the appropriate endpoint.

Argo CD and App of Apps

With the cluster properly set up, bootstrapped, and everything, it’s pretty simple to set up Argo CD and the App of Apps for the first time, provided the repository where the App of Apps will be is immediately accessible. Since my repository will contain more than just my Argo CD top level application, I created a directory called argo/ where I placed most of the things Argo CD will need to deploy service on the cluster.

The directory structure as of 2025-08-25 is as follows:

├── argo                  # Argo CD-reachable manifests
│   ├── app-of-apps       # Top level Argo CD application as a Helm chart
│   ├── cert-manager      # cert-manager issuer configuration
│   ├── cluster-utils     # Miscellaneous manifests for cluster stuff
│   ├── ddclient          # ddclient-related secrets and release values
│   ├── democratic-csi    # democratic-csi application manifests + secrets
│   ├── keycloak          # Manifests used to deploy Keycloak (e.g. database)
│   ├── longhorn          # Additional manifests related to Longhorn
│   ├── media             # Extra resources for media services
│   ├── metallb           # Extra MetalLB manifests (e.g. IPAddressPool)
│   ├── monitoring        # Extra resources for the monitoring stack
│   ├── network-policies  # Network policies to apply across the cluster
│   └── olmv0             # Operator Lifecycle Manager v0 manifests
├── democratic-csi        # democratic-csi driver configuration (encrypted)
├── images                # Container image build files
├── talos                 # Talos-related configuration and Makefile
└── terraform             # Infrastructure as Code for, well, infrastructure
    ├── dns               # DNS records for multiple domains
    └── keycloak          # Keycloak configuration

Bootstrapping Argo CD is as simple as creating a Helm release using the values defined in the Application manifest I wrote for it, and then the App of Apps (top level application) chart itself can be released, and then consequently managed through Argo CD. It’s like magic, honestly, seeing all the resources pop up in the Argo CD UI.

⚠️

WARNING: The network policies created through enabling network policies for the official Argo CD Helm chart are NOT sufficient. You will need to create additional policies allowing traffic from Argo CD’s server, repo-server and application controller pods to reach Redis (Redis HA HAProxy if you have enabled high availability).

Container Network Interface

When I first wrote this and up until some time ago, I was using “Canal”, or Calico (for policies) and Flannel (for networking), as my CNI. This was to keep things as simple as possible while enabling network policies. However, as part of troubleshooting an issue with Argo CD — that I later discovered was due to network policies being too restrictive — I switched to full-on Calico.

This is how that was done:

# talos/patches/use-calico-for-cni.yaml
cluster:
  network:
    cni:
      name: custom
      urls:
        - https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/operator-crds.yaml
        - https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/tigera-operator.yaml

This adds manifests for Talos to install the Tigera Operator, which will be responsible for bringing up Calico itself with our help:

curl -O -L https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/custom-resources.yaml
nvim custom-resources.yaml # Edit to your liking!
kubectl apply -f custom-resources.yaml

This was taken from the Calico installation docs for on-premises deployments.

There must be a better way to do this that doesn’t involve manual steps or uploading manifests elsewhere that we’d then depend on. Maybe Calico could be installed through Argo CD, but I don’t like the idea of having the CNI installed by a service already running on the cluster and virtually dependant on it.