What You’ll Build

By the end of this tutorial, you’ll have a reusable Ansible role that can deploy a K3s Kubernetes cluster to any set of Linux machines — homelab hardware, cloud VPS nodes, or both. The setup includes:

  • Automated K3s installation on server and agent nodes
  • WireGuard-encrypted pod-to-pod networking
  • Embedded etcd with HA support and S3 snapshot backups
  • iptables firewall rules tailored for K3s traffic
  • Multi-environment inventories so one codebase manages multiple clusters

If you’ve ever SSH’d into a box, run curl https://get.k3s.io | sh, and hoped for the best — this replaces that with something you can version-control, audit, and rebuild from scratch.

Prerequisites

Before you start, you’ll need:

  • Ansible 2.9+ and Python 3.8+ on your control machine (your laptop, a jumpbox, etc.)
  • SSH access to your target nodes (key-based auth recommended)
  • Two or more Linux machines running Debian, Ubuntu, RHEL, or SUSE — these will become your K3s cluster
  • Basic familiarity with Ansible (inventories, playbooks, roles)

Step 1: Set Up the Project Structure

Create a Galaxy-style Ansible project. This structure separates inventories, playbooks, and roles so you can manage multiple environments with the same code:

mkdir -p ansible/{inventories/home/group_vars,playbooks,roles/k3s/{defaults,handlers,tasks,templates}}
cd ansible

Your directory should look like this:

ansible/
├── ansible.cfg
├── requirements.yml
├── Makefile
├── inventories/
   └── home/
       ├── hosts.yml
       └── group_vars/
           └── all.yml
├── playbooks/
   ├── setup-k3s-cluster.yml
   └── uninstall-k3s.yml
└── roles/
    └── k3s/
        ├── defaults/main.yml
        ├── handlers/main.yml
        ├── tasks/
           ├── main.yml
           ├── preflight.yml
           ├── install.yml
           ├── server.yml
           ├── agent.yml
           ├── post-install.yml
           └── uninstall.yml
        └── templates/
            ├── config.yaml.j2
            ├── k3s.env.j2
            └── registries.yaml.j2

The key idea: inventories define where to deploy, group vars define how to configure, and the role handles all the logic. You can add a second inventory (e.g., inventories/production/) later and deploy a completely different cluster with the same role.

Step 2: Configure Ansible

Create ansible.cfg in your project root. This tells Ansible where to find roles and inventories, enables SSH pipelining for performance, and disables host key checking (you can tighten this for production):

[defaults]
stdout_callback = default
result_format = yaml
roles_path = ./roles
inventory = ./inventories/home/hosts.yml
retry_files_enabled = False
host_key_checking = False
gathering = smart
forks = 10
timeout = 30

[privilege_escalation]
become = True
become_method = sudo
become_user = root
become_ask_pass = False

[ssh_connection]
pipelining = True

And a requirements.yml for any Galaxy dependencies:

---
collections:
  - name: ansible.posix
    version: 1.5.4
  - name: community.general
    version: 8.0.2

Install them:

ansible-galaxy install -r requirements.yml

Step 3: Define Your Inventory

The inventory tells Ansible which machines are servers (control plane) and which are workers. Create inventories/home/hosts.yml:

---
all:
  children:
    home:
      children:
        k3s:
          children:
            k3s_master:
              hosts:
                master.lab.example.com:
                  ansible_host: 192.168.1.10
                  k3s_cluster_init: true
                  k3s_kubelet_args:
                    - "max-pods=110"
            k3s_workers:
              hosts:
                worker-01.lab.example.com:
                  ansible_host: 192.168.1.11
                  k3s_kubelet_args:
                    - "max-pods=45"
      vars:
        ansible_user: deploy
        env: home

Replace the hostnames and IPs with your actual machines. The important parts:

  • k3s_master — these nodes run the K3s server (API server, scheduler, controller manager)
  • k3s_workers — these nodes run the K3s agent and your workloads
  • k3s_cluster_init: true — only set this on the first server node; it bootstraps the cluster
  • k3s_kubelet_args — optional per-node kubelet tuning

For a multi-cloud or multi-region cluster, you can organize workers into subgroups and add metadata:

k3s_workers:
  children:
    k3s_workers_eu:
      hosts:
        worker-eu-01.example.com:
          ansible_host: 203.0.113.20
          location: amsterdam
          datacenter: EU-WEST
          edge: true
    k3s_workers_na:
      hosts:
        worker-na-01.example.com:
          ansible_host: 203.0.113.21
          location: virginia
          datacenter: US-EAST
          edge: true

These location and datacenter variables get applied as Kubernetes node labels automatically, making it easy to schedule workloads to specific regions.

Step 4: Set Group Variables

Create inventories/home/group_vars/all.yml. This is where you configure the K3s role behavior for all nodes in this inventory:

---
# K3s version - pin to a specific version or use "stable"
k3s_version: stable

# Network configuration
k3s_cluster_cidr: "10.42.0.0/16"
k3s_service_cidr: "10.43.0.0/16"

# Flannel backend - "vxlan" for local networks, "wireguard-native" for
# encrypted traffic across untrusted networks
k3s_flannel_backend: "wireguard-native"

# TLS SANs - add any hostnames/IPs you'll use to reach the API server
k3s_tls_san:
  - "*.lab.example.com"
  - "master.lab.example.com"

# Kubeconfig permissions
k3s_write_kubeconfig_mode: "0644"

# Node labels applied to every node
k3s_node_labels:
  - "env=home"

# Components to disable (uncomment to use your own ingress/LB)
k3s_disable_components: []
  # - traefik
  # - servicelb

# Secrets encryption at rest
k3s_secrets_encryption: true

# Embedded etcd (required for HA with multiple server nodes)
k3s_embedded_etcd: false  # set to true if running 3+ server nodes

# etcd snapshots
k3s_etcd_snapshot_enabled: true
k3s_etcd_snapshot_schedule_cron: "0 */12 * * *"
k3s_etcd_snapshot_retention: 5

# Firewall settings
firewall_k3s_enabled: true
firewall_k3s_pod_cidr: "10.42.0.0/16"
firewall_k3s_service_cidr: "10.43.0.0/16"

WireGuard or VXLAN? If your nodes are on a trusted private network (same LAN, VPC peered, VPN), vxlan is fine. If traffic crosses the public internet — cloud VPS nodes in different providers, for example — use wireguard-native. It encrypts all inter-node pod traffic with WireGuard’s Noise protocol at the kernel level. K3s manages the keys automatically; you don’t touch WireGuard config at all.

Step 5: Write the K3s Role Defaults

Create roles/k3s/defaults/main.yml. These are the default values; anything set in your group vars or inventory overrides them:

---
# Installation mode: server, agent, or uninstall
k3s_role: server
k3s_version: stable

# Cluster configuration
k3s_cluster_init: false
k3s_cluster_token: ""
k3s_server_url: ""

# High Availability
k3s_ha_enabled: false
k3s_embedded_etcd: false

# Network
k3s_cluster_cidr: "10.42.0.0/16"
k3s_service_cidr: "10.43.0.0/16"
k3s_cluster_dns: "10.43.0.10"
k3s_cluster_domain: "cluster.local"
k3s_cni: "flannel"
k3s_flannel_backend: "vxlan"
k3s_flannel_iface: ""

# Node
k3s_node_name: "{{ ansible_facts['hostname'] }}"
k3s_node_ip: "{{ ansible_facts['default_ipv4']['address'] }}"
k3s_node_external_ip: ""
k3s_node_labels: []
k3s_node_taints: []

# Components to disable
k3s_disable_components: []

# Kubelet / API server args
k3s_kubelet_args: []
k3s_kube_apiserver_args: []
k3s_kube_controller_manager_args: []
k3s_kube_scheduler_args: []

# TLS
k3s_tls_san: []
k3s_https_listen_port: 6443

# Security
k3s_secrets_encryption: false
k3s_protect_kernel_defaults: false
k3s_selinux: false

# Paths
k3s_data_dir: "/var/lib/rancher/k3s"
k3s_config_dir: "/etc/rancher/k3s"
k3s_install_dir: "/usr/local/bin"
k3s_kubeconfig: "/etc/rancher/k3s/k3s.yaml"
k3s_write_kubeconfig_mode: "0644"

# Installation
k3s_install_method: "script"
k3s_install_script_url: "https://get.k3s.io"
k3s_airgap: false

# Service
k3s_service_enabled: true
k3s_service_state: started

# Container runtime
k3s_docker: false
k3s_container_runtime_endpoint: ""

# Registry configuration
k3s_registries: {}

# Environment variables
k3s_env_vars: {}

# etcd snapshots
k3s_etcd_snapshot_enabled: false
k3s_etcd_snapshot_schedule_cron: "0 */12 * * *"
k3s_etcd_snapshot_retention: 5
k3s_etcd_snapshot_dir: "/var/lib/rancher/k3s/server/db/snapshots"

# S3 backup
k3s_etcd_s3_enabled: false
k3s_etcd_s3_endpoint: "s3.amazonaws.com"
k3s_etcd_s3_bucket: ""
k3s_etcd_s3_region: "us-east-1"
k3s_etcd_s3_folder: "k3s-snapshots"
k3s_etcd_s3_access_key: ""
k3s_etcd_s3_secret_key: ""

# Uninstall
k3s_uninstall_remove_data: true
k3s_force_reinstall: false

# Extra args
k3s_server_extra_args: ""
k3s_agent_extra_args: ""

# Feature gates
k3s_feature_gates: []

This is intentionally comprehensive. Every knob you might need is here with a sensible default. Most deployments only override a handful of these in group vars.

Step 6: Write the Role Tasks

This is the core of the tutorial. The role is split into task files that run conditionally based on k3s_role.

tasks/main.yml — The Dispatcher

This file routes to the correct task file based on whether you’re installing a server, an agent, or uninstalling:

---
- name: Include preflight checks
  ansible.builtin.include_tasks: preflight.yml
  tags: [k3s, preflight]

- name: Include uninstall tasks
  ansible.builtin.include_tasks: uninstall.yml
  when: k3s_role == "uninstall"
  tags: [k3s, uninstall]

- name: Include installation tasks
  ansible.builtin.include_tasks: install.yml
  when: k3s_role in ['server', 'agent']
  tags: [k3s, install]

- name: Include server configuration
  ansible.builtin.include_tasks: server.yml
  when: k3s_role == "server"
  tags: [k3s, server]

- name: Include agent configuration
  ansible.builtin.include_tasks: agent.yml
  when: k3s_role == "agent"
  tags: [k3s, agent]

- name: Include post-installation tasks
  ansible.builtin.include_tasks: post-install.yml
  when: k3s_role in ['server', 'agent']
  tags: [k3s, post-install]

One role, three behaviors. The k3s_role variable does all the routing.

tasks/preflight.yml — Validate the Environment

Before installing anything, verify the target node is suitable:

---
- name: Check if running on supported OS
  ansible.builtin.assert:
    that:
      - ansible_facts['os_family'] in ['Debian', 'RedHat', 'Suse']
    fail_msg: "K3s requires Debian, RedHat, or Suse based OS"

- name: Check minimum kernel version
  ansible.builtin.shell: |
    uname -r | awk -F. '{print $1"."$2}'
  register: kernel_version
  changed_when: false

- name: Verify kernel is 3.10 or higher
  ansible.builtin.assert:
    that:
      - kernel_version.stdout is version('3.10', '>=')
    fail_msg: "K3s requires kernel 3.10 or higher"

- name: Install curl
  ansible.builtin.package:
    name: curl
    state: present
  when: k3s_install_method == "script"

- name: Install WireGuard (Debian/Ubuntu)
  ansible.builtin.apt:
    name: [wireguard, wireguard-tools]
    state: present
  when:
    - ansible_facts['os_family'] == "Debian"
    - k3s_flannel_backend == "wireguard-native"

- name: Install WireGuard (RedHat/CentOS)
  ansible.builtin.yum:
    name: [wireguard-tools]
    state: present
  when:
    - ansible_facts['os_family'] == "RedHat"
    - k3s_flannel_backend == "wireguard-native"

- name: Load WireGuard kernel module
  community.general.modprobe:
    name: wireguard
    state: present
  when: k3s_flannel_backend == "wireguard-native"

- name: Ensure WireGuard module loads on boot
  ansible.builtin.lineinfile:
    path: /etc/modules-load.d/wireguard.conf
    line: wireguard
    create: yes
  when: k3s_flannel_backend == "wireguard-native"

- name: Ensure required directories exist
  ansible.builtin.file:
    path: "{{ item }}"
    state: directory
    mode: '0755'
  loop:
    - "{{ k3s_config_dir }}"
    - "{{ k3s_data_dir }}"

- name: Check if K3s is already installed
  ansible.builtin.stat:
    path: "{{ k3s_install_dir }}/k3s"
  register: k3s_binary

The WireGuard steps only run when you’ve set k3s_flannel_backend: "wireguard-native". If you’re using plain VXLAN, they’re skipped entirely.

tasks/install.yml — Download and Configure

This handles the actual K3s installation. The key design choice: install the binary first with INSTALL_K3S_SKIP_START=true, lay down the config file, then start the service. This avoids K3s starting with default settings and having to restart:

---
- name: Generate cluster token if not provided
  ansible.builtin.set_fact:
    k3s_cluster_token: "{{ lookup('password', '/dev/null chars=ascii_letters,digits length=32') }}"
  when:
    - k3s_cluster_token == ""
    - k3s_cluster_init or k3s_role == "server"
  run_once: true
  delegate_to: localhost

- name: Create K3s config directory
  ansible.builtin.file:
    path: "{{ k3s_config_dir }}"
    state: directory
    mode: '0755'

- name: Create K3s environment file
  ansible.builtin.template:
    src: k3s.env.j2
    dest: /etc/systemd/system/k3s.service.env
    mode: '0644'
  when: k3s_env_vars | length > 0
  notify: restart k3s

- name: Create K3s config file
  ansible.builtin.template:
    src: config.yaml.j2
    dest: "{{ k3s_config_dir }}/config.yaml"
    mode: '0644'
  notify: restart k3s

- name: Create registries config
  ansible.builtin.template:
    src: registries.yaml.j2
    dest: "{{ k3s_config_dir }}/registries.yaml"
    mode: '0644'
  when: k3s_registries | length > 0
  notify: restart k3s

- name: Download K3s installation script
  ansible.builtin.get_url:
    url: "{{ k3s_install_script_url }}"
    dest: /tmp/k3s-install.sh
    mode: '0755'
  when: k3s_install_method == "script"

- name: Install K3s using script
  ansible.builtin.shell: |
    INSTALL_K3S_VERSION="{{ k3s_version if k3s_version not in ['stable', 'latest'] else '' }}" \
    INSTALL_K3S_CHANNEL="{{ k3s_version if k3s_version in ['stable', 'latest'] else '' }}" \
    INSTALL_K3S_SKIP_START=true \
    /tmp/k3s-install.sh {{ 'server' if k3s_role == 'server' else 'agent' }}
  when:
    - k3s_install_method == "script"
    - not k3s_binary.stat.exists or k3s_version != 'stable'
  notify: restart k3s

Note the token generation: if you don’t provide a k3s_cluster_token in your group vars, the role auto-generates a 32-character random token and stores it as a fact. This fact then flows to worker nodes via hostvars — no separate coordination step needed.

tasks/server.yml — Start the Control Plane

After installation, the server task starts K3s, waits for the API to be reachable, and retrieves the cluster join token:

---
- name: Start K3s server service
  ansible.builtin.systemd:
    name: k3s
    state: started
    enabled: "{{ k3s_service_enabled }}"
    daemon_reload: yes

- name: Wait for K3s server to be ready
  ansible.builtin.wait_for:
    host: "{{ k3s_node_ip }}"
    port: "{{ k3s_https_listen_port }}"
    timeout: 300

- name: Wait for K3s token to be available
  ansible.builtin.wait_for:
    path: /var/lib/rancher/k3s/server/node-token
    timeout: 300
  when: k3s_cluster_init

- name: Retrieve K3s token from master
  ansible.builtin.slurp:
    src: /var/lib/rancher/k3s/server/node-token
  register: k3s_token_file
  when: k3s_cluster_init

- name: Set K3s token fact
  ansible.builtin.set_fact:
    k3s_cluster_token: "{{ k3s_token_file.content | b64decode | trim }}"
  when:
    - k3s_cluster_init
    - k3s_token_file is defined

- name: Wait for kubeconfig to be created
  ansible.builtin.wait_for:
    path: "{{ k3s_kubeconfig }}"
    timeout: 300

- name: Set kubeconfig permissions
  ansible.builtin.file:
    path: "{{ k3s_kubeconfig }}"
    mode: "{{ k3s_write_kubeconfig_mode }}"

- name: Create kubectl symlink
  ansible.builtin.file:
    src: /usr/local/bin/k3s
    dest: /usr/local/bin/kubectl
    state: link

The slurp + set_fact pattern is how the server’s join token gets shared with agents. After this play runs, any subsequent play targeting worker nodes can reference hostvars[groups['k3s_master'][0]]['k3s_cluster_token'].

tasks/agent.yml — Join Workers to the Cluster

Agent setup is simpler — it just needs the server URL and token, then starts the agent service:

---
- name: Verify server URL is provided
  ansible.builtin.assert:
    that:
      - k3s_server_url != ""
    fail_msg: "k3s_server_url must be provided for agent nodes"

- name: Verify cluster token is provided
  ansible.builtin.assert:
    that:
      - k3s_cluster_token != ""
    fail_msg: "k3s_cluster_token must be provided for agent nodes"

- name: Start K3s agent service
  ansible.builtin.systemd:
    name: k3s-agent
    state: started
    enabled: "{{ k3s_service_enabled }}"
    daemon_reload: yes

- name: Wait for K3s agent to be ready
  ansible.builtin.wait_for:
    path: /var/lib/rancher/k3s/agent/kubelet.kubeconfig
    timeout: 300

The assertions catch misconfigurations early — if you forget to set the server URL or token, you get a clear error message instead of a cryptic systemd failure.

tasks/post-install.yml — Verify Everything Works

---
- name: Verify K3s is running
  ansible.builtin.systemd:
    name: "{{ 'k3s' if k3s_role == 'server' else 'k3s-agent' }}"
    state: started

- name: Get K3s version
  ansible.builtin.command: k3s --version
  register: k3s_version_output
  changed_when: false

- name: Display K3s version
  ansible.builtin.debug:
    msg: "{{ k3s_version_output.stdout_lines[0] }}"

- name: Get node status (server only)
  ansible.builtin.command: kubectl get nodes
  register: k3s_nodes
  changed_when: false
  when: k3s_role == "server"
  environment:
    KUBECONFIG: "{{ k3s_kubeconfig }}"

- name: Display node status
  ansible.builtin.debug:
    msg: "{{ k3s_nodes.stdout_lines }}"
  when: k3s_role == "server"

tasks/uninstall.yml — Clean Teardown

---
- name: Stop K3s services
  ansible.builtin.systemd:
    name: "{{ item }}"
    state: stopped
    enabled: no
  loop: [k3s, k3s-agent]
  failed_when: false

- name: Run K3s server uninstall script
  ansible.builtin.command: /usr/local/bin/k3s-uninstall.sh
  when: k3s_role == "server"
  failed_when: false

- name: Run K3s agent uninstall script
  ansible.builtin.command: /usr/local/bin/k3s-agent-uninstall.sh
  when: k3s_role == "agent"
  failed_when: false

- name: Remove K3s data directory
  ansible.builtin.file:
    path: "{{ k3s_data_dir }}"
    state: absent
  when: k3s_uninstall_remove_data

- name: Remove K3s config directory
  ansible.builtin.file:
    path: "{{ k3s_config_dir }}"
    state: absent
  when: k3s_uninstall_remove_data

- name: Remove kubectl symlink
  ansible.builtin.file:
    path: /usr/local/bin/kubectl
    state: absent

handlers/main.yml

---
- name: restart k3s
  ansible.builtin.systemd:
    name: "{{ 'k3s' if k3s_role == 'server' else 'k3s-agent' }}"
    state: restarted
    daemon_reload: yes
  listen: restart k3s

Step 7: Write the Config Template

Create roles/k3s/templates/config.yaml.j2. This is a Jinja2 template that generates the K3s config file. It handles server, agent, networking, node labels, security settings, and etcd backup — all driven by variables:

# K3s configuration - managed by Ansible

{% if k3s_role == "server" %}
{% if k3s_cluster_init %}
cluster-init: true
{% elif k3s_server_url is defined and k3s_server_url != "" %}
server: "{{ k3s_server_url }}"
{% endif %}

{% if k3s_cluster_token != "" %}
token: "{{ k3s_cluster_token }}"
{% endif %}

{% if k3s_tls_san | length > 0 %}
tls-san:
{% for san in k3s_tls_san %}
  - "{{ san }}"
{% endfor %}
{% endif %}

{% if k3s_disable_components | length > 0 %}
disable:
{% for component in k3s_disable_components %}
  - {{ component }}
{% endfor %}
{% endif %}

{% if k3s_https_listen_port != 6443 %}
https-listen-port: {{ k3s_https_listen_port }}
{% endif %}

{% if k3s_etcd_snapshot_enabled %}
etcd-snapshot-schedule-cron: "{{ k3s_etcd_snapshot_schedule_cron }}"
etcd-snapshot-retention: {{ k3s_etcd_snapshot_retention }}
etcd-snapshot-dir: "{{ k3s_etcd_snapshot_dir }}"
{% endif %}

{% if k3s_etcd_s3_enabled %}
etcd-s3: true
etcd-s3-endpoint: "{{ k3s_etcd_s3_endpoint }}"
etcd-s3-bucket: "{{ k3s_etcd_s3_bucket }}"
etcd-s3-region: "{{ k3s_etcd_s3_region }}"
etcd-s3-folder: "{{ k3s_etcd_s3_folder }}"
{% if k3s_etcd_s3_access_key != "" %}
etcd-s3-access-key: "{{ k3s_etcd_s3_access_key }}"
{% endif %}
{% if k3s_etcd_s3_secret_key != "" %}
etcd-s3-secret-key: "{{ k3s_etcd_s3_secret_key }}"
{% endif %}
{% endif %}

{% if k3s_cluster_cidr != "10.42.0.0/16" %}
cluster-cidr: "{{ k3s_cluster_cidr }}"
{% endif %}
{% if k3s_service_cidr != "10.43.0.0/16" %}
service-cidr: "{{ k3s_service_cidr }}"
{% endif %}
{% if k3s_secrets_encryption %}
secrets-encryption: true
{% endif %}
{% endif %}

{% if k3s_role == "agent" %}
server: "{{ k3s_server_url }}"
token: "{{ k3s_cluster_token }}"
{% endif %}

{% if k3s_flannel_backend != "vxlan" %}
flannel-backend: "{{ k3s_flannel_backend }}"
{% endif %}

{% if k3s_flannel_iface != "" %}
flannel-iface: "{{ k3s_flannel_iface }}"
{% endif %}

{% if k3s_node_labels | length > 0 %}
node-label:
{% for label in k3s_node_labels %}
  - "{{ label }}"
{% endfor %}
{% endif %}

{% if k3s_node_taints | length > 0 %}
node-taint:
{% for taint in k3s_node_taints %}
  - "{{ taint }}"
{% endfor %}
{% endif %}

{% if k3s_kubelet_args | length > 0 %}
kubelet-arg:
{% for arg in k3s_kubelet_args %}
  - "{{ arg }}"
{% endfor %}
{% endif %}

{% if k3s_write_kubeconfig_mode != "0644" %}
write-kubeconfig-mode: "{{ k3s_write_kubeconfig_mode }}"
{% endif %}

{% if k3s_docker %}
docker: true
{% endif %}

Also create the two supporting templates:

roles/k3s/templates/k3s.env.j2:

# K3s environment variables - managed by Ansible
{% for key, value in k3s_env_vars.items() %}
{{ key }}="{{ value }}"
{% endfor %}

roles/k3s/templates/registries.yaml.j2:

# K3s registries - managed by Ansible
{% if k3s_registries.mirrors is defined %}
mirrors:
{% for registry, config in k3s_registries.mirrors.items() %}
  "{{ registry }}":
    endpoint:
{% for endpoint in config.endpoint %}
      - "{{ endpoint }}"
{% endfor %}
{% endfor %}
{% endif %}

Step 8: Write the Orchestration Playbook

Create playbooks/setup-k3s-cluster.yml. This sequences the deployment in the correct order — server first, then workers one at a time, then verification:

---
- name: Install K3s on master node
  hosts: k3s_master
  become: true
  roles:
    - k3s
  vars:
    k3s_role: server
  tags: [k3s, master]

- name: Gather facts from master for worker configuration
  hosts: k3s_master
  become: true
  gather_facts: true
  tasks:
    - name: Master facts gathered
      ansible.builtin.debug:
        msg: "Master facts gathered for worker nodes"
  tags: [k3s, workers]

- name: Install K3s on worker nodes
  hosts: k3s_workers
  become: true
  serial: 1
  roles:
    - k3s
  vars:
    k3s_role: agent
    k3s_server_url: "https://{{ hostvars[groups['k3s_master'][0]]['ansible_host'] }}:6443"
    k3s_cluster_token: "{{ hostvars[groups['k3s_master'][0]]['k3s_cluster_token'] }}"
  tags: [k3s, workers]

- name: Verify K3s cluster
  hosts: k3s_master
  become: true
  tasks:
    - name: Wait for all nodes to be ready
      ansible.builtin.shell: |
        kubectl get nodes --no-headers | grep -v " Ready" | wc -l
      register: not_ready_nodes
      until: not_ready_nodes.stdout == "0"
      retries: 30
      delay: 10
      environment:
        KUBECONFIG: /etc/rancher/k3s/k3s.yaml
      changed_when: false

    - name: Display cluster status
      ansible.builtin.command: kubectl get nodes -o wide
      register: cluster_status
      environment:
        KUBECONFIG: /etc/rancher/k3s/k3s.yaml
      changed_when: false

    - name: Show cluster nodes
      ansible.builtin.debug:
        msg: "{{ cluster_status.stdout_lines }}"
  tags: [k3s, verify]

The serial: 1 on workers is important — it joins nodes one at a time. This prevents a thundering herd overwhelming the API server during bootstrap. It’s slower, but it’s reliable.

Also create playbooks/uninstall-k3s.yml for teardown:

---
- name: Uninstall K3s from worker nodes
  hosts: k3s_workers
  become: yes
  serial: 1
  roles:
    - k3s
  vars:
    k3s_role: uninstall
    k3s_uninstall_remove_data: true

- name: Uninstall K3s from master node
  hosts: k3s_master
  become: yes
  roles:
    - k3s
  vars:
    k3s_role: uninstall
    k3s_uninstall_remove_data: true

Workers first, then master. Always.

Step 9: Deploy

You’re ready. Verify connectivity first:

ansible -i inventories/home all -m ping

You should see SUCCESS for every node. If not, fix your SSH access before continuing.

Then deploy:

ansible-playbook -i inventories/home playbooks/setup-k3s-cluster.yml

That’s it. Ansible will:

  1. Run preflight checks (OS, kernel version, WireGuard if needed)
  2. Generate a cluster token
  3. Template the K3s config file
  4. Download and install the K3s binary
  5. Start the server, wait for the API to be ready
  6. Retrieve the join token
  7. Join each worker node, one at a time
  8. Wait for all nodes to report Ready
  9. Print the cluster status

When it finishes, SSH into your master node and confirm:

sudo kubectl get nodes -o wide

Step 10: Optional Enhancements

Enable High Availability

If you have three or more server nodes, enable embedded etcd for HA. In your group vars:

k3s_embedded_etcd: true
k3s_ha_enabled: true

The first server gets k3s_cluster_init: true in the inventory. Additional servers get k3s_server_url pointing at the first server — they join as additional control plane members. Lose any one server and the cluster keeps running.

Back Up etcd to S3

Add to your group vars (use Ansible Vault for the credentials):

k3s_etcd_s3_enabled: true
k3s_etcd_s3_endpoint: "s3.amazonaws.com"
k3s_etcd_s3_bucket: "my-etcd-backups"
k3s_etcd_s3_region: "us-east-1"
k3s_etcd_s3_folder: "k3s-etcd-snapshots"
k3s_etcd_s3_access_key: "{{ vault_s3_access_key }}"
k3s_etcd_s3_secret_key: "{{ vault_s3_secret_key }}"

Add Firewall Rules

If your nodes are on the public internet, you need a firewall. Without one, your K3s API server (port 6443), etcd (2379/2380), and kubelet (10250) are exposed to the entire internet. The approach here is an iptables-based firewall role that’s K3s-aware — it knows which CIDRs to allow, which ports to lock down, and which IPs to trust.

Create the firewall role structure

mkdir -p roles/firewall/{defaults,handlers,tasks,templates}

roles/firewall/defaults/main.yml

---
manage_firewall: true
firewall_backend: iptables

# Default policies
firewall_default_input_policy: DROP
firewall_default_forward_policy: DROP
firewall_default_output_policy: ACCEPT

# Trusted IPs with full access
firewall_trusted_ips: []

# Public ports accessible from anywhere
firewall_public_ports: []

# Basic rules
firewall_allow_ping: true
firewall_allow_established: true
firewall_allow_loopback: true

# Logging
firewall_enable_logging: true
firewall_log_dropped: true
firewall_log_prefix: "[FIREWALL-DROP] "

# SSH rate limiting
firewall_ssh_rate_limit: true
firewall_ssh_rate_limit_connections: 10
firewall_ssh_rate_limit_seconds: 60

# K3s-specific settings
firewall_k3s_enabled: false
firewall_k3s_pod_cidr: "10.42.0.0/16"
firewall_k3s_service_cidr: "10.43.0.0/16"

The defaults lock everything down: DROP on input and forward, ACCEPT on output. Nothing gets in unless explicitly permitted.

roles/firewall/tasks/main.yml

The main task file dispatches based on the chosen backend:

---
- name: Include iptables tasks
  ansible.builtin.include_tasks: iptables.yml
  when:
    - manage_firewall
    - firewall_backend == "iptables"
  tags: [firewall, iptables]

roles/firewall/tasks/iptables.yml

This installs iptables, deploys the rules script as a Jinja2 template, creates a systemd service to apply rules on boot, and runs them immediately:

---
- name: Install iptables (Debian/Ubuntu)
  ansible.builtin.apt:
    name: [iptables, iptables-persistent]
    state: present
  when: ansible_os_family == "Debian"

- name: Deploy iptables rules script
  ansible.builtin.template:
    src: iptables-rules.sh.j2
    dest: /etc/iptables-rules.sh
    mode: '0750'
    owner: root
    group: root
  notify: apply iptables rules

- name: Create systemd service for iptables rules
  ansible.builtin.copy:
    dest: /etc/systemd/system/iptables-custom.service
    mode: '0644'
    content: |
      [Unit]
      Description=Custom iptables rules
      After=network.target

      [Service]
      Type=oneshot
      ExecStart=/etc/iptables-rules.sh
      RemainAfterExit=yes

      [Install]
      WantedBy=multi-user.target
  notify:
    - reload systemd
    - apply iptables rules

- name: Enable iptables-custom service
  ansible.builtin.systemd:
    name: iptables-custom
    enabled: yes
    daemon_reload: yes

- name: Apply iptables rules immediately
  ansible.builtin.command: /etc/iptables-rules.sh
  changed_when: true

The systemd service ensures rules survive reboots. The template gets re-deployed and re-applied whenever variables change.

roles/firewall/handlers/main.yml

---
- name: apply iptables rules
  ansible.builtin.command: /etc/iptables-rules.sh
  listen: apply iptables rules

- name: reload systemd
  ansible.builtin.systemd:
    daemon_reload: yes
  listen: reload systemd

roles/firewall/templates/iptables-rules.sh.j2

This is the core of the firewall — a bash script generated from your Ansible variables. It sets up chains, allows K3s internal traffic, creates a trusted IP allowlist, opens public ports, blocks K8s-sensitive ports from the internet, and drops everything else:

#!/bin/bash
# Managed by Ansible - do not edit manually

# Flush custom rules (preserve K3s chains like KUBE-*, CNI-*, FLANNEL)
iptables -F INPUT
iptables -F OUTPUT
iptables -F FORWARD 2>/dev/null || true

# Remove custom chains only
for chain in $(iptables -L -n | grep "^Chain" \
  | grep -v -E "INPUT|OUTPUT|FORWARD|KUBE-|CNI-|FLANNEL" \
  | awk '{print $2}'); do
  iptables -F "$chain" 2>/dev/null || true
  iptables -X "$chain" 2>/dev/null || true
done

# Default policies
iptables -P INPUT {{ firewall_default_input_policy }}
iptables -P FORWARD {{ firewall_default_forward_policy }}
iptables -P OUTPUT {{ firewall_default_output_policy }}

{% if firewall_allow_loopback %}
# Loopback
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT
{% endif %}

{% if firewall_allow_established %}
# Established/related connections
iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
iptables -A OUTPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
iptables -A FORWARD -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
{% endif %}

{% if firewall_k3s_enabled %}
# ---- K3s traffic ----
# Pod-to-pod and pod-to-service forwarding
iptables -A FORWARD -s {{ firewall_k3s_pod_cidr }} -j ACCEPT
iptables -A FORWARD -d {{ firewall_k3s_pod_cidr }} -j ACCEPT
iptables -A INPUT -s {{ firewall_k3s_service_cidr }} -j ACCEPT
iptables -A INPUT -d {{ firewall_k3s_service_cidr }} -j ACCEPT

# CNI bridge traffic
iptables -A INPUT -i cni0 -j ACCEPT
iptables -A OUTPUT -o cni0 -j ACCEPT
iptables -A FORWARD -i cni0 -j ACCEPT
iptables -A FORWARD -o cni0 -j ACCEPT

# Flannel overlay traffic (flannel.1, flannel-wg, etc.)
iptables -A INPUT -i flannel+ -j ACCEPT
iptables -A OUTPUT -o flannel+ -j ACCEPT
iptables -A FORWARD -i flannel+ -j ACCEPT
iptables -A FORWARD -o flannel+ -j ACCEPT
{% endif %}

{% if firewall_allow_ping %}
# ICMP
iptables -A INPUT -p icmp --icmp-type echo-request -j ACCEPT
iptables -A OUTPUT -p icmp --icmp-type echo-reply -j ACCEPT
{% endif %}

# ---- Trusted IPs (SECURE_ACCESS chain) ----
iptables -N SECURE_ACCESS 2>/dev/null || iptables -F SECURE_ACCESS
iptables -A SECURE_ACCESS -j ACCEPT

{% if firewall_trusted_ips | length > 0 %}
{% for ip in firewall_trusted_ips %}
iptables -A INPUT -s {{ ip }} -j SECURE_ACCESS \
  -m comment --comment "SECURE_ACCESS: {{ ip }}"
{% endfor %}
{% endif %}

{% if firewall_public_ports | length > 0 %}
# ---- Public ports ----
{% for port in firewall_public_ports %}
{% set port_num = port.split('/')[0] %}
{% set protocol = port.split('/')[1] if '/' in port else 'tcp' %}
iptables -A INPUT -p {{ protocol }} --dport {{ port_num }} -j ACCEPT \
  -m comment --comment "Public: {{ port }}"
{% endfor %}
{% endif %}

{% if firewall_k3s_enabled %}
# ---- Block K8s ports from public (trusted IPs already matched above) ----
iptables -A INPUT -p tcp --dport 6443 -m conntrack --ctstate NEW -j DROP \
  -m comment --comment "Block K8s API from public"
iptables -A INPUT -p tcp --dport 10250 -m conntrack --ctstate NEW -j DROP \
  -m comment --comment "Block Kubelet from public"
iptables -A INPUT -p tcp --dport 2379 -m conntrack --ctstate NEW -j DROP \
  -m comment --comment "Block etcd from public"
iptables -A INPUT -p tcp --dport 2380 -m conntrack --ctstate NEW -j DROP \
  -m comment --comment "Block etcd peer from public"
{% endif %}

# Block SSH from public (trusted IPs already have access)
iptables -A INPUT -p tcp --dport 22 -m conntrack --ctstate NEW -j DROP \
  -m comment --comment "Block SSH from public"

{% if firewall_log_dropped %}
# Log dropped packets
iptables -A INPUT -m limit --limit 5/min \
  -j LOG --log-prefix "{{ firewall_log_prefix }}" --log-level 7
{% endif %}

# Persist rules
{% if ansible_os_family == "Debian" %}
netfilter-persistent save
{% elif ansible_os_family == "RedHat" %}
service iptables save
{% endif %}

echo "Firewall rules applied successfully"

The order of rules matters — this is how iptables works:

  1. Loopback and established connections pass immediately
  2. K3s internal traffic (pod/service CIDRs, CNI bridge, Flannel overlay) is allowed
  3. ICMP (ping) is allowed
  4. Trusted IPs get full access via the SECURE_ACCESS chain — your cluster nodes, your home IP, management hosts
  5. Public ports (80, 443, etc.) are open to everyone
  6. K8s-sensitive ports (6443, 10250, 2379, 2380) are explicitly dropped for non-trusted IPs
  7. SSH is dropped for non-trusted IPs
  8. Everything else is dropped by the default INPUT DROP policy

Because trusted IPs match before the drop rules, cluster nodes can still reach the API server and etcd. But a random scanner on the internet hits the drops.

Configure the firewall in group vars

Add these to your inventories/home/group_vars/all.yml:

# Firewall
manage_firewall: true
firewall_backend: iptables

firewall_k3s_enabled: true
firewall_k3s_pod_cidr: "10.42.0.0/16"
firewall_k3s_service_cidr: "10.43.0.0/16"

firewall_trusted_ips:
  - 198.51.100.1        # your home IP
  - 203.0.113.10        # master node
  - 203.0.113.20        # worker node 1
  - 203.0.113.21        # worker node 2
  # add every node's public IP here

firewall_public_ports:
  - 80/tcp
  - 443/tcp

firewall_ssh_rate_limit: true
firewall_ssh_rate_limit_connections: 10
firewall_ssh_rate_limit_seconds: 60

Every node’s public IP goes in every other node’s trusted list. It’s explicit, auditable, and means you can reason about exactly who can talk to whom.

Add the firewall to the orchestration playbook

Update playbooks/setup-k3s-cluster.yml to run the firewall role before K3s installation. Add this play at the top, before the K3s server play:

- name: Configure firewall on all nodes
  hosts: k3s
  become: true
  roles:
    - firewall
  vars:
    manage_firewall: true
    firewall_k3s_enabled: true
  tags: [firewall]

This ensures every node has firewall rules in place before K3s starts, so the API server is never exposed — even briefly — during deployment.

Add a Second Inventory

Want to manage a staging cluster alongside your homelab? Copy the inventory:

cp -r inventories/home inventories/staging

Edit inventories/staging/hosts.yml with your staging hosts and update the group vars. Then deploy with:

ansible-playbook -i inventories/staging playbooks/setup-k3s-cluster.yml

Same playbook. Different inventory. Different cluster.

Tearing It Down

If you need to start over:

ansible-playbook -i inventories/home playbooks/uninstall-k3s.yml

This stops services, runs K3s’s own uninstall scripts, and removes all data. You can rebuild from scratch in minutes.

What’s Next

Once the cluster is running, consider:

  • cert-manager for automated TLS certificate management
  • ArgoCD or Flux for GitOps-based workload deployment — Ansible gets you the cluster, GitOps manages what runs on it
  • Ansible Vault for encrypting cluster tokens and S3 credentials in your group vars
  • Monitoring with Prometheus and Grafana via Helm charts

The whole point of this approach is that your cluster is a Git repo. If a node dies, replace the hardware, update the inventory, and re-run the playbook. If the whole cluster is toast, rebuild it from scratch. Infrastructure as code means you never have to remember what you did — you just run it again.