What You’ll Build
By the end of this tutorial, you’ll have a reusable Ansible role that can deploy a K3s Kubernetes cluster to any set of Linux machines — homelab hardware, cloud VPS nodes, or both. The setup includes:
- Automated K3s installation on server and agent nodes
- WireGuard-encrypted pod-to-pod networking
- Embedded etcd with HA support and S3 snapshot backups
- iptables firewall rules tailored for K3s traffic
- Multi-environment inventories so one codebase manages multiple clusters
If you’ve ever SSH’d into a box, run curl https://get.k3s.io | sh, and hoped for the best — this replaces that with something you can version-control, audit, and rebuild from scratch.
Prerequisites
Before you start, you’ll need:
- Ansible 2.9+ and Python 3.8+ on your control machine (your laptop, a jumpbox, etc.)
- SSH access to your target nodes (key-based auth recommended)
- Two or more Linux machines running Debian, Ubuntu, RHEL, or SUSE — these will become your K3s cluster
- Basic familiarity with Ansible (inventories, playbooks, roles)
Step 1: Set Up the Project Structure
Create a Galaxy-style Ansible project. This structure separates inventories, playbooks, and roles so you can manage multiple environments with the same code:
mkdir -p ansible/{inventories/home/group_vars,playbooks,roles/k3s/{defaults,handlers,tasks,templates}}
cd ansible
Your directory should look like this:
ansible/
├── ansible.cfg
├── requirements.yml
├── Makefile
├── inventories/
│ └── home/
│ ├── hosts.yml
│ └── group_vars/
│ └── all.yml
├── playbooks/
│ ├── setup-k3s-cluster.yml
│ └── uninstall-k3s.yml
└── roles/
└── k3s/
├── defaults/main.yml
├── handlers/main.yml
├── tasks/
│ ├── main.yml
│ ├── preflight.yml
│ ├── install.yml
│ ├── server.yml
│ ├── agent.yml
│ ├── post-install.yml
│ └── uninstall.yml
└── templates/
├── config.yaml.j2
├── k3s.env.j2
└── registries.yaml.j2
The key idea: inventories define where to deploy, group vars define how to configure, and the role handles all the logic. You can add a second inventory (e.g., inventories/production/) later and deploy a completely different cluster with the same role.
Step 2: Configure Ansible
Create ansible.cfg in your project root. This tells Ansible where to find roles and inventories, enables SSH pipelining for performance, and disables host key checking (you can tighten this for production):
[defaults]
stdout_callback = default
result_format = yaml
roles_path = ./roles
inventory = ./inventories/home/hosts.yml
retry_files_enabled = False
host_key_checking = False
gathering = smart
forks = 10
timeout = 30
[privilege_escalation]
become = True
become_method = sudo
become_user = root
become_ask_pass = False
[ssh_connection]
pipelining = True
And a requirements.yml for any Galaxy dependencies:
---
collections:
- name: ansible.posix
version: 1.5.4
- name: community.general
version: 8.0.2
Install them:
ansible-galaxy install -r requirements.yml
Step 3: Define Your Inventory
The inventory tells Ansible which machines are servers (control plane) and which are workers. Create inventories/home/hosts.yml:
---
all:
children:
home:
children:
k3s:
children:
k3s_master:
hosts:
master.lab.example.com:
ansible_host: 192.168.1.10
k3s_cluster_init: true
k3s_kubelet_args:
- "max-pods=110"
k3s_workers:
hosts:
worker-01.lab.example.com:
ansible_host: 192.168.1.11
k3s_kubelet_args:
- "max-pods=45"
vars:
ansible_user: deploy
env: home
Replace the hostnames and IPs with your actual machines. The important parts:
k3s_master— these nodes run the K3s server (API server, scheduler, controller manager)k3s_workers— these nodes run the K3s agent and your workloadsk3s_cluster_init: true— only set this on the first server node; it bootstraps the clusterk3s_kubelet_args— optional per-node kubelet tuning
For a multi-cloud or multi-region cluster, you can organize workers into subgroups and add metadata:
k3s_workers:
children:
k3s_workers_eu:
hosts:
worker-eu-01.example.com:
ansible_host: 203.0.113.20
location: amsterdam
datacenter: EU-WEST
edge: true
k3s_workers_na:
hosts:
worker-na-01.example.com:
ansible_host: 203.0.113.21
location: virginia
datacenter: US-EAST
edge: true
These location and datacenter variables get applied as Kubernetes node labels automatically, making it easy to schedule workloads to specific regions.
Step 4: Set Group Variables
Create inventories/home/group_vars/all.yml. This is where you configure the K3s role behavior for all nodes in this inventory:
---
# K3s version - pin to a specific version or use "stable"
k3s_version: stable
# Network configuration
k3s_cluster_cidr: "10.42.0.0/16"
k3s_service_cidr: "10.43.0.0/16"
# Flannel backend - "vxlan" for local networks, "wireguard-native" for
# encrypted traffic across untrusted networks
k3s_flannel_backend: "wireguard-native"
# TLS SANs - add any hostnames/IPs you'll use to reach the API server
k3s_tls_san:
- "*.lab.example.com"
- "master.lab.example.com"
# Kubeconfig permissions
k3s_write_kubeconfig_mode: "0644"
# Node labels applied to every node
k3s_node_labels:
- "env=home"
# Components to disable (uncomment to use your own ingress/LB)
k3s_disable_components: []
# - traefik
# - servicelb
# Secrets encryption at rest
k3s_secrets_encryption: true
# Embedded etcd (required for HA with multiple server nodes)
k3s_embedded_etcd: false # set to true if running 3+ server nodes
# etcd snapshots
k3s_etcd_snapshot_enabled: true
k3s_etcd_snapshot_schedule_cron: "0 */12 * * *"
k3s_etcd_snapshot_retention: 5
# Firewall settings
firewall_k3s_enabled: true
firewall_k3s_pod_cidr: "10.42.0.0/16"
firewall_k3s_service_cidr: "10.43.0.0/16"
WireGuard or VXLAN? If your nodes are on a trusted private network (same LAN, VPC peered, VPN),
vxlanis fine. If traffic crosses the public internet — cloud VPS nodes in different providers, for example — usewireguard-native. It encrypts all inter-node pod traffic with WireGuard’s Noise protocol at the kernel level. K3s manages the keys automatically; you don’t touch WireGuard config at all.
Step 5: Write the K3s Role Defaults
Create roles/k3s/defaults/main.yml. These are the default values; anything set in your group vars or inventory overrides them:
---
# Installation mode: server, agent, or uninstall
k3s_role: server
k3s_version: stable
# Cluster configuration
k3s_cluster_init: false
k3s_cluster_token: ""
k3s_server_url: ""
# High Availability
k3s_ha_enabled: false
k3s_embedded_etcd: false
# Network
k3s_cluster_cidr: "10.42.0.0/16"
k3s_service_cidr: "10.43.0.0/16"
k3s_cluster_dns: "10.43.0.10"
k3s_cluster_domain: "cluster.local"
k3s_cni: "flannel"
k3s_flannel_backend: "vxlan"
k3s_flannel_iface: ""
# Node
k3s_node_name: "{{ ansible_facts['hostname'] }}"
k3s_node_ip: "{{ ansible_facts['default_ipv4']['address'] }}"
k3s_node_external_ip: ""
k3s_node_labels: []
k3s_node_taints: []
# Components to disable
k3s_disable_components: []
# Kubelet / API server args
k3s_kubelet_args: []
k3s_kube_apiserver_args: []
k3s_kube_controller_manager_args: []
k3s_kube_scheduler_args: []
# TLS
k3s_tls_san: []
k3s_https_listen_port: 6443
# Security
k3s_secrets_encryption: false
k3s_protect_kernel_defaults: false
k3s_selinux: false
# Paths
k3s_data_dir: "/var/lib/rancher/k3s"
k3s_config_dir: "/etc/rancher/k3s"
k3s_install_dir: "/usr/local/bin"
k3s_kubeconfig: "/etc/rancher/k3s/k3s.yaml"
k3s_write_kubeconfig_mode: "0644"
# Installation
k3s_install_method: "script"
k3s_install_script_url: "https://get.k3s.io"
k3s_airgap: false
# Service
k3s_service_enabled: true
k3s_service_state: started
# Container runtime
k3s_docker: false
k3s_container_runtime_endpoint: ""
# Registry configuration
k3s_registries: {}
# Environment variables
k3s_env_vars: {}
# etcd snapshots
k3s_etcd_snapshot_enabled: false
k3s_etcd_snapshot_schedule_cron: "0 */12 * * *"
k3s_etcd_snapshot_retention: 5
k3s_etcd_snapshot_dir: "/var/lib/rancher/k3s/server/db/snapshots"
# S3 backup
k3s_etcd_s3_enabled: false
k3s_etcd_s3_endpoint: "s3.amazonaws.com"
k3s_etcd_s3_bucket: ""
k3s_etcd_s3_region: "us-east-1"
k3s_etcd_s3_folder: "k3s-snapshots"
k3s_etcd_s3_access_key: ""
k3s_etcd_s3_secret_key: ""
# Uninstall
k3s_uninstall_remove_data: true
k3s_force_reinstall: false
# Extra args
k3s_server_extra_args: ""
k3s_agent_extra_args: ""
# Feature gates
k3s_feature_gates: []
This is intentionally comprehensive. Every knob you might need is here with a sensible default. Most deployments only override a handful of these in group vars.
Step 6: Write the Role Tasks
This is the core of the tutorial. The role is split into task files that run conditionally based on k3s_role.
tasks/main.yml — The Dispatcher
This file routes to the correct task file based on whether you’re installing a server, an agent, or uninstalling:
---
- name: Include preflight checks
ansible.builtin.include_tasks: preflight.yml
tags: [k3s, preflight]
- name: Include uninstall tasks
ansible.builtin.include_tasks: uninstall.yml
when: k3s_role == "uninstall"
tags: [k3s, uninstall]
- name: Include installation tasks
ansible.builtin.include_tasks: install.yml
when: k3s_role in ['server', 'agent']
tags: [k3s, install]
- name: Include server configuration
ansible.builtin.include_tasks: server.yml
when: k3s_role == "server"
tags: [k3s, server]
- name: Include agent configuration
ansible.builtin.include_tasks: agent.yml
when: k3s_role == "agent"
tags: [k3s, agent]
- name: Include post-installation tasks
ansible.builtin.include_tasks: post-install.yml
when: k3s_role in ['server', 'agent']
tags: [k3s, post-install]
One role, three behaviors. The k3s_role variable does all the routing.
tasks/preflight.yml — Validate the Environment
Before installing anything, verify the target node is suitable:
---
- name: Check if running on supported OS
ansible.builtin.assert:
that:
- ansible_facts['os_family'] in ['Debian', 'RedHat', 'Suse']
fail_msg: "K3s requires Debian, RedHat, or Suse based OS"
- name: Check minimum kernel version
ansible.builtin.shell: |
uname -r | awk -F. '{print $1"."$2}'
register: kernel_version
changed_when: false
- name: Verify kernel is 3.10 or higher
ansible.builtin.assert:
that:
- kernel_version.stdout is version('3.10', '>=')
fail_msg: "K3s requires kernel 3.10 or higher"
- name: Install curl
ansible.builtin.package:
name: curl
state: present
when: k3s_install_method == "script"
- name: Install WireGuard (Debian/Ubuntu)
ansible.builtin.apt:
name: [wireguard, wireguard-tools]
state: present
when:
- ansible_facts['os_family'] == "Debian"
- k3s_flannel_backend == "wireguard-native"
- name: Install WireGuard (RedHat/CentOS)
ansible.builtin.yum:
name: [wireguard-tools]
state: present
when:
- ansible_facts['os_family'] == "RedHat"
- k3s_flannel_backend == "wireguard-native"
- name: Load WireGuard kernel module
community.general.modprobe:
name: wireguard
state: present
when: k3s_flannel_backend == "wireguard-native"
- name: Ensure WireGuard module loads on boot
ansible.builtin.lineinfile:
path: /etc/modules-load.d/wireguard.conf
line: wireguard
create: yes
when: k3s_flannel_backend == "wireguard-native"
- name: Ensure required directories exist
ansible.builtin.file:
path: "{{ item }}"
state: directory
mode: '0755'
loop:
- "{{ k3s_config_dir }}"
- "{{ k3s_data_dir }}"
- name: Check if K3s is already installed
ansible.builtin.stat:
path: "{{ k3s_install_dir }}/k3s"
register: k3s_binary
The WireGuard steps only run when you’ve set k3s_flannel_backend: "wireguard-native". If you’re using plain VXLAN, they’re skipped entirely.
tasks/install.yml — Download and Configure
This handles the actual K3s installation. The key design choice: install the binary first with INSTALL_K3S_SKIP_START=true, lay down the config file, then start the service. This avoids K3s starting with default settings and having to restart:
---
- name: Generate cluster token if not provided
ansible.builtin.set_fact:
k3s_cluster_token: "{{ lookup('password', '/dev/null chars=ascii_letters,digits length=32') }}"
when:
- k3s_cluster_token == ""
- k3s_cluster_init or k3s_role == "server"
run_once: true
delegate_to: localhost
- name: Create K3s config directory
ansible.builtin.file:
path: "{{ k3s_config_dir }}"
state: directory
mode: '0755'
- name: Create K3s environment file
ansible.builtin.template:
src: k3s.env.j2
dest: /etc/systemd/system/k3s.service.env
mode: '0644'
when: k3s_env_vars | length > 0
notify: restart k3s
- name: Create K3s config file
ansible.builtin.template:
src: config.yaml.j2
dest: "{{ k3s_config_dir }}/config.yaml"
mode: '0644'
notify: restart k3s
- name: Create registries config
ansible.builtin.template:
src: registries.yaml.j2
dest: "{{ k3s_config_dir }}/registries.yaml"
mode: '0644'
when: k3s_registries | length > 0
notify: restart k3s
- name: Download K3s installation script
ansible.builtin.get_url:
url: "{{ k3s_install_script_url }}"
dest: /tmp/k3s-install.sh
mode: '0755'
when: k3s_install_method == "script"
- name: Install K3s using script
ansible.builtin.shell: |
INSTALL_K3S_VERSION="{{ k3s_version if k3s_version not in ['stable', 'latest'] else '' }}" \
INSTALL_K3S_CHANNEL="{{ k3s_version if k3s_version in ['stable', 'latest'] else '' }}" \
INSTALL_K3S_SKIP_START=true \
/tmp/k3s-install.sh {{ 'server' if k3s_role == 'server' else 'agent' }}
when:
- k3s_install_method == "script"
- not k3s_binary.stat.exists or k3s_version != 'stable'
notify: restart k3s
Note the token generation: if you don’t provide a k3s_cluster_token in your group vars, the role auto-generates a 32-character random token and stores it as a fact. This fact then flows to worker nodes via hostvars — no separate coordination step needed.
tasks/server.yml — Start the Control Plane
After installation, the server task starts K3s, waits for the API to be reachable, and retrieves the cluster join token:
---
- name: Start K3s server service
ansible.builtin.systemd:
name: k3s
state: started
enabled: "{{ k3s_service_enabled }}"
daemon_reload: yes
- name: Wait for K3s server to be ready
ansible.builtin.wait_for:
host: "{{ k3s_node_ip }}"
port: "{{ k3s_https_listen_port }}"
timeout: 300
- name: Wait for K3s token to be available
ansible.builtin.wait_for:
path: /var/lib/rancher/k3s/server/node-token
timeout: 300
when: k3s_cluster_init
- name: Retrieve K3s token from master
ansible.builtin.slurp:
src: /var/lib/rancher/k3s/server/node-token
register: k3s_token_file
when: k3s_cluster_init
- name: Set K3s token fact
ansible.builtin.set_fact:
k3s_cluster_token: "{{ k3s_token_file.content | b64decode | trim }}"
when:
- k3s_cluster_init
- k3s_token_file is defined
- name: Wait for kubeconfig to be created
ansible.builtin.wait_for:
path: "{{ k3s_kubeconfig }}"
timeout: 300
- name: Set kubeconfig permissions
ansible.builtin.file:
path: "{{ k3s_kubeconfig }}"
mode: "{{ k3s_write_kubeconfig_mode }}"
- name: Create kubectl symlink
ansible.builtin.file:
src: /usr/local/bin/k3s
dest: /usr/local/bin/kubectl
state: link
The slurp + set_fact pattern is how the server’s join token gets shared with agents. After this play runs, any subsequent play targeting worker nodes can reference hostvars[groups['k3s_master'][0]]['k3s_cluster_token'].
tasks/agent.yml — Join Workers to the Cluster
Agent setup is simpler — it just needs the server URL and token, then starts the agent service:
---
- name: Verify server URL is provided
ansible.builtin.assert:
that:
- k3s_server_url != ""
fail_msg: "k3s_server_url must be provided for agent nodes"
- name: Verify cluster token is provided
ansible.builtin.assert:
that:
- k3s_cluster_token != ""
fail_msg: "k3s_cluster_token must be provided for agent nodes"
- name: Start K3s agent service
ansible.builtin.systemd:
name: k3s-agent
state: started
enabled: "{{ k3s_service_enabled }}"
daemon_reload: yes
- name: Wait for K3s agent to be ready
ansible.builtin.wait_for:
path: /var/lib/rancher/k3s/agent/kubelet.kubeconfig
timeout: 300
The assertions catch misconfigurations early — if you forget to set the server URL or token, you get a clear error message instead of a cryptic systemd failure.
tasks/post-install.yml — Verify Everything Works
---
- name: Verify K3s is running
ansible.builtin.systemd:
name: "{{ 'k3s' if k3s_role == 'server' else 'k3s-agent' }}"
state: started
- name: Get K3s version
ansible.builtin.command: k3s --version
register: k3s_version_output
changed_when: false
- name: Display K3s version
ansible.builtin.debug:
msg: "{{ k3s_version_output.stdout_lines[0] }}"
- name: Get node status (server only)
ansible.builtin.command: kubectl get nodes
register: k3s_nodes
changed_when: false
when: k3s_role == "server"
environment:
KUBECONFIG: "{{ k3s_kubeconfig }}"
- name: Display node status
ansible.builtin.debug:
msg: "{{ k3s_nodes.stdout_lines }}"
when: k3s_role == "server"
tasks/uninstall.yml — Clean Teardown
---
- name: Stop K3s services
ansible.builtin.systemd:
name: "{{ item }}"
state: stopped
enabled: no
loop: [k3s, k3s-agent]
failed_when: false
- name: Run K3s server uninstall script
ansible.builtin.command: /usr/local/bin/k3s-uninstall.sh
when: k3s_role == "server"
failed_when: false
- name: Run K3s agent uninstall script
ansible.builtin.command: /usr/local/bin/k3s-agent-uninstall.sh
when: k3s_role == "agent"
failed_when: false
- name: Remove K3s data directory
ansible.builtin.file:
path: "{{ k3s_data_dir }}"
state: absent
when: k3s_uninstall_remove_data
- name: Remove K3s config directory
ansible.builtin.file:
path: "{{ k3s_config_dir }}"
state: absent
when: k3s_uninstall_remove_data
- name: Remove kubectl symlink
ansible.builtin.file:
path: /usr/local/bin/kubectl
state: absent
handlers/main.yml
---
- name: restart k3s
ansible.builtin.systemd:
name: "{{ 'k3s' if k3s_role == 'server' else 'k3s-agent' }}"
state: restarted
daemon_reload: yes
listen: restart k3s
Step 7: Write the Config Template
Create roles/k3s/templates/config.yaml.j2. This is a Jinja2 template that generates the K3s config file. It handles server, agent, networking, node labels, security settings, and etcd backup — all driven by variables:
# K3s configuration - managed by Ansible
{% if k3s_role == "server" %}
{% if k3s_cluster_init %}
cluster-init: true
{% elif k3s_server_url is defined and k3s_server_url != "" %}
server: "{{ k3s_server_url }}"
{% endif %}
{% if k3s_cluster_token != "" %}
token: "{{ k3s_cluster_token }}"
{% endif %}
{% if k3s_tls_san | length > 0 %}
tls-san:
{% for san in k3s_tls_san %}
- "{{ san }}"
{% endfor %}
{% endif %}
{% if k3s_disable_components | length > 0 %}
disable:
{% for component in k3s_disable_components %}
- {{ component }}
{% endfor %}
{% endif %}
{% if k3s_https_listen_port != 6443 %}
https-listen-port: {{ k3s_https_listen_port }}
{% endif %}
{% if k3s_etcd_snapshot_enabled %}
etcd-snapshot-schedule-cron: "{{ k3s_etcd_snapshot_schedule_cron }}"
etcd-snapshot-retention: {{ k3s_etcd_snapshot_retention }}
etcd-snapshot-dir: "{{ k3s_etcd_snapshot_dir }}"
{% endif %}
{% if k3s_etcd_s3_enabled %}
etcd-s3: true
etcd-s3-endpoint: "{{ k3s_etcd_s3_endpoint }}"
etcd-s3-bucket: "{{ k3s_etcd_s3_bucket }}"
etcd-s3-region: "{{ k3s_etcd_s3_region }}"
etcd-s3-folder: "{{ k3s_etcd_s3_folder }}"
{% if k3s_etcd_s3_access_key != "" %}
etcd-s3-access-key: "{{ k3s_etcd_s3_access_key }}"
{% endif %}
{% if k3s_etcd_s3_secret_key != "" %}
etcd-s3-secret-key: "{{ k3s_etcd_s3_secret_key }}"
{% endif %}
{% endif %}
{% if k3s_cluster_cidr != "10.42.0.0/16" %}
cluster-cidr: "{{ k3s_cluster_cidr }}"
{% endif %}
{% if k3s_service_cidr != "10.43.0.0/16" %}
service-cidr: "{{ k3s_service_cidr }}"
{% endif %}
{% if k3s_secrets_encryption %}
secrets-encryption: true
{% endif %}
{% endif %}
{% if k3s_role == "agent" %}
server: "{{ k3s_server_url }}"
token: "{{ k3s_cluster_token }}"
{% endif %}
{% if k3s_flannel_backend != "vxlan" %}
flannel-backend: "{{ k3s_flannel_backend }}"
{% endif %}
{% if k3s_flannel_iface != "" %}
flannel-iface: "{{ k3s_flannel_iface }}"
{% endif %}
{% if k3s_node_labels | length > 0 %}
node-label:
{% for label in k3s_node_labels %}
- "{{ label }}"
{% endfor %}
{% endif %}
{% if k3s_node_taints | length > 0 %}
node-taint:
{% for taint in k3s_node_taints %}
- "{{ taint }}"
{% endfor %}
{% endif %}
{% if k3s_kubelet_args | length > 0 %}
kubelet-arg:
{% for arg in k3s_kubelet_args %}
- "{{ arg }}"
{% endfor %}
{% endif %}
{% if k3s_write_kubeconfig_mode != "0644" %}
write-kubeconfig-mode: "{{ k3s_write_kubeconfig_mode }}"
{% endif %}
{% if k3s_docker %}
docker: true
{% endif %}
Also create the two supporting templates:
roles/k3s/templates/k3s.env.j2:
# K3s environment variables - managed by Ansible
{% for key, value in k3s_env_vars.items() %}
{{ key }}="{{ value }}"
{% endfor %}
roles/k3s/templates/registries.yaml.j2:
# K3s registries - managed by Ansible
{% if k3s_registries.mirrors is defined %}
mirrors:
{% for registry, config in k3s_registries.mirrors.items() %}
"{{ registry }}":
endpoint:
{% for endpoint in config.endpoint %}
- "{{ endpoint }}"
{% endfor %}
{% endfor %}
{% endif %}
Step 8: Write the Orchestration Playbook
Create playbooks/setup-k3s-cluster.yml. This sequences the deployment in the correct order — server first, then workers one at a time, then verification:
---
- name: Install K3s on master node
hosts: k3s_master
become: true
roles:
- k3s
vars:
k3s_role: server
tags: [k3s, master]
- name: Gather facts from master for worker configuration
hosts: k3s_master
become: true
gather_facts: true
tasks:
- name: Master facts gathered
ansible.builtin.debug:
msg: "Master facts gathered for worker nodes"
tags: [k3s, workers]
- name: Install K3s on worker nodes
hosts: k3s_workers
become: true
serial: 1
roles:
- k3s
vars:
k3s_role: agent
k3s_server_url: "https://{{ hostvars[groups['k3s_master'][0]]['ansible_host'] }}:6443"
k3s_cluster_token: "{{ hostvars[groups['k3s_master'][0]]['k3s_cluster_token'] }}"
tags: [k3s, workers]
- name: Verify K3s cluster
hosts: k3s_master
become: true
tasks:
- name: Wait for all nodes to be ready
ansible.builtin.shell: |
kubectl get nodes --no-headers | grep -v " Ready" | wc -l
register: not_ready_nodes
until: not_ready_nodes.stdout == "0"
retries: 30
delay: 10
environment:
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
changed_when: false
- name: Display cluster status
ansible.builtin.command: kubectl get nodes -o wide
register: cluster_status
environment:
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
changed_when: false
- name: Show cluster nodes
ansible.builtin.debug:
msg: "{{ cluster_status.stdout_lines }}"
tags: [k3s, verify]
The serial: 1 on workers is important — it joins nodes one at a time. This prevents a thundering herd overwhelming the API server during bootstrap. It’s slower, but it’s reliable.
Also create playbooks/uninstall-k3s.yml for teardown:
---
- name: Uninstall K3s from worker nodes
hosts: k3s_workers
become: yes
serial: 1
roles:
- k3s
vars:
k3s_role: uninstall
k3s_uninstall_remove_data: true
- name: Uninstall K3s from master node
hosts: k3s_master
become: yes
roles:
- k3s
vars:
k3s_role: uninstall
k3s_uninstall_remove_data: true
Workers first, then master. Always.
Step 9: Deploy
You’re ready. Verify connectivity first:
ansible -i inventories/home all -m ping
You should see SUCCESS for every node. If not, fix your SSH access before continuing.
Then deploy:
ansible-playbook -i inventories/home playbooks/setup-k3s-cluster.yml
That’s it. Ansible will:
- Run preflight checks (OS, kernel version, WireGuard if needed)
- Generate a cluster token
- Template the K3s config file
- Download and install the K3s binary
- Start the server, wait for the API to be ready
- Retrieve the join token
- Join each worker node, one at a time
- Wait for all nodes to report
Ready - Print the cluster status
When it finishes, SSH into your master node and confirm:
sudo kubectl get nodes -o wide
Step 10: Optional Enhancements
Enable High Availability
If you have three or more server nodes, enable embedded etcd for HA. In your group vars:
k3s_embedded_etcd: true
k3s_ha_enabled: true
The first server gets k3s_cluster_init: true in the inventory. Additional servers get k3s_server_url pointing at the first server — they join as additional control plane members. Lose any one server and the cluster keeps running.
Back Up etcd to S3
Add to your group vars (use Ansible Vault for the credentials):
k3s_etcd_s3_enabled: true
k3s_etcd_s3_endpoint: "s3.amazonaws.com"
k3s_etcd_s3_bucket: "my-etcd-backups"
k3s_etcd_s3_region: "us-east-1"
k3s_etcd_s3_folder: "k3s-etcd-snapshots"
k3s_etcd_s3_access_key: "{{ vault_s3_access_key }}"
k3s_etcd_s3_secret_key: "{{ vault_s3_secret_key }}"
Add Firewall Rules
If your nodes are on the public internet, you need a firewall. Without one, your K3s API server (port 6443), etcd (2379/2380), and kubelet (10250) are exposed to the entire internet. The approach here is an iptables-based firewall role that’s K3s-aware — it knows which CIDRs to allow, which ports to lock down, and which IPs to trust.
Create the firewall role structure
mkdir -p roles/firewall/{defaults,handlers,tasks,templates}
roles/firewall/defaults/main.yml
---
manage_firewall: true
firewall_backend: iptables
# Default policies
firewall_default_input_policy: DROP
firewall_default_forward_policy: DROP
firewall_default_output_policy: ACCEPT
# Trusted IPs with full access
firewall_trusted_ips: []
# Public ports accessible from anywhere
firewall_public_ports: []
# Basic rules
firewall_allow_ping: true
firewall_allow_established: true
firewall_allow_loopback: true
# Logging
firewall_enable_logging: true
firewall_log_dropped: true
firewall_log_prefix: "[FIREWALL-DROP] "
# SSH rate limiting
firewall_ssh_rate_limit: true
firewall_ssh_rate_limit_connections: 10
firewall_ssh_rate_limit_seconds: 60
# K3s-specific settings
firewall_k3s_enabled: false
firewall_k3s_pod_cidr: "10.42.0.0/16"
firewall_k3s_service_cidr: "10.43.0.0/16"
The defaults lock everything down: DROP on input and forward, ACCEPT on output. Nothing gets in unless explicitly permitted.
roles/firewall/tasks/main.yml
The main task file dispatches based on the chosen backend:
---
- name: Include iptables tasks
ansible.builtin.include_tasks: iptables.yml
when:
- manage_firewall
- firewall_backend == "iptables"
tags: [firewall, iptables]
roles/firewall/tasks/iptables.yml
This installs iptables, deploys the rules script as a Jinja2 template, creates a systemd service to apply rules on boot, and runs them immediately:
---
- name: Install iptables (Debian/Ubuntu)
ansible.builtin.apt:
name: [iptables, iptables-persistent]
state: present
when: ansible_os_family == "Debian"
- name: Deploy iptables rules script
ansible.builtin.template:
src: iptables-rules.sh.j2
dest: /etc/iptables-rules.sh
mode: '0750'
owner: root
group: root
notify: apply iptables rules
- name: Create systemd service for iptables rules
ansible.builtin.copy:
dest: /etc/systemd/system/iptables-custom.service
mode: '0644'
content: |
[Unit]
Description=Custom iptables rules
After=network.target
[Service]
Type=oneshot
ExecStart=/etc/iptables-rules.sh
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
notify:
- reload systemd
- apply iptables rules
- name: Enable iptables-custom service
ansible.builtin.systemd:
name: iptables-custom
enabled: yes
daemon_reload: yes
- name: Apply iptables rules immediately
ansible.builtin.command: /etc/iptables-rules.sh
changed_when: true
The systemd service ensures rules survive reboots. The template gets re-deployed and re-applied whenever variables change.
roles/firewall/handlers/main.yml
---
- name: apply iptables rules
ansible.builtin.command: /etc/iptables-rules.sh
listen: apply iptables rules
- name: reload systemd
ansible.builtin.systemd:
daemon_reload: yes
listen: reload systemd
roles/firewall/templates/iptables-rules.sh.j2
This is the core of the firewall — a bash script generated from your Ansible variables. It sets up chains, allows K3s internal traffic, creates a trusted IP allowlist, opens public ports, blocks K8s-sensitive ports from the internet, and drops everything else:
#!/bin/bash
# Managed by Ansible - do not edit manually
# Flush custom rules (preserve K3s chains like KUBE-*, CNI-*, FLANNEL)
iptables -F INPUT
iptables -F OUTPUT
iptables -F FORWARD 2>/dev/null || true
# Remove custom chains only
for chain in $(iptables -L -n | grep "^Chain" \
| grep -v -E "INPUT|OUTPUT|FORWARD|KUBE-|CNI-|FLANNEL" \
| awk '{print $2}'); do
iptables -F "$chain" 2>/dev/null || true
iptables -X "$chain" 2>/dev/null || true
done
# Default policies
iptables -P INPUT {{ firewall_default_input_policy }}
iptables -P FORWARD {{ firewall_default_forward_policy }}
iptables -P OUTPUT {{ firewall_default_output_policy }}
{% if firewall_allow_loopback %}
# Loopback
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT
{% endif %}
{% if firewall_allow_established %}
# Established/related connections
iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
iptables -A OUTPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
iptables -A FORWARD -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
{% endif %}
{% if firewall_k3s_enabled %}
# ---- K3s traffic ----
# Pod-to-pod and pod-to-service forwarding
iptables -A FORWARD -s {{ firewall_k3s_pod_cidr }} -j ACCEPT
iptables -A FORWARD -d {{ firewall_k3s_pod_cidr }} -j ACCEPT
iptables -A INPUT -s {{ firewall_k3s_service_cidr }} -j ACCEPT
iptables -A INPUT -d {{ firewall_k3s_service_cidr }} -j ACCEPT
# CNI bridge traffic
iptables -A INPUT -i cni0 -j ACCEPT
iptables -A OUTPUT -o cni0 -j ACCEPT
iptables -A FORWARD -i cni0 -j ACCEPT
iptables -A FORWARD -o cni0 -j ACCEPT
# Flannel overlay traffic (flannel.1, flannel-wg, etc.)
iptables -A INPUT -i flannel+ -j ACCEPT
iptables -A OUTPUT -o flannel+ -j ACCEPT
iptables -A FORWARD -i flannel+ -j ACCEPT
iptables -A FORWARD -o flannel+ -j ACCEPT
{% endif %}
{% if firewall_allow_ping %}
# ICMP
iptables -A INPUT -p icmp --icmp-type echo-request -j ACCEPT
iptables -A OUTPUT -p icmp --icmp-type echo-reply -j ACCEPT
{% endif %}
# ---- Trusted IPs (SECURE_ACCESS chain) ----
iptables -N SECURE_ACCESS 2>/dev/null || iptables -F SECURE_ACCESS
iptables -A SECURE_ACCESS -j ACCEPT
{% if firewall_trusted_ips | length > 0 %}
{% for ip in firewall_trusted_ips %}
iptables -A INPUT -s {{ ip }} -j SECURE_ACCESS \
-m comment --comment "SECURE_ACCESS: {{ ip }}"
{% endfor %}
{% endif %}
{% if firewall_public_ports | length > 0 %}
# ---- Public ports ----
{% for port in firewall_public_ports %}
{% set port_num = port.split('/')[0] %}
{% set protocol = port.split('/')[1] if '/' in port else 'tcp' %}
iptables -A INPUT -p {{ protocol }} --dport {{ port_num }} -j ACCEPT \
-m comment --comment "Public: {{ port }}"
{% endfor %}
{% endif %}
{% if firewall_k3s_enabled %}
# ---- Block K8s ports from public (trusted IPs already matched above) ----
iptables -A INPUT -p tcp --dport 6443 -m conntrack --ctstate NEW -j DROP \
-m comment --comment "Block K8s API from public"
iptables -A INPUT -p tcp --dport 10250 -m conntrack --ctstate NEW -j DROP \
-m comment --comment "Block Kubelet from public"
iptables -A INPUT -p tcp --dport 2379 -m conntrack --ctstate NEW -j DROP \
-m comment --comment "Block etcd from public"
iptables -A INPUT -p tcp --dport 2380 -m conntrack --ctstate NEW -j DROP \
-m comment --comment "Block etcd peer from public"
{% endif %}
# Block SSH from public (trusted IPs already have access)
iptables -A INPUT -p tcp --dport 22 -m conntrack --ctstate NEW -j DROP \
-m comment --comment "Block SSH from public"
{% if firewall_log_dropped %}
# Log dropped packets
iptables -A INPUT -m limit --limit 5/min \
-j LOG --log-prefix "{{ firewall_log_prefix }}" --log-level 7
{% endif %}
# Persist rules
{% if ansible_os_family == "Debian" %}
netfilter-persistent save
{% elif ansible_os_family == "RedHat" %}
service iptables save
{% endif %}
echo "Firewall rules applied successfully"
The order of rules matters — this is how iptables works:
- Loopback and established connections pass immediately
- K3s internal traffic (pod/service CIDRs, CNI bridge, Flannel overlay) is allowed
- ICMP (ping) is allowed
- Trusted IPs get full access via the
SECURE_ACCESSchain — your cluster nodes, your home IP, management hosts - Public ports (80, 443, etc.) are open to everyone
- K8s-sensitive ports (6443, 10250, 2379, 2380) are explicitly dropped for non-trusted IPs
- SSH is dropped for non-trusted IPs
- Everything else is dropped by the default
INPUT DROPpolicy
Because trusted IPs match before the drop rules, cluster nodes can still reach the API server and etcd. But a random scanner on the internet hits the drops.
Configure the firewall in group vars
Add these to your inventories/home/group_vars/all.yml:
# Firewall
manage_firewall: true
firewall_backend: iptables
firewall_k3s_enabled: true
firewall_k3s_pod_cidr: "10.42.0.0/16"
firewall_k3s_service_cidr: "10.43.0.0/16"
firewall_trusted_ips:
- 198.51.100.1 # your home IP
- 203.0.113.10 # master node
- 203.0.113.20 # worker node 1
- 203.0.113.21 # worker node 2
# add every node's public IP here
firewall_public_ports:
- 80/tcp
- 443/tcp
firewall_ssh_rate_limit: true
firewall_ssh_rate_limit_connections: 10
firewall_ssh_rate_limit_seconds: 60
Every node’s public IP goes in every other node’s trusted list. It’s explicit, auditable, and means you can reason about exactly who can talk to whom.
Add the firewall to the orchestration playbook
Update playbooks/setup-k3s-cluster.yml to run the firewall role before K3s installation. Add this play at the top, before the K3s server play:
- name: Configure firewall on all nodes
hosts: k3s
become: true
roles:
- firewall
vars:
manage_firewall: true
firewall_k3s_enabled: true
tags: [firewall]
This ensures every node has firewall rules in place before K3s starts, so the API server is never exposed — even briefly — during deployment.
Add a Second Inventory
Want to manage a staging cluster alongside your homelab? Copy the inventory:
cp -r inventories/home inventories/staging
Edit inventories/staging/hosts.yml with your staging hosts and update the group vars. Then deploy with:
ansible-playbook -i inventories/staging playbooks/setup-k3s-cluster.yml
Same playbook. Different inventory. Different cluster.
Tearing It Down
If you need to start over:
ansible-playbook -i inventories/home playbooks/uninstall-k3s.yml
This stops services, runs K3s’s own uninstall scripts, and removes all data. You can rebuild from scratch in minutes.
What’s Next
Once the cluster is running, consider:
- cert-manager for automated TLS certificate management
- ArgoCD or Flux for GitOps-based workload deployment — Ansible gets you the cluster, GitOps manages what runs on it
- Ansible Vault for encrypting cluster tokens and S3 credentials in your group vars
- Monitoring with Prometheus and Grafana via Helm charts
The whole point of this approach is that your cluster is a Git repo. If a node dies, replace the hardware, update the inventory, and re-run the playbook. If the whole cluster is toast, rebuild it from scratch. Infrastructure as code means you never have to remember what you did — you just run it again.