homelab/AGENTS.md
Gonçalo Rodrigues 13b7149614 First Commit
2026-06-13 11:25:23 +01:00

161 lines
8.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# AGENTS.md
## Repo map
```
apps/<namespace>/services/<name>/ # one service per directory
├── main/ # Go service entrypoint (only if Go)
│ ├── main.go
│ └── handler.go
├── Dockerfile # build context = project root
├── Makefile # single include line (see below)
├── k8s/ # deployment.yaml, service.yaml, ingress.yaml
└── package.json # only for Astro frontend services
infrastructure/
├── k3d/k3d.sh # cluster create/delete
├── Makefile/service.mk # shared build/deploy targets
├── terraform/ # All infrastructure (MongoDB, monitoring, namespaces)
└── mongodb/deploy.sh # unused standalone script (Terraform-managed now)
pkg/ # shared Go packages (logger, setup, auth, mongo)
packages/ui/ # @homelab/ui Astro primitive library
```
## Commands
```sh
# Full dev cycle (requires running `make up` first)
make dev # k3d cluster → terraform infra → build+deploy all services
# Cluster lifecycle
make up # create k3d cluster
make down # delete k3d cluster
make infra # terraform apply + Traefik metrics + copy MongoDB secret
# Service lifecycle (run from any service dir)
make build-deploy # docker build → k3d import → kubectl apply
# Bulk operations
make deploy-all # build+load+deploy every discovered service
make restart-all # rollout restart all deployments
```
## Build conventions
- **Docker build context is project root**, not the service directory. The `Dockerfile` references paths relative to root.
- **Go services**: listen on `:8080` (set by `setup.Default`). K8s Service maps `80 → 8080`.
- **Astro services**: Node build → nginx serving `/dist` on port 80.
- **Image naming**: `homelab/<service-name>:latest` (inferred from directory name by `service.mk`).
- **`imagePullPolicy: IfNotPresent`** on all deployments — images loaded via `k3d image import`.
- Go base image: `golang:1.25-alpine` builder → `alpine:3.21` runtime.
- Node base image: `node:26-alpine` builder → `nginx:alpine`.
## Service Makefiles
Every per-service Makefile is a single include:
```makefile
# Go service:
PROJECT_ROOT := ../../../../
include ../../../../infrastructure/Makefile/service.mk
# Astro:
PROJECT_ROOT = $(abspath ../../../..)
SERVICE_DIR = .
include ../../../../infrastructure/Makefile/service.mk
```
`SERVICE_NAME` and `NAMESPACE` are auto-inferred (`NAMESPACE` from `apps/<name>/...` path; `SERVICE_NAME` from directory name). Infers Go vs Node by presence of `package.json`.
## Observability
### Traces (OpenTelemetry OTLP gRPC)
- `OTEL_EXPORTER_OTLP_ENDPOINT=jaeger.monitoring.svc:4317` set on gateway, users, example-service deployments
- `pkg/trace` provides OTLP gRPC trace exporter + HTTP middleware (creates spans per request)
- Jaeger all-in-one deployed in `monitoring` namespace, ingress at `jaeger.homelab.local`
- Every service uses `trace.Middleware(metrics.Middleware(mux))` via `setup.Run`
### Metrics (Prometheus)
- `pkg/metrics` exposes: `http_requests_total{method,path,status}`, `http_request_duration_seconds{method,path}`, `http_requests_in_flight`
- `/metrics` endpoint added automatically by `setup.Run` via `promhttp.Handler()`
- Go runtime metrics from default Prometheus registry
- ServiceMonitors (with `release: kps` label required by Prometheus operator):
- `gateway` (auth) — scrapes `:http/metrics`
- `users` (auth) — scrapes `:http/metrics`
- `example-service` (test) — scrapes `:http/metrics`
- `traefik` (monitoring) — scrapes `:9100/metrics`
- Prometheus operator selects ServiceMonitors via `serviceMonitorSelector.matchLabels.release: kps`
### Traefik Metrics
- HelmChartConfig in `kube-system` enables prometheus metrics on port 9100
- Traefik service patched to expose `metrics` port 9100
- ServiceMonitor in `monitoring` namespace scrapes it
## Auth system
- **Traefik ForwardAuth**: `auth-forward-auth` Middleware in `auth` namespace. Any Ingress can use it via annotation `traefik.ingress.kubernetes.io/router.middlewares: auth-forward-auth@kubernetescrd`.
- The `/verify` endpoint on gateway returns a **302 redirect to login** (not 401) so unauthenticated browser users get redirected seamlessly.
- Cookie is set with `Domain: homelab.local` so it works on all subdomains.
- Gateway calls the users service internal via `USERS_SERVICE=http://users` (port 80).
- Users service auto-seeds admin on first startup from `ADMIN_EMAIL`/`ADMIN_PASSWORD` env vars.
## Frontend
- npm workspaces at root. Shared primitives in `packages/ui/` (`@homelab/ui`), consumed via Vite alias (not workspace exports) to avoid `.astro` resolution issues across packages.
- Tailwind v4: `@source "../"` in shared CSS so JIT scans `packages/ui/` for class usage.
- App-specific components go in `apps/<app>/services/ui/src/components/`, not in `packages/ui/`.
## Infra (Terraform at `infrastructure/terraform/`)
### Architecture (local-exec for all native K8s resources)
The Terraform Kubernetes provider (`hashicorp/kubernetes` v2.32.0 and v2.38.0) **hangs on all write operations** (Create) against k3d v1.33.6's API server. The helm provider works fine. Therefore:
- **Namespaces**: `terraform_data` + `local-exec` with `kubectl create namespace --dry-run=client -o yaml | kubectl apply -f -`
- **MongoDB Secret, Service, StatefulSet**: `terraform_data` + `local-exec` with inline YAML piped to `kubectl apply -f -`
- **Helm releases** (kube-prometheus-stack, Jaeger, Loki, Fluent Bit): `helm_release` resource — works fine
- **random_password**: used for MongoDB root password and Grafana admin password
- **Provider auth**: explicit client certificate/key/CA from k3d kubeconfig (decoded from `client-certificate-data`/`client-key-data`/`cluster-ca-data`), `0.0.0.0``127.0.0.1` in server URL. `config_path` causes provider crash. `insecure=true` conflicts with `cluster_ca_certificate`.
### Terraform state contents
- 5× `terraform_data` (auth, home, test, monitoring, mongodb namespaces + mongodb_secret, mongodb_service, mongodb_statefulset via local-exec)
- 2× `random_password` (mongodb, grafana)
- 4× `helm_release` (kube-prometheus-stack, jaeger, loki, fluent-bit)
### Monitoring stack
- kube-prometheus-stack (Prometheus + Grafana), Jaeger v2, Loki, Fluent Bit — all via `helm_release` into `monitoring` namespace.
- Prometheus operator selects ServiceMonitors via `serviceMonitorSelector.matchLabels.release: kps`.
- Grafana: `admin` / password in `kps-grafana` K8s Secret, ingress at `grafana.homelab.local`.
- Jaeger: OTLP gRPC `jaeger.monitoring.svc:4317`, OTLP HTTP `:4318`, UI at `jaeger.homelab.local`.
- Traefik metrics: Pre-enabled in k3d, but the `metrics` port must be added to the Traefik service manually (`kubectl patch svc -n kube-system traefik`). A `ServiceMonitor` in `monitoring` namespace scrapes it.
### MongoDB
- StatefulSet `mongo:8` deployed by `terraform_data` (local-exec kubectl apply)
- Secret `mongodb` in `mongodb` namespace with `MONGO_INITDB_ROOT_PASSWORD`, `MONGO_URI`, `MONGO_DB`
- MongoDB secret is copied to `auth`, `finance`, `test` namespaces (as both `mongodb` and `mongo` names) via `infrastructure/copy-mongo-secret.sh` (run by `make infra`)
- Deployments reference it as `mongo` via `envFrom.secretRef`
### Traefik metrics
- HelmChartConfig in `infrastructure/traefik-metrics.sh` (applied by `make infra`)
- Requires `kubectl patch svc traefik -n kube-system` to add metrics port
## DNS
All subdomains must resolve to `127.0.0.1`. Currently configured in `/etc/hosts`. Run via `sudo`:
```sh
sudo sed -i '' '/homelab.local/d' /etc/hosts && \
echo '127.0.0.1 homelab.local auth.homelab.local grafana.homelab.local jaeger.homelab.local finance.homelab.local' | \
sudo tee -a /etc/hosts
```
## Known issues
### kube-prometheus-stack upgrade hangs
Any change to the `kube_prometheus_stack` helm_release (even `create_namespace: false → true`) triggers an upgrade that hangs for 2+ minutes due to CRD processing. Workaround: avoid changing it, or set `create_namespace = true` and leave it unchanged. If stuck in `pending-upgrade`, rollback via `helm rollback kps <revision> -n monitoring`.
## Local dev
- `k3d` cluster must be running (`k3d cluster list` to check).
- No lint/typecheck/test commands exist yet.
- No CI, no pre-commit hooks.