25 Commits

Author SHA1 Message Date
Gonçalo Rodrigues
bd174be094 fix(cicd): switch act runner to Docker mode with node:20 image
Host mode lacks Node.js so actions/checkout@v4 fails. Switch label to
ubuntu-latest:docker:node:20 which has Node.js for JS actions. Install
Docker CLI in the deploy job since node:20 doesn't include it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-26 23:43:44 +01:00
Gonçalo Rodrigues
f5f2251e24 fix(k8s): move ServiceMonitor manifests to k8s/monitoring/ subdirectory
The k8s/*.yaml glob in each skaffold.yaml picks up servicemonitor.yaml
and fails when monitoring is disabled (CRD not installed). Moving them
to k8s/monitoring/ keeps the config but excludes them from the default
deploy. Apply manually when enable_monitoring=true.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-26 22:43:04 +01:00
Gonçalo Rodrigues
ba3fa6e46d fix(infra): switch MongoDB to 7 LTS (jemalloc, ARM64 stable)
MongoDB 8.x (both 8.0 and 8.2) uses tcmalloc-google which segfaults
(exit 139) on Hetzner ARM64 kernels with transparent hugepages disabled.
MongoDB 7 LTS uses jemalloc and runs cleanly on the same hardware.
PVC was already wiped so there is no FCV incompatibility.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-26 22:28:33 +01:00
Gonçalo Rodrigues
8d824b3e19 fix(infra): pin MongoDB to 8.0 LTS to avoid ARM64 segfault
mongo:8 resolves to 8.2 which uses tcmalloc-google. That allocator
segfaults (exit 139) when transparent hugepages are disabled, which is
the default on Hetzner kernels. MongoDB 8.0 LTS uses jemalloc and does
not have this issue.

PVC must be deleted before applying since FCV 8.2 data files can't be
opened by 8.0. Finance API seeds admin on startup so no data is lost.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-26 22:25:25 +01:00
Gonçalo Rodrigues
81e804206d fix(infra): revert to mongo:8, keep cache-size arg removed
mongo:7 can't open data files written by mongo:8 (exit code 62 =
NeedsDowngrade). Stay on mongo:8 — the SIGSEGV was caused by the
--wiredTigerCacheSizeGB=0.25 flag, not the version. Removing the flag
is the actual fix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-26 22:21:11 +01:00
Gonçalo Rodrigues
de48ba2206 fix(infra): switch MongoDB to v7 to fix ARM64 segfault
mongo:8 was crashing with exit code 139 (SIGSEGV) on the Hetzner CAX11
ARM64 instance. Switch to mongo:7 (LTS) which has more stable ARM64
support. Also remove the --wiredTigerCacheSizeGB=0.25 arg since the
512Mi memory limit already bounds memory use adequately.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-26 22:17:48 +01:00
Gonçalo Rodrigues
92fc9843c2 fix(gitea): use Recreate strategy to prevent LevelDB lock conflict
SQLite and LevelDB can't be accessed by two pods simultaneously.
RollingUpdate starts a new pod before the old one stops, causing
the queue lock to fail on startup. Recreate terminates the old
pod first.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-26 22:08:24 +01:00
Gonçalo Rodrigues
6dd7592ac9 fix(gitea): add TLS, scheme helper, and Skaffold registry config (#41)
Changes from PR #40 that didn't make it into main:
- local.scheme derived from var.domain (http for homelab.local, https otherwise)
- Gitea ROOT_URL and runner bootstrap URLs use local.scheme
- Gitea Helm ingress gets TLS + letsencrypt certresolver annotations
- Skaffold CI profile sets defaultRepo=git.gugagr.xyz/admin

Co-authored-by: Gonçalo Rodrigues <guga@Goncalos-MacBook-Pro.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-26 22:06:06 +01:00
Gonçalo Rodrigues
d4ccff518e feat: switch to gugagr.xyz with TLS via Let's Encrypt (#39)
Adds Traefik Helm release (kube-system) with ACME HTTP-01 challenge
configured for Let's Encrypt, replacing the k3s-disabled bundled Traefik.

Migrates all hostnames from *.homelab.local to *.gugagr.xyz and upgrades
all ingresses to HTTPS with certresolver=letsencrypt annotations.

Adds var.domain (default homelab.local) to Terraform so the domain is
a single config point for monitoring and Gitea ingresses.

Gateway reads DOMAIN env var at runtime — falls back to homelab.local
so local k3d dev continues to work without changes.

Co-authored-by: Gonçalo Rodrigues <guga@Goncalos-MacBook-Pro.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-26 21:45:19 +01:00
Gonçalo Rodrigues
8436295bbc feat(infra): gate observability stack behind var.enable_monitoring (#38)
Adds enable_monitoring variable (default true) that controls whether
Prometheus/Grafana, Loki, Fluent Bit, and Jaeger are deployed.
Setting it to false saves ~1.5 GB RAM, making the stack viable on
a 2–4 GB VPS without touching the application services.

Also caps MongoDB WiredTiger cache at 256 MB (--wiredTigerCacheSizeGB=0.25)
so it doesn't balloon on memory-constrained hosts.

Co-authored-by: Gonçalo Rodrigues <guga@Goncalos-MacBook-Pro.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-26 17:44:14 +01:00
Gonçalo Rodrigues
464bde2ee6 chore: update Makefiles for Skaffold-based workflow
Root Makefile:
- Replace deploy-*/deploy-all/restart-all with skaffold dev/run
- Add dev-<service> targets for per-service watch mode
- Rename dev → skaffold dev (was: up+infra+deploy-all)
- Rename to bootstrap for the full first-time setup
- Add test-integration target

service.mk:
- Remove REGISTRY variable (image is now homelab/<svc>, no registry prefix)
- Remove skaffold-gen (skaffold.yaml files are committed)
- Update skaffold-dev/run to pass -p local
- Keep build-deploy as a manual fallback

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-20 16:34:08 +01:00
Gonçalo Rodrigues
a7ba0a9dd6 refactor(infra): gate Gitea and act-runner behind var.enable_gitea
All Gitea and runner resources use count = var.enable_gitea ? 1 : 0
(or for_each with an empty set when false). The gitea namespace is
conditionally included. Default is false.

To enable: terraform apply -var enable_gitea=true

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-20 16:14:57 +01:00
Gonçalo Rodrigues
c3b7003725 chore(infra): disable Gitea and act-runner — postponed until dedicated server
Empties gitea.tf and act-runner.tf so terraform apply removes all Gitea
and runner resources. Drops the gitea namespace from the managed list.
Full config preserved in git history.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-20 16:06:32 +01:00
Gonçalo Rodrigues
f5c08d6f02 fix: add git.homelab.local registry prefix and imagePullSecrets to all app deployments
auth/gateway, auth/users, and test/example-service were referencing
images without a registry prefix, causing k8s to fall back to Docker Hub
(which doesn't have these images).

Also generalises the gitea-registry imagePullSecret to all app namespaces
(auth, finance, home, test) via a for_each in Terraform.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-20 16:01:55 +01:00
Gonçalo Rodrigues
e39840cca2 fix(infra): use GET not POST for Gitea runner registration token API
The endpoint GET /api/v1/admin/runners/registration-token returns the
token — POST returns 405. Bootstrapper was silently failing, leaving
the secret empty and the act-runner unable to register.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-20 15:49:26 +01:00
Gonçalo Rodrigues
07c2dc3ecb feat(infra): auto-generate Gitea admin password and runner token
- Replace var.gitea_admin_password with random_password (like Grafana)
- Replace var.gitea_runner_token with terraform_data bootstrapper that
  calls the Gitea admin API after first deploy and patches the secret
- Empty variables.tf — no manual secrets needed on terraform apply

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-20 15:43:10 +01:00
Gonçalo Rodrigues
dee8b5b40a fix(infra): simplify Gitea to SQLite + in-process — drop PostgreSQL and Valkey
Removes 6 pods (3x postgresql-ha, 1x pgpool, 2x valkey-cluster) in favour
of SQLite (database) and leveldb queue, memory cache/session. Appropriate
for a single-user homelab instance with no HA requirements.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-20 15:34:28 +01:00
Gonçalo Rodrigues
3c981b6ba4 fix(infra): bump Gitea chart 10.x → 12.x to fix ImagePullBackOff
Chart 10.x pinned bitnami/redis-cluster:7.2.3-debian-11 and
bitnami/postgresql-repmgr:16.1.0-debian-11 — both removed from
Docker Hub by Bitnami. Chart 12.x replaces Redis with Valkey and
uses bitnamilegacy/ images that are still available.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-20 15:29:37 +01:00
Gonçalo Rodrigues
079ffae90b fix(infra): remove double-dollar escape in Fluent Bit label_keys
In Terraform quoted strings $var is literal — only ${var} triggers
interpolation. The $$ was passing through as literal $$kube_* to
Fluent Bit, causing a record accessor syntax error on startup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-20 15:23:46 +01:00
Gonçalo Rodrigues
99ed992d98 obs: request access log middleware + Loki label enrichment (#36)
Adds two targeted observability improvements across all homelab services.

pkg/logger/access.go (new)
  HTTP access log middleware that logs one structured line per request:
    method, path, status, ms, trace_id
  The trace_id comes from the OTel span already in context (created by
  trace.Middleware which runs outside this one), so each log entry in
  Loki has a clickable link into Jaeger. Health/metrics endpoints are
  excluded to avoid noise. Level is ERROR for 5xx, WARN for 4xx, INFO
  otherwise.

pkg/setup/setup.go
  Wire the new middleware between trace.Middleware (which creates the
  span) and metrics.Middleware:
    trace → AccessMiddleware → metrics → mux
  Order matters: span must exist before AccessMiddleware reads it.

infrastructure/terraform/monitoring.tf
  Fluent Bit was shipping all container logs to Loki with a single
  static label (job=fluent-bit), making it impossible to filter logs
  by service. Added a `nest/lift` filter that flattens the kubernetes
  metadata block to top-level fields (kube_namespace_name,
  kube_container_name, …), then promoted those as Loki label_keys.
  After this change you can query:
    {kube_namespace_name="finance"} |= "trace_id"
  and LogQL will only return finance-api logs.

Co-authored-by: Gonçalo Rodrigues <guga@Goncalos-MacBook-Pro.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-20 15:15:06 +01:00
Gonçalo Rodrigues
91796c9fb9 test(finance): expand unit test coverage from ~55% to 64.7% (#34)
* infra(terraform): manage finance session secret via random_password

Replace the hand-rolled variable (with insecure hardcoded default) with a
random_password resource so Terraform auto-generates a 48-char secret and
owns the finance-api-secrets k8s Secret lifecycle.

To rotate: terraform taint random_password.finance_session_secret && terraform apply

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(finance): active sessions panel + account deletion with full data purge

Sessions panel (/account):
- AuthSession now stores IPAddress and Device (browser + OS hint)
  populated from X-Forwarded-For / User-Agent on every login
- Lists all active sessions with device icon, IP, sign-in time
- Current session badge ("This device") — cannot be self-revoked
- DELETE /sessions/:id revokes any other session (user-scoped)

Account deletion (POST /account/delete):
- Password accounts require password confirmation
- OAuth accounts require typing email address to confirm
- deleteAllUserData purges all 12 finance collections + user record
  in a single call: accounts, categories, transactions, trades,
  ticker_mappings, goals, import_schedules, properties, loans,
  permissions, households, sessions → then the user itself
- Clears session cookie and redirects to login with success message

Infrastructure:
- findAuthUserByID added to store + storeIface
- getSessionsByUserID, deleteSessionForUser added to store + storeIface
- contains() added to template FuncMap
- accountTmpl registered; GET /account, POST /account/delete,
  DELETE /sessions/:id routes wired
- 🔐 nav icon links to /account page
- Full EN + PT i18n coverage for all new strings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(finance): expand unit test coverage from ~55% to 64.7%

- Add handler_coverage_test.go (~3300 lines) covering auth flows,
  org request lifecycle, CSV bank import, property/loan views,
  fiscal year operations, session management, and cross-handler
  consistency (values shown on one page match actions on others)
- Add handler_org_test.go (~1800 lines) covering the full org
  handler surface: teams, members, invites, events, budget lines,
  tx requests (all status transitions), ledger, analysis, and reports
- Extend handler_test.go mockStore with: properties/loans slice fields,
  authUsers map with session-aware lookup, household field, org maps,
  and updateFiscalYearStatusErr for error-path testing
- Fix nav bar: Business and Account links now show active state and
  use i18n keys (removes hardcoded emoji); add account key to en/pt locales

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Gonçalo Rodrigues <guga@Goncalos-MacBook-Pro.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-20 15:07:29 +01:00
Gonçalo Rodrigues
05dd725579 feat(infra): Gitea self-hosted CI/CD + MongoDB PVC + registry pipeline (#28)
* fix(k8s): expose / without auth so homepage is publicly reachable

Adds a second Ingress (api-public) for the exact path / with no
forward-auth middleware. Traefik prefers the Exact match for the root,
while the Prefix ingress (with auth) still protects all other routes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: homepage renders correctly at / for unauthenticated visitors

Two fixes:
1. Added parseStandalone() helper — parseTmpl() roots on "" but ParseFS()
   stores standalone (no {{define}}) files under their base filename, so
   Execute() ran the empty root and returned Content-Length: 0.
2. Added router.priority: 100 annotation to api-public ingress so Traefik
   picks the Exact / rule over the Prefix / rule (Traefik ranks by rule
   string length by default, which made PathPrefix beat Path).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(k8s): remove forward-auth middleware from finance ingress

The app now handles its own auth at /auth/login — Traefik no longer
needs to forward-auth requests, which was causing redirects to
auth.homelab.local instead of finance.homelab.local.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(auth): harden authentication for cloud deployment

1. Secure cookie flag — set when BASE_URL starts with https://
2. SameSite=Strict on session cookie (was Lax)
3. Rate limiter — per-IP, 10 failures → 15-min lockout, auto-cleanup goroutine
4. Session rotation on login — old session deleted before issuing new one
   (prevents session fixation attacks)
5. bcrypt cost 12 (was DefaultCost/10, OWASP minimum for cloud)
6. Security headers middleware on all responses:
   X-Content-Type-Options, X-Frame-Options, Referrer-Policy,
   Permissions-Policy, Content-Security-Policy, HSTS (when HTTPS)
7. Structured audit logging — login success/failure/lockout with IP + email
8. Google OAuth state cookie gets Secure flag too

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(infra): Gitea self-hosted CI/CD + MongoDB PVC + registry pipeline

- Add Gitea Helm deployment (git hosting, container registry, Gitea Actions)
- Add act runner with DinD sidecar for Docker builds in-cluster
- Add RBAC so act runner can kubectl-deploy to finance namespace
- Fix MongoDB StatefulSet: add volumeClaimTemplates (data was lost on restart)
- Configure k3d containerd to mirror git.homelab.local → Gitea NodePort 30002
- Add .gitea/workflows/finance-api.yml: test → build/push → rolling deploy
- Update finance-api deployment: Gitea registry image, imagePullPolicy Always
- Extract finance-api secrets (SESSION_SECRET, Google OAuth) into Terraform
- Add variables.tf for Gitea admin password and runner token

All changes testable on local k3d before the VPS exists.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Gonçalo Rodrigues <guga@Goncalos-MacBook-Pro.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 21:45:34 +01:00
Gonçalo Rodrigues
85930ef40f ci: switch to self-hosted runner with local k3d image import
Removes all ghcr.io and registry dependencies. Workflows now build
images locally, import them into k3d, and deploy with kubectl set image
— all on the self-hosted runner which already has Docker and kubectl.

Also removes the github Terraform provider and ci.tf since no registry
pull secrets or GitHub Actions secrets are needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-13 14:33:12 +01:00
Gonçalo Rodrigues
e018e627e3 infra: manage CI secrets and ghcr.io pull credentials via Terraform
Adds github provider + ci.tf which provisions:
- KUBECONFIG GitHub Actions secret (from local kubeconfig)
- ghcr-credentials k8s pull secret in finance and auth namespaces

Run `terraform apply -var github_token=<PAT>` once after cluster setup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-13 14:21:15 +01:00
Gonçalo Rodrigues
13b7149614 First Commit 2026-06-13 11:25:23 +01:00