Skip to main content

Agent Workers

Agent Workers are a Cloud plans and Self-Hosted Enterprise Feature.

Agent Workers are a 4th mode of execution of the Windmill binary, but instead of using MODE=worker, we use here MODE=agent. It is most useful when workers are deployed remotely and they are in an unsecure environments and/or have a high latency to the Postgresql database. Note that agent workers can even be running without docker on Linux, Windows and Apple targets using the cross-compiled binaries of Windmill attached to each release.

The differences is that the worker:

  1. will NOT emit intempestive warnings logs about high latency to the database
  2. will NOT start the embedded server api like normal workers do (the embedded server api is a performance optimization)
  3. (optional) will NOT create an ephemeral token per job and will instead use a static token that is pre-defined and that must only have access to the workspaces and folders this remote workers is meant to have access to

In this settings, workers will still require a connection to be passed with DATABASE_URL as this is how they will:

  • fetch jobs from the queue,
  • emit audit logs in the database
  • insert the finished job in the completed_job table.

All other operations are done by the intermerdiary of the api servers that the remote workers will connect to using http and the BASE_URL defined in the instance settings (overridable by passing BASE_INTERNAL_URL=<https://private_instance_url:8000> to the worker`).

(Optional) Restrict the role used by the worker to the strict minimum

Not having access to the other tables allow to restrict the postgresql role of the database_url used by the worker to the strict minimum, mitigating any risks for the rest of the system even in the event of a compromise of the remote worker.

One such role that can be defined could be as follows:

CREATE USER agent with PASSWORD 'changeme';
GRANT SELECT, UPDATE, DELETE ON queue to agent;
GRANT SELECT, INSERT, UPDATE ON completed_job to agent;
GRANT SELECT on config to agent;
GRANT SELECT on global_settings to agent;
GRANT SELECT, INSERT, UPDATE on worker_ping to agent;
GRANT ALL on concurrency_counter to agent;
GRANT SELECT, INSERT on pip_resolution_cache to agent;

CREATE POLICY agent ON queue TO agent USING (tag = '<x>' OR tag = '<y>');
CREATE POLICY agent ON completed_job TO agent USING (true);
CREATE POLICY agent
ON audit
FOR INSERT
TO agent
WITH CHECK (true);

Note that <x> and <y> need to be replaced with the exact job tags this agent is meant to be using.

Option 1. Using a static job token to avoid having to write complex RLS policies

  1. Create a user with all the possible permissions ever needed by jobs on the remote worker:

    1. Instance settings -> Global users -> Add user to instance
    2. Login with what user, create a token
  2. Run the binary or docker on the target with:

DATABASE_URL=postgres://agent:changeme@localhost/windmill JOB_TOKEN=<STATIC_TOKEN> BASE_INTERNAL_URL=http://private_instance_url:8000 MODE=agent ./windmill

Option 2: Define an RLS policy to the token table

The token table is the primary mechanism by which it is possible to do privilege escalation. Which is why Option 1. is recommended. But if writing RLS policy is your cup of tea, then you can avoid having to create a job token and do as follows:

Note that the token table has the following schema:

token | label | expiration | workspace_id | owner | email | super_admin | created_at | last_used_at | scopes
GRANT ALL on token to agent;
CREATE POLICY agent ON token TO agent USING <your custom defined policy>

You will most definitely want to restrict the ability to set super_admin to True and limit the owner to be permissioned with only a few restricted possible owners.