Skip to main content

Persistent Storage & Databases

Persistent storage refers to any method of storing data that remains intact and accessible even after a system is powered off, restarted, or experiences a crash.

In the context of Windmill, the stakes are: where to effectively store and manage the data manipulated by Windmill (ETL, data ingestion and preprocessing, data migration and sync etc.) ?

TLDR

When it comes to storing data manipulated by Windmill, it is recommended to only store Windmill-specific elements (resources, variables etc.). To store data, it is recommended to use external storage service providers that can be accessed from Windmill.


This present document gives a list of trusted services to use alongside Windmill.


There are 4 kinds of persistent storage in Windmill:

  1. Small data that is relevant in between script/flow execution and can be persisted on Windmill itself.

  2. Object storage for large data such as S3.

  3. Big structured SQL data that is critical to your services and that is stored externally on an SQL Database or Data Warehouse.

  4. NoSQL and document database such as MongoDB and Key-Value stores.

You already have your own database

If you already have your own database provided by a supported integration, you can easily connect it to Windmill.

If your service provider is already part of our list of integrations, just add your database as a resource.

If your service provider is not already integrated with Windmill, you can create a new resource type to establish the connection (and if you want, share the schema on our Hub).

Windmill is not designed to store heavy data that extends beyond the execution of a script or flow. Indeed, for each computation the worker executing is not the same as the previous computation, so the data would have to be retrieved from another location.

Instead, Windmill is very convenient to use alongside data storage providers to manipulate big amounts of data.

There are however internal methods to persist data between executions of jobs.

All details at:

Object Storage for Large Data: S3, R2, MinIO, Azure Blob

On heavier data objects & unstructured data storage, Amazon S3 (Simple Storage Service) and its alternatives Cloudflare R2 and MinIO as well as Azure Blob Storage storage are highly scalable and durable object storage service that provides secure, reliable, and cost-effective storage for a wide range of data types and use cases.

Windmill comes with a native integration with S3 and Azure Blob, making it the recommended storage for large objects like files and binary data.

S3 Integration Infographic

All details at:

Structured SQL Data: Postgres (Supabase, Neon.tech)

For Postgres databases (best for structured data storage and retrieval, where you can define schema and relationships between entities), we recommend using Supabase or Neon.tech.

All details at:

NoSQL and Document Databases (Mongodb, Key-Value Stores)

Key-value stores are a popular choice for managing non-structured data, providing a flexible and scalable solution for various data types and use cases. In the context of Windmill, you can use MongoDB Atlas, Redis, and Upstash to store and manipulate non-structured data effectively.

All details at: