> ## Documentation Index
> Fetch the complete documentation index at: https://knowledge.cloudquant.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Cache Pre-Generation

> Schedule dataset cache creation ahead of time so users query warm Parquet caches instead of cold source systems

# Cache pre-generation

By default, Liberator builds its cache the first time a query runs (cache warming). For large datasets or slow source databases, that first query can take a long time. **Pre-generated cache** lets Super Admins schedule cache creation ahead of time so data is ready when users query it.

Cache files are written in **Parquet** format (replacing the older Arrow cache format) and can be stored in **Amazon S3** or **Google Cloud Storage**, so cached data does not need to live on the Liberator server. Parquet is typically much more compact than the previous Arrow caches, which reduces storage cost at scale.

<Note>
  Super Admin role is required to configure cache storage connections and pre-generation settings.
</Note>

## Step 1 — Create a cache storage connection

Before enabling pre-generated cache on a dataset, create a cache storage destination.

<Steps>
  <Step title="Add a connection">
    Go to **Connections → Add Connection**.
  </Step>

  <Step title="Select cache storage">
    Select **Cache Storage** as the connection type.
  </Step>

  <Step title="Configure storage">
    Select your storage provider (**S3** is supported today; confirm GCS availability with your CloudQuant account team if needed). Enter the bucket name and credentials.
  </Step>

  <Step title="Test and save">
    Click **Test Connection**, then **Save**.
  </Step>
</Steps>

## Step 2 — Enable pre-generated cache on a dataset

<Steps>
  <Step title="Open the dataset">
    Open the dataset and click **Edit**.
  </Step>

  <Step title="Advanced options">
    Scroll to **Advanced Options** and toggle on **Pre-Generated Cache**.
  </Step>

  <Step title="Format and destination">
    Select **Parquet** as the cache format (recommended). Under **Destination**, select the cache storage connection from Step 1.
  </Step>

  <Step title="Test destination">
    Click **Test** next to the destination to confirm connectivity, then save.
  </Step>
</Steps>

## Step 3 — Configure the cache window

| Option               | Use when                                                                                 |
| -------------------- | ---------------------------------------------------------------------------------------- |
| **Rolling window**   | Users mostly query recent data; keeps the last N days cached and advances automatically. |
| **Fixed date range** | You need a known historical slice that does not change.                                  |
| **Full dataset**     | You want complete coverage and the dataset is bounded in size.                           |

Enter the number of days (rolling) or the start/end dates (fixed), then save.

<Warning>
  Long or open-ended retention periods trigger a **storage impact warning** in the UI. Review projected volume before saving — large windows in Parquet can accumulate significant object storage over time.
</Warning>

## Step 4 — Run or schedule generation

**Scheduled:** Once a schedule is configured on the dataset, cache generation runs automatically. No further action is required.

**Manual:** Open the dataset and click **Trigger Cache Pre-Generation**. A status indicator shows whether the job is in progress or complete.

## What users experience

After pre-generated cache is populated, queries against that dataset are much faster because Liberator reads the warm cache instead of the source system. This matters most for large SQL-backed datasets where the first cold query previously took minutes.

## Tips

* Use **rolling window** for datasets queried over recent windows (for example, the last 30 days of market data).
* Use **fixed date range** for historical snapshots that do not change.
* Use **full dataset** only when storage budget and dataset size are understood.
* The **Most queried datasets** view on [System Monitoring](/system-monitoring/overview) helps prioritize which datasets to pre-generate.