Skip to main content

System monitoring

The System Monitoring page in the Liberator UI surfaces the most operationally relevant signals from the Liberator stack in a single place: cluster health, queue depth, dataset access patterns, long-running queries, and license usage. It’s intended for super-admins, on-call engineers, and capacity planners. End users do not see this page.
The System Monitoring page is backed by the same Prometheus instance you can connect Grafana to. See the Grafana integration guide if you want the same data alongside metrics from systems outside CloudQuant.

Opening system monitoring

  1. Sign in to the Liberator UI as a super-admin.
  2. Click System Monitoring in the top navigation.
If the menu item doesn’t appear, your account doesn’t have super-admin privileges. Contact your CloudQuant administrator.

Tabs

System Monitoring is organized into five tabs. The first four are backed by Prometheus and share a global time-range selector (6h / 24h) in the upper-right. The fifth is backed by the entitlements database and uses its own dedicated 1d / 1w / 1m / 1y selector.

Cluster

Real-time cluster health from the Liberator gateway, application pods, and host nodes. What you’ll see:
  • Gateway request rate and latency percentiles (p50 / p95 / p99)
  • Per-pod CPU and memory utilization for Liberator components
  • Per-node CPU, memory, and filesystem utilization
  • Data-cache worker pool status
Watch volume / disk usage on mounted filesystems. Sustained usage at or above roughly 85% on a volume warrants immediate attention to avoid query failures from insufficient write space.
Use this tab to answer: “Is the cluster behaving normally right now, and if not, where is the problem?”

Queue

The Liberator waiting room queue — how many requests are queued, how long they’ve been waiting, and which users own them. What you’ll see:
  • Active connections (in flight) and queued connections (waiting)
  • Per-user breakdown of queue occupancy
  • Maximum in-flight query duration (a useful early-warning signal)
The CQAIOps service account often appears as a high-volume user; that reflects automated platform monitoring and is expected. Use this tab to answer: “Is anyone being blocked, and by whom?”

Datasets

Dataset access patterns over the selected time range. What you’ll see:
  • Top datasets by query count
  • Top datasets by bytes returned
  • Distribution of access by client (Python, REST, Excel, etc.)
Use this tab for capacity planning and detecting anomalous access patterns (e.g. a previously dormant dataset suddenly receiving heavy traffic).

Long queries

The slowest individual queries in the selected window. What you’ll see:
  • A ranked list of queries with execution time, user, dataset, and from / to window
  • Click-through to see the full query text and result-set size
Use this tab to find candidates for query rewriting, dataset re-partitioning, or user education.

Usage

License utilization from the entitlements database. Distinct from the other tabs in two ways:
  1. Different selector. This tab exposes 1d / 1w / 1m / 1y windows instead of the Prometheus 6h / 24h, because license utilization is measured against per-contract caps that operate on much longer windows.
  2. Different backing store. Numbers come from the entitlements database, not Prometheus, so they survive Prometheus retention rollovers and reflect contract truth.
What you’ll see:
  • Active vs. licensed seat count, by license tier
  • Per-dataset utilization vs. contract caps
  • Trend lines that make it easy to spot accounts approaching their limits
  • Most queried datasets — use this to prioritize cache pre-generation
The 1d / 1w selector labels on this tab may still reflect monthly aggregation in the backing entitlements store in some releases. Treat long-window utilization as directional until label semantics match the aggregation period in your environment.

Grafana integration

The action in the upper-right of every tab opens the Grafana Integration dialog. Super-admins can use it to:
  • Issue Bearer tokens for external Prometheus consumers (the full token is shown exactly once at creation, so copy it immediately).
  • List existing tokens with their issue time and issuing user.
  • Revoke tokens that are no longer needed or may have leaked.
See the full setup walkthrough in the Grafana integration guide.

How the data flows

┌────────────────────────────────────────────────────────────────────┐
│  Liberator UI                                                      │
│  ┌────────────────────────┐    ┌──────────────────────────────┐    │
│  │  Cluster / Queue /     │    │  Usage tab                   │    │
│  │  Datasets / Long Q     │    │                              │    │
│  └──┬─────────────────────┘    └──┬───────────────────────────┘    │
└─────┼────────────────────────────┼────────────────────────────────┘
      │ /metrics-api/* (OIDC cookie)│ /admin-api/entitlements/*
      ▼                             ▼
┌──────────────┐              ┌──────────────────┐
│  Prometheus  │              │  Entitlements DB │
│  (read-only) │              │  (PostgreSQL)    │
└──────────────┘              └──────────────────┘

      │ /metrics-api-bearer/* (Bearer token)

┌──────────────┐
│  External    │
│  Grafana,    │
│  Federation  │
└──────────────┘
Both Prometheus-fronted endpoints (/metrics-api/* for the in-product UI and /metrics-api-bearer/* for external consumers) expose the same read-only subset of the Prometheus HTTP API. The OIDC-fronted route is what the in-product tabs use; the Bearer-fronted route is what Grafana and federated Prometheus servers use.