REST API Reference
Overview
Section titled “Overview”This page documents the Iceberg gateway, lineage, and webhook endpoints in detail — the commit data path, which is the surface most integrations write against. These are the parts whose request/response contracts and fail-closed semantics operators most need to get exactly right.
The full API is larger than this page. Neksur Core also exposes REST resources for contracts and the lifecycle, the semantic layer and metrics, detection, compliance, and FinOps, plus a GraphQL surface, an MCP server for AI agents, and the SQL-proxy /
pgwire/ XMLA read path. See Architecture → API surface for the map, andinternal/api/openapi.ymlin the source repo for the generated REST schema. The endpoints below are the stable, heavily-exercised core of that surface.
The gateway endpoints comprise four POST routes in three groups:
- Catalog Gateway (write path) — single-table and multi-table commit endpoints that proxy Apache Iceberg REST commits through the Neksur policy engine.
- Lineage ingestion — an OpenLineage v2 RunEvent receiver.
- Polaris webhook — an HMAC-signed callback that triggers the post-commit detection sweep.
All four routes are mounted on the same http.ServeMux in cmd/neksur-server/main.go. The first three (/v1/iceberg/* and /v1/lineage) are wrapped in workosauth.TenantMiddleware, which resolves the calling tenant and attaches the tenant UUID to the request context before any handler logic runs. The fourth (/v1/webhooks/polaris) is intentionally mounted outside the tenant middleware — the HMAC signature verification IS the authentication step (the per-tenant secret IS the principal), and tenant resolution happens against the signed payload only after verification succeeds.
All gateway routes are tenant-scoped via the URL {prefix} segment, which is a Neksur-side catalog alias (e.g. prod-polaris), not the upstream catalog’s own name. See Tenant prefix model below.
Related documents: Architecture describes the 14-step commit pipeline; Deployment covers operational alerting on the 503 fail-closed path.
Authentication
Section titled “Authentication”The L1 gateway and lineage endpoints derive a request principal from a three-step chain defined in internal/gateway/iceberg/principal.go (ExtractPrincipal). The chain is evaluated in priority order; the first step that yields a usable subject wins. The selected step is also recorded verbatim in the audit log’s principal_source column so SecOps can spot which path is in use per request.
Step 1 — mTLS client certificate SAN (preferred, service-to-service)
Section titled “Step 1 — mTLS client certificate SAN (preferred, service-to-service)”If the request arrived over TLS with a client certificate (r.TLS.PeerCertificates[0]), the handler extracts the first URI SAN from the certificate. If no URI SAN is present, it falls back to the first DNS SAN. The trust anchor is the Phase 0.5 Private CA; the upstream terminator (e.g. ALB with mTLS termination) is responsible for chain validation before forwarding.
The audit principal_source value for this path is mtls_san.
Step 2 — Authorization header (Bearer JWT, for human or SDK clients)
Section titled “Step 2 — Authorization header (Bearer JWT, for human or SDK clients)”If no mTLS principal was extracted, the handler reads the Authorization header. It expects the form:
Authorization: Bearer <jwt>The JWT is parsed without signature verification in Phase 1 — Neksur trusts that an upstream proxy or identity provider has already verified the token. The handler extracts:
sub— required; an emptysubcauses the request to fail closed with401 Unauthorized.email— optional; surfaced to CEL policies asprincipal.email.- Roles — pulled from
roles(as a string array or as a space-delimited string) or fromgroups(WorkOS convention). Surfaced to CEL policies asprincipal.roles.
The audit principal_source value for this path is auth_header.
Signed-token (JWKS) signature verification at this boundary is planned; today the gateway trusts an upstream proxy / IdP to have verified the token. Credential vending (compute isolation) is available as a Defense-in-Depth capability — see Concepts: Enforcement model.
Step 3 — Session cookie fallback (browser-initiated)
Section titled “Step 3 — Session cookie fallback (browser-initiated)”If neither mTLS nor Authorization produced a principal, the handler falls back to the tenant UUID attached by workosauth.TenantMiddleware (which itself resolved the tenant from the WorkOS session cookie set during /callback). The principal sub is set to the tenant UUID string.
This path is lower fidelity — it has per-tenant granularity but no per-user attribution. The audit principal_source value is session. Production deployments should configure mTLS or upstream JWT for proper user attribution.
When all three steps fail
Section titled “When all three steps fail”If none of the three steps produces a principal, the handler returns 401 Unauthorized. This also happens if the chosen principal has an empty sub (defence-in-depth against a malformed JWT or an empty SAN slipping into the policy evaluator).
Endpoint reference
Section titled “Endpoint reference”POST /v1/iceberg/{prefix}/namespaces/{namespace}/tables/{table}
Section titled “POST /v1/iceberg/{prefix}/namespaces/{namespace}/tables/{table}”Single-table Iceberg commit. This is the central L1 gateway endpoint — every Spark / Trino / Flink write to an Iceberg table flows through this route. The handler runs the full 14-step pipeline (validate → load → policy fetch → policy evaluate → forward → audit → ingest); see Architecture for the step list.
Path parameters:
| Name | Description |
|---|---|
prefix | Neksur-side catalog alias for this tenant (e.g. prod-polaris). Resolves via catalog_credentials to a concrete upstream catalog. |
namespace | Iceberg namespace (e.g. sales). Single segment in the URL path. |
table | Iceberg table name (e.g. orders). |
All three identifiers must match ^[a-zA-Z0-9_-]+$. Anything else returns 400.
Request body: A standard Iceberg REST UpdateTableRequest JSON object — requirements[] + updates[]. The body is hard-capped at 16 MiB (http.MaxBytesReader); larger payloads return 413 from the wrapper.
Success response: 200 OK with the upstream catalog’s commit response body (a LoadTableResult) echoed verbatim.
HTTP/1.1 200 OKContent-Type: application/json
{ "metadata-location": "s3://warehouse/.../00042-abc123.metadata.json", "metadata": { /* updated table metadata */ }}Error responses:
| Status | Sentinel / cause |
|---|---|
400 | Malformed path identifier; body parse failure; identifier validation failed |
401 | ErrPrincipalMissing (auth chain produced no principal); upstream catalog credentials expired |
403 | ErrPolicyDenied — a P1/P2/P3 policy returned Deny; audit row written; commit_rejected_total{reason="policy_denied"} incremented |
404 | Catalog credentials not configured for this {prefix}; upstream table not found |
405 | Method other than POST |
409 | Upstream commit conflict (Iceberg requirements mismatched — client should rebase and retry) |
413 | Body exceeds 16 MiB |
500 | Tenant context missing (mounted outside TenantMiddleware — config bug); catalog adapter config unmarshal error |
502 | Upstream catalog returned a non-conflict, non-credentials-expired error |
503 | ErrPolicyEngineUnavailable — D-1.09 fail-closed; commit_rejected_total{reason="policy_engine_unavailable"} incremented |
Example:
curl -X POST \ -H "Authorization: Bearer $JWT" \ -H "Content-Type: application/json" \ --data @commit.json \ https://api.neksur.example.com/v1/iceberg/prod-polaris/namespaces/sales/tables/ordersPOST /v1/iceberg/{prefix}/transactions/commit
Section titled “POST /v1/iceberg/{prefix}/transactions/commit”Multi-table Iceberg commit, with Reject-All semantics. All listed tables share a single policy gate: if any one table’s commit would be denied, the entire transaction is rejected and no upstream forwards occur. This prevents the “policy bypass via batched commit” attack pattern where a denied write hides inside a permitted batch.
Path parameters:
| Name | Description |
|---|---|
prefix | Same as the single-table endpoint — Neksur-side catalog alias. |
Request body:
{ "table-changes": [ { "identifier": { "namespace": ["sales"], "name": "orders" }, "requirements": [ /* TableRequirement[] */ ], "updates": [ /* TableUpdate[] */ ] }, { "identifier": { "namespace": ["sales"], "name": "items" }, "requirements": [ /* ... */ ], "updates": [ /* ... */ ] } ]}Each requirements[] and updates[] follows the standard Iceberg REST shape; see the Iceberg REST Catalog Spec. The whole body is capped at 16 MiB. Both namespace[] entries and name per change must match ^[a-zA-Z0-9_-]+$.
Success response: 200 OK with an array of upstream CommitResult objects, one per table, in input order:
HTTP/1.1 200 OKContent-Type: application/json
{ "results": [ { "new-snapshot-id": 1234567, "new-metadata-location": "s3://..." }, { "new-snapshot-id": 1234568, "new-metadata-location": "s3://..." } ]}Error semantics specific to this endpoint:
- Policy gate runs first, for all tables. Any single
Denycauses the offending table to be audit-logged asREJECTED, and the response is403. No upstream forwards happen. NoWriteEventis emitted for the non-offending tables. - Policy-engine-unavailable for any table during the gate phase aborts the transaction with
503(fail-closed). - Upstream forwarding is sequential and independent. Phase 1 does NOT wrap the upstream commits in a cross-catalog transaction (iceberg-go does not yet support multi-table 2PC). If table N’s commit succeeds and table N+1 fails, the prior commits stay. The handler returns
502(or409on conflict) with the offending table name in the message.
Status codes: Same table as the single-table endpoint, plus the message body identifies which table caused a 403 / 409 / 502 (e.g. "multi-table reject-all: policy denied on orders: <reason>").
Example:
curl -X POST \ -H "Authorization: Bearer $JWT" \ -H "Content-Type: application/json" \ --data @transaction.json \ https://api.neksur.example.com/v1/iceberg/prod-polaris/transactions/commitPOST /v1/lineage
Section titled “POST /v1/lineage”OpenLineage v2 RunEvent receiver. Spark, Flink, Airflow, and dbt’s OpenLineage integrations push run events here; the handler persists the raw event to a durable lineage_inbox table FIRST (at-least-once durability per Pitfall 5) and then translates each (input, output) dataset pair into a LINEAGE_OF edge in the property graph.
This endpoint is not bound to {prefix} — lineage is a tenant-scoped concept but spans across catalogs. The tenant is resolved by workosauth.TenantMiddleware.
Request body: A subset of the OpenLineage v2 RunEvent schema — see openlineage.io. The handler types and validates these fields:
| Field | Required | Description |
|---|---|---|
eventType | Yes | One of START, RUNNING, COMPLETE, ABORT, FAIL. |
eventTime | No | RFC 3339 timestamp. Defaults to “now” if absent or unparseable. |
producer | Yes | The emitting tool, e.g. spark/3.5.0. Forms half of the dedup UNIQUE key. |
run.runId | Yes | UUID-shaped per OpenLineage. Forms the other half of the dedup UNIQUE key. |
job.namespace, job.name | No | The logical pipeline identifier. |
inputs[], outputs[] | No | Dataset{namespace, name} pairs. May be empty (run-state-only events). |
schemaURL | No | OpenLineage spec URL. |
The full untyped JSON body is preserved in lineage_inbox.payload so Phase 2 / Phase 3 enrichment can read facets we don’t currently type.
Body cap: 1 MiB (http.MaxBytesReader). Larger payloads should be split across multiple run events.
Success response: 202 Accepted (empty body). The 202 reflects the at-least-once semantics — the event has been durably persisted; downstream MERGE may complete asynchronously.
Error responses:
| Status | Cause |
|---|---|
400 | Body parse failure; missing required field (eventType / run.runId / producer); body exceeds 1 MiB |
405 | Method other than POST |
422 | A LINEAGE_OF edge would create a cycle (ingest.LineageCycleError) |
500 | Tenant context missing |
503 | lineage_inbox persist failure; downstream MERGE failure |
Dedup behavior: Re-POSTing the same (producer, run_id) is a no-op — the UNIQUE (producer, run_id) constraint on lineage_inbox swallows duplicates (ON CONFLICT DO NOTHING). Spark’s OpenLineage HTTP transport retries on transient failures land here safely.
Example:
curl -X POST \ -H "Authorization: Bearer $JWT" \ -H "Content-Type: application/json" \ --data '{ "eventType": "COMPLETE", "eventTime": "2026-05-15T10:00:00Z", "producer": "spark/3.5.0", "run": {"runId": "01900a3b-c4d5-7e6f-89ab-cdef01234567"}, "job": {"namespace": "etl", "name": "load_orders"}, "inputs": [{"namespace": "iceberg", "name": "raw.events"}], "outputs": [{"namespace": "iceberg", "name": "sales.orders"}] }' \ https://api.neksur.example.com/v1/lineagePOST /v1/webhooks/polaris
Section titled “POST /v1/webhooks/polaris”HMAC-signed callback from an upstream Polaris catalog. Used as a trigger source for the L3 detection sweep — when Polaris notifies Neksur of a new snapshot, the handler verifies the signature, decodes the payload, and pushes a Hit{Source: "polaris-webhook"} onto the dispatch channel for the detection pool to consume.
This endpoint is mounted outside workosauth.TenantMiddleware. The HMAC signature verification IS the authentication step.
Required headers:
| Header | Required | Description |
|---|---|---|
X-Polaris-Signature | Yes | Hex-encoded HMAC-SHA256 of the raw request body, computed with the per-tenant webhook secret. |
X-Polaris-Tenant | No | Tenant UUID. Preferred when present; otherwise the handler reads tenant_id from the JSON body. |
Request body:
{ "tenant_id": "01900a3b-c4d5-7e6f-89ab-cdef01234567", "metadata_location": "s3://warehouse/.../00042-abc123.metadata.json", "namespace": "sales", "table_name": "orders"}metadata_location is required (the L3 dedup key); the other fields are advisory. Body is hard-capped at 256 KiB.
Verification flow:
- If
NEKSUR_POLARIS_WEBHOOK_ENABLED=0, the handler returns410 Goneimmediately. - Read the body (256 KiB cap).
- Require
X-Polaris-Signatureheader to be present. - Parse the body to extract
tenant_id(falling back toX-Polaris-Tenantif the body field is empty). - Validate the tenant ID is a properly formatted UUID (strict character class — used as a SQL identifier suffix).
- Look up the per-tenant webhook secret from
catalog_credentials.config_json[webhook_secret]forcatalog_kind = 'polaris'. - Compute HMAC-SHA256 of the body and constant-time compare against the header (
hmac.Equal). - On match, push the
Hit{}onto the dispatch channel and return200 OK.
Success response: 200 OK (empty body).
Error responses:
| Status | Cause |
|---|---|
400 | Body read / parse failure; missing X-Polaris-Signature header; missing tenant ID (neither header nor body); missing metadata_location |
401 | No webhook secret configured for the named tenant; HMAC signature mismatch |
405 | Method other than POST |
410 | Handler disabled via NEKSUR_POLARIS_WEBHOOK_ENABLED=0 |
503 | Admin pool unreachable; catalog_credentials lookup failed; request context cancelled |
Example (with helper signing):
BODY='{"tenant_id":"...", "metadata_location":"s3://..."}'SIG=$(printf '%s' "$BODY" | openssl dgst -sha256 -hmac "$WEBHOOK_SECRET" -hex | awk '{print $2}')
curl -X POST \ -H "X-Polaris-Signature: $SIG" \ -H "Content-Type: application/json" \ --data "$BODY" \ https://api.neksur.example.com/v1/webhooks/polarisError codes & meanings
Section titled “Error codes & meanings”The L1 gateway maps internal sentinel errors (internal/gateway/iceberg/errors.go) to HTTP status codes deterministically. Operators monitoring the gateway should treat the two policy-related codes — 503 and 403 — very differently.
403 Forbidden — ErrPolicyDenied
Section titled “403 Forbidden — ErrPolicyDenied”A P1 (schema), P2 (write-ACL), or P3 (retention) policy returned Deny during evaluation. This is the expected, business-as-usual rejection path — a customer’s policy correctly blocked a non-conforming commit. The gateway:
- Increments
commit_rejected_total{reason="policy_denied"}. - Writes a
WriteEventaudit row withdecision = 'REJECTED'and the policy’s denial reason. - Returns
403with the reason in the response body (e.g.policy denied: schema field 'pii_email' added without P3 retention tag).
Alert on this via dashboard trend, not via page. A sudden spike may indicate an upstream client misconfiguration; a sustained zero may indicate a policy is missing.
503 Service Unavailable — ErrPolicyEngineUnavailable
Section titled “503 Service Unavailable — ErrPolicyEngineUnavailable”The policy engine could not produce a verdict. Possible causes:
- The policy fetch from the AGE graph store failed (
LoadPoliciesForTablereturned error). - A CEL compile error in the stored policy text.
- A CEL eval error (e.g. type mismatch against the
Inputsmap). - A non-bool return from a policy expression.
- A panic during evaluation.
This is fail-closed by design (D-1.09). The gateway refuses to forward the commit — it would rather reject a legitimate write than silently let a non-evaluated commit through. The gateway:
- Increments
commit_rejected_total{reason="policy_engine_unavailable"}. - Logs the underlying error (operator-visible via
slog). - Returns
503with bodypolicy-engine-unavailable. - Does NOT emit an audit row (no policy verdict was produced).
This is page-severity. If the rate of 503s from this counter exceeds threshold, the on-call should investigate immediately — either the engine itself is unhealthy or a recently-edited policy is malformed. See Deployment for the AlertManager rule.
413 Payload Too Large
Section titled “413 Payload Too Large”Returned by http.MaxBytesReader when the body exceeds the per-endpoint cap:
- 16 MiB for the two gateway endpoints (
/v1/iceberg/...). - 1 MiB for
/v1/lineage. - 256 KiB for
/v1/webhooks/polaris.
Caps are constants in source and are not configurable in Phase 1.
400 Bad Request
Section titled “400 Bad Request”Returned for any malformed input that the handler can determine without consulting external state:
- Identifier validation failure (
{prefix},{namespace},{table}not matching^[a-zA-Z0-9_-]+$). - JSON parse failure on the commit body or lineage payload.
- Missing required fields (
eventType/run.runId/produceron lineage;metadata_location/ signature header on webhook). - Empty
table-changesarray on a multi-table commit.
401 Unauthorized
Section titled “401 Unauthorized”Returned when:
- The principal extraction chain produced no principal (no mTLS, no
Authorization, no tenant context — see Authentication). - The extracted principal has an empty
sub. - The upstream catalog’s stored credentials have expired (
iceberg.ErrCredentialsExpired). - On the Polaris webhook: no secret configured for the named tenant, or HMAC mismatch.
Other codes (summary)
Section titled “Other codes (summary)”| Code | When |
|---|---|
404 | Catalog credentials not found for {prefix}; upstream table not found |
405 | Method other than POST on any of these routes |
409 | Upstream Iceberg commit conflict — requirements mismatch, client must rebase |
410 | Polaris webhook handler disabled |
422 | OpenLineage event would create a lineage cycle |
500 | Tenant context missing (route misconfiguration); catalog adapter config error |
502 | Upstream catalog returned an unexpected error (non-conflict, non-credentials path) |
Tenant prefix model
Section titled “Tenant prefix model”The {prefix} path segment on the gateway routes is a Neksur-side alias, not the upstream catalog’s own name. The mapping lives in the per-tenant catalog_credentials table:
{prefix} (URL) → | Lookup → | (catalog_kind, endpoint, config_json) |
|---|
When a request arrives at POST /v1/iceberg/prod-polaris/namespaces/sales/tables/orders, the handler:
- Validates
prod-polarisagainst^[a-zA-Z0-9_-]+$. - Calls
CredStore.GetCatalogCredentials(ctx, "prod-polaris")— this is RLS-scoped to the tenant resolved byTenantMiddleware, so two tenants can register the same prefix string independently with no collision. - Dispatches on
creds.Kindto the matching adapter (polaris/nessie/glue/unity) viaBuildAdapter. The Phase 1 default kind is Polaris; Nessie ships in Phase 1; Glue and Unity are stub adapters that returnErrAdapterStubon state-mutating calls until Phase 3.
Multi-catalog-per-tenant is supported. A single tenant can register multiple {prefix} aliases — e.g. prod-polaris, stage-polaris, analytics-nessie — each mapping to a different upstream catalog. The CEL policy graph attaches policies to logical table identities (namespace + name), not to prefix aliases, so the same policy can govern commits arriving via different prefixes.
The upstream Polaris catalog’s own name (its warehouse config value) is stored inside config_json and is never visible in the URL path. This decouples the wire URL from upstream catalog renames and lets Neksur transparently swap upstreams without changing the URL contract for client integrations.
Rate limits & quotas
Section titled “Rate limits & quotas”No rate limits are enforced at the Neksur application layer in Phase 1. Deployment-level limits should be applied at the ingress (ALB, API Gateway, or equivalent) — see Deployment for the recommended ingress configuration.
The only request-shape bounds enforced inside Neksur are the per-endpoint body caps documented above (16 MiB / 1 MiB / 256 KiB).
Beyond the gateway endpoints
Section titled “Beyond the gateway endpoints”The four endpoints above are documented here in full because they carry the commit data path. The rest of the surface is available in Core but documented elsewhere as it stabilizes:
| Surface | Where it lives | Status |
|---|---|---|
SQL proxy (read-path enforcement) + pgwire + XMLA | internal/sqlproxy, internal/pgwire, internal/xmla | Core — see Architecture |
| GraphQL (graph / contract queries) | internal/api/graphql | Core |
MCP server (graph.traverse for AI agents) | internal/api/mcp | Core — Access-enforced, clause-whitelisted |
| Contracts / lifecycle, semantic / metrics, detection, compliance, FinOps REST resources | internal/api/rest, internal/api/openapi.yml | Core |
| First-party SDK clients (Go, Python, TypeScript) | — | on the distribution track |
When integrating, treat the gateway/lineage/webhook contracts on this page as stable; for the other surfaces, generate against internal/api/openapi.yml (REST) or the GraphQL schema, which are the source of truth.
See Getting Started for the supported integration path today (deploy neksur-server, point Spark / Trino at the gateway and read-path proxy, register policies via the admin CLI).