Databricks

Databricks is a unified data analytics platform built on Apache Spark, providing a collaborative environment for data engineering, data science, and machine learning. Databricks combines data warehousing, ETL, and AI workloads into a single lakehouse architecture, with support for SQL analytics, job orchestration, and cluster management across major cloud providers.

With the Databricks integration in AACFlow, you can:

Execute SQL queries: Run SQL statements against Databricks SQL warehouses with support for parameterized queries and Unity Catalog
Manage jobs: List, trigger, and monitor Databricks job runs programmatically
Track run status: Get detailed run information including timing, state, and output results
Control clusters: List and inspect cluster configurations, states, and resource details
Retrieve run outputs: Access notebook results, error messages, and logs from completed job runs

In AACFlow, the Databricks integration enables your agents to interact with your data lakehouse as part of automated workflows. Agents can query large-scale datasets, orchestrate ETL pipelines by triggering jobs, monitor job execution, and retrieve results—all without leaving the workflow canvas. This is ideal for automated reporting, data pipeline management, scheduled analytics, and building AI-driven data workflows that react to query results or job outcomes.

Nutzungsanleitung

Connect to Databricks to execute SQL queries against SQL warehouses, trigger and monitor job runs, manage clusters, and retrieve run outputs. Requires a Personal Access Token and workspace host URL.

Tools

`databricks_execute_sql`

Execute a SQL statement against a Databricks SQL warehouse and return results inline. Supports parameterized queries and Unity Catalog.

Eingabe

Parameter	Typ	Erforderlich	Beschreibung
`host`	string	Ja	Databricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
`apiKey`	string	Ja	Databricks Personal Access Token
`warehouseId`	string	Ja	The ID of the SQL warehouse to execute against
`statement`	string	Ja	The SQL statement to execute (max 16 MiB)
`catalog`	string	Nein	Unity Catalog name (equivalent to USE CATALOG)
`schema`	string	Nein	Schema name (equivalent to USE SCHEMA)
`rowLimit`	number	Nein	Maximum number of rows to return
`waitTimeout`	string	Nein	How long to wait for results (e.g., "50s"). Range: "0s" or "5s" to "50s". Default: "50s"

Ausgabe

Parameter	Typ	Beschreibung
`statementId`	string	Unique identifier for the executed statement
`status`	string	Execution status (SUCCEEDED, PENDING, RUNNING, FAILED, CANCELED, CLOSED)
`columns`	array	Column schema of the result set
↳ `name`	string	Column name
↳ `position`	number	Column position (0-based)
↳ `typeName`	string	Column type (STRING, INT, LONG, DOUBLE, BOOLEAN, TIMESTAMP, DATE, DECIMAL, etc.)
`data`	array	Result rows as a 2D array of strings where each inner array is a row of column values
`totalRows`	number	Total number of rows in the result
`truncated`	boolean	Whether the result set was truncated due to row_limit or byte_limit

`databricks_list_jobs`

List all jobs in a Databricks workspace with optional filtering by name.

Eingabe

Parameter	Typ	Erforderlich	Beschreibung
`host`	string	Ja	Databricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
`apiKey`	string	Ja	Databricks Personal Access Token
`limit`	number	Nein	Maximum number of jobs to return (range 1-100, default 20)
`offset`	number	Nein	Offset for pagination
`name`	string	Nein	Filter jobs by exact name (case-insensitive)
`expandTasks`	boolean	Nein	Include task and cluster details in the response (max 100 elements)

Ausgabe

Parameter	Typ	Beschreibung
`jobs`	array	List of jobs in the workspace
↳ `jobId`	number	Unique job identifier
↳ `name`	string	Job name
↳ `createdTime`	number	Job creation timestamp (epoch ms)
↳ `creatorUserName`	string	Email of the job creator
↳ `maxConcurrentRuns`	number	Maximum number of concurrent runs
↳ `format`	string	Job format (SINGLE_TASK or MULTI_TASK)
`hasMore`	boolean	Whether more jobs are available for pagination
`nextPageToken`	string	Token for fetching the next page of results

`databricks_run_job`

Trigger an existing Databricks job to run immediately with optional job-level or notebook parameters.

Eingabe

Parameter	Typ	Erforderlich	Beschreibung
`host`	string	Ja	Databricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
`apiKey`	string	Ja	Databricks Personal Access Token
`jobId`	number	Ja	The ID of the job to trigger
`jobParameters`	string	Nein	Job-level parameter overrides as a JSON object (e.g., {"key": "value"})
`notebookParams`	string	Nein	Notebook task parameters as a JSON object (e.g., {"param1": "value1"})
`idempotencyToken`	string	Nein	Idempotency token to prevent duplicate runs (max 64 characters)

Ausgabe

Parameter	Typ	Beschreibung
`runId`	number	The globally unique ID of the triggered run
`numberInJob`	number	The sequence number of this run among all runs of the job

`databricks_get_run`

Get the status, timing, and details of a Databricks job run by its run ID.

Eingabe

Parameter	Typ	Erforderlich	Beschreibung
`host`	string	Ja	Databricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
`apiKey`	string	Ja	Databricks Personal Access Token
`runId`	number	Ja	The canonical identifier of the run
`includeHistory`	boolean	Nein	Include repair history in the response
`includeResolvedValues`	boolean	Nein	Include resolved parameter values in the response

Ausgabe

Parameter	Typ	Beschreibung
`runId`	number	The run ID
`jobId`	number	The job ID this run belongs to
`runName`	string	Name of the run
`runType`	string	Type of run (JOB_RUN, WORKFLOW_RUN, SUBMIT_RUN)
`attemptNumber`	number	Retry attempt number (0 for initial attempt)
`state`	object	Run state information
↳ `lifeCycleState`	string	Lifecycle state (QUEUED, PENDING, RUNNING, TERMINATING, TERMINATED, SKIPPED, INTERNAL_ERROR, BLOCKED, WAITING_FOR_RETRY)
↳ `resultState`	string	Result state (SUCCESS, FAILED, TIMEDOUT, CANCELED, SUCCESS_WITH_FAILURES, UPSTREAM_FAILED, UPSTREAM_CANCELED, EXCLUDED)
↳ `stateMessage`	string	Descriptive message for the current state
↳ `userCancelledOrTimedout`	boolean	Whether the run was cancelled by user or timed out
`startTime`	number	Run start timestamp (epoch ms)
`endTime`	number	Run end timestamp (epoch ms, 0 if still running)
`setupDuration`	number	Cluster setup duration (ms)
`executionDuration`	number	Execution duration (ms)
`cleanupDuration`	number	Cleanup duration (ms)
`queueDuration`	number	Time spent in queue before execution (ms)
`runPageUrl`	string	URL to the run detail page in Databricks UI
`creatorUserName`	string	Email of the user who triggered the run

`databricks_list_runs`

List job runs in a Databricks workspace with optional filtering by job, status, and time range.

Eingabe

Parameter	Typ	Erforderlich	Beschreibung
`host`	string	Ja	Databricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
`apiKey`	string	Ja	Databricks Personal Access Token
`jobId`	number	Nein	Filter runs by job ID. Omit to list runs across all jobs
`activeOnly`	boolean	Nein	Only include active runs (PENDING, RUNNING, or TERMINATING)
`completedOnly`	boolean	Nein	Only include completed runs
`limit`	number	Nein	Maximum number of runs to return (range 1-24, default 20)
`offset`	number	Nein	Offset for pagination
`runType`	string	Nein	Filter by run type (JOB_RUN, WORKFLOW_RUN, SUBMIT_RUN)
`startTimeFrom`	number	Nein	Filter runs started at or after this timestamp (epoch ms)
`startTimeTo`	number	Nein	Filter runs started at or before this timestamp (epoch ms)

Ausgabe

Parameter	Typ	Beschreibung
`runs`	array	List of job runs
↳ `runId`	number	Unique run identifier
↳ `jobId`	number	Job this run belongs to
↳ `runName`	string	Run name
↳ `runType`	string	Run type (JOB_RUN, WORKFLOW_RUN, SUBMIT_RUN)
↳ `state`	object	Run state information
↳ `lifeCycleState`	string	Lifecycle state (QUEUED, PENDING, RUNNING, TERMINATING, TERMINATED, SKIPPED, INTERNAL_ERROR, BLOCKED, WAITING_FOR_RETRY)
↳ `resultState`	string	Result state (SUCCESS, FAILED, TIMEDOUT, CANCELED, SUCCESS_WITH_FAILURES, UPSTREAM_FAILED, UPSTREAM_CANCELED, EXCLUDED)
↳ `stateMessage`	string	Descriptive state message
↳ `userCancelledOrTimedout`	boolean	Whether the run was cancelled by user or timed out
↳ `startTime`	number	Run start timestamp (epoch ms)
↳ `endTime`	number	Run end timestamp (epoch ms)
`hasMore`	boolean	Whether more runs are available for pagination
`nextPageToken`	string	Token for fetching the next page of results

`databricks_cancel_run`

Cancel a running or pending Databricks job run. Cancellation is asynchronous; poll the run status to confirm termination.

Eingabe

Parameter	Typ	Erforderlich	Beschreibung
`host`	string	Ja	Databricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
`apiKey`	string	Ja	Databricks Personal Access Token
`runId`	number	Ja	The canonical identifier of the run to cancel

Ausgabe

Parameter	Typ	Beschreibung
`success`	boolean	Whether the cancel request was accepted

`databricks_get_run_output`

Get the output of a completed Databricks job run, including notebook results, error messages, and logs. For multi-task jobs, use the task run ID (not the parent run ID).

Eingabe

Parameter	Typ	Erforderlich	Beschreibung
`host`	string	Ja	Databricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
`apiKey`	string	Ja	Databricks Personal Access Token
`runId`	number	Ja	The run ID to get output for. For multi-task jobs, use the task run ID

Ausgabe

Parameter	Typ	Beschreibung
`notebookOutput`	object	Notebook task output (from dbutils.notebook.exit())
↳ `result`	string	Value passed to dbutils.notebook.exit() (max 5 MB)
↳ `truncated`	boolean	Whether the result was truncated
`error`	string	Error message if the run failed or output is unavailable
`errorTrace`	string	Error stack trace if available
`logs`	string	Log output (last 5 MB) from spark_jar, spark_python, or python_wheel tasks
`logsTruncated`	boolean	Whether the log output was truncated

`databricks_list_clusters`

List all clusters in a Databricks workspace including their state, configuration, and resource details.

Eingabe

Parameter	Typ	Erforderlich	Beschreibung
`host`	string	Ja	Databricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
`apiKey`	string	Ja	Databricks Personal Access Token

Ausgabe

Parameter	Typ	Beschreibung
`clusters`	array	List of clusters in the workspace
↳ `clusterId`	string	Unique cluster identifier
↳ `clusterName`	string	Cluster display name
↳ `state`	string	Current state (PENDING, RUNNING, RESTARTING, RESIZING, TERMINATING, TERMINATED, ERROR, UNKNOWN)
↳ `stateMessage`	string	Human-readable state description
↳ `creatorUserName`	string	Email of the cluster creator
↳ `sparkVersion`	string	Spark runtime version (e.g., 13.3.x-scala2.12)
↳ `nodeTypeId`	string	Worker node type identifier
↳ `driverNodeTypeId`	string	Driver node type identifier
↳ `numWorkers`	number	Number of worker nodes (for fixed-size clusters)
↳ `autoscale`	object	Autoscaling configuration (null for fixed-size clusters)
↳ `minWorkers`	number	Minimum number of workers
↳ `maxWorkers`	number	Maximum number of workers
↳ `clusterSource`	string	Origin (API, UI, JOB, MODELS, PIPELINE, PIPELINE_MAINTENANCE, SQL)
↳ `autoterminationMinutes`	number	Minutes of inactivity before auto-termination (0 = disabled)
↳ `startTime`	number	Cluster start timestamp (epoch ms)

Databricks

On this page