AACFlow

Databricks

SQL-Abfragen ausführen und Jobs auf Databricks verwalten

Databricks is a unified data analytics platform built on Apache Spark, providing a collaborative environment for data engineering, data science, and machine learning. Databricks combines data warehousing, ETL, and AI workloads into a single lakehouse architecture, with support for SQL analytics, job orchestration, and cluster management across major cloud providers.

With the Databricks integration in AACFlow, you can:

  • Execute SQL queries: Run SQL statements against Databricks SQL warehouses with support for parameterized queries and Unity Catalog
  • Manage jobs: List, trigger, and monitor Databricks job runs programmatically
  • Track run status: Get detailed run information including timing, state, and output results
  • Control clusters: List and inspect cluster configurations, states, and resource details
  • Retrieve run outputs: Access notebook results, error messages, and logs from completed job runs

In AACFlow, the Databricks integration enables your agents to interact with your data lakehouse as part of automated workflows. Agents can query large-scale datasets, orchestrate ETL pipelines by triggering jobs, monitor job execution, and retrieve results—all without leaving the workflow canvas. This is ideal for automated reporting, data pipeline management, scheduled analytics, and building AI-driven data workflows that react to query results or job outcomes.

Nutzungsanleitung

Connect to Databricks to execute SQL queries against SQL warehouses, trigger and monitor job runs, manage clusters, and retrieve run outputs. Requires a Personal Access Token and workspace host URL.

Tools

databricks_execute_sql

Execute a SQL statement against a Databricks SQL warehouse and return results inline. Supports parameterized queries and Unity Catalog.

Eingabe

ParameterTypErforderlichBeschreibung
hoststringJaDatabricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
apiKeystringJaDatabricks Personal Access Token
warehouseIdstringJaThe ID of the SQL warehouse to execute against
statementstringJaThe SQL statement to execute (max 16 MiB)
catalogstringNeinUnity Catalog name (equivalent to USE CATALOG)
schemastringNeinSchema name (equivalent to USE SCHEMA)
rowLimitnumberNeinMaximum number of rows to return
waitTimeoutstringNeinHow long to wait for results (e.g., "50s"). Range: "0s" or "5s" to "50s". Default: "50s"

Ausgabe

ParameterTypBeschreibung
statementIdstringUnique identifier for the executed statement
statusstringExecution status (SUCCEEDED, PENDING, RUNNING, FAILED, CANCELED, CLOSED)
columnsarrayColumn schema of the result set
namestringColumn name
positionnumberColumn position (0-based)
typeNamestringColumn type (STRING, INT, LONG, DOUBLE, BOOLEAN, TIMESTAMP, DATE, DECIMAL, etc.)
dataarrayResult rows as a 2D array of strings where each inner array is a row of column values
totalRowsnumberTotal number of rows in the result
truncatedbooleanWhether the result set was truncated due to row_limit or byte_limit

databricks_list_jobs

List all jobs in a Databricks workspace with optional filtering by name.

Eingabe

ParameterTypErforderlichBeschreibung
hoststringJaDatabricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
apiKeystringJaDatabricks Personal Access Token
limitnumberNeinMaximum number of jobs to return (range 1-100, default 20)
offsetnumberNeinOffset for pagination
namestringNeinFilter jobs by exact name (case-insensitive)
expandTasksbooleanNeinInclude task and cluster details in the response (max 100 elements)

Ausgabe

ParameterTypBeschreibung
jobsarrayList of jobs in the workspace
jobIdnumberUnique job identifier
namestringJob name
createdTimenumberJob creation timestamp (epoch ms)
creatorUserNamestringEmail of the job creator
maxConcurrentRunsnumberMaximum number of concurrent runs
formatstringJob format (SINGLE_TASK or MULTI_TASK)
hasMorebooleanWhether more jobs are available for pagination
nextPageTokenstringToken for fetching the next page of results

databricks_run_job

Trigger an existing Databricks job to run immediately with optional job-level or notebook parameters.

Eingabe

ParameterTypErforderlichBeschreibung
hoststringJaDatabricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
apiKeystringJaDatabricks Personal Access Token
jobIdnumberJaThe ID of the job to trigger
jobParametersstringNeinJob-level parameter overrides as a JSON object (e.g., {"key": "value"})
notebookParamsstringNeinNotebook task parameters as a JSON object (e.g., {"param1": "value1"})
idempotencyTokenstringNeinIdempotency token to prevent duplicate runs (max 64 characters)

Ausgabe

ParameterTypBeschreibung
runIdnumberThe globally unique ID of the triggered run
numberInJobnumberThe sequence number of this run among all runs of the job

databricks_get_run

Get the status, timing, and details of a Databricks job run by its run ID.

Eingabe

ParameterTypErforderlichBeschreibung
hoststringJaDatabricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
apiKeystringJaDatabricks Personal Access Token
runIdnumberJaThe canonical identifier of the run
includeHistorybooleanNeinInclude repair history in the response
includeResolvedValuesbooleanNeinInclude resolved parameter values in the response

Ausgabe

ParameterTypBeschreibung
runIdnumberThe run ID
jobIdnumberThe job ID this run belongs to
runNamestringName of the run
runTypestringType of run (JOB_RUN, WORKFLOW_RUN, SUBMIT_RUN)
attemptNumbernumberRetry attempt number (0 for initial attempt)
stateobjectRun state information
lifeCycleStatestringLifecycle state (QUEUED, PENDING, RUNNING, TERMINATING, TERMINATED, SKIPPED, INTERNAL_ERROR, BLOCKED, WAITING_FOR_RETRY)
resultStatestringResult state (SUCCESS, FAILED, TIMEDOUT, CANCELED, SUCCESS_WITH_FAILURES, UPSTREAM_FAILED, UPSTREAM_CANCELED, EXCLUDED)
stateMessagestringDescriptive message for the current state
userCancelledOrTimedoutbooleanWhether the run was cancelled by user or timed out
startTimenumberRun start timestamp (epoch ms)
endTimenumberRun end timestamp (epoch ms, 0 if still running)
setupDurationnumberCluster setup duration (ms)
executionDurationnumberExecution duration (ms)
cleanupDurationnumberCleanup duration (ms)
queueDurationnumberTime spent in queue before execution (ms)
runPageUrlstringURL to the run detail page in Databricks UI
creatorUserNamestringEmail of the user who triggered the run

databricks_list_runs

List job runs in a Databricks workspace with optional filtering by job, status, and time range.

Eingabe

ParameterTypErforderlichBeschreibung
hoststringJaDatabricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
apiKeystringJaDatabricks Personal Access Token
jobIdnumberNeinFilter runs by job ID. Omit to list runs across all jobs
activeOnlybooleanNeinOnly include active runs (PENDING, RUNNING, or TERMINATING)
completedOnlybooleanNeinOnly include completed runs
limitnumberNeinMaximum number of runs to return (range 1-24, default 20)
offsetnumberNeinOffset for pagination
runTypestringNeinFilter by run type (JOB_RUN, WORKFLOW_RUN, SUBMIT_RUN)
startTimeFromnumberNeinFilter runs started at or after this timestamp (epoch ms)
startTimeTonumberNeinFilter runs started at or before this timestamp (epoch ms)

Ausgabe

ParameterTypBeschreibung
runsarrayList of job runs
runIdnumberUnique run identifier
jobIdnumberJob this run belongs to
runNamestringRun name
runTypestringRun type (JOB_RUN, WORKFLOW_RUN, SUBMIT_RUN)
stateobjectRun state information
lifeCycleStatestringLifecycle state (QUEUED, PENDING, RUNNING, TERMINATING, TERMINATED, SKIPPED, INTERNAL_ERROR, BLOCKED, WAITING_FOR_RETRY)
resultStatestringResult state (SUCCESS, FAILED, TIMEDOUT, CANCELED, SUCCESS_WITH_FAILURES, UPSTREAM_FAILED, UPSTREAM_CANCELED, EXCLUDED)
stateMessagestringDescriptive state message
userCancelledOrTimedoutbooleanWhether the run was cancelled by user or timed out
startTimenumberRun start timestamp (epoch ms)
endTimenumberRun end timestamp (epoch ms)
hasMorebooleanWhether more runs are available for pagination
nextPageTokenstringToken for fetching the next page of results

databricks_cancel_run

Cancel a running or pending Databricks job run. Cancellation is asynchronous; poll the run status to confirm termination.

Eingabe

ParameterTypErforderlichBeschreibung
hoststringJaDatabricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
apiKeystringJaDatabricks Personal Access Token
runIdnumberJaThe canonical identifier of the run to cancel

Ausgabe

ParameterTypBeschreibung
successbooleanWhether the cancel request was accepted

databricks_get_run_output

Get the output of a completed Databricks job run, including notebook results, error messages, and logs. For multi-task jobs, use the task run ID (not the parent run ID).

Eingabe

ParameterTypErforderlichBeschreibung
hoststringJaDatabricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
apiKeystringJaDatabricks Personal Access Token
runIdnumberJaThe run ID to get output for. For multi-task jobs, use the task run ID

Ausgabe

ParameterTypBeschreibung
notebookOutputobjectNotebook task output (from dbutils.notebook.exit())
resultstringValue passed to dbutils.notebook.exit() (max 5 MB)
truncatedbooleanWhether the result was truncated
errorstringError message if the run failed or output is unavailable
errorTracestringError stack trace if available
logsstringLog output (last 5 MB) from spark_jar, spark_python, or python_wheel tasks
logsTruncatedbooleanWhether the log output was truncated

databricks_list_clusters

List all clusters in a Databricks workspace including their state, configuration, and resource details.

Eingabe

ParameterTypErforderlichBeschreibung
hoststringJaDatabricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
apiKeystringJaDatabricks Personal Access Token

Ausgabe

ParameterTypBeschreibung
clustersarrayList of clusters in the workspace
clusterIdstringUnique cluster identifier
clusterNamestringCluster display name
statestringCurrent state (PENDING, RUNNING, RESTARTING, RESIZING, TERMINATING, TERMINATED, ERROR, UNKNOWN)
stateMessagestringHuman-readable state description
creatorUserNamestringEmail of the cluster creator
sparkVersionstringSpark runtime version (e.g., 13.3.x-scala2.12)
nodeTypeIdstringWorker node type identifier
driverNodeTypeIdstringDriver node type identifier
numWorkersnumberNumber of worker nodes (for fixed-size clusters)
autoscaleobjectAutoscaling configuration (null for fixed-size clusters)
minWorkersnumberMinimum number of workers
maxWorkersnumberMaximum number of workers
clusterSourcestringOrigin (API, UI, JOB, MODELS, PIPELINE, PIPELINE_MAINTENANCE, SQL)
autoterminationMinutesnumberMinutes of inactivity before auto-termination (0 = disabled)
startTimenumberCluster start timestamp (epoch ms)

On this page

Heute mit dem Aufbau beginnen
Über 100 000 Entwickler vertrauen uns.
Die SaaS-Plattform zum Aufbau von KI-Agenten und für Ihre agentische Belegschaft.
Loslegen