Skip to main content
Version: Next

Cassandra

Testing

Important Capabilities

CapabilityStatusNotes
Asset ContainersEnabled by default
Detect Deleted EntitiesOptionally enabled via stateful_ingestion.remove_stale_metadata
Platform InstanceEnabled by default
Schema MetadataEnabled by default

This plugin extracts the following:

  • Metadata for tables
  • Column types associated with each table column
  • The keyspace each table belongs to

CLI based Ingestion

Install the Plugin

The cassandra source works out of the box with acryl-datahub.

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: "cassandra"
config:
# Coordinates
contact_point: "localhost"
port: 9042

# Credentials
username: "admin"
password: "password"

#cloud astra db
#cloud_config:
# secure_connect_bundle: "Path to Secure Connect Bundle (.zip)"
# token: "Application Token"

# Options
keyspace_pattern:
allow: [".*"]
sink:
# config sinks

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
contact_point
string
Domain or IP address of the Cassandra instance (excluding port).
Default: localhost
password
string
Password credential associated with the specified username.
platform_instance
string
The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details.
port
integer
Port number to connect to the Cassandra instance.
Default: 9042
username
string
Username credential with read access to the system_schema keyspace.
env
string
The environment that all assets produced by this connector belong to
Default: PROD
cloud_config
CassandraCloudConfig
Configuration for cloud-based Cassandra, such as DataStax Astra DB.
cloud_config.connect_timeout
integer
Timeout in seconds for establishing new connections to Cassandra.
Default: 60
cloud_config.request_timeout
integer
Timeout in seconds for individual Cassandra requests.
Default: 60
cloud_config.secure_connect_bundle
string
File path to the Secure Connect Bundle (.zip) used for a secure connection to DataStax Astra DB.
Default:
cloud_config.token
string
The Astra DB application token used for authentication.
Default:
keyspace_pattern
AllowDenyPattern
Regex patterns to filter keyspaces for ingestion.
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
keyspace_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
keyspace_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
keyspace_pattern.allow.string
string
keyspace_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
keyspace_pattern.deny.string
string
stateful_ingestion
StatefulStaleMetadataRemovalConfig
Configuration for stateful ingestion and stale metadata removal.
stateful_ingestion.enabled
boolean
Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False
Default: False
stateful_ingestion.remove_stale_metadata
boolean
Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.
Default: True

Code Coordinates

  • Class Name: datahub.ingestion.source.cassandra.cassandra.CassandraSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Cassandra, feel free to ping us on our Slack.