From 6feea46a8e9426c5025ddd8bbd1401fb2a47e19a Mon Sep 17 00:00:00 2001 From: April M <36110273+aimurphy@users.noreply.github.com> Date: Wed, 15 Apr 2026 09:49:38 -0700 Subject: [PATCH 01/12] move zdm proxy logs to new page --- modules/ROOT/nav.adoc | 1 + modules/ROOT/pages/change-read-routing.adoc | 25 +-- .../ROOT/pages/deploy-proxy-monitoring.adoc | 2 +- modules/ROOT/pages/dse-migration-paths.adoc | 17 +- .../ROOT/pages/enable-async-dual-reads.adoc | 13 +- .../ROOT/pages/manage-proxy-instances.adoc | 10 +- modules/ROOT/pages/metrics.adoc | 15 +- .../ROOT/pages/setup-ansible-playbooks.adoc | 11 +- modules/ROOT/pages/troubleshooting-tips.adoc | 179 +----------------- modules/ROOT/pages/zdm-logs.adoc | 176 +++++++++++++++++ modules/ROOT/partials/zdm-logs-intro.adoc | 1 + 11 files changed, 208 insertions(+), 242 deletions(-) create mode 100644 modules/ROOT/pages/zdm-logs.adoc create mode 100644 modules/ROOT/partials/zdm-logs-intro.adoc diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index e576fbb3..cb350533 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -24,6 +24,7 @@ * xref:ROOT:enable-async-dual-reads.adoc[] * xref:ROOT:change-read-routing.adoc[] * xref:ROOT:connect-clients-to-target.adoc[] +* xref:ROOT:zdm-logs.adoc[] * xref:ROOT:troubleshooting-tips.adoc[] * xref:ROOT:faqs.adoc[] * Release notes diff --git a/modules/ROOT/pages/change-read-routing.adoc b/modules/ROOT/pages/change-read-routing.adoc index 170b4216..f104a5c9 100644 --- a/modules/ROOT/pages/change-read-routing.adoc +++ b/modules/ROOT/pages/change-read-routing.adoc @@ -48,21 +48,17 @@ By design, the data is expected to be identical on both clusters, and your clien For this reason, the only way to manually test read routing is to intentionally write mismatched test data to the clusters. Then, you can send a read request to {product-proxy} and see which cluster-specific data is returned, which indicates the cluster that received the read request. -There are two ways to do this. +There are two ways to do this: -[tabs] -====== Manually create mismatched tables:: -+ --- To manually create mismatched data, you can create a test table on each cluster, and then write different data to each table. - ++ [IMPORTANT] ==== When you write the mismatched data to the tables, make sure you connect to each cluster directly. Don't connect to {product-proxy}, because {product-proxy} will, by design, write the same data to both clusters through dual writes. ==== - ++ . Create a small test table on both clusters, such as a simple key/value table. You can use an existing keyspace, or create one for this test specifically. For example: @@ -71,7 +67,7 @@ For example: ---- CREATE TABLE test_keyspace.test_table(k TEXT PRIMARY KEY, v TEXT); ---- - ++ . Use `cqlsh` to connect _directly to the origin cluster_, and then insert a row with any key and a value that is specific to the origin cluster. For example: + @@ -79,7 +75,7 @@ For example: ---- INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from the origin cluster!'); ---- - ++ . Use `cqlsh` to connect _directly to the target cluster_, and then insert a row with the same key and a value that is specific to the target cluster. For example: + @@ -87,7 +83,7 @@ For example: ---- INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from the target cluster!'); ---- - ++ . Use `cqlsh` to xref:connect-clients-to-proxy.adoc#_connecting_cqlsh_to_the_zdm_proxy[connect to {product-proxy}], and then issue a read request to your test table. For example: + @@ -99,22 +95,19 @@ SELECT * FROM test_keyspace.test_table WHERE k = '1'; The cluster-specific value in the response tells you which cluster received the read request. For example: + +-- * If the read request was correctly routed to the target cluster, the result from `test_table` contains `Hello from the target cluster!`. * If the read request was incorrectly routed to the origin cluster, the result from `test_table` contains `Hello from the origin cluster!`. - +-- ++ . When you're done testing, drop the test tables from both clusters. If you created dedicated test keyspaces, drop the keyspaces as well. --- Use the Themis sample client application:: -+ --- The xref:connect-clients-to-proxy.adoc#_themis_client[Themis sample client application] connects directly to the origin cluster, the target cluster, and {product-proxy}. It inserts some test data in its own, dedicated table. Then, you can view the results of reads from each source. For more information, see the https://github.com/absurdfarce/themis/blob/main/README.md[Themis README]. --- -====== === System tables cannot validate read routing diff --git a/modules/ROOT/pages/deploy-proxy-monitoring.adoc b/modules/ROOT/pages/deploy-proxy-monitoring.adoc index a8eb7279..84494215 100644 --- a/modules/ROOT/pages/deploy-proxy-monitoring.adoc +++ b/modules/ROOT/pages/deploy-proxy-monitoring.adoc @@ -212,7 +212,7 @@ For the majority of the migration, leave this set to the default value of `ORIGI At the end of the migration, when you're preparing to switch over to the target cluster permanently, you can change it to `TARGET` after migrating all data from the origin cluster. * `read_mode`: Controls the xref:enable-async-dual-reads.adoc[asynchronous dual reads] feature. Until you reach Phase 3, leave this set to the default value of `PRIMARY_ONLY`. -* `log_level`: You might need to xref:ROOT:troubleshooting-tips.adoc#proxy-logs[modify the log level] when troubleshooting issues. +* `log_level`: You might need to xref:ROOT:zdm-logs.adoc[modify the log level] when troubleshooting issues. Unless you are investigating an issue, leave this set to the default value of `INFO`. [IMPORTANT] diff --git a/modules/ROOT/pages/dse-migration-paths.adoc b/modules/ROOT/pages/dse-migration-paths.adoc index 1d68f039..3732ed00 100644 --- a/modules/ROOT/pages/dse-migration-paths.adoc +++ b/modules/ROOT/pages/dse-migration-paths.adoc @@ -20,31 +20,24 @@ For information about clusters that support the {product-short} tools, including The tools and process for data migration to or from {dse-short} depends on your {dse-short} version and the other database's platform or version. -[tabs] -====== Migrate data to {dse-short}:: +The following information provides guidance on migrations _to_ {dse-short}, with a focus on data transfer tools: + -- -The following information provides guidance on migrations _to_ {dse-short}, with a focus on data transfer tools: - * xref:6.9@dse:managing:operations/migrate-data.adoc[Migrate to {dse-short} 6.9] * xref:6.8@dse:managing:operations/migrate-data.adoc[Migrate to {dse-short} 6.8] * xref:5.1@dse:managing:operations/migrate-data.adoc[Migrate to {dse-short} 5.1] - -Generally, {company} recommends migrating to the latest version of {dse-short}, unless you have a specific functional requirement or a compatibility issue that requires migrating to an earlier version. -- ++ +Generally, {company} recommends migrating to the latest version of {dse-short}, unless you have a specific functional requirement or a compatibility issue that requires migrating to an earlier version. Migrate data from {dse-short}:: -+ --- When migrating _from_ {dse-short} to another {cass-short}-based database, follow the migration guidance for your target database to determine cluster compatibility, migration options, and recommendations. For example, for {astra-db}, see xref:ROOT:astra-migration-paths.adoc[], and for {hcd-short}, see xref:ROOT:hcd-migration-paths.adoc[]. - ++ For information about origin and target clusters that are supported by the {product-short} tools, see xref:ROOT:zdm-proxy-migration-paths.adoc[]. - ++ If your target database isn't directly compatible with a migration from {dse-short}, you might need to take interim steps to prepare your data for migration, such as upgrading your {dse-short} version, modifying the data in your existing database to be compatible with the target database, or running an extract, transform, load (ETL) pipeline. --- -====== == Migrate your code diff --git a/modules/ROOT/pages/enable-async-dual-reads.adoc b/modules/ROOT/pages/enable-async-dual-reads.adoc index efe1edb5..7be89a87 100644 --- a/modules/ROOT/pages/enable-async-dual-reads.adoc +++ b/modules/ROOT/pages/enable-async-dual-reads.adoc @@ -38,26 +38,21 @@ Then, perform a rolling restart of your {product-proxy} instances to apply the c . Edit `vars/zdm_proxy_core_config.yml`, and then set the `read_mode` variable: + -[tabs] -====== -Enable asynchronous dual reads:: +* To enable asynchronous dual reads, set `read_mode` to `DUAL_ASYNC_ON_SECONDARY`: + --- [source,yml] ---- read_mode: DUAL_ASYNC_ON_SECONDARY ---- --- -Disable asynchronous dual reads (default):: +* To disable asynchronous dual reads (default), set `read_mode` to `PRIMARY_ONLY`: + --- [source,yml] ---- read_mode: PRIMARY_ONLY ---- --- -====== ++ +The default configuration is disable (`PRIMARY_ONLY`). . xref:ROOT:manage-proxy-instances.adoc#perform-a-rolling-restart-of-the-proxies[Perform a rolling restart] to apply the configuration change to your {product-proxy} instances. diff --git a/modules/ROOT/pages/manage-proxy-instances.adoc b/modules/ROOT/pages/manage-proxy-instances.adoc index 4ffbd819..f66cf03e 100644 --- a/modules/ROOT/pages/manage-proxy-instances.adoc +++ b/modules/ROOT/pages/manage-proxy-instances.adoc @@ -12,7 +12,7 @@ Rolling restarts of the {product-proxy} instances are useful to apply configurat [IMPORTANT] ==== A rolling restart is a destructive action because it stops the previous containers, and then starts new containers. -xref:ROOT:troubleshooting-tips.adoc#proxy-logs[Collect the logs] before you apply the configuration change if you want to keep them. +xref:ROOT:zdm-logs.adoc[Collect the logs] before you apply the configuration change if you want to keep them. ==== [tabs] @@ -82,8 +82,8 @@ To avoid downtime, wait for each instance to fully restart and begin receiving t == Inspect {product-proxy} logs -{product-proxy} logs can help you verify that your {product-proxy} instances are operating normally, investigate how processes are executed, and troubleshoot issues. -For information about configuring, retrieving, and interpreting {product-proxy} logs, see xref:ROOT:troubleshooting-tips.adoc#proxy-logs[Viewing and interpreting {product-proxy} logs]. +include::ROOT:partial$zdm-logs-intro.adoc[] +For more information, see xref:ROOT:zdm-logs.adoc[]. [[change-mutable-config-variable]] == Change mutable configuration variables @@ -119,7 +119,7 @@ Set the {product-proxy} log level as `INFO` (default) or `DEBUG`. Only use `DEBUG` while temporarily troubleshooting an issue. Revert to `INFO` as soon as possible because the extra logging can impact performance slightly. + -For more information, see xref:ROOT:troubleshooting-tips.adoc#proxy-logs[Check {product-proxy} logs]. +For more information, see xref:ROOT:zdm-logs.adoc[]. === Mutable variables in `vars/zdm_proxy_cluster_config.yml` @@ -285,7 +285,7 @@ All containers are re-created with the given image version. [IMPORTANT] ==== A version change is a destructive action because the rolling restart playbook removes the previous containers and their logs, replacing them with new containers using the new image. -xref:ROOT:troubleshooting-tips.adoc#proxy-logs[Collect the logs] before you run the playbook if you want to keep them. +xref:ROOT:zdm-logs.adoc[Collect the logs] before you run the playbook if you want to keep them. ==== To check your current {product-proxy} version, see xref:ROOT:troubleshooting-tips.adoc#check-version[Check your {product-proxy} version]. diff --git a/modules/ROOT/pages/metrics.adoc b/modules/ROOT/pages/metrics.adoc index 56f53193..b12e4d02 100644 --- a/modules/ROOT/pages/metrics.adoc +++ b/modules/ROOT/pages/metrics.adoc @@ -83,11 +83,8 @@ If you deployed your monitoring stack on another machine, replace `**MONITORING_ For example, after deploying the {product-proxy} monitoring stack, you can use the `liveness` and `readiness` HTTP endpoints to confirm that your {product-proxy} instances are running. -[tabs] -====== -Liveliness endpoint:: -+ --- +=== Liveliness endpoint + [source,plaintext,subs="+quotes"] ---- http://**ZDM_PROXY_PRIVATE_IP**:**METRICS_PORT**/health/liveness @@ -114,11 +111,8 @@ Example request with plaintext values: curl -G "http://172.18.10.40:14001/health/liveliness" ---- --- +=== Readiness endpoint -Readiness endpoint:: -+ --- [source,plaintext,subs="+quotes"] ---- http://**ZDM_PROXY_PRIVATE_IP**:**METRICS_PORT**/health/readiness @@ -166,9 +160,6 @@ Example result: } ---- --- -====== - == Inspect {product-proxy} metrics {product-proxy} exposes an HTTP endpoint that returns metrics in the Prometheus format. diff --git a/modules/ROOT/pages/setup-ansible-playbooks.adoc b/modules/ROOT/pages/setup-ansible-playbooks.adoc index ea849d8c..0248a0b3 100644 --- a/modules/ROOT/pages/setup-ansible-playbooks.adoc +++ b/modules/ROOT/pages/setup-ansible-playbooks.adoc @@ -38,22 +38,15 @@ You don't need to install Docker on any other machines. If you don't want to pull images from a specific registry, or your servers don't connect to the public internet, there are two alternative Docker configurations you can use. -[tabs] -====== Pull from local cache:: -+ --- If your servers can connect directly to a local Docker registry, the servers can pull containers from the public internet by way of the local Docker registry. With this option, only the local Docker registry is connected to the public internet. For instructions, see the Docker documentation on https://docs.docker.com/docker-hub/mirror/[configuring a pull-through cache]. --- Airgapped local registry:: -+ --- Local registries that aren't connected to the internet require administrators to manually add containers to their registry. For {product-utility}, you need the following five containers to install and configure the jumphost, {product-proxy}, and monitoring: - ++ [source,plaintext] ---- grafana/grafana:7.5.17 @@ -62,8 +55,6 @@ datastax/zdm-ansible:2.x prom/node-exporter:latest datastax/zdm-proxy:2.x ---- --- -====== == Use a jumphost to deploy the Ansible Control Host container diff --git a/modules/ROOT/pages/troubleshooting-tips.adoc b/modules/ROOT/pages/troubleshooting-tips.adoc index b87d6ac4..ee261414 100644 --- a/modules/ROOT/pages/troubleshooting-tips.adoc +++ b/modules/ROOT/pages/troubleshooting-tips.adoc @@ -10,183 +10,8 @@ For additional assistance, you can <>, contact [#proxy-logs] == Check {product-proxy} logs -{product-proxy} logs can help you verify that your {product-proxy} instances are operating normally, investigate how processes are executed, and troubleshoot issues. - -[#set-the-zdm-proxy-log-level] -=== Set the {product-proxy} log level - -Set the {product-proxy} log level to print the messages that you need. - -The default log level is `INFO`, which is adequate for most logging. - -If you need more detail for temporary troubleshooting, you can set the log level to `DEBUG`. -However, this can slightly degrade performance, and {company} recommends that you revert to `INFO` logging as soon as possible. - -How you set the log level depends on how you deployed {product-proxy}: - -* If you used {product-automation} to deploy {product-proxy}, set `log_level` in `vars/zdm_proxy_core_config.yml`, and then run the `rolling_update_zdm_proxy.yml` playbook. -For more information, see xref:manage-proxy-instances.adoc#change-mutable-config-variable[Change a mutable configuration variable]. - -* If you didn't use {product-automation} to deploy {product-proxy}, set the `ZDM_LOG_LEVEL` environment variable on each proxy instance, and then restart each instance. - -=== Get {product-proxy} log files - -If you used {product-automation} to deploy {product-proxy}, then you can get logs for a single proxy instance, and you can use a playbook to retrieve logs for all instances. - -[tabs] -====== -View or tail logs for one instance:: -+ --- -{product-proxy} runs as a Docker container on each proxy host. - -To view the logs for a single {product-proxy} instance, connect to a proxy host, and then run the following command: - -[source,bash] ----- -docker container logs zdm-proxy-container ----- - -To tail (stream) the logs as they are written, use the `--follow` (`-f`) option: - -[source,bash] ----- -docker container logs zdm-proxy-container -f ----- - -Keep in mind that Docker logs are deleted if the container is re-created. --- - -Collect logs for multiple instances:: -+ --- -{product-automation} has a dedicated playbook, `collect_zdm_proxy_logs.yml`, that you can use to collect logs for all {product-proxy} instances in a deployment. - -You can view the playbook's configuration in `vars/zdm_proxy_log_collection_config.yml`, but no changes are required to run it. - -. Connect to the Ansible Control Host Docker container. -You can do this from the jumphost machine by running the following command: -+ -[source,bash] ----- -docker exec -it zdm-ansible-container bash ----- -+ -.Result -[%collapsible] -==== -[source,bash] ----- -ubuntu@52772568517c:~$ ----- -==== - -. Run the log collection playbook: -+ -[source,bash] ----- -ansible-playbook collect_zdm_proxy_logs.yml -i zdm_ansible_inventory ----- -+ -This playbook creates a single zip file, `zdm_proxy_logs_**TIMESTAMP**.zip`, that contains the logs from all proxy instances. -This archive is stored on the Ansible Control Host Docker container at `/home/ubuntu/zdm_proxy_archived_logs`. - -. To copy the archive from the container to the jumphost, open a shell on the jumphost, and then run the following command: -+ -[source,bash,subs="+quotes"] ----- -docker cp zdm-ansible-container:/home/ubuntu/zdm_proxy_archived_logs/zdm_proxy_logs_**TIMESTAMP**.zip **DESTINATION_DIRECTORY_ON_JUMPHOST** ----- -+ -Replace the following: -+ -* `**TIMESTAMP**`: The timestamp from the name of your log file archive -* `**DESTINATION_DIRECTORY_ON_JUMPHOST**`: The path to the directory where you want to copy the archive --- - -Get logs for deployments that don't use {product-automation}:: -+ --- -If you didn't use {product-automation} to deploy {product-proxy}, you must access the logs another way, depending on your deployment configuration and infrastructure. - -For example, if you used Docker, you can use the following command to export a container's logs to a `log.txt` file: - -[source,bash] ----- -docker logs my-container > log.txt ----- - -Keep in mind that Docker logs are deleted if the container is re-created. --- -====== - -=== Message levels - -Some log messages contain text that seems like an error but they aren't errors. -Instead, the message's `level` indicates severity: - -* `level=info`: Expected and normal messages that typically aren't errors. - -* `level=debug`: Expected and normal messages that typically aren't errors. -However, they can help you find the source of a problem by providing information about the environment and conditions when the error occurred. -+ -`debug` messages are only recorded if you <>. - -* `level=warn`: Reports an event that wasn't fatal to the overall process but might indicate an issue with an individual request or connection. - -* `level=error`: Indicates an issue with {product-proxy}, the client application, or the clusters. -These messages require further examination. - -If the meaning of a `warn` or `error` message isn't clear, you can <>. - -=== Common log messages - -Here are some of the most common messages in the {product-proxy} logs. - -{product-proxy} startup message:: -If the log level doesn't filter out `info` entries, you can look for a `Proxy started` log message to verify that {product-proxy} started correctly. -For example: -+ -[source,json] ----- -{"log":"time=\"2023-01-13T11:50:48Z\" level=info -msg=\"Proxy started. Waiting for SIGINT/SIGTERM to shutdown. -\"\n","stream":"stderr","time":"2023-01-13T11:50:48.522097083Z"} ----- - -{product-proxy} configuration message:: -If the log level doesn't filter out `info` entries, the first few lines of a {product-proxy} log file contain all configuration variables and values in a long JSON string. -+ -The following example log message is truncated for readability: -+ -[source,json] ----- -{"log":"time=\"2023-01-13T11:50:48Z\" level=info -msg=\"Parsed configuration: {\\\"ProxyIndex\\\":1,\\\"ProxyAddresses\\\":"...", -...TRUNCATED... -","stream":"stderr","time":"2023-01-13T11:50:48.339225051Z"} ----- -+ -Configuration settings can help with troubleshooting. -+ -To make this message easier to read, pass it through a JSON formatter or paste it into a text editor that can reformat JSON. - -Protocol log messages:: -There are cases where protocol errors are fatal, and they will kill an active connection that was being used to serve requests. -However, it is also possible to get normal protocol log messages that contain wording that sounds like an error. -+ -For example, the following `DEBUG` message contains the phrases `force a downgrade` and `unsupported protocol version`, which can sound like errors: -+ -[source,json] ----- -{"log":"time=\"2023-01-13T12:02:12Z\" level=debug msg=\"[TARGET-CONNECTOR] -Protocol v5 detected while decoding a frame. Returning a protocol message -to the client to force a downgrade: PROTOCOL (code=Code Protocol [0x0000000A], -msg=Invalid or unsupported protocol version (5)).\"\n","stream":"stderr","time":"2023-01-13T12:02:12.379287735Z"} ----- -+ -However, `level=debug` indicates that this is not an error. -Instead, this is a normal part of protocol version negotiation (handshake) during connection initialization. +include::ROOT:partial$zdm-logs-intro.adoc[] +For more information, see xref:ROOT:zdm-logs.adoc[]. [#check-version] == Check your {product-proxy} version diff --git a/modules/ROOT/pages/zdm-logs.adoc b/modules/ROOT/pages/zdm-logs.adoc new file mode 100644 index 00000000..54708b40 --- /dev/null +++ b/modules/ROOT/pages/zdm-logs.adoc @@ -0,0 +1,176 @@ += Configure and read {product-proxy} logs +:navtitle: Get {product-short} logs +:description: Configure and retrieve {product-proxy} logs. + +include::ROOT:partial$zdm-logs-intro.adoc[] + +[#set-the-zdm-proxy-log-level] +== Set the {product-proxy} log level + +Set the {product-proxy} log level to print the messages that you need. + +The default log level is `INFO`, which is adequate for most logging. + +If you need more detail for temporary troubleshooting, you can set the log level to `DEBUG`. +However, this can slightly degrade performance, and {company} recommends that you revert to `INFO` logging as soon as possible. + +How you set the log level depends on how you deployed {product-proxy}: + +* If you used {product-automation} to deploy {product-proxy}, set `log_level` in `vars/zdm_proxy_core_config.yml`, and then run the `rolling_update_zdm_proxy.yml` playbook. +For more information, see xref:manage-proxy-instances.adoc#change-mutable-config-variable[Change a mutable configuration variable]. + +* If you didn't use {product-automation} to deploy {product-proxy}, set the `ZDM_LOG_LEVEL` environment variable on each proxy instance, and then restart each instance. + +== Get {product-proxy} log files + +If you used {product-automation} to deploy {product-proxy}, then you can get logs for a single proxy instance, and you can use a playbook to retrieve logs for all instances. + +=== View or tail logs for one instance + +{product-proxy} runs as a Docker container on each proxy host. + +To view the logs for a single {product-proxy} instance, connect to a proxy host, and then run the following command: + +[source,bash] +---- +docker container logs zdm-proxy-container +---- + +To tail (stream) the logs as they are written, use the `--follow` (`-f`) option: + +[source,bash] +---- +docker container logs zdm-proxy-container -f +---- + +Keep in mind that Docker logs are deleted if the container is re-created. + +=== Collect logs for multiple instances + +{product-automation} has a dedicated playbook, `collect_zdm_proxy_logs.yml`, that you can use to collect logs for all {product-proxy} instances in a deployment. + +You can view the playbook's configuration in `vars/zdm_proxy_log_collection_config.yml`, but no changes are required to run it. + +. Connect to the Ansible Control Host Docker container. +You can do this from the jumphost machine by running the following command: ++ +[source,bash] +---- +docker exec -it zdm-ansible-container bash +---- ++ +.Result +[%collapsible] +==== +[source,bash] +---- +ubuntu@52772568517c:~$ +---- +==== + +. Run the log collection playbook: ++ +[source,bash] +---- +ansible-playbook collect_zdm_proxy_logs.yml -i zdm_ansible_inventory +---- ++ +This playbook creates a single zip file, `zdm_proxy_logs_**TIMESTAMP**.zip`, that contains the logs from all proxy instances. +This archive is stored on the Ansible Control Host Docker container at `/home/ubuntu/zdm_proxy_archived_logs`. + +. To copy the archive from the container to the jumphost, open a shell on the jumphost, and then run the following command: ++ +[source,bash,subs="+quotes"] +---- +docker cp zdm-ansible-container:/home/ubuntu/zdm_proxy_archived_logs/zdm_proxy_logs_**TIMESTAMP**.zip **DESTINATION_DIRECTORY_ON_JUMPHOST** +---- ++ +Replace the following: ++ +* `**TIMESTAMP**`: The timestamp from the name of your log file archive +* `**DESTINATION_DIRECTORY_ON_JUMPHOST**`: The path to the directory where you want to copy the archive + +=== Get logs for deployments that don't use {product-automation} + +If you didn't use {product-automation} to deploy {product-proxy}, you must access the logs another way, depending on your deployment configuration and infrastructure. + +For example, if you used Docker, you can use the following command to export a container's logs to a `log.txt` file: + +[source,bash] +---- +docker logs my-container > log.txt +---- + +Keep in mind that Docker logs are deleted if the container is re-created. + +== Message levels + +Some log messages contain text that seems like an error but they aren't errors. +Instead, the message's `level` indicates severity: + +* `level=info`: Expected and normal messages that typically aren't errors. + +* `level=debug`: Expected and normal messages that typically aren't errors. +However, they can help you find the source of a problem by providing information about the environment and conditions when the error occurred. ++ +`debug` messages are only recorded if you <>. + +* `level=warn`: Reports an event that wasn't fatal to the overall process but might indicate an issue with an individual request or connection. + +* `level=error`: Indicates an issue with {product-proxy}, the client application, or the clusters. +These messages require further examination. + +If the meaning of a `warn` or `error` message isn't clear, you can <>. + +== Common log messages + +Here are some of the most common messages in the {product-proxy} logs. + +{product-proxy} startup message:: +If the log level doesn't filter out `info` entries, you can look for a `Proxy started` log message to verify that {product-proxy} started correctly. +For example: ++ +[source,json] +---- +{"log":"time=\"2023-01-13T11:50:48Z\" level=info +msg=\"Proxy started. Waiting for SIGINT/SIGTERM to shutdown. +\"\n","stream":"stderr","time":"2023-01-13T11:50:48.522097083Z"} +---- + +{product-proxy} configuration message:: +If the log level doesn't filter out `info` entries, the first few lines of a {product-proxy} log file contain all configuration variables and values in a long JSON string. ++ +The following example log message is truncated for readability: ++ +[source,json] +---- +{"log":"time=\"2023-01-13T11:50:48Z\" level=info +msg=\"Parsed configuration: {\\\"ProxyIndex\\\":1,\\\"ProxyAddresses\\\":"...", +...TRUNCATED... +","stream":"stderr","time":"2023-01-13T11:50:48.339225051Z"} +---- ++ +Configuration settings can help with troubleshooting. ++ +To make this message easier to read, pass it through a JSON formatter or paste it into a text editor that can reformat JSON. + +Protocol log messages:: +There are cases where protocol errors are fatal, and they will kill an active connection that was being used to serve requests. +However, it is also possible to get normal protocol log messages that contain wording that sounds like an error. ++ +For example, the following `DEBUG` message contains the phrases `force a downgrade` and `unsupported protocol version`, which can sound like errors: ++ +[source,json] +---- +{"log":"time=\"2023-01-13T12:02:12Z\" level=debug msg=\"[TARGET-CONNECTOR] +Protocol v5 detected while decoding a frame. Returning a protocol message +to the client to force a downgrade: PROTOCOL (code=Code Protocol [0x0000000A], +msg=Invalid or unsupported protocol version (5)).\"\n","stream":"stderr","time":"2023-01-13T12:02:12.379287735Z"} +---- ++ +However, `level=debug` indicates that this is not an error. +Instead, this is a normal part of protocol version negotiation (handshake) during connection initialization. + +== See also + +* xref:ROOT:troubleshooting-tips.adoc[] \ No newline at end of file diff --git a/modules/ROOT/partials/zdm-logs-intro.adoc b/modules/ROOT/partials/zdm-logs-intro.adoc new file mode 100644 index 00000000..04bd663d --- /dev/null +++ b/modules/ROOT/partials/zdm-logs-intro.adoc @@ -0,0 +1 @@ +{product-proxy} logs can help you verify that your {product-proxy} instances are operating normally, investigate how processes are executed, and troubleshoot issues. \ No newline at end of file From 66055afaef8bb00cef7a2193aaa7a35e058078ff Mon Sep 17 00:00:00 2001 From: April M <36110273+aimurphy@users.noreply.github.com> Date: Wed, 15 Apr 2026 09:52:48 -0700 Subject: [PATCH 02/12] remove small tabs --- .../ROOT/pages/deploy-proxy-monitoring.adoc | 19 +++++-------------- modules/ROOT/pages/troubleshooting-tips.adoc | 16 ++++------------ 2 files changed, 9 insertions(+), 26 deletions(-) diff --git a/modules/ROOT/pages/deploy-proxy-monitoring.adoc b/modules/ROOT/pages/deploy-proxy-monitoring.adoc index 84494215..aae26a6b 100644 --- a/modules/ROOT/pages/deploy-proxy-monitoring.adoc +++ b/modules/ROOT/pages/deploy-proxy-monitoring.adoc @@ -75,28 +75,19 @@ If you previously populated the variables in `zdm_proxy_core_config.yml`, these However, consider updating your configuration to use the new file and take advantage of new features in later releases. ==== -. Get connection credentials for your origin and target clusters. +. Get connection credentials for your origin and target clusters: + -[tabs] -====== Self-managed clusters:: -+ --- For self-managed clusters with authentication enabled, you need a valid username and password for the cluster. - ++ If authentication isn't enabled, no credentials are required. --- {astra-db}:: -+ --- For {astra-db} databases, xref:astra-db-serverless:administration:manage-application-tokens.adoc[generate an application token] with a role that can read and write to your database, such as the {database-administrator-role} role, and then store the token securely. - -At minimum, store the core token that is prefixed by `AstraCS:...`. - ++ +At minimum, store the primary token that is prefixed by `AstraCS:...`. ++ For legacy authentication to earlier {astra-db} databases with an older token generated prior to the unified `AstraCS` token, you can use the `clientId` and `secret` instead of the core token. --- -====== . Edit the `zdm_proxy_cluster_config.yml` file. The `vi` and `nano` text editors are available in the container. diff --git a/modules/ROOT/pages/troubleshooting-tips.adoc b/modules/ROOT/pages/troubleshooting-tips.adoc index ee261414..918c8ca1 100644 --- a/modules/ROOT/pages/troubleshooting-tips.adoc +++ b/modules/ROOT/pages/troubleshooting-tips.adoc @@ -462,41 +462,33 @@ These are reported as `level=debug`, so {product-proxy} isn't affected by them. There are two ways to resolve this issue: -[tabs] -====== Disable {metrics-collector} (recommended):: + --- . On the origin {dse-short} cluster, disable {metrics-collector}: + [source,bash] ---- dsetool insights_config --mode DISABLED ---- - ++ . Run the following command to verify that `mode` is set to `DISABLED`: + [source,bash] ---- dsetool insights_config --show_config ---- --- Grant InsightsRpc permissions:: -+ --- Only use this option if you cannot disable {metrics-collector}. - ++ Using a superuser role, grant the appropriate permissions to the user named in the logs: - ++ [source,bash,subs="+quotes"] ---- GRANT EXECUTE ON REMOTE OBJECT InsightsRpc TO **USER**; ---- - ++ Replace **USER** with the actual username given in the logs. --- -====== [#report-an-issue] == Report an issue From 33a09bb20567c4a0f02251432fc530e16cccc8a4 Mon Sep 17 00:00:00 2001 From: April M <36110273+aimurphy@users.noreply.github.com> Date: Wed, 15 Apr 2026 10:08:15 -0700 Subject: [PATCH 03/12] remove some collapsibles and move why migrate --- .../ROOT/pages/cassandra-data-migrator.adoc | 18 +++++---------- modules/ROOT/pages/components.adoc | 13 +++++++++++ .../ROOT/pages/connect-clients-to-target.adoc | 8 +++---- .../ROOT/pages/deploy-proxy-monitoring.adoc | 14 ++---------- .../ROOT/pages/feasibility-checklists.adoc | 6 ++--- modules/ROOT/pages/introduction.adoc | 22 ++++--------------- 6 files changed, 30 insertions(+), 51 deletions(-) diff --git a/modules/ROOT/pages/cassandra-data-migrator.adoc b/modules/ROOT/pages/cassandra-data-migrator.adoc index 526001f6..29346d1a 100644 --- a/modules/ROOT/pages/cassandra-data-migrator.adoc +++ b/modules/ROOT/pages/cassandra-data-migrator.adoc @@ -328,24 +328,18 @@ Specifically, see the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed. == Troubleshoot {cass-migrator-short} -.Java NoSuchMethodError -[%collapsible] -==== +Java NoSuchMethodError:: If you installed {spark-short} as a JAR file, and your {spark-short} and Scala versions aren't compatible with your installed version of {cass-migrator-short}, {cass-migrator-short} jobs can throw exceptions such a the following: - ++ [source,console] ---- Exception in thread "main" java.lang.NoSuchMethodError: 'void scala.runtime.Statics.releaseFence()' ---- - ++ Make sure that your {spark-short} binary is compatible with your {cass-migrator-short} version. If you installed an earlier version of {cass-migrator-short}, you might need to install an earlier {spark-short} binary. -==== -.Rerun a failed or partially completed job -[%collapsible] -==== +Rerun a failed or partially completed job:: You can use the `track-run` feature to track the progress of a migration or validation, and then, if necessary, use the `run-id` to rerun a failed job from the last successful migration or validation point. - -For more information, see the {cass-migrator-repo}[{cass-migrator-short} repository] and the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. -==== \ No newline at end of file ++ +For more information, see the {cass-migrator-repo}[{cass-migrator-short} repository] and the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file]. \ No newline at end of file diff --git a/modules/ROOT/pages/components.adoc b/modules/ROOT/pages/components.adoc index 408b5ff3..474b602a 100644 --- a/modules/ROOT/pages/components.adoc +++ b/modules/ROOT/pages/components.adoc @@ -10,6 +10,19 @@ For live migrations, {product-proxy} orchestrates activity-in-transition on your To move and validate data, you use data migration tools. You can use these tools alone or with {product-proxy}. +== When to use migration tools + +You can use migration tools for database platform migrations, upgrades, and other infrastructure changes that require synchronizing clusters for a period of time. +For example: + +* You want to move to a different database provider, such as {dse} to {hcd}. + +* You need to upgrade a cluster to a new version or new infrastructure, and an in-place upgrade is risky or impossible. + +* You want to move client applications from shared clusters to dedicated clusters for greater control over individual configurations. + +* You want to consolidate client applications running on separate clusters onto one shared cluster to minimize sprawl and maintenance. + == {product-proxy} The main component of the {company} {product} toolkit is {product-proxy-repo}[{product-proxy}], which is designed to be a lightweight proxy that handles all real-time requests generated by your client applications during the migration process. diff --git a/modules/ROOT/pages/connect-clients-to-target.adoc b/modules/ROOT/pages/connect-clients-to-target.adoc index 4597ff3e..887494ea 100644 --- a/modules/ROOT/pages/connect-clients-to-target.adoc +++ b/modules/ROOT/pages/connect-clients-to-target.adoc @@ -113,10 +113,9 @@ a| Required. * Client ID and secret authentication (Legacy): Set to the `secret` generated with your token. |=== -.Driver pseudocode comparison -[%collapsible] -==== -The two pseudocode examples provide a simplified comparison of the way a {cass-short} driver interacts with {astra-db} and self-managed {cass-short} clusters. +==== Compare driver pseudocode + +The following two pseudocode examples provide a simplified comparison of the way a {cass-short} driver interacts with {astra-db} and self-managed {cass-short} clusters. This pseudocode is for illustration purposes only; the exact syntax depends on your driver language and version. The first pseudocode example illustrates the connection to a self-managed {cass-short} cluster. @@ -188,7 +187,6 @@ my_cluster.close() // Print the data retrieved from the result set print(release_version) ---- -==== === Verify driver compatibility and update connection strings diff --git a/modules/ROOT/pages/deploy-proxy-monitoring.adoc b/modules/ROOT/pages/deploy-proxy-monitoring.adoc index aae26a6b..4ae4d2b2 100644 --- a/modules/ROOT/pages/deploy-proxy-monitoring.adoc +++ b/modules/ROOT/pages/deploy-proxy-monitoring.adoc @@ -17,13 +17,10 @@ docker exec -it zdm-ansible-container bash ---- + .Result -[%collapsible] -==== [source,bash] ---- ubuntu@52772568517c:~$ ---- -==== . List (`ls`) the contents of the Ansible Control Host Docker container, and then find the `zdm-proxy-automation` directory. @@ -64,8 +61,8 @@ In versions 2.3.0 and later, you can inject the configuration with a YAML file g For the cluster and core configuration, you need to provide connection credentials and details for both the origin and target clusters. -.{product-automation} version 2.1.0 or earlier -[%collapsible] +.Configuration file change in {product-automation} version 2.2.0 +[IMPORTANT] ==== Starting in version 2.2.0 of {product-automation}, all origin and target cluster configuration variables are stored in `zdm_proxy_cluster_config.yml`. In earlier versions, these variables are in the `zdm_proxy_core_config.yml` file. @@ -156,7 +153,6 @@ For example, if you use `target_astra_db_id` and `target_astra_token`, then `tar ====== + .Example: Cluster configuration -[%collapsible] ==== The following example shows the cluster configuration for a migration from a self-managed origin cluster to an {astra-db} target: @@ -502,8 +498,6 @@ docker logs zdm-proxy-container ---- + .Result -[%collapsible] -==== [source,console] ---- time="2023-01-13T22:21:42Z" level=info msg="Initialized origin control connection. Cluster Name: OriginCluster, Hosts: map[3025c4ad-7d6a-4398-b56e-87d33509581d:Host{addr: 191.100.20.61, @@ -514,7 +508,6 @@ port: 9042, host_id: 6973271339454cfea5ee0a84c7377eaa} 6ec35bc3-4ff4-4740-a16c-0 time="2023-01-13T22:21:42Z" level=info msg="Proxy connected and ready to accept queries on 172.18.10.111:9042" time="2023-01-13T22:21:42Z" level=info msg="Proxy started. Waiting for SIGINT/SIGTERM to shutdown." ---- -==== .. In the logs, look for messages containing `Proxy connected` and `Proxy started`: + @@ -532,14 +525,11 @@ docker ps ---- + .Result -[%collapsible] -==== [source,console] ---- CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 02470bbc1338 datastax/zdm-proxy:2.1.x "/main" 2 hours ago Up 2 hours zdm-proxy-container ---- -==== == Troubleshoot deployment issues diff --git a/modules/ROOT/pages/feasibility-checklists.adoc b/modules/ROOT/pages/feasibility-checklists.adoc index e04ca846..7cab8b64 100644 --- a/modules/ROOT/pages/feasibility-checklists.adoc +++ b/modules/ROOT/pages/feasibility-checklists.adoc @@ -20,9 +20,8 @@ Otherwise, a lower protocol version must be used. If the requested version isn't mutually supported, then {product-proxy} can force the client application to downgrade to a mutually supported protocol version. If automatic forced downgrade isn't possible, then the connection fails, and you must modify your client application to request a different protocol version. -.Determine your client application's supported and negotiated protocol versions -[%collapsible] -==== +=== Determine your client application's supported and negotiated protocol versions + Outside of a migration scenario (without {product-proxy}), the supported protocol versions depend on your origin cluster's version and client application's driver version. Generally, when connecting to a cluster, the driver requests the highest protocol version that it supports. @@ -34,7 +33,6 @@ For example, if the cluster and driver both support `V5`, then your client appli If you upgrade your cluster, driver, or both to a version with a higher mutually supported protocol version, then the driver automatically starts using the higher version unless you explicitly disable it in your driver configuration. When you introduce {product-proxy}, the target cluster is integrated into the protocol negotiation process to ensure that the negotiated protocol version is supported by the origin cluster, target cluster, and driver. -==== === Considerations and requirements for `V5` diff --git a/modules/ROOT/pages/introduction.adoc b/modules/ROOT/pages/introduction.adoc index 45816c59..259ef8f1 100644 --- a/modules/ROOT/pages/introduction.adoc +++ b/modules/ROOT/pages/introduction.adoc @@ -4,22 +4,6 @@ With the {product} ({product-short}) process, your applications can continue to run while you migrate data from one {cass-short}-based database to another, resulting in little or no downtime and minimal service interruptions. -.Why migrate? -[%collapsible] -==== -There are many reasons that you might need to migrate data and applications. -For example: - -* You want to move to a different database provider. -For example, you might move from self-managed clusters to a cloud-based Database-as-a-Service (DBaaS), such as {astra-db}. - -* You need to upgrade a cluster to a newer version or infrastructure. - -* You want to move client applications from shared clusters to dedicated clusters for greater control over individual configurations. - -* You want to consolidate client applications running on separate clusters onto one shared cluster to minimize sprawl and maintenance. -==== - The {product-short} process uses {product-proxy}, {product-utility}, and {product-automation} to orchestrate live reads and writes on your databases while you move and validate data with a data migration tool, such as {sstable-sideloader}, {cass-migrator}, or {dsbulk}. {product-proxy} keeps your databases in sync at all times through its dual-writes feature, which means you can seamlessly stop or abandon the migration at any point before the last phase of the migration (the final cutover to the new database). @@ -28,10 +12,12 @@ For more information about these tools, see xref:ROOT:components.adoc[]. When the migration is complete, all data is present in the new database, and you can switch your client applications to connect exclusively to the new database. The old database becomes obsolete and can be shut down. -== Requirements for zero downtime - +.Requirements for zero downtime +[IMPORTANT] +==== True zero downtime migration with {product-proxy} is only possible if your database meets the minimum requirements, including cluster compatibility, that are described in xref:ROOT:feasibility-checklists.adoc[] If your database doesn't meet these requirements, you can still complete the migration, but you might not be able to use {product-proxy} and some downtime might be necessary to finish the migration. +==== == Migration phases From 56f751ee20060b0931c21d44bf16dfb5d2a62bc4 Mon Sep 17 00:00:00 2001 From: April M <36110273+aimurphy@users.noreply.github.com> Date: Wed, 15 Apr 2026 10:17:07 -0700 Subject: [PATCH 04/12] more collapsibles --- .../ROOT/pages/manage-proxy-instances.adoc | 3 - modules/ROOT/pages/metrics.adoc | 3 - modules/ROOT/pages/troubleshooting-tips.adoc | 16 ++--- modules/ROOT/pages/zdm-logs.adoc | 3 - .../sideloader/pages/migrate-sideloader.adoc | 18 +----- .../sideloader/pages/sideloader-overview.adoc | 63 ++++++++++++++++++- modules/sideloader/partials/import.adoc | 35 ----------- modules/sideloader/partials/initialize.adoc | 24 ------- 8 files changed, 68 insertions(+), 97 deletions(-) delete mode 100644 modules/sideloader/partials/import.adoc delete mode 100644 modules/sideloader/partials/initialize.adoc diff --git a/modules/ROOT/pages/manage-proxy-instances.adoc b/modules/ROOT/pages/manage-proxy-instances.adoc index f66cf03e..3eee2cf8 100644 --- a/modules/ROOT/pages/manage-proxy-instances.adoc +++ b/modules/ROOT/pages/manage-proxy-instances.adoc @@ -39,13 +39,10 @@ docker exec -it zdm-ansible-container bash ---- + .Result -[%collapsible] -==== [source,bash] ---- ubuntu@52772568517c:~$ ---- -==== . Run the rolling restart playbook: + diff --git a/modules/ROOT/pages/metrics.adoc b/modules/ROOT/pages/metrics.adoc index b12e4d02..8966753a 100644 --- a/modules/ROOT/pages/metrics.adoc +++ b/modules/ROOT/pages/metrics.adoc @@ -33,13 +33,10 @@ docker exec -it zdm-ansible-container bash ---- + .Result -[%collapsible] -==== [source,bash] ---- ubuntu@52772568517c:~$ ---- -==== . To configure the Grafana credentials, edit the `zdm_monitoring_config.yml` file that is located at `zdm-proxy-automation/ansible/vars`: + diff --git a/modules/ROOT/pages/troubleshooting-tips.adoc b/modules/ROOT/pages/troubleshooting-tips.adoc index 918c8ca1..72cf5d23 100644 --- a/modules/ROOT/pages/troubleshooting-tips.adoc +++ b/modules/ROOT/pages/troubleshooting-tips.adoc @@ -32,16 +32,13 @@ For example, you can use the following Docker command, replacing `**TAG**` with docker run --rm datastax/zdm-proxy:**TAG** -version ---- -.Result -[%collapsible] -==== The output shows the binary version of {product-proxy} that is currently running: +.Result [source,console] ---- ZDM proxy version 2.1.0 ---- -==== [IMPORTANT] ==== @@ -159,9 +156,10 @@ If you observe this behavior in your logs, <> s If the {product-proxy} logs contain `debug` messages with `Invalid or unsupported protocol version: 3`, this means that one of the origin clusters doesn't support protocol `V3` or later. -.Invalid or unsupported protocol version logs -[%collapsible] -==== +Specifically, this happens with {cass-short} 2.0 and {dse-short} 4.6. +{product-short} cannot be used for these migrations because the {product-proxy} control connections don't perform protocol version negotiation; they only attempt to use `V3`. + +.Example: Invalid or unsupported protocol version logs [source,log] ---- time="2022-10-01T19:58:15+01:00" level=info msg="Starting proxy..." @@ -189,10 +187,6 @@ time="2022-10-01T19:58:15+01:00" level=debug msg="Shutting down the schedulers a time="2022-10-01T19:58:15+01:00" level=info msg="Proxy shutdown complete." time="2022-10-01T19:58:15+01:00" level=error msg="Couldn't start proxy, retrying in 2.229151525s: failed to initialize origin control connection: could not open control connection to ORIGIN, tried endpoints: [127.0.0.1:9042]." ---- -==== - -Specifically, this happens with {cass-short} 2.0 and {dse-short} 4.6. -{product-short} cannot be used for these migrations because the {product-proxy} control connections don't perform protocol version negotiation; they only attempt to use `V3`. === Authentication errors diff --git a/modules/ROOT/pages/zdm-logs.adoc b/modules/ROOT/pages/zdm-logs.adoc index 54708b40..e0eab094 100644 --- a/modules/ROOT/pages/zdm-logs.adoc +++ b/modules/ROOT/pages/zdm-logs.adoc @@ -60,13 +60,10 @@ docker exec -it zdm-ansible-container bash ---- + .Result -[%collapsible] -==== [source,bash] ---- ubuntu@52772568517c:~$ ---- -==== . Run the log collection playbook: + diff --git a/modules/sideloader/pages/migrate-sideloader.adoc b/modules/sideloader/pages/migrate-sideloader.adoc index e2835603..1a2a9e75 100644 --- a/modules/sideloader/pages/migrate-sideloader.adoc +++ b/modules/sideloader/pages/migrate-sideloader.adoc @@ -266,11 +266,7 @@ If you don't have jq installed, remove `| jq .` from the end of each command. Use the {devops-api} to initialize the migration and get your migration directory path and credentials. -.What happens during initialization? -[%collapsible] -==== -include::sideloader:partial$initialize.adoc[] -==== +To learn more about the initialization process, see xref:sideloader:sideloader-overview.adoc[About {sstable-sideloader}: Initialize a migration]. The initialization process can take several minutes to complete, especially if the migration bucket doesn't already exist. @@ -521,8 +517,6 @@ include::sideloader:partial$command-placeholders-common.adoc[] + .Example: Upload a snapshot with AWS CLI -[%collapsible] -==== [source,bash] ---- # Set environment variables @@ -534,7 +528,6 @@ export AWS_SESSION_TOKEN=XXXXXXXXXX du -sh /var/lib/cassandra/data/smart_home/*/snapshots/*sensor_readings*; \ aws s3 sync --only-show-errors --exclude '*' --include '*/snapshots/sensor_readings*' /var/lib/cassandra/data/ s3://ds-mig-b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d/272eac1d-df8e-4d1b-a7c6-71d5af232182/sstables/dse0 ---- -==== . Monitor upload progress: + @@ -605,8 +598,6 @@ include::sideloader:partial$command-placeholders-common.adoc[] + .Example: Upload a snapshot with gcloud and gsutil -[%collapsible] -==== [source,bash,subs="attributes"] ---- # Authenticate @@ -615,7 +606,6 @@ gcloud auth activate-service-account --key-file=creds.json # Upload "sensor_readings" snapshot from "dse0" node gsutil -m rsync -r -d /var/lib/cassandra/data/smart_home/{asterisk}{asterisk}/snapshots/sensor_readings/ gs://ds-mig-b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d/272eac1d-df8e-4d1b-a7c6-71d5af232182/sstables/dse0 ---- -==== . Monitor upload progress: + @@ -737,11 +727,7 @@ Data import is a multi-step operation that requires complete success. If one step fails, then the entire import operation stops and the migration fails. //Does all data fail to import or is it possible to have a partial import? -.What happens during data import? -[%collapsible] -====== -include::sideloader:partial$import.adoc[] -====== +To learn more about the data import process, see xref:sideloader:sideloader-overview.adoc[About {sstable-sideloader}: Import data]. [WARNING] ==== diff --git a/modules/sideloader/pages/sideloader-overview.adoc b/modules/sideloader/pages/sideloader-overview.adoc index 0c441f9e..c6a9c73a 100644 --- a/modules/sideloader/pages/sideloader-overview.adoc +++ b/modules/sideloader/pages/sideloader-overview.adoc @@ -59,9 +59,33 @@ However, you might need to modify your schema or data model to be compatible wit For specific requirements and more information, see xref:sideloader:migrate-sideloader.adoc#record-schema[Migrate data with {sstable-sideloader}: Configure the target database]. +[#initialize-a-migration] === Initialize a migration -include::sideloader:partial$initialize.adoc[] +After you create snapshots on the origin cluster and pre-configure the schema on the target database, use the {astra} {devops-api} to initialize the migration. + +.{sstable-sideloader} moves data from the migration bucket to {astra-db}. +svg::sideloader:data-importer-workflow.svg[] + +When you initialize a migration, {sstable-sideloader} does the following: + +. Creates a secure migration bucket. ++ +The migration bucket is only created during the first initialization. +All subsequent migrations use different directories in the same migration bucket. ++ +{company} owns the migration bucket, and it is located within the {astra} perimeter. + +. Generates a migration ID that is unique to the new migration. + +. Creates a migration directory within the migration bucket that is unique to the new migration. ++ +The migration directory is also referred to as the `uploadBucketDir`. +In the next phase of the migration process, you will upload your snapshots to this migration directory. + +. Generates upload credentials that grant read/write access to the migration directory. ++ +The credentials are formatted according to the cloud provider where your target database is deployed. For instructions and more information, see xref:sideloader:migrate-sideloader.adoc#initialize-migration[Migrate data with {sstable-sideloader}: Initialize the migration]. @@ -103,9 +127,44 @@ If the data must also change cloud providers, there can be additional delays. In this case, consider creating your target database in a co-located datacenter, and then xref:astra-db-serverless:databases:manage-regions.adoc[deploy your database to other regions] after the migration. |=== +[#import-data] === Import data -include::sideloader:partial$import.adoc[] +After uploading the snapshots to the migration directory, use the {devops-api} to start the data import process. + +During the import process, {sstable-sideloader} does the following: + +. Revokes access to the migration directory. ++ +You cannot read or write to the migration directory after starting the data import process. + +. Discovers all uploaded SSTables in the migration directory, and then groups them into approximately same-sized subsets. + +. Runs validation checks on each subset. + +. Converts all SSTables of each subset. + +. Disables new compactions on the target database. ++ +[WARNING] +==== +This is the last point at which you can xref:sideloader:stop-restart-sideloader.adoc#abort-migration[abort the migration]. + +Once {sstable-sideloader} begins to import SSTable metadata (the next step), you cannot stop the migration. +==== + +. Imports metadata from each SSTable. ++ +If the dataset contains tombstones, any read operations on the target database can return inconsistent results during this step. +Since compaction is disabled, there is no risk of permanent inconsistencies. +However, in the context of xref:ROOT:introduction.adoc[{product}], it's important that the {product-short} proxy continues to read from the origin cluster. + +. Re-enables compactions on the {astra-db} Serverless database. + +Each step must finish successfully. +If one step fails, the import operation stops and no data is imported into your target database. + +If all steps finish successfully, the migration is complete and you can access the imported data in your target database. For instructions and more information, see xref:sideloader:migrate-sideloader.adoc#import-data[Migrate data with {sstable-sideloader}: Import data] diff --git a/modules/sideloader/partials/import.adoc b/modules/sideloader/partials/import.adoc deleted file mode 100644 index b164d764..00000000 --- a/modules/sideloader/partials/import.adoc +++ /dev/null @@ -1,35 +0,0 @@ -After uploading the snapshots to the migration directory, use the {devops-api} to start the data import process. - -During the import process, {sstable-sideloader} does the following: - -. Revokes access to the migration directory. -+ -You cannot read or write to the migration directory after starting the data import process. - -. Discovers all uploaded SSTables in the migration directory, and then groups them into approximately same-sized subsets. - -. Runs validation checks on each subset. - -. Converts all SSTables of each subset. - -. Disables new compactions on the target database. -+ -[WARNING] -==== -This is the last point at which you can xref:sideloader:stop-restart-sideloader.adoc#abort-migration[abort the migration]. - -Once {sstable-sideloader} begins to import SSTable metadata (the next step), you cannot stop the migration. -==== - -. Imports metadata from each SSTable. -+ -If the dataset contains tombstones, any read operations on the target database can return inconsistent results during this step. -Since compaction is disabled, there is no risk of permanent inconsistencies. -However, in the context of xref:ROOT:introduction.adoc[{product}], it's important that the {product-short} proxy continues to read from the origin cluster. - -. Re-enables compactions on the {astra-db} Serverless database. - -Each step must finish successfully. -If one step fails, the import operation stops and no data is imported into your target database. - -If all steps finish successfully, the migration is complete and you can access the imported data in your target database. \ No newline at end of file diff --git a/modules/sideloader/partials/initialize.adoc b/modules/sideloader/partials/initialize.adoc deleted file mode 100644 index 3e288f43..00000000 --- a/modules/sideloader/partials/initialize.adoc +++ /dev/null @@ -1,24 +0,0 @@ -After you create snapshots on the origin cluster and pre-configure the schema on the target database, use the {astra} {devops-api} to initialize the migration. - -.{sstable-sideloader} moves data from the migration bucket to {astra-db}. -svg::sideloader:data-importer-workflow.svg[] - -When you initialize a migration, {sstable-sideloader} does the following: - -. Creates a secure migration bucket. -+ -The migration bucket is only created during the first initialization. -All subsequent migrations use different directories in the same migration bucket. -+ -{company} owns the migration bucket, and it is located within the {astra} perimeter. - -. Generates a migration ID that is unique to the new migration. - -. Creates a migration directory within the migration bucket that is unique to the new migration. -+ -The migration directory is also referred to as the `uploadBucketDir`. -In the next phase of the migration process, you will upload your snapshots to this migration directory. - -. Generates upload credentials that grant read/write access to the migration directory. -+ -The credentials are formatted according to the cloud provider where your target database is deployed. \ No newline at end of file From e387919d94c4502a0462485a0bc05ae8e4feb153 Mon Sep 17 00:00:00 2001 From: April M <36110273+aimurphy@users.noreply.github.com> Date: Wed, 15 Apr 2026 10:44:37 -0700 Subject: [PATCH 05/12] last collapsibles --- .../sideloader/pages/migrate-sideloader.adoc | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/modules/sideloader/pages/migrate-sideloader.adoc b/modules/sideloader/pages/migrate-sideloader.adoc index 1a2a9e75..6cfa2d03 100644 --- a/modules/sideloader/pages/migrate-sideloader.adoc +++ b/modules/sideloader/pages/migrate-sideloader.adoc @@ -48,8 +48,8 @@ Replace *`SNAPSHOT_NAME`* with a descriptive name for the snapshot. Use the same snapshot name on each node. This makes it easier to programmatically upload the snapshots to the migration directory. -.Optional: Use a for loop to simplify snapshot creation -[%collapsible] +.Use a `for` loop to simplify snapshot creation +[TIP] ==== If the nodes in your origin cluster are named in a predictable way (for example, `dse0`, `dse1`, `dse2`, etc.), you can use a `for` loop to simplify snapshot creation. For example: @@ -95,8 +95,8 @@ To include multiple keyspaces, list each keyspace separated by a space as shown Use the same snapshot name on each node. This makes it easier to programmatically upload the snapshots to the migration directory. -.Optional: Use a for loop to simplify snapshot creation -[%collapsible] +.Use a `for` loop to simplify snapshot creation +[TIP] ==== If the nodes in your origin cluster are named in a predictable way (for example, `dse0`, `dse1`, `dse2`, etc.), you can use a `for` loop to simplify snapshot creation. For example: @@ -147,8 +147,8 @@ To include multiple tables from one or more keyspaces, list each *`KEYSPACE_NAME Use the same snapshot name on each node. This makes it easier to programmatically upload the snapshots to the migration directory. -.Optional: Use a for loop to simplify snapshot creation -[%collapsible] +.Use a `for` loop to simplify snapshot creation +[TIP] ==== If the nodes in your origin cluster are named in a predictable way (for example, `dse0`, `dse1`, `dse2`, etc.), you can use a `for` loop to simplify snapshot creation. For example: @@ -549,9 +549,9 @@ However, upload time primarily depends on the snapshot size, network throughput . Repeat the upload process for each snapshot (*`SNAPSHOT_NAME`*) and node (*`NODE_NAME`*) in your origin cluster. + If your credentials expire, see xref:sideloader:troubleshoot-sideloader.adoc#get-new-upload-credentials[Get new upload credentials]. - -.Optional: Use a for loop to simplify snapshot uploads -[%collapsible] ++ +.Use a `for` loop to simplify snapshot creation +[TIP] ==== If the nodes in your origin cluster have predictable names (for example, `dse0`, `dse1`, and `dse2`), then you can use a `for` loop to streamline the execution of the upload commands. For example: @@ -625,9 +625,9 @@ The `https://cloud.google.com/storage/docs/gsutil/commands/rsync#description[-m] However, upload time primarily depends on the snapshot size, network throughput from your origin cluster to the migration bucket, and whether the origin cluster and migration bucket are in the same region. . Repeat the upload process for each snapshot (*`SNAPSHOT_NAME`*) and node (*`NODE_NAME`*) in your origin cluster. - -.Optional: Use a for loop to simplify snapshot uploads -[%collapsible] ++ +.Use a `for` loop to simplify snapshot creation +[TIP] ==== If the nodes in your origin cluster have predictable names (for example, `dse0`, `dse1`, and `dse2`), then you can use a `for` loop to streamline the execution of the `gsutil rsync` commands. For example: From 3e5004bb3f96a419cff281d5a0a96c7da5719052 Mon Sep 17 00:00:00 2001 From: April M <36110273+aimurphy@users.noreply.github.com> Date: Wed, 15 Apr 2026 11:05:55 -0700 Subject: [PATCH 06/12] remove tabs from prepare sideloader --- antora.yml | 1 + .../sideloader/pages/prepare-sideloader.adoc | 103 ++++++++---------- 2 files changed, 46 insertions(+), 58 deletions(-) diff --git a/antora.yml b/antora.yml index 8ebc0dc9..2ec95d0c 100644 --- a/antora.yml +++ b/antora.yml @@ -32,6 +32,7 @@ asciidoc: astra: 'Astra' db-serverless: 'Serverless (non-vector)' db-serverless-vector: 'Serverless (vector)' + db-classic: 'Astra Managed Clusters' astra-ui: 'Astra Portal' astra-url: 'https://astra.datastax.com' astra-ui-link: '{astra-url}[{astra-ui}^]' diff --git a/modules/sideloader/pages/prepare-sideloader.adoc b/modules/sideloader/pages/prepare-sideloader.adoc index 8bd78d0a..40b1ea94 100644 --- a/modules/sideloader/pages/prepare-sideloader.adoc +++ b/modules/sideloader/pages/prepare-sideloader.adoc @@ -20,11 +20,16 @@ Make sure you understand how to securely store and use sensitive credentials whe == Target {astra-db} database requirements -* Your {astra} organization must be on an *Enterprise* xref:astra-db-serverless:administration:subscription-plans.adoc[subscription plan]. -+ +The following requirements, recommendations, and limitations apply to the target {astra-db} database. +Review all of these to ensure that your database is compatible with {sstable-sideloader}. + +=== {product-short} subscription plan requirement + +Your {astra} organization must be on an *Enterprise* xref:astra-db-serverless:administration:subscription-plans.adoc[subscription plan]. + {sstable-sideloader} is a premium feature that incurs costs based on usage. This includes the total amount (GB) of data processed as part of the {sstable-sideloader} workload, and the amount of data stored in the migration bucket is metered at the standard {astra-db} storage rate. -+ + [TIP] ==== Migration directories are automatically cleaned up after one week of idle time. @@ -32,50 +37,45 @@ Migration directories are automatically cleaned up after one week of idle time. To minimize costs, you can xref:sideloader:cleanup-sideloader.adoc[manually clean up migration directories] when you no longer need them. ==== -* Your target database must be an {astra-db} Serverless database. -+ -If you don't already have one, xref:astra-db-serverless:databases:create-database.adoc[create a database]. +=== Database type requirement + +Your target database must be an {astra-db} Serverless database. +{sstable-sideloader} isn't compatible with {db-classic} databases. + +If you haven't done so already, xref:astra-db-serverless:databases:create-database.adoc[create a database]. You can use either a {db-serverless} or {db-serverless-vector} database. -+ -{db-serverless-vector} databases can store both vector and non-vector data. +{db-serverless-vector} databases support both fixed-schema tables and dynamic-schema collections. -* Your target database must be in a xref:astra-db-serverless:administration:provisioned-capacity-units.adoc[Provisioned Capacity Unit (PCU) group]. +=== PCU group requirement + +Your target database must be in a xref:astra-db-serverless:administration:provisioned-capacity-units.adoc[Provisioned Capacity Unit (PCU) group]. You can use either a flexible capacity PCU group or a committed capacity PCU group, depending on your long-term needs and other PCU group usage. -+ -[tabs] -====== -Flexible capacity PCU group:: -+ --- -Because {sstable-sideloader} operations are typically short-term, resource-intensive events, you can create a flexible capacity PCU group exclusively to support your target database during the migration. -{company} recommends the following flexible capacity PCU group configuration for {sstable-sideloader} migrations. -For instructions, see xref:astra-db-serverless:administration:create-pcu.adoc#flexible-capacity[Create a flexible capacity PCU group]. +==== Use a flexible capacity PCU group (Recommended) -[tabs] -==== -Target database is a {db-serverless} database:: +Because {sstable-sideloader} operations are typically short-term, resource-intensive events, you can xref:astra-db-serverless:administration:create-pcu.adoc#flexible-capacity[create a flexible capacity PCU group] exclusively to support your target database during the migration. + +After the migration, you can move your target database out of the flexible capacity PCU group, and then park or delete the group. +Don't park the PCU group during the {sstable-sideloader} process because databases in a parked PCU group are hibernated and unavailable for use. + +{company} recommends the following configurations for flexible capacity PCU groups for {sstable-sideloader} migrations: + +* **Recommended configuration if the target database is a {db-serverless} database**: + -* Minimum capacity: One or more, depending on the scale of the migration. -* Maximum capacity: Greater than the minimum by several units to allow autoscaling during resource intensive stages of the migration. +** **Minimum capacity**: One or more, depending on the scale of the migration. +** **Maximum capacity**: Greater than the minimum by several units to allow autoscaling during resource intensive stages of the migration. + For non-trivial migrations, consider setting the maximum to 10. For extremely large migrations, contact your {company} account representative or {support-url}[IBM Support] to request more than 10 units to support your migration. -Target database is a {db-serverless-vector} database:: +* **Recommended configuration if the target database is a {db-serverless-vector} database**: + By default, {db-serverless-vector} databases can have no more than one unit per PCU group. For any non-trivial migration, contact your {company} account representative or {support-url}[IBM Support] for assistance configuring a PCU group for your target {db-serverless-vector} database. -==== -After the migration, you can move your target database out of the flexible capacity PCU group, and then park or delete the group. -Don't park the PCU group during the {sstable-sideloader} process because databases in a parked PCU group are hibernated and unavailable for use. --- +==== Use a committed capacity PCU group -Committed capacity PCU group:: -+ --- -For a long-term PCU group option, you can use a committed capacity PCU group for your target database. +For a long-term PCU group option, you can use a xref:astra-db-serverless:administration:create-pcu.adoc#committed-capacity[committed capacity PCU group] for your target database. This could be your database's permanent PCU group assignment, or it could be a long-lived PCU group that you use for many migrations over time, adding and removing databases from the group as needed. [IMPORTANT] @@ -86,37 +86,31 @@ If there are any other databases in the same PCU group, the migration process ca To avoid interfering with other databases in the same PCU group, {company} recommends isolating the database during the migration using either a single-database committed capacity PCU group or a flexible capacity PCU group. ==== -{company} recommends the following committed capacity PCU group configuration for {sstable-sideloader} migrations. -For instructions, see xref:astra-db-serverless:administration:create-pcu.adoc#committed-capacity[Create a committed capacity PCU group]. +{company} recommends the following configurations for committed capacity PCU groups for {sstable-sideloader} migrations: -[tabs] -==== -Target database is a {db-serverless} database:: +* **Recommended configuration if the target database is a {db-serverless} database**: + -* Reserved capacity: One or more, depending on the PCU group's normal, long-term workload requirements. +-- +** **Reserved capacity**: One or more, depending on the PCU group's normal, long-term workload requirements. + This is the amount of long-term capacity that you want the group to have after the migration is complete. -* Minimum capacity: Equal to or greater than the reserved capacity. +** **Minimum capacity**: Equal to or greater than the reserved capacity. + If the minimum is greater than the reserved capacity, the surplus capacity is prepared in advance, and there is no autoscaling required to access that capacity. -* Maximum capacity: Greater than the minimum by several units to allow autoscaling during resource intensive stages of the migration. +* **Maximum capacity**: Greater than the minimum by several units to allow autoscaling during resource intensive stages of the migration. + For non-trivial migrations, consider setting the maximum to 10. For extremely large migrations, contact your {company} account representative or {support-url}[IBM Support] to request more than 10 units to support your migration. +-- + After the migration, you can reduce the minimum and maximum capacity down to the levels required for normal database operations. -Target database is a {db-serverless-vector} database:: +* **Recommended configuration if the target database is a {db-serverless-vector} database**: + By default, {db-serverless-vector} databases can have no more than one unit per PCU group. For any non-trivial migration, contact your {company} account representative or {support-url}[IBM Support] for assistance configuring a PCU group for your target {db-serverless-vector} database. -==== --- -====== -+ -For more information, see xref:astra-db-serverless:administration:provisioned-capacity-units.adoc[]. [#origin-cluster-requirements] == Origin cluster requirements @@ -208,13 +202,10 @@ include::ROOT:partial$multi-region-migrations.adoc[] You can migrate data from any number of nodes in your origin cluster to the same target database or multiple target databases. When you xref:sideloader:migrate-sideloader.adoc[migrate data with {sstable-sideloader}], there is no difference in the core process when migrating from one node or multiple nodes. -The following steps summarize the process and outline some considerations for migrating multiple nodes. +The following steps summarize the process and considerations for migrating multiple nodes. + +==== Migrate multiple nodes to one database -[tabs] -====== -Migrate multiple nodes to one database:: -+ --- . On your origin cluster, make sure your data is valid and ready to migrate, as explained in <>. . From your origin cluster, create snapshots for all of the nodes that you want to migrate. @@ -242,11 +233,9 @@ The success of the import depends primarily on the validity of the schemas and t . After the import, validate the migrated data to ensure that it matches the data in the origin cluster. For example, you can xref:ROOT:cassandra-data-migrator.adoc#cdm-validation-steps[run {cass-migrator-short} in validation mode]. --- -Migrate multiple nodes to multiple databases:: -+ --- +==== Migrate multiple nodes to multiple databases + Orchestrating concurrent migrations from multiple nodes to multiple target databases can be complex. Consider focusing on one target database at a time, or create a migration plan to track origin nodes, target databases, migration bucket credentials, and timelines for each migration. @@ -285,8 +274,6 @@ The success of the import depends primarily on the validity of the schemas and t . After the import, validate the migrated data to ensure that it matches the data in the origin cluster. For example, you can xref:ROOT:cassandra-data-migrator.adoc#cdm-validation-steps[run {cass-migrator-short} in validation mode]. --- -====== === Multiple migrations to the same database From 1aa2d6d12e3db91a705ea69cc6a9f1320db6637c Mon Sep 17 00:00:00 2001 From: April M <36110273+aimurphy@users.noreply.github.com> Date: Wed, 15 Apr 2026 13:53:43 -0700 Subject: [PATCH 07/12] remove tabs from sidelaoder --- .../sideloader/pages/migrate-sideloader.adoc | 285 +++++++----------- .../idle-migration-directories-note.adoc | 10 + .../staged-snapshots-need-import-ph.adoc | 2 + 3 files changed, 124 insertions(+), 173 deletions(-) create mode 100644 modules/sideloader/partials/idle-migration-directories-note.adoc create mode 100644 modules/sideloader/partials/staged-snapshots-need-import-ph.adoc diff --git a/modules/sideloader/pages/migrate-sideloader.adoc b/modules/sideloader/pages/migrate-sideloader.adoc index 6cfa2d03..8ca43c77 100644 --- a/modules/sideloader/pages/migrate-sideloader.adoc +++ b/modules/sideloader/pages/migrate-sideloader.adoc @@ -30,70 +30,70 @@ Don't create snapshots of system tables or tables that you don't want to migrate The migration can fail if you attempt to migrate snapshots that don't have a matching schema in the target database. {sstable-sideloader} ignores system keyspaces. + -The structure of the `nodetool snapshot` command depends on the keyspaces and tables that you want to migrate. +The structure of the `nodetool snapshot` command depends on the keyspaces and tables that you want to migrate: + -[tabs] -====== -All keyspaces:: -+ --- +Snapshot all keyspaces:: Create a snapshot of all tables in all keyspaces: - ++ [source,bash,subs="+quotes"] ---- nodetool snapshot -t *SNAPSHOT_NAME* ---- ++ +Replace the following: ++ +* *`SNAPSHOT_NAME`*: A descriptive name for the snapshot. +Use the same snapshot name for each node's snapshot; this makes it easier to programmatically upload the snapshots to the migration directory. -Replace *`SNAPSHOT_NAME`* with a descriptive name for the snapshot. -Use the same snapshot name on each node. -This makes it easier to programmatically upload the snapshots to the migration directory. - -.Use a `for` loop to simplify snapshot creation -[TIP] -==== -If the nodes in your origin cluster are named in a predictable way (for example, `dse0`, `dse1`, `dse2`, etc.), you can use a `for` loop to simplify snapshot creation. -For example: - +Snapshot specific keyspaces:: +Create a snapshot of all tables in one or more specified keyspaces: ++ +.Snapshot one keyspace [source,bash,subs="+quotes"] ---- -for i in 0 1 2; do ssh dse${i} nodetool snapshot -t *SNAPSHOT_NAME*; done +nodetool snapshot -t *SNAPSHOT_NAME* *KEYSPACE_NAME* ---- - -You can use the same `for` loop to verify that each snapshot was successfully created: - -[source,bash] ++ +.Snapshot multiple keyspaces +[source,bash,subs="+quotes"] ---- -for i in 0 1 2; do ssh dse${i} nodetool listsnapshots; done +nodetool snapshot -t *SNAPSHOT_NAME* *KEYSPACE_NAME_1* *KEYSPACE_NAME_2* ---- -==== --- - -Specific keyspaces:: + --- -Create a snapshot of all tables in one or more keyspaces: +Replace the following: ++ +* *`SNAPSHOT_NAME`*: A descriptive name for the snapshot. +Use the same snapshot name for each node's snapshot; this makes it easier to programmatically upload the snapshots to the migration directory. +* *`KEYSPACE_NAME`*: The name of the keyspace that you want to migrate. ++ +To snapshot multiple keyspaces, pass a space-separated list of keyspace names. +For example, `customer_data product_data purchase_history` specifies three keyspaces. -.Single keyspace +Snapshot specific tables:: +Create a snapshot of one or more specified tables: ++ +.Snapshot one table [source,bash,subs="+quotes"] ---- -nodetool snapshot -t *SNAPSHOT_NAME* *KEYSPACE_NAME* +nodetool snapshot -kt *KEYSPACE_NAME*.*TABLE_NAME* -t *SNAPSHOT_NAME* ---- - -.Multiple keyspaces ++ +.Snapshot multiple tables [source,bash,subs="+quotes"] ---- -nodetool snapshot -t *SNAPSHOT_NAME* *KEYSPACE_NAME_1* *KEYSPACE_NAME_2* +nodetool snapshot -kt *KEYSPACE_NAME_1*.*TABLE_NAME_A* *KEYSPACE_NAME_1*.*TABLE_NAME_B* *KEYSPACE_NAME_2*.*TABLE_NAME_X* -t *SNAPSHOT_NAME* ---- - ++ Replace the following: - -* *`KEYSPACE_NAME`*: The name of the keyspace that contains the tables you want to migrate. + -To include multiple keyspaces, list each keyspace separated by a space as shown in the example above. -* *`SNAPSHOT_NAME`*: A descriptive name for the snapshot. +* *`KEYSPACE_NAME.TABLE_NAME`*: The name of the table that you want to migrate and the keyspace that it belongs to, separated by a period. +For example, `product_data.appliances` specifies the `appliances` table in the `product_data` keyspace. ++ +To snapshot multiple tables, pass a space-separated list of keyspace-table pairs. +For example, `product_data.appliances purchase_history.nevada purchase_history.wisconsin` specifies the `appliances` table in the `product_data` keyspace and the `nevada` and `wisconsin` tables in the `purchase_history` keyspace. + -Use the same snapshot name on each node. -This makes it easier to programmatically upload the snapshots to the migration directory. +* *`SNAPSHOT_NAME`*: A descriptive name for the snapshot. +Use the same snapshot name for each node's snapshot; this makes it easier to programmatically upload the snapshots to the migration directory. .Use a `for` loop to simplify snapshot creation [TIP] @@ -101,74 +101,51 @@ This makes it easier to programmatically upload the snapshots to the migration d If the nodes in your origin cluster are named in a predictable way (for example, `dse0`, `dse1`, `dse2`, etc.), you can use a `for` loop to simplify snapshot creation. For example: +Use a `for` loop to snapshot all keyspaces:: +To snapshot all keyspaces on each node, append the `nodetool` command to your `for` loop: ++ [source,bash,subs="+quotes"] ---- -for i in 0 1 2; do ssh dse${i} nodetool snapshot -t *SNAPSHOT_NAME* *KEYSPACE_NAME*; done +for i in 0 1 2; do ssh dse${i} nodetool snapshot -t *SNAPSHOT_NAME*; done ---- -To include multiple keyspaces in the snapshot, include multiple comma-separated `*KEYSPACE_NAME*` values, such as `keyspace1,keyspace2`. - -You can use the same `for` loop to verify that each snapshot was successfully created: - -[source,bash] +Use a `for` loop to snapshot specific keyspaces:: +To snapshot one keyspace on each node, append the `nodetool` command to your `for` loop: ++ +[source,bash,subs="+quotes"] ---- -for i in 0 1 2; do ssh dse${i} nodetool listsnapshots; done +for i in 0 1 2; do ssh dse${i} nodetool snapshot -t *SNAPSHOT_NAME* *KEYSPACE_NAME*; done ---- -==== --- - -Specific tables:: + --- -Create a snapshot of specific tables within one or more keyspaces: - -.Single table +To snapshot multiple specific keyspaces on each node, use commas (not spaces) to separate the keyspace names: ++ [source,bash,subs="+quotes"] ---- -nodetool snapshot -kt *KEYSPACE_NAME*.*TABLE_NAME* -t *SNAPSHOT_NAME* +for i in 0 1 2; do ssh dse${i} nodetool snapshot -t *SNAPSHOT_NAME* *KEYSPACE_NAME_1*,*KEYSPACE_NAME_2*; done ---- -.Multiple tables from one or more keyspaces +Use a `for` loop to snapshot specific tables:: +To snapshot one table on each node, append the `nodetool` command to your `for` loop: ++ [source,bash,subs="+quotes"] ---- -nodetool snapshot -kt *KEYSPACE_NAME_1*.*TABLE_NAME_A* *KEYSPACE_NAME_1*.*TABLE_NAME_B* *KEYSPACE_NAME_2*.*TABLE_NAME_X* -t *SNAPSHOT_NAME* +for i in 0 1 2; do ssh dse${i} nodetool snapshot -kt *KEYSPACE_NAME*.*TABLE_NAME* -t *SNAPSHOT_NAME*; done ---- - -Replace the following: - -* *`KEYSPACE_NAME`*: The name of the keyspace that contains the table you want to migrate. - -* *`TABLE_NAME`*: The name of the table you want to migrate. + -To include multiple tables from one or more keyspaces, list each *`KEYSPACE_NAME.TABLE_NAME`* pair separated by a space as shown in the example above. - -* *`SNAPSHOT_NAME`*: A descriptive name for the snapshot. +To snapshot multiple specific tables on each node, use commas (not spaces) to separate the keyspace-table pairs: + -Use the same snapshot name on each node. -This makes it easier to programmatically upload the snapshots to the migration directory. - -.Use a `for` loop to simplify snapshot creation -[TIP] -==== -If the nodes in your origin cluster are named in a predictable way (for example, `dse0`, `dse1`, `dse2`, etc.), you can use a `for` loop to simplify snapshot creation. -For example: - [source,bash,subs="+quotes"] ---- -for i in 0 1 2; do ssh dse${i} nodetool snapshot -kt *KEYSPACE_NAME*.*TABLE_NAME* -t *SNAPSHOT_NAME*; done +for i in 0 1 2; do ssh dse${i} nodetool snapshot -kt *KEYSPACE_NAME_1*.*TABLE_NAME_A*,*KEYSPACE_NAME_1*.*TABLE_NAME_B* -t *SNAPSHOT_NAME*; done ---- -To include multiple tables in the snapshot, include multiple comma-separated `*KEYSPACE_NAME*.*TABLE_NAME*` pairs, such as `keyspace1.table1,keyspace1.table2`. - -You can use the same `for` loop to verify that each snapshot was successfully created: +You can use the same `for` loop structure to verify that each snapshot was successfully created: [source,bash] ---- for i in 0 1 2; do ssh dse${i} nodetool listsnapshots; done ---- ==== --- -====== . Use `xref:6.9@dse:managing:tools/nodetool/list-snapshots.adoc[nodetool listsnapshots]` to verify that the snapshots were created: + @@ -270,6 +247,8 @@ To learn more about the initialization process, see xref:sideloader:sideloader-o The initialization process can take several minutes to complete, especially if the migration bucket doesn't already exist. +=== Get a migration ID + . In your terminal, use the {devops-api} to initialize the data migration: + [source,bash] @@ -313,6 +292,8 @@ export migrationID=*MIGRATION_ID* + Replace *`MIGRATION_ID`* with the `migrationID` returned by the `initialize` endpoint. +=== Check the migration status to verify initialization + . Check the migration status: + include::sideloader:partial$check-status.adoc[] @@ -324,15 +305,15 @@ Proceed to the next step. * `"status": "Initializing"`: The migration is still initializing. Wait a few minutes before you check the status again. -. Get your migration directory path and upload credentials from the response. +=== Get migration directory path and upload credentials + +Get your migration directory path and upload credentials from the response. You need these values to xref:sideloader:migrate-sideloader.adoc#upload-snapshots-to-migration-directory[upload snapshots to the migration directory]. -+ -[tabs] -====== -AWS:: -+ --- -.MigrationStatus with AWS credentials + +==== Get AWS credentials from MigrationStatus + +Securely store the `uploadBucketDir`, `accessKeyID`, `secretAccessKey`, and `sessionToken` from the response: + [source,json] ---- { @@ -355,12 +336,10 @@ AWS:: } ---- -Securely store the `uploadBucketDir`, `accessKeyID`, `secretAccessKey`, and `sessionToken`: - -* `uploadBucketDir` is the migration directory URL. +`uploadBucketDir` is the migration directory URL. Note the trailing slash. -* `uploadCredentials` contains the AWS credentials that authorize uploads to the migration directory, namely `accessKeyID`, `secretAccessKey`, and `sessionToken`. +`uploadCredentials` contains the AWS credentials that authorize uploads to the migration directory, namely `accessKeyID`, `secretAccessKey`, and `sessionToken`. [IMPORTANT] ==== @@ -369,12 +348,11 @@ If your total migration takes longer than one hour, xref:sideloader:troubleshoot If you use automation to handle {sstable-sideloader} migrations, you might need to script a xref:sideloader:stop-restart-sideloader.adoc[pause] every hour so you can generate new credentials without unexpectedly interrupting the migration. ==== --- -Google Cloud:: +==== Get Google Cloud credentials from MigrationStatus + +. Find the `uploadBucketDir` and the `uploadCredentials` in the response: + --- -.MigrationStatus with Google Cloud credentials [source,json] ---- { @@ -393,14 +371,13 @@ Google Cloud:: "expectedCleanupTime": "2024-08-14T15:14:38Z" } ---- - -.. Find the `uploadBucketDir` and the `uploadCredentials` in the response: + -* `uploadBucketDir` is the migration directory URL. +`uploadBucketDir` is the migration directory URL. Note the trailing slash. -* `uploadCredentials` includes a base64-encoded file containing Google Cloud credentials that authorize uploads to the migration directory. ++ +`uploadCredentials` contains a base64-encoded file containing Google Cloud credentials that authorize uploads to the migration directory. -.. Pipe the Google Cloud credentials `file` to a `creds.json` file: +. Pipe the Google Cloud credentials `file` to a `creds.json` file: + [source,bash] ---- @@ -411,13 +388,12 @@ curl -X GET \ | base64 -d > creds.json ---- -.. Securely store the `uploadBucketDir` and `creds.json`. --- +. Securely store the `uploadBucketDir` and `creds.json`. + +==== Get Azure credentials from MigrationStatus + +Securely store the `uploadBucketDir` and `urlSignature` from the response: -Microsoft Azure:: -+ --- -.MigrationStatus with Azure credentials [source,json] ---- { @@ -437,17 +413,13 @@ Microsoft Azure:: "expectedCleanupTime": "2025-03-04T15:14:38Z" } ---- -Securely store the `uploadBucketDir` and `urlSignature`: -* `uploadBucketDir` is the migration directory URL. +`uploadBucketDir` is the migration directory URL. Note the trailing slash. -* `uploadCredentials` contains `url` and `urlSignature` keys that represent an https://learn.microsoft.com/en-us/azure/ai-services/translator/document-translation/how-to-guides/create-sas-tokens[Azure Shared Access Signature (SAS) token]. -In the preceding example, these strings are truncated for readability. -+ +`uploadCredentials` contains `url` and `urlSignature` keys that represent an https://learn.microsoft.com/en-us/azure/ai-services/translator/document-translation/how-to-guides/create-sas-tokens[Azure Shared Access Signature (SAS) token]. You need the `urlSignature` to upload snapshots to the migration directory. --- -====== +In the preceding example, these strings are truncated for readability. [#upload-snapshots-to-migration-directory] == Upload snapshots to the migration directory @@ -473,27 +445,8 @@ For more information, see xref:sideloader:prepare-sideloader.adoc[]. * You might need to modify these commands depending on your environment, node names, directory structures, and other variables. ==== -[tabs] -====== -AWS:: -+ --- -//// -Originals: -[source,bash,subs="+quotes"] ----- -export AWS_ACCESS_KEY_ID=**ACCESS_KEY_ID**; export AWS_SECRET_ACCESS_KEY=**SECRET_ACCESS_KEY**; export AWS_SESSION_TOKEN=**SESSION_TOKEN**; \ -du -sh **CASSANDRA_DATA_DIR**/**KEYSPACE_NAME**/\*/snapshots/***SNAPSHOT_NAME***; \ -aws s3 sync --only-show-errors --exclude '\*' --include '*/snapshots/**SNAPSHOT_NAME***' **CASSANDRA_DATA_DIR**/ **MIGRATION_DIR**/**NODE_NAME** ----- +=== Upload snapshots to AWS -[source,bash] ----- -export AWS_ACCESS_KEY_ID=ASXXXXXXXXXXXXXXXXXX; export AWS_SECRET_ACCESS_KEY=2XXXXXXXXXXXXXXXWqcdV519ZubYbyfuNxbZg1Rw; AWS_SESSION_TOKEN=XXXXXXXXXX; \ -du -sh /var/lib/cassandra/data/smart_home/*/snapshots/*sensor_readings*; \ -aws s3 sync --only-show-errors --exclude '*' --include '*/snapshots/sensor_readings*' /var/lib/cassandra/data/ s3://ds-mig-b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d/272eac1d-df8e-4d1b-a7c6-71d5af232182/sstables/dse0 ----- -//// . Set environment variables for the AWS credentials that were generated when you xref:sideloader:migrate-sideloader.adoc#initialize-migration[initialized the migration]: + [source,bash,subs="+quotes"] @@ -513,8 +466,9 @@ aws s3 sync --only-show-errors --exclude '{asterisk}' --include '{asterisk}/snap + Replace the following: + +-- include::sideloader:partial$command-placeholders-common.adoc[] - +-- + .Example: Upload a snapshot with AWS CLI [source,bash] @@ -550,7 +504,7 @@ However, upload time primarily depends on the snapshot size, network throughput + If your credentials expire, see xref:sideloader:troubleshoot-sideloader.adoc#get-new-upload-credentials[Get new upload credentials]. + -.Use a `for` loop to simplify snapshot creation +.Use a `for` loop to simplify snapshot uploads [TIP] ==== If the nodes in your origin cluster have predictable names (for example, `dse0`, `dse1`, and `dse2`), then you can use a `for` loop to streamline the execution of the upload commands. @@ -569,11 +523,13 @@ for i in 0 1 2; do ssh dse{loop-var} \ aws s3 sync --only-show-errors --exclude '{asterisk}' --include '{asterisk}/snapshots/**SNAPSHOT_NAME**{asterisk}' **CASSANDRA_DATA_DIR**/ **MIGRATION_DIR**dse{loop-var}" & done ---- ==== --- -Google Cloud:: -+ --- +include::sideloader:partial$staged-snapshots-need-import-ph.adoc[] + +include::sideloader:partial$idle-migration-directories-note.adoc[] + +=== Upload snapshots to Google Cloud Storage + . Authenticate to Google Cloud with the `creds.json` file that you created when you xref:sideloader:migrate-sideloader.adoc#initialize-migration[initialized the migration]: + [source,bash,subs="+quotes,attributes"] @@ -594,8 +550,9 @@ gsutil -m rsync -r -d **CASSANDRA_DATA_DIR**/**KEYSPACE_NAME**/{asterisk}{asteri + Replace the following: + +-- include::sideloader:partial$command-placeholders-common.adoc[] - +-- + .Example: Upload a snapshot with gcloud and gsutil [source,bash,subs="attributes"] @@ -626,7 +583,7 @@ However, upload time primarily depends on the snapshot size, network throughput . Repeat the upload process for each snapshot (*`SNAPSHOT_NAME`*) and node (*`NODE_NAME`*) in your origin cluster. + -.Use a `for` loop to simplify snapshot creation +.Use a `for` loop to simplify snapshot uploads [TIP] ==== If the nodes in your origin cluster have predictable names (for example, `dse0`, `dse1`, and `dse2`), then you can use a `for` loop to streamline the execution of the `gsutil rsync` commands. @@ -639,21 +596,16 @@ du -sh **CASSANDRA_DATA_DIR**/**KEYSPACE_NAME**/{asterisk}/snapshots/{asterisk}* gsutil -m rsync -r -d **CASSANDRA_DATA_DIR**/**KEYSPACE_NAME**/{asterisk}{asterisk}/snapshots/**SNAPSHOT_NAME**/ **MIGRATION_DIR**dse{loop-var} & done ---- ==== --- -Microsoft Azure:: -+ --- -//---- -//for dir in $(find "$CASSANDRA_DATA_DIR" -type d -path "*/snapshots/${SNAPSHOT_NAME}*"); do -// REL_PATH=${dir#"$CASSANDRA_DATA_DIR"} # Remove the base path -// azcopy sync "$dir" "${MIGRATION_DIR}${NODE_NAME}/${REL_PATH}/"?${AZURE_SAS_TOKEN} --recursive -// done -// ' -//---- +include::sideloader:partial$staged-snapshots-need-import-ph.adoc[] + +include::sideloader:partial$idle-migration-directories-note.adoc[] + +=== Upload snapshots to Azure . Set environment variables for the following values: + +-- * *`AZURE_SAS_TOKEN`*: The `urlSignature` key that was generated when you xref:sideloader:migrate-sideloader.adoc#initialize-migration[initialized the migration]. * *`CASSANDRA_DATA_DIR`*: The absolute file system path to where {cass-short} data is stored on the node, including the trailing slash. For example, `/var/lib/cassandra/data/`. @@ -661,7 +613,7 @@ For example, `/var/lib/cassandra/data/`. * *`MIGRATION_DIR`*: The entire `uploadBucketDir` value that was generated when you xref:sideloader:migrate-sideloader.adoc#initialize-migration[initialized the migration], including the trailing slash. * *`NODE_NAME`*: The host name of the node that your snapshots are from. It is important to use the specific node name to ensure that each node has a unique directory in the migration bucket. - +-- + [source,bash,subs="+quotes"] ---- @@ -673,7 +625,7 @@ export NODE_NAME="**NODE_NAME**" ---- . Use the Azure CLI to upload one snapshot from one node into the migration directory: -+ + [source,bash] ---- for dir in $(find "$CASSANDRA_DATA_DIR" -type d -path "*/snapshots/${SNAPSHOT_NAME}*"); do @@ -700,23 +652,10 @@ Upload time primarily depends on the snapshot size, network throughput from your . Repeat the upload process for each snapshot and node in your origin cluster. Be sure to change the `SNAPSHOT_NAME` and `NODE_NAME` environment variables as needed. --- -====== - -Uploaded snapshots are staged in the migration directory, but the data is not yet written to the target database. -After uploading snapshots, you must xref:sideloader:migrate-sideloader.adoc#import-data[import the data] to finish the migration. - -=== Idle migration directories are evicted - -[WARNING] -==== -For large migrations, it can take several days to upload snapshots and import data. -Make sure you xref:sideloader:cleanup-sideloader.adoc#reschedule-a-cleanup[manually reschedule the cleanup] to avoid automatic cleanup. -==== -As an added security measure, migrations that remain continuously idle for one week are subject to xref:sideloader:cleanup-sideloader.adoc[automatic cleanup], which deletes all associated snapshots, revokes any unexpired upload credentials, and then closes the migration. +include::sideloader:partial$staged-snapshots-need-import-ph.adoc[] -{company} recommends that you xref:sideloader:cleanup-sideloader.adoc#reschedule-a-cleanup[manually reschedule the cleanup] if you don't plan to launch the migration within one week or if you need several days to upload snapshots or import data. +include::sideloader:partial$idle-migration-directories-note.adoc[] [#import-data] == Import data diff --git a/modules/sideloader/partials/idle-migration-directories-note.adoc b/modules/sideloader/partials/idle-migration-directories-note.adoc new file mode 100644 index 00000000..bd9983b5 --- /dev/null +++ b/modules/sideloader/partials/idle-migration-directories-note.adoc @@ -0,0 +1,10 @@ +.Idle migration directories are evicted +[WARNING] +==== +As an added security measure, migrations that remain continuously idle for one week are subject to xref:sideloader:cleanup-sideloader.adoc[automatic cleanup], which deletes all associated snapshots, revokes any unexpired upload credentials, and then closes the migration. + +{company} recommends that you xref:sideloader:cleanup-sideloader.adoc#reschedule-a-cleanup[manually reschedule the cleanup] if you don't plan to launch the migration within one week or if you need several days to upload snapshots or import data. + +For large migrations, it can take several days to upload snapshots and import data. +Make sure you xref:sideloader:cleanup-sideloader.adoc#reschedule-a-cleanup[manually reschedule the cleanup] to avoid automatic cleanup. +==== \ No newline at end of file diff --git a/modules/sideloader/partials/staged-snapshots-need-import-ph.adoc b/modules/sideloader/partials/staged-snapshots-need-import-ph.adoc new file mode 100644 index 00000000..f66d47df --- /dev/null +++ b/modules/sideloader/partials/staged-snapshots-need-import-ph.adoc @@ -0,0 +1,2 @@ +Uploaded snapshots are staged in the migration directory, but the data is not yet written to the target database. +After uploading snapshots, you must xref:sideloader:migrate-sideloader.adoc#import-data[import the data] to finish the migration. \ No newline at end of file From af1e18d535bf7f2ed23660fb08e22a5ef1ed3ce8 Mon Sep 17 00:00:00 2001 From: April M <36110273+aimurphy@users.noreply.github.com> Date: Wed, 15 Apr 2026 14:22:43 -0700 Subject: [PATCH 08/12] more tabs --- .../ROOT/pages/connect-clients-to-proxy.adoc | 12 +-- modules/ROOT/pages/create-target.adoc | 30 +++----- .../ROOT/pages/deploy-proxy-monitoring.adoc | 74 +++++++------------ .../ROOT/pages/manage-proxy-instances.adoc | 53 +++---------- 4 files changed, 52 insertions(+), 117 deletions(-) diff --git a/modules/ROOT/pages/connect-clients-to-proxy.adoc b/modules/ROOT/pages/connect-clients-to-proxy.adoc index 68fd8efe..31b730d4 100644 --- a/modules/ROOT/pages/connect-clients-to-proxy.adoc +++ b/modules/ROOT/pages/connect-clients-to-proxy.adoc @@ -22,11 +22,7 @@ Specifically, these examples are for {astra-db} and self-managed {dse-short}, {h This pseudocode is for illustration purposes only; the exact syntax depends on your driver language and version. For specific instructions and examples, see xref:datastax-drivers:connecting:connect-cloud.adoc[]. -[tabs] -====== -Self-managed {cass-short} clusters:: -+ --- +.Self-managed {cass-short} clusters [source,pseudocode] ---- // Create an object to represent a Cassandra cluster @@ -56,9 +52,7 @@ print(release_version) ---- -- -{astra-db}:: -+ --- +.{astra-db} [source,text] ---- // Create an object to represent a Cassandra cluster (an Astra database) @@ -89,8 +83,6 @@ my_cluster.close() // Print the data retrieved from the result set print(release_version) ---- --- -====== Review your client application's code to understand how it connects to your existing {cass-short}-based clusters. Then, proceed to <> to learn how to modify that code to connect to {product-proxy} instead. diff --git a/modules/ROOT/pages/create-target.adoc b/modules/ROOT/pages/create-target.adoc index 02fdc25d..af5aa42f 100644 --- a/modules/ROOT/pages/create-target.adoc +++ b/modules/ROOT/pages/create-target.adoc @@ -15,11 +15,8 @@ The preparation steps depend on your target platform. For complex migrations, such as those that involve multi-datacenter clusters, many-to-one/one-to-many mappings, or unresolvable mismatched schemas, see the xref:ROOT:feasibility-checklists.adoc#multi-datacenter-clusters-and-other-complex-migrations[considerations for complex migrations]. ==== -[tabs] -====== -Migrate to {astra}:: -+ --- +== Migrate to {astra} + . Sign in to the {astra-ui-link} and xref:astra-db-serverless:administration:manage-organizations.adoc#switch-organizations[switch to the organization] where you want to create the new database. + {product-proxy} can be used with any xref:astra-db-serverless:administration:subscription-plans.adoc[{astra} subscription plan]. @@ -70,11 +67,9 @@ As a best practice, omit xref:astra-db-serverless:cql:develop-with-cql.adoc#unsu You must adjust your data model and application logic to discard or replace these structures before beginning your migration. For more information, see xref:astra-db-serverless:cql:develop-with-cql.adoc#limitations-on-cql-for-astra-db[Limitations on CQL for {astra-db}]. * If you plan to use {sstable-sideloader} for xref:ROOT:migrate-and-validate-data.adoc[Phase 2], see the xref:sideloader:migrate-sideloader.adoc#record-schema[target database configuration requirements for migrating data with {sstable-sideloader}]. --- -Migrate to {hcd-short}, {dse-short}, or open-source {cass-reg}:: -+ --- +== Migrate to {hcd-short}, {dse-short}, or open-source {cass-reg}:: + . Provision the cluster infrastructure, and then create your {hcd-short}, {dse-short}, or {cass-short} cluster with your desired configuration: + Determine the correct topology and specifications for your new cluster, and then provision infrastructure that meets those requirements. @@ -107,22 +102,18 @@ To copy the schema, you can run CQL `DESCRIBE` on the origin cluster to get the + If your origin cluster is running an earlier version, you might need to edit CQL clauses that are no longer supported in newer versions, such as `COMPACT STORAGE`. For specific changes in each version, see the release notes for your database platform and {cass-short} driver. --- -Other CQL-compatible data stores:: -+ --- +== Migrate to other CQL-compatible data stores + Support for other CQL-compatible data stores isn't guaranteed for {product-proxy}. If your origin and target clusters meet the xref:ROOT:feasibility-checklists.adoc[protocol version compatibility requirements], you might be able to use {product-proxy} for your migration. As with any migration, {company} recommends that you test this in isolation before attempting a full-scale production migration. See your data store provider's documentation for information about creating your cluster and schema, generating authentication credentials, and gathering the connection details. --- -====== -[TIP] -==== +== Test the connection to the target cluster + After you create the target cluster, try connecting your client application directly to the target cluster without {product-proxy}. This ensures that the connection will work when you disconnect {product-proxy} at the end of the migration. @@ -132,6 +123,7 @@ This is particularly valuable when migrating to a new platform, such as {dse-sho Depending on the results of your tests, you might need to adjust your application logic, data model, or cluster configuration to achieve your performance goals. For example, you might need to optimize queries to avoid anti-patterns that were acceptable on your origin cluster but degrade performance on the target cluster. -==== -Next, learn about xref:ROOT:rollback.adoc[rollback options] before you begin xref:ROOT:phase1.adoc[Phase 1] of the migration process. \ No newline at end of file +== Next steps + +Learn about xref:ROOT:rollback.adoc[rollback options] before you begin xref:ROOT:phase1.adoc[Phase 1] of the migration process. \ No newline at end of file diff --git a/modules/ROOT/pages/deploy-proxy-monitoring.adoc b/modules/ROOT/pages/deploy-proxy-monitoring.adoc index 4ae4d2b2..0c4898b2 100644 --- a/modules/ROOT/pages/deploy-proxy-monitoring.adoc +++ b/modules/ROOT/pages/deploy-proxy-monitoring.adoc @@ -104,53 +104,42 @@ For example, `origin_username` and `target_username`. The expected values depend on the type of cluster (self-managed or {astra-db}). For example, if your target cluster is an {astra-db} database, provide the {astra-db} connection details in the `TARGET CONFIGURATION` section. + -[tabs] -====== Origin/target configuration for a self-managed cluster:: -+ --- The following configuration is required to connect to a self-managed cluster: - ++ * `*_username` and `*_password`: For a self-managed cluster with authentication enabled, provide a valid username and password to access the cluster. If authentication isn't enabled, leave both variables unset. - * `*_contact_points`: Provide a comma-separated list of IP addresses for the cluster's seed nodes. - * `*_port`: Provide the port on which the cluster listens for client connections. The default is 9042. - * `*_astra_secure_connect_bundle_path`, `*_astra_db_id`, and `*_astra_token`: All of these must be unset. --- Origin/target configuration for {astra-db}:: -+ --- The following configuration is required to connect to an {astra-db} database: - ++ * `*_username` and `*_password`: Set `username` to the literal string `token`, and set `password` to your {astra-db} application token (`AstraCS:...`). + For legacy authentication to earlier {astra-db} databases with an older token generated prior to the unified `token` approach, set the `username` to the token's `clientId`, and set the `password` to the token's `secret`. - ++ * `*_contact_points`: Must be unset. - ++ * `*_port`: Must be unset. - -* `*_astra_secure_connect_bundle_path`, `*_astra_db_id`, and `*_astra_token`: Provide either `*_astra_secure_connect_bundle_path` only, or both `*_astra_db_id` and `*_astra_token`. + -** If you want {product-automation} to automatically download your database's {scb}, use `*_astra_db_id` and `*_astra_token`. +* `*_astra_secure_connect_bundle_path` or `*_astra_db_id` and `*_astra_token`: Provide either of the following, but not both. +The unused option must be unset. +For example, if you use `target_astra_db_id` and `target_astra_token`, then `target_astra_secure_connect_bundle_path` must be unset. ++ +-- +** **Both `{asterisk}_astra_db_id` and `{asterisk}_astra_token`**: If you want {product-automation} to automatically download your database's {scb}, use `*_astra_db_id` and `*_astra_token`. Set `*_astra_db_id` to your xref:astra-db-serverless:databases:create-database.adoc#get-db-id[database's ID], and set `*_astra_token` to your application token (`AstraCS:...`). -** If you want to manually provide database's {scb-short} to the jumphost, use `*_astra_secure_connect_bundle_path`, and manually upload the {scb-short} to the jumphost: +** **Only `{asterisk}_astra_secure_connect_bundle_path`**: If you want to manually provide your database's {scb-short} to the jumphost, use `*_astra_secure_connect_bundle_path`, and manually upload the {scb-short} to the jumphost: + .. xref:astra-db-serverless:databases:secure-connect-bundle.adoc[Download your database's {scb-short}]. .. Upload it to the jumphost. .. Open a new shell on the jumpost, and then run `docker cp /path/to/scb.zim zdm-ansible-container:/home/ubuntu` to copy the {scb-short} to the container. .. Set `*_astra_secure_connect_bundle_path` to the path to the {scb-short} on the jumphost. - -+ -The unused option must be unset. -For example, if you use `target_astra_db_id` and `target_astra_token`, then `target_astra_secure_connect_bundle_path` must be unset. -- -====== + + .Example: Cluster configuration ==== @@ -254,13 +243,13 @@ Transportation Layer Security (TLS) encryption is optional and disabled by defau {product-proxy} supports TLS encryption between {product-proxy} and either or both clusters, and between {product-proxy} and your client application. -To enable TLS encryption, you must provide the necessary files and configure TLS settings in the `zdm_proxy_custom_tls_config.yml` file. +To enable TLS encryption, you must provide the necessary files and configure TLS settings in the `zdm_proxy_custom_tls_config.yml` file before running the deployment playbook. + +When you deploy the {product-proxy} instances with {product-automation}, the deployment playbook automatically distributes the TLS files and applies the TLS configuration to all {product-proxy} instances. +If you want to enable TLS after the initial deployment, you must rerun the deployment playbook to redeploy the instances with the new TLS configuration. + +==== Configure proxy-to-cluster TLS -[tabs] -====== -Proxy-to-cluster TLS:: -+ --- Use these steps to enable TLS encryption between {product-proxy} and one or both clusters if required. Each cluster has its own TLS configuration. @@ -277,9 +266,10 @@ For {astra-db}, {product-proxy} uses mTLS automatically with the xref:astra-db-s . Find the required files for each cluster where you want to enable TLS encryption. All files must be in plain-text, non-binary format. + +-- * **One-way TLS**: Find the server CA. * **Mutual TLS**: Find the server CA, the client certificate, and the client key. - +-- + If your client application and origin cluster already use TLS encryption, then the required files should already be used in the client application's configuration (TLS client files) and the origin cluster's configuration (TLS Server files). @@ -288,6 +278,7 @@ If your client application and origin cluster already use TLS encryption, then t .. If your TLS files are in a JKS keystore, you must extract them as plain text because {product-proxy} cannot accept a JKS keystore. You must provide the raw files. + +-- ... Get the files contained in your JKS keystore and their aliases: + [source,bash,subs="+quotes"] @@ -313,8 +304,8 @@ Replace the following: + The `-rfc` option extracts the files in non-binary PEM format. For more information, see the https://docs.oracle.com/javase/8/docs/technotes/tools/windows/keytool.html[keytool syntax documentation]. +-- -+ .. Upload the required TLS files to the jumphost: + ** **One-way TLS**: Upload the server CA. @@ -328,7 +319,7 @@ For more information, see the https://docs.oracle.com/javase/8/docs/technotes/to ---- docker cp **TLS_FILE** zdm-ansible-container:/home/ubuntu/origin_tls_files ---- - ++ ** Copy target files to the `target_tls_files` directory, replacing `**TLS_FILE**` with the path to each required files: + [source,bash,subs="+quotes"] @@ -348,12 +339,10 @@ docker exec -it zdm-ansible-container bash . From this shell, edit the `zdm_proxy_tls_config.yml` file at `zdm-proxy-automation/ansible/vars/zdm_proxy_custom_tls_config.yml`. . Uncomment and populate the TLS configuration variables for the clusters where you want to enable TLS encryption. -For example, if you want to enable TLS encryption for both clusters, configure both sets of variables: `origin_tls_{asterisk}` and `target_tls_{asterisk}`. +For example, if you want to enable TLS encryption for both clusters, configure both sets of variables (`origin_tls_{asterisk}` and `target_tls_{asterisk}`). + In the proxy-to-cluster configuration, the word `server` in the variable names refers to the cluster, which acts as the TLS server, and the word `client` refers to {product-proxy}, which acts as the TLS client. + -[tabs] -==== Origin cluster TLS encryption variables:: + * `origin_tls_user_dir_path`: Use the default value of `/home/ubuntu/origin_tls_files`. @@ -371,12 +360,9 @@ Target cluster TLS encryption variables:: Must be unset for one-way TLS. * `target_tls_client_key_filename`: Required for mTLS only. Provide the filename (without the path) of the client key. Must be unset for one-way TLS. -==== --- -Client application-to-proxy TLS:: -+ --- +==== Configure client application-to-proxy TLS + Use these steps to enable TLS encryption between your client application and {product-proxy} if required. In this case, your client application is the TLS client, and {product-proxy} is the TLS server. @@ -411,10 +397,11 @@ keytool -exportcert -keystore **PATH/TO/KEYSTORE.JKS** -alias **FILE_ALIAS** -fi + Replace the following: + +-- ** `**PATH/TO/KEYSTORE.JKS**`: The path to your JKS keystore ** `**FILE_ALIAS**`: The alias of the file you want to extract ** `**PATH/TO/DESTINATION/FILE**`: The path where you want to save the extracted file - +-- + The `-rfc` option extracts the files in non-binary PEM format. For more information, see the https://docs.oracle.com/javase/8/docs/technotes/tools/windows/keytool.html[keytool syntax documentation]. @@ -448,11 +435,6 @@ The word `server` in the variable names refers to {product-proxy}, which acts as * `zdm_proxy_tls_server_key_filename` : Required. Provide the filename (without the path) of the server key that the proxy must use. * `zdm_proxy_tls_require_client_auth`: Set to `false` (default) for one-way TLS between the application and proxy. Set to `true` to enable mTLS between the application and the proxy. --- -====== - -When you deploy the {product-proxy} instances with {product-automation}, the deployment playbook automatically distributes the TLS files and applies the TLS configuration to all {product-proxy} instances. -If you want to enable TLS after the initial deployment, you must rerun the deployment playbook to redeploy the instances with the new TLS configuration. [#run-the-deployment-playbook] == Run the deployment playbook diff --git a/modules/ROOT/pages/manage-proxy-instances.adoc b/modules/ROOT/pages/manage-proxy-instances.adoc index 3eee2cf8..a4643995 100644 --- a/modules/ROOT/pages/manage-proxy-instances.adoc +++ b/modules/ROOT/pages/manage-proxy-instances.adoc @@ -15,11 +15,8 @@ A rolling restart is a destructive action because it stops the previous containe xref:ROOT:zdm-logs.adoc[Collect the logs] before you apply the configuration change if you want to keep them. ==== -[tabs] -====== -With {product-automation}:: -+ --- +=== Rolling restart with {product-automation} + If you use {product-automation} to manage your {product-proxy} deployment, you can use a dedicated playbook to perform rolling restarts of all {product-proxy} instances in a deployment: . Connect to your Ansible Control Host container. @@ -66,16 +63,12 @@ If all six attempts fail, {product-automation} interrupts the entire rolling res * If the check succeeds, {product-automation} waits a fixed amount of time, and then moves on to the next container. The default pause between containers is 10 seconds. You can change the pause duration in `zdm-proxy-automation/ansible/vars/zdm_playbook_internal_config.yml`. --- -Without {product-automation}:: -+ --- +=== Rolling restart without {product-automation} + If you don't use {product-automation}, you must manually restart each instance. To avoid downtime, wait for each instance to fully restart and begin receiving traffic before restarting the next instance. --- -====== == Inspect {product-proxy} logs @@ -306,18 +299,14 @@ zdm_proxy_image: datastax/zdm-proxy:2.3.4 == Scale {product-proxy} instances -[tabs] -====== -Scale with {product-automation}:: -+ --- +The process for scaling your {product-proxy} instance depends on whether you use {product-automation} to manage your deployment. + +=== Scale with {product-automation} + {product-automation} doesn't provide a way to scale operations up or down in a rolling fashion. If you are using {product-automation} and you need a larger {product-proxy} deployment, you can create a new deployment, or you can add instances to an existing deployment. -[tabs] -==== Create a new deployment (recommended):: -+ This option is the recommended way to scale your {product-proxy} deployment because it requires no downtime. + Create a new {product-proxy} deployment, and then reconfigure your client application to use the new instance: @@ -331,7 +320,6 @@ The application instances switch seamlessly from the old deployment to the new o . After restarting all application instances, you can safely remove the old {product-proxy} deployment. Add instances to an existing deployment:: -+ This option requires manual configuration and a small amount of downtime. + Change the topology of your existing {product-proxy} deployment, and then restart the entire deployment to apply the change: @@ -349,46 +337,30 @@ ansible-playbook deploy_zdm_proxy.yml -i zdm_ansible_inventory + Rerunning the playbook stops the existing instances, destroys them, and then creates and starts a new deployment with new instances based on the amended inventory. This results in a brief interruption of service for your entire {product-proxy} deployment. -==== --- -Scale without {product-automation}:: -+ --- +=== Scale without {product-automation} + If you aren't using {product-automation}, use these steps to add, change, or remove {product-proxy} instances. -[tabs] -==== Add an instance:: + . Prepare and configure the new {product-proxy} instances appropriately based on your other instances. -+ Make sure the new instance's configuration references all planned {product-proxy} cluster nodes. -+ . On all {product-proxy} instances, add the new instance's address to the `ZDM_PROXY_TOPOLOGY_ADDRESSES` environment variable. -+ Make sure to include all new nodes. -+ . On the new {product-proxy} instance, set the `ZDM_PROXY_TOPOLOGY_INDEX` to the next sequential integer after the greatest one in your existing deployment. -+ . Perform a rolling restart of all {product-proxy} instances, one at a time. Vertically scale existing instances:: -+ Use these steps to increase or decrease resources for existing {product-proxy} instances, such as CPU or memory. To avoid downtime, perform the following steps on one instance at a time: + . Stop the first {product-proxy} instance that you want to modify. -+ . Modify the instance's resources as required. -+ Make sure the instance's IP address remains the same. -If the IP address changes, you must treat it as a new instance; follow the steps on the **Add an instance** tab. -+ +If the IP address changes, you must treat it as a new instance; follow the steps to **Add an instance** instead. . Restart the modified {product-proxy} instance. -+ . Wait until the instance starts, and then confirm that it is receiving traffic. -+ . Repeat these steps to modify each additional instance, one at a time. Remove an instance:: @@ -396,9 +368,6 @@ Remove an instance:: . On all {product-proxy} instances, remove the unused instance's address from the `ZDM_PROXY_TOPOLOGY_ADDRESSES` environment variable. . Perform a rolling restart of all remaining {product-proxy} instances. . Clean up resources used by the removed instance, such as the container or VM. -==== --- -====== === Proxy topology addresses enable failover and high availability From dcaafc52f2bdfd4a66063f21de779fb43b93ffe8 Mon Sep 17 00:00:00 2001 From: April M <36110273+aimurphy@users.noreply.github.com> Date: Wed, 15 Apr 2026 14:36:25 -0700 Subject: [PATCH 09/12] cdm tabs --- .../ROOT/pages/cassandra-data-migrator.adoc | 128 +++++++----------- 1 file changed, 50 insertions(+), 78 deletions(-) diff --git a/modules/ROOT/pages/cassandra-data-migrator.adoc b/modules/ROOT/pages/cassandra-data-migrator.adoc index 29346d1a..205d39e8 100644 --- a/modules/ROOT/pages/cassandra-data-migrator.adoc +++ b/modules/ROOT/pages/cassandra-data-migrator.adoc @@ -38,30 +38,20 @@ For example, if a new write occurs in your target cluster with a `writetime` of {company} recommends that you always install the latest version of {cass-migrator-short} to get the latest features, dependencies, and bug fixes. -[tabs] -====== -Install as a container:: -+ --- -Get the latest `cassandra-data-migrator` image that includes all dependencies from https://hub.docker.com/r/datastax/cassandra-data-migrator[DockerHub]. +=== Install as a container +Get the latest `cassandra-data-migrator` image that includes all dependencies from https://hub.docker.com/r/datastax/cassandra-data-migrator[DockerHub]. The container's `assets` directory includes all required migration tools: `cassandra-data-migrator`, `dsbulk`, and `cqlsh`. --- -Install as a JAR file:: -+ --- +=== Install as a JAR file on a single VM + +For one-off migrations, you can install the {spark-short} binary on a single VM where you will run the {cass-migrator-short} job: + . Install Java 11 or later, which includes {spark-short} binaries. -. Install https://spark.apache.org/downloads.html[{spark-reg}] version 3.5.x with Scala 2.13 and {hadoop-reg} 3.3 and later. -+ -[tabs] -==== -Single VM:: +. Install https://spark.apache.org/downloads.html[{spark-reg}] version 3.5.x with Scala 2.13 and {hadoop-reg} 3.3 and later: + -For one-off migrations, you can install the {spark-short} binary on a single VM where you will run the {cass-migrator-short} job. -+ -. Get the {spark-reg} tarball from the {spark} archive. +.. Get the {spark-reg} tarball from the {spark} archive: + [source,bash,subs="+quotes"] ---- @@ -70,7 +60,7 @@ wget https://archive.apache.org/dist/spark/spark-3.5.**PATCH**/spark-3.5.**PATCH + Replace `**PATCH**` with your {spark-short} patch version. + -. Change to the directory where you want install {spark-short}, and then extract the tarball: +.. Change to the directory where you want install {spark-short}, and then extract the tarball: + [source,bash,subs="+quotes"] ---- @@ -79,12 +69,35 @@ tar -xvzf spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz + Replace `**PATCH**` with your {spark-short} patch version. -{spark-reg} cluster:: +. Download the latest {cass-migrator-repo}/packages/1832128/versions[cassandra-data-migrator JAR file] {cass-migrator-shield}. + +. Add the `cassandra-data-migrator` dependency to `pom.xml`: ++ +[source,xml,subs="+quotes"] +---- + + datastax.cdm + cassandra-data-migrator + **VERSION** + +---- + +Replace `**VERSION**` with your {cass-migrator-short} version. + +. Run `mvn install`. + +=== Install as a JAR file on a {spark} cluster or {spark-short} Serverless platform + For large (several terabytes) migrations, complex migrations, and use of {cass-migrator-short} as a long-term data transfer utility, {company} recommends that you use an {spark} cluster or {spark-short} Serverless platform. + +. Install Java 11 or later, which includes {spark-short} binaries. + +. Deploy a https://spark.apache.org/downloads.html[{spark-reg} cluster or {spark-short} Serverless instance] running version 3.5.x with Scala 2.13 and {hadoop-reg} 3.3 and later. + +[IMPORTANT] +==== If you deploy {cass-migrator-short} on a {spark-short} cluster, you must modify your `spark-submit` commands as follows: -+ + * Replace `--master "local[*]"` with the host and port for your {spark-short} cluster, as in `--master "spark://**MASTER_HOST**:**PORT**"`. * Remove parameters related to single-VM installations, such as `--driver-memory` and `--executor-memory`. ==== @@ -106,9 +119,9 @@ Replace `**VERSION**` with your {cass-migrator-short} version. . Run `mvn install`. -If you need to build the JAR for local development or your environment only has Scala version 2.12.x, see the alternative installation instructions in the {cass-migrator-repo}?tab=readme-ov-file[{cass-migrator-short} README]. --- -====== +=== Build for local development or Scala 2.12.x environments + +If you need to build the JAR for local development, or your environment only has Scala version 2.12.x, see the alternative installation instructions in the {cass-migrator-repo}?tab=readme-ov-file[{cass-migrator-short} README]. == Configure {cass-migrator-short} @@ -139,11 +152,7 @@ To optimize large-scale migrations, {cass-migrator-short} can run multiple concu The following `spark-submit` command migrates one table from the origin to the target cluster, using the configuration in your properties file. The migration job is specified in the `--class` argument. -[tabs] -====== -Local installation:: -+ --- +.Migration job using a local installation [source,bash,subs="+quotes,+attributes"] ---- ./spark-submit --properties-file cdm.properties \ @@ -152,24 +161,7 @@ Local installation:: --class com.datastax.cdm.job.Migrate cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt ---- -Replace or modify the following, if needed: - -* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. -+ -Depending on where your properties file is stored, you might need to specify the full or relative file path. - -* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to migrate and the keyspace that it belongs to. -+ -You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. - -* `--driver-memory` and `--executor-memory`: For local installations, specify the appropriate memory settings for your environment. - -* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. --- - -{spark-reg} cluster:: -+ --- +.Migration job using a {spark-reg} cluster [source,bash,subs="+quotes"] ---- ./spark-submit --properties-file cdm.properties \ @@ -188,14 +180,13 @@ Depending on where your properties file is stored, you might need to specify the + You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. -* `--master`: Provide the URL of your {spark-short} cluster. +* `--driver-memory` and `--executor-memory` (local installations only): Specify the appropriate memory settings for your environment. + +* `--master` ({spark-short} cluster deployments only): Provide the URL of your {spark-short} cluster. * `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. --- -====== This command generates a log file (`logfile_name_**TIMESTAMP**.txt`) instead of logging output to the console. - For additional modifications to this command, see <>. [#cdm-validation-steps] @@ -208,11 +199,7 @@ Optionally, {cass-migrator-short} can automatically correct discrepancies in the . Use the following `spark-submit` command to run a data validation job using the configuration in your properties file. The data validation job is specified in the `--class` argument. + -[tabs] -====== -Local installation:: -+ --- +.Validation job using a local installation [source,bash,subs="+quotes,+attributes"] ---- ./spark-submit --properties-file cdm.properties \ @@ -220,25 +207,8 @@ Local installation:: --master "local[{asterisk}]" --driver-memory 25G --executor-memory 25G \ --class com.datastax.cdm.job.DiffData cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt ---- - -Replace or modify the following, if needed: - -* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. -+ -Depending on where your properties file is stored, you might need to specify the full or relative file path. - -* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to validate and the keyspace that it belongs to. -+ -You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. - -* `--driver-memory` and `--executor-memory`: For local installations, specify the appropriate memory settings for your environment. - -* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. --- - -{spark-reg} cluster:: + --- +.Validation job using a {spark-reg} cluster [source,bash,subs="+quotes"] ---- ./spark-submit --properties-file cdm.properties \ @@ -246,9 +216,10 @@ You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file --master "spark://**MASTER_HOST**:**PORT**" \ --class com.datastax.cdm.job.DiffData cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt ---- - ++ Replace or modify the following, if needed: - ++ +-- * `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file. + Depending on where your properties file is stored, you might need to specify the full or relative file path. @@ -257,11 +228,12 @@ Depending on where your properties file is stored, you might need to specify the + You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`. -* `--master`: Provide the URL of your {spark-short} cluster. +* `--driver-memory` and `--executor-memory` (local installations only): Specify the appropriate memory settings for your environment. + +* `--master` ({spark-short} cluster deployments only): Provide the URL of your {spark-short} cluster. * `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`. -- -====== . Allow the command some time to run, and then open the log file (`logfile_name_**TIMESTAMP**.txt`) and look for `ERROR` entries. + From dda1c810960799fa2e92d91ee0479b71764f56b9 Mon Sep 17 00:00:00 2001 From: April M <36110273+aimurphy@users.noreply.github.com> Date: Wed, 15 Apr 2026 14:40:33 -0700 Subject: [PATCH 10/12] fix unterminated block --- modules/ROOT/pages/connect-clients-to-proxy.adoc | 1 - 1 file changed, 1 deletion(-) diff --git a/modules/ROOT/pages/connect-clients-to-proxy.adoc b/modules/ROOT/pages/connect-clients-to-proxy.adoc index 31b730d4..a8df1c2f 100644 --- a/modules/ROOT/pages/connect-clients-to-proxy.adoc +++ b/modules/ROOT/pages/connect-clients-to-proxy.adoc @@ -50,7 +50,6 @@ my_cluster.close() // Print the data retrieved from the result set print(release_version) ---- --- .{astra-db} [source,text] From 2ad11c46934dc0d81a458926feb87e49ba47fad9 Mon Sep 17 00:00:00 2001 From: April M <36110273+aimurphy@users.noreply.github.com> Date: Wed, 15 Apr 2026 15:11:31 -0700 Subject: [PATCH 11/12] fix some issues --- modules/ROOT/pages/create-target.adoc | 2 +- .../sideloader/pages/migrate-sideloader.adoc | 83 +++++++++++-------- .../sideloader/pages/prepare-sideloader.adoc | 4 +- 3 files changed, 50 insertions(+), 39 deletions(-) diff --git a/modules/ROOT/pages/create-target.adoc b/modules/ROOT/pages/create-target.adoc index af5aa42f..83bde4b0 100644 --- a/modules/ROOT/pages/create-target.adoc +++ b/modules/ROOT/pages/create-target.adoc @@ -68,7 +68,7 @@ You must adjust your data model and application logic to discard or replace thes For more information, see xref:astra-db-serverless:cql:develop-with-cql.adoc#limitations-on-cql-for-astra-db[Limitations on CQL for {astra-db}]. * If you plan to use {sstable-sideloader} for xref:ROOT:migrate-and-validate-data.adoc[Phase 2], see the xref:sideloader:migrate-sideloader.adoc#record-schema[target database configuration requirements for migrating data with {sstable-sideloader}]. -== Migrate to {hcd-short}, {dse-short}, or open-source {cass-reg}:: +== Migrate to {hcd-short}, {dse-short}, or open-source {cass-reg} . Provision the cluster infrastructure, and then create your {hcd-short}, {dse-short}, or {cass-short} cluster with your desired configuration: + diff --git a/modules/sideloader/pages/migrate-sideloader.adoc b/modules/sideloader/pages/migrate-sideloader.adoc index 8ca43c77..ded633ee 100644 --- a/modules/sideloader/pages/migrate-sideloader.adoc +++ b/modules/sideloader/pages/migrate-sideloader.adoc @@ -13,8 +13,10 @@ Before you use {sstable-sideloader} for a migration, xref:sideloader:sideloader- On _each node_ in your origin cluster, use `nodetool` to create a backup of the data that you want to migrate, including all keyspaces and CQL tables that you want to migrate. -. Be aware of the {sstable-sideloader} limitations related to materialized views, secondary indexes, and encrypted data that are described in xref:sideloader:prepare-sideloader.adoc#origin-cluster-requirements[Origin cluster requirements]. -If necessary, modify the data model on your origin cluster to prepare for the migration. +=== Prepare to create snapshots + +. Due to {sstable-sideloader} limitations related to materialized views, secondary indexes, and encrypted data, you might need to modify the data model on your origin cluster to prepare for the migration. +For more information, see xref:sideloader:prepare-sideloader.adoc#origin-cluster-requirements[Origin cluster requirements]. . Optional: Before you create snapshots, consider running `xref:dse:managing:tools/nodetool/cleanup.adoc[nodetool cleanup]` to remove data that no longer belongs to your nodes. This command is particularly useful after adding more nodes to a cluster because it helps ensure that each node only contains the data that it is responsible for, according to the current cluster configuration and partitioning scheme. @@ -24,80 +26,101 @@ Smaller snapshots can lead to lower overall migration times and lower network tr + However, take adequate precautions before you run this command because the cleanup operations can introduce additional load on your origin cluster. -. Use `xref:dse:managing:tools/nodetool/snapshot.adoc[nodetool snapshot]` to create snapshots for the tables that you want to migrate. -+ +=== Run nodetool snapshot + +Use `xref:dse:managing:tools/nodetool/snapshot.adoc[nodetool snapshot]` to create snapshots for the tables that you want to migrate. + Don't create snapshots of system tables or tables that you don't want to migrate. The migration can fail if you attempt to migrate snapshots that don't have a matching schema in the target database. {sstable-sideloader} ignores system keyspaces. -+ -The structure of the `nodetool snapshot` command depends on the keyspaces and tables that you want to migrate: -+ -Snapshot all keyspaces:: + +The structure of the `nodetool snapshot` command depends on the keyspaces and tables that you want to migrate. + +==== Snapshot all keyspaces + Create a snapshot of all tables in all keyspaces: -+ + [source,bash,subs="+quotes"] ---- nodetool snapshot -t *SNAPSHOT_NAME* ---- -+ + Replace the following: -+ + * *`SNAPSHOT_NAME`*: A descriptive name for the snapshot. Use the same snapshot name for each node's snapshot; this makes it easier to programmatically upload the snapshots to the migration directory. -Snapshot specific keyspaces:: +==== Snapshot specific keyspaces + Create a snapshot of all tables in one or more specified keyspaces: -+ + .Snapshot one keyspace [source,bash,subs="+quotes"] ---- nodetool snapshot -t *SNAPSHOT_NAME* *KEYSPACE_NAME* ---- -+ + .Snapshot multiple keyspaces [source,bash,subs="+quotes"] ---- nodetool snapshot -t *SNAPSHOT_NAME* *KEYSPACE_NAME_1* *KEYSPACE_NAME_2* ---- -+ + Replace the following: -+ + * *`SNAPSHOT_NAME`*: A descriptive name for the snapshot. Use the same snapshot name for each node's snapshot; this makes it easier to programmatically upload the snapshots to the migration directory. + * *`KEYSPACE_NAME`*: The name of the keyspace that you want to migrate. + To snapshot multiple keyspaces, pass a space-separated list of keyspace names. For example, `customer_data product_data purchase_history` specifies three keyspaces. -Snapshot specific tables:: +==== Snapshot specific tables + Create a snapshot of one or more specified tables: -+ + .Snapshot one table [source,bash,subs="+quotes"] ---- nodetool snapshot -kt *KEYSPACE_NAME*.*TABLE_NAME* -t *SNAPSHOT_NAME* ---- -+ + .Snapshot multiple tables [source,bash,subs="+quotes"] ---- nodetool snapshot -kt *KEYSPACE_NAME_1*.*TABLE_NAME_A* *KEYSPACE_NAME_1*.*TABLE_NAME_B* *KEYSPACE_NAME_2*.*TABLE_NAME_X* -t *SNAPSHOT_NAME* ---- -+ + Replace the following: -+ + * *`KEYSPACE_NAME.TABLE_NAME`*: The name of the table that you want to migrate and the keyspace that it belongs to, separated by a period. For example, `product_data.appliances` specifies the `appliances` table in the `product_data` keyspace. + To snapshot multiple tables, pass a space-separated list of keyspace-table pairs. For example, `product_data.appliances purchase_history.nevada purchase_history.wisconsin` specifies the `appliances` table in the `product_data` keyspace and the `nevada` and `wisconsin` tables in the `purchase_history` keyspace. -+ + * *`SNAPSHOT_NAME`*: A descriptive name for the snapshot. Use the same snapshot name for each node's snapshot; this makes it easier to programmatically upload the snapshots to the migration directory. -.Use a `for` loop to simplify snapshot creation -[TIP] +=== Verify snapshot creation with nodetool listsnapshots + +Use `xref:6.9@dse:managing:tools/nodetool/list-snapshots.adoc[nodetool listsnapshots]` to verify that the snapshots were created: + +[source,bash] +---- +nodetool listsnapshots +---- + +[IMPORTANT] ==== +Snapshots have a specific directory structure, such as `*KEYSPACE_NAME*/*TABLE_NAME*/snapshots/*SNAPSHOT_NAME*/...`. +{sstable-sideloader} relies on this fixed structure to properly interpret the SSTable components. +**Don't modify the snapshot's directory structure; this can cause your migration to fail.** +==== + +=== Optional: Use `for` loops for snapshot creation and validation + If the nodes in your origin cluster are named in a predictable way (for example, `dse0`, `dse1`, `dse2`, etc.), you can use a `for` loop to simplify snapshot creation. For example: @@ -145,18 +168,6 @@ You can use the same `for` loop structure to verify that each snapshot was succe ---- for i in 0 1 2; do ssh dse${i} nodetool listsnapshots; done ---- -==== - -. Use `xref:6.9@dse:managing:tools/nodetool/list-snapshots.adoc[nodetool listsnapshots]` to verify that the snapshots were created: -+ -[source,bash] ----- -nodetool listsnapshots ----- -+ -Snapshots have a specific directory structure, such as `*KEYSPACE_NAME*/*TABLE_NAME*/snapshots/*SNAPSHOT_NAME*/...`. -{sstable-sideloader} relies on this fixed structure to properly interpret the SSTable components. -**Don't modify the snapshot's directory structure; this can cause your migration to fail.** [#record-schema] == Configure the target database diff --git a/modules/sideloader/pages/prepare-sideloader.adoc b/modules/sideloader/pages/prepare-sideloader.adoc index 40b1ea94..e4cb4e54 100644 --- a/modules/sideloader/pages/prepare-sideloader.adoc +++ b/modules/sideloader/pages/prepare-sideloader.adoc @@ -23,7 +23,7 @@ Make sure you understand how to securely store and use sensitive credentials whe The following requirements, recommendations, and limitations apply to the target {astra-db} database. Review all of these to ensure that your database is compatible with {sstable-sideloader}. -=== {product-short} subscription plan requirement +=== {astra} subscription plan requirement Your {astra} organization must be on an *Enterprise* xref:astra-db-serverless:administration:subscription-plans.adoc[subscription plan]. @@ -99,7 +99,7 @@ This is the amount of long-term capacity that you want the group to have after t + If the minimum is greater than the reserved capacity, the surplus capacity is prepared in advance, and there is no autoscaling required to access that capacity. -* **Maximum capacity**: Greater than the minimum by several units to allow autoscaling during resource intensive stages of the migration. +** **Maximum capacity**: Greater than the minimum by several units to allow autoscaling during resource intensive stages of the migration. + For non-trivial migrations, consider setting the maximum to 10. For extremely large migrations, contact your {company} account representative or {support-url}[IBM Support] to request more than 10 units to support your migration. From cc11ecaa89fc2f771c6d82a680dc5b4b23695f98 Mon Sep 17 00:00:00 2001 From: April M <36110273+aimurphy@users.noreply.github.com> Date: Thu, 16 Apr 2026 08:34:34 -0700 Subject: [PATCH 12/12] fix numbering --- modules/sideloader/pages/migrate-sideloader.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/sideloader/pages/migrate-sideloader.adoc b/modules/sideloader/pages/migrate-sideloader.adoc index ded633ee..b1fca554 100644 --- a/modules/sideloader/pages/migrate-sideloader.adoc +++ b/modules/sideloader/pages/migrate-sideloader.adoc @@ -636,7 +636,7 @@ export NODE_NAME="**NODE_NAME**" ---- . Use the Azure CLI to upload one snapshot from one node into the migration directory: - ++ [source,bash] ---- for dir in $(find "$CASSANDRA_DATA_DIR" -type d -path "*/snapshots/${SNAPSHOT_NAME}*"); do