Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions antora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ asciidoc:
astra: 'Astra'
db-serverless: 'Serverless (non-vector)'
db-serverless-vector: 'Serverless (vector)'
db-classic: 'Astra Managed Clusters'
astra-ui: 'Astra Portal'
astra-url: 'https://astra.datastax.com'
astra-ui-link: '{astra-url}[{astra-ui}^]'
Expand Down
1 change: 1 addition & 0 deletions modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
* xref:ROOT:enable-async-dual-reads.adoc[]
* xref:ROOT:change-read-routing.adoc[]
* xref:ROOT:connect-clients-to-target.adoc[]
* xref:ROOT:zdm-logs.adoc[]
* xref:ROOT:troubleshooting-tips.adoc[]
* xref:ROOT:faqs.adoc[]
* Release notes
Expand Down
146 changes: 56 additions & 90 deletions modules/ROOT/pages/cassandra-data-migrator.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -38,30 +38,20 @@ For example, if a new write occurs in your target cluster with a `writetime` of

{company} recommends that you always install the latest version of {cass-migrator-short} to get the latest features, dependencies, and bug fixes.

[tabs]
======
Install as a container::
+
--
Get the latest `cassandra-data-migrator` image that includes all dependencies from https://hub.docker.com/r/datastax/cassandra-data-migrator[DockerHub].
=== Install as a container

Get the latest `cassandra-data-migrator` image that includes all dependencies from https://hub.docker.com/r/datastax/cassandra-data-migrator[DockerHub].
The container's `assets` directory includes all required migration tools: `cassandra-data-migrator`, `dsbulk`, and `cqlsh`.
--

Install as a JAR file::
+
--
=== Install as a JAR file on a single VM

For one-off migrations, you can install the {spark-short} binary on a single VM where you will run the {cass-migrator-short} job:

. Install Java 11 or later, which includes {spark-short} binaries.

. Install https://spark.apache.org/downloads.html[{spark-reg}] version 3.5.x with Scala 2.13 and {hadoop-reg} 3.3 and later.
+
[tabs]
====
Single VM::
+
For one-off migrations, you can install the {spark-short} binary on a single VM where you will run the {cass-migrator-short} job.
. Install https://spark.apache.org/downloads.html[{spark-reg}] version 3.5.x with Scala 2.13 and {hadoop-reg} 3.3 and later:
+
. Get the {spark-reg} tarball from the {spark} archive.
.. Get the {spark-reg} tarball from the {spark} archive:
+
[source,bash,subs="+quotes"]
----
Expand All @@ -70,7 +60,7 @@ wget https://archive.apache.org/dist/spark/spark-3.5.**PATCH**/spark-3.5.**PATCH
+
Replace `**PATCH**` with your {spark-short} patch version.
+
. Change to the directory where you want install {spark-short}, and then extract the tarball:
.. Change to the directory where you want install {spark-short}, and then extract the tarball:
+
[source,bash,subs="+quotes"]
----
Expand All @@ -79,12 +69,35 @@ tar -xvzf spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz
+
Replace `**PATCH**` with your {spark-short} patch version.

{spark-reg} cluster::
. Download the latest {cass-migrator-repo}/packages/1832128/versions[cassandra-data-migrator JAR file] {cass-migrator-shield}.

. Add the `cassandra-data-migrator` dependency to `pom.xml`:
+
[source,xml,subs="+quotes"]
----
<dependency>
<groupId>datastax.cdm</groupId>
<artifactId>cassandra-data-migrator</artifactId>
<version>**VERSION**</version>
</dependency>
----
+
Replace `**VERSION**` with your {cass-migrator-short} version.

. Run `mvn install`.

=== Install as a JAR file on a {spark} cluster or {spark-short} Serverless platform

For large (several terabytes) migrations, complex migrations, and use of {cass-migrator-short} as a long-term data transfer utility, {company} recommends that you use an {spark} cluster or {spark-short} Serverless platform.

. Install Java 11 or later, which includes {spark-short} binaries.

. Deploy a https://spark.apache.org/downloads.html[{spark-reg} cluster or {spark-short} Serverless instance] running version 3.5.x with Scala 2.13 and {hadoop-reg} 3.3 and later.
+
[IMPORTANT]
====
If you deploy {cass-migrator-short} on a {spark-short} cluster, you must modify your `spark-submit` commands as follows:
+

* Replace `--master "local[*]"` with the host and port for your {spark-short} cluster, as in `--master "spark://**MASTER_HOST**:**PORT**"`.
* Remove parameters related to single-VM installations, such as `--driver-memory` and `--executor-memory`.
====
Expand All @@ -106,9 +119,9 @@ Replace `**VERSION**` with your {cass-migrator-short} version.

. Run `mvn install`.

If you need to build the JAR for local development or your environment only has Scala version 2.12.x, see the alternative installation instructions in the {cass-migrator-repo}?tab=readme-ov-file[{cass-migrator-short} README].
--
======
=== Build for local development or Scala 2.12.x environments

If you need to build the JAR for local development, or your environment only has Scala version 2.12.x, see the alternative installation instructions in the {cass-migrator-repo}?tab=readme-ov-file[{cass-migrator-short} README].

== Configure {cass-migrator-short}

Expand Down Expand Up @@ -139,11 +152,7 @@ To optimize large-scale migrations, {cass-migrator-short} can run multiple concu
The following `spark-submit` command migrates one table from the origin to the target cluster, using the configuration in your properties file.
The migration job is specified in the `--class` argument.

[tabs]
======
Local installation::
+
--
.Migration job using a local installation
[source,bash,subs="+quotes,+attributes"]
----
./spark-submit --properties-file cdm.properties \
Expand All @@ -152,24 +161,7 @@ Local installation::
--class com.datastax.cdm.job.Migrate cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
----

Replace or modify the following, if needed:

* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file.
+
Depending on where your properties file is stored, you might need to specify the full or relative file path.

* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to migrate and the keyspace that it belongs to.
+
You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`.

* `--driver-memory` and `--executor-memory`: For local installations, specify the appropriate memory settings for your environment.

* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`.
--

{spark-reg} cluster::
+
--
.Migration job using a {spark-reg} cluster
[source,bash,subs="+quotes"]
----
./spark-submit --properties-file cdm.properties \
Expand All @@ -188,14 +180,13 @@ Depending on where your properties file is stored, you might need to specify the
+
You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`.

* `--master`: Provide the URL of your {spark-short} cluster.
* `--driver-memory` and `--executor-memory` (local installations only): Specify the appropriate memory settings for your environment.

* `--master` ({spark-short} cluster deployments only): Provide the URL of your {spark-short} cluster.

* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`.
--
======

This command generates a log file (`logfile_name_**TIMESTAMP**.txt`) instead of logging output to the console.

For additional modifications to this command, see <<advanced>>.

[#cdm-validation-steps]
Expand All @@ -208,47 +199,27 @@ Optionally, {cass-migrator-short} can automatically correct discrepancies in the
. Use the following `spark-submit` command to run a data validation job using the configuration in your properties file.
The data validation job is specified in the `--class` argument.
+
[tabs]
======
Local installation::
+
--
.Validation job using a local installation
[source,bash,subs="+quotes,+attributes"]
----
./spark-submit --properties-file cdm.properties \
--conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \
--master "local[{asterisk}]" --driver-memory 25G --executor-memory 25G \
--class com.datastax.cdm.job.DiffData cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
----

Replace or modify the following, if needed:

* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file.
+
Depending on where your properties file is stored, you might need to specify the full or relative file path.

* `**KEYSPACE_NAME**.**TABLE_NAME**`: Specify the name of the table that you want to validate and the keyspace that it belongs to.
+
You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`.

* `--driver-memory` and `--executor-memory`: For local installations, specify the appropriate memory settings for your environment.

* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`.
--

{spark-reg} cluster::
+
--
.Validation job using a {spark-reg} cluster
[source,bash,subs="+quotes"]
----
./spark-submit --properties-file cdm.properties \
--conf spark.cdm.schema.origin.keyspaceTable="**KEYSPACE_NAME**.**TABLE_NAME**" \
--master "spark://**MASTER_HOST**:**PORT**" \
--class com.datastax.cdm.job.DiffData cassandra-data-migrator-**VERSION**.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
----

+
Replace or modify the following, if needed:

+
--
* `--properties-file cdm.properties`: If your properties file has a different name, specify the actual name of your properties file.
+
Depending on where your properties file is stored, you might need to specify the full or relative file path.
Expand All @@ -257,11 +228,12 @@ Depending on where your properties file is stored, you might need to specify the
+
You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`.

* `--master`: Provide the URL of your {spark-short} cluster.
* `--driver-memory` and `--executor-memory` (local installations only): Specify the appropriate memory settings for your environment.

* `--master` ({spark-short} cluster deployments only): Provide the URL of your {spark-short} cluster.

* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`.
--
======

. Allow the command some time to run, and then open the log file (`logfile_name_**TIMESTAMP**.txt`) and look for `ERROR` entries.
+
Expand Down Expand Up @@ -328,24 +300,18 @@ Specifically, see the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.

== Troubleshoot {cass-migrator-short}

.Java NoSuchMethodError
[%collapsible]
====
Java NoSuchMethodError::
If you installed {spark-short} as a JAR file, and your {spark-short} and Scala versions aren't compatible with your installed version of {cass-migrator-short}, {cass-migrator-short} jobs can throw exceptions such a the following:

+
[source,console]
----
Exception in thread "main" java.lang.NoSuchMethodError: 'void scala.runtime.Statics.releaseFence()'
----

+
Make sure that your {spark-short} binary is compatible with your {cass-migrator-short} version.
If you installed an earlier version of {cass-migrator-short}, you might need to install an earlier {spark-short} binary.
====

.Rerun a failed or partially completed job
[%collapsible]
====
Rerun a failed or partially completed job::
You can use the `track-run` feature to track the progress of a migration or validation, and then, if necessary, use the `run-id` to rerun a failed job from the last successful migration or validation point.

For more information, see the {cass-migrator-repo}[{cass-migrator-short} repository] and the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file].
====
+
For more information, see the {cass-migrator-repo}[{cass-migrator-short} repository] and the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.properties[fully annotated properties file].
25 changes: 9 additions & 16 deletions modules/ROOT/pages/change-read-routing.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -48,21 +48,17 @@ By design, the data is expected to be identical on both clusters, and your clien

For this reason, the only way to manually test read routing is to intentionally write mismatched test data to the clusters.
Then, you can send a read request to {product-proxy} and see which cluster-specific data is returned, which indicates the cluster that received the read request.
There are two ways to do this.
There are two ways to do this:

[tabs]
======
Manually create mismatched tables::
+
--
To manually create mismatched data, you can create a test table on each cluster, and then write different data to each table.

+
[IMPORTANT]
====
When you write the mismatched data to the tables, make sure you connect to each cluster directly.
Don't connect to {product-proxy}, because {product-proxy} will, by design, write the same data to both clusters through dual writes.
====

+
. Create a small test table on both clusters, such as a simple key/value table.
You can use an existing keyspace, or create one for this test specifically.
For example:
Expand All @@ -71,23 +67,23 @@ For example:
----
CREATE TABLE test_keyspace.test_table(k TEXT PRIMARY KEY, v TEXT);
----

+
. Use `cqlsh` to connect _directly to the origin cluster_, and then insert a row with any key and a value that is specific to the origin cluster.
For example:
+
[source,cql]
----
INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from the origin cluster!');
----

+
. Use `cqlsh` to connect _directly to the target cluster_, and then insert a row with the same key and a value that is specific to the target cluster.
For example:
+
[source,cql]
----
INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from the target cluster!');
----

+
. Use `cqlsh` to xref:connect-clients-to-proxy.adoc#_connecting_cqlsh_to_the_zdm_proxy[connect to {product-proxy}], and then issue a read request to your test table.
For example:
+
Expand All @@ -99,22 +95,19 @@ SELECT * FROM test_keyspace.test_table WHERE k = '1';
The cluster-specific value in the response tells you which cluster received the read request.
For example:
+
--
* If the read request was correctly routed to the target cluster, the result from `test_table` contains `Hello from the target cluster!`.
* If the read request was incorrectly routed to the origin cluster, the result from `test_table` contains `Hello from the origin cluster!`.

--
+
. When you're done testing, drop the test tables from both clusters.
If you created dedicated test keyspaces, drop the keyspaces as well.
--

Use the Themis sample client application::
+
--
The xref:connect-clients-to-proxy.adoc#_themis_client[Themis sample client application] connects directly to the origin cluster, the target cluster, and {product-proxy}.
It inserts some test data in its own, dedicated table.
Then, you can view the results of reads from each source.
For more information, see the https://github.com/absurdfarce/themis/blob/main/README.md[Themis README].
--
======

=== System tables cannot validate read routing

Expand Down
13 changes: 13 additions & 0 deletions modules/ROOT/pages/components.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,19 @@ For live migrations, {product-proxy} orchestrates activity-in-transition on your
To move and validate data, you use data migration tools.
You can use these tools alone or with {product-proxy}.

== When to use migration tools

You can use migration tools for database platform migrations, upgrades, and other infrastructure changes that require synchronizing clusters for a period of time.
For example:

* You want to move to a different database provider, such as {dse} to {hcd}.

* You need to upgrade a cluster to a new version or new infrastructure, and an in-place upgrade is risky or impossible.

* You want to move client applications from shared clusters to dedicated clusters for greater control over individual configurations.

* You want to consolidate client applications running on separate clusters onto one shared cluster to minimize sprawl and maintenance.

== {product-proxy}

The main component of the {company} {product} toolkit is {product-proxy-repo}[{product-proxy}], which is designed to be a lightweight proxy that handles all real-time requests generated by your client applications during the migration process.
Expand Down
Loading