Skip to content

More tweaks#375

Draft
kshyatt wants to merge 4 commits intomainfrom
ksh/cuda_tweaks
Draft

More tweaks#375
kshyatt wants to merge 4 commits intomainfrom
ksh/cuda_tweaks

Conversation

@kshyatt
Copy link
Member

@kshyatt kshyatt commented Feb 18, 2026

Needed to get more MPSKit examples working

@@ -0,0 +1,28 @@
function TensorKit._copyto!(A::StridedView{TA, 1, <:CuArray{TA}}, B::StridedView{TB, 2, <:CuArray{TB}}) where {TA, TB}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make sense to include, and should this not simply fall back to the default copyto!?
This really is just a performance optimization to avoid a bunch of the overhead of Strided.jl, but I would be surprised that building the indexarrays like this really gives an improvement over just a regular strided copyto!.

I think this entire thing should boil down to the following, which is not obvious and I should have added a comment/fallback definition: (up to some off-by-one errors though)

A[A.offset:stride(A, 1):end] .= B.op.(view(B, div(B.offset, stride(B, 2)):stride(B, 1):size(B, 1), 1:stride(B, 2):size(B, 2)))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be necessary to avoid scalar indexing sadness 🤷 . Happy to use the fallback, though!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just investigated this a bit more, couple comments:

  • My fallback is needlessly complicated, and should have just been Base.copyto!(A, B), which then dispatches to Strided.jl
  • If that fails, the fallback is copy!(sreshape(A, size(B)), B), which I think works with your last changes.

It might be reasonable to turn around the logic here, and simply go from opt-out to opt-in, i.e. TensorKit._copyto!(A, B) = copyto!(A, B) and then only specialize this for <:Vector + <:Memory parent types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH all this might be obviated by the fixes that have now been merged into Strided and StridedViews, right? Why don't I just nuke this and we can see if we need it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this function still bypasses all of the Strided stuff because in this really specific case I have a bit more information and could squeeze out a tiny bit more performance. Ultimately though, if this turns out to be too much of a hassle it might be reasonable to choose maintainability and simply replace this at the callsites


const _GenericTransformerData{T, N} = Tuple{
Matrix{T},
DenseMatrix{T},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change makes the types below abstractly typed, do we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in order to allow device-side matrices to get passed in. Otherwise you get attempts to multiply CuMatrix * Matrix outside of constructors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but in that case we would really have to make that an additional type parameter in the GenericTreeTransformer struct -- these were introduced to hyper specialize and get maximal efficiency, so I don't think we can eat a type-instability here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it would have been helpful to have had a comment or anything that this was why they were there

@codecov
Copy link

codecov bot commented Feb 26, 2026

Codecov Report

❌ Patch coverage is 86.04651% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/tensors/abstracttensor.jl 50.00% 3 Missing ⚠️
src/tensors/braidingtensor.jl 0.00% 2 Missing ⚠️
ext/TensorKitCUDAExt/cutensormap.jl 91.66% 1 Missing ⚠️
Files with missing lines Coverage Δ
ext/TensorKitCUDAExt/TensorKitCUDAExt.jl 100.00% <ø> (ø)
ext/TensorKitCUDAExt/auxiliary.jl 100.00% <100.00%> (ø)
src/auxiliary/auxiliary.jl 94.64% <100.00%> (ø)
src/tensors/treetransformers.jl 96.22% <ø> (ø)
ext/TensorKitCUDAExt/cutensormap.jl 75.94% <91.66%> (+1.97%) ⬆️
src/tensors/braidingtensor.jl 67.46% <0.00%> (-0.83%) ⬇️
src/tensors/abstracttensor.jl 55.22% <50.00%> (+0.33%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kshyatt kshyatt marked this pull request as draft February 27, 2026 11:14
@kshyatt
Copy link
Member Author

kshyatt commented Feb 27, 2026

Let's make this a draft too to cut down on CI thrash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants