Conversation
3bed38d to
8665c4a
Compare
| @@ -0,0 +1,28 @@ | |||
| function TensorKit._copyto!(A::StridedView{TA, 1, <:CuArray{TA}}, B::StridedView{TB, 2, <:CuArray{TB}}) where {TA, TB} | |||
There was a problem hiding this comment.
Does this make sense to include, and should this not simply fall back to the default copyto!?
This really is just a performance optimization to avoid a bunch of the overhead of Strided.jl, but I would be surprised that building the indexarrays like this really gives an improvement over just a regular strided copyto!.
I think this entire thing should boil down to the following, which is not obvious and I should have added a comment/fallback definition: (up to some off-by-one errors though)
A[A.offset:stride(A, 1):end] .= B.op.(view(B, div(B.offset, stride(B, 2)):stride(B, 1):size(B, 1), 1:stride(B, 2):size(B, 2)))There was a problem hiding this comment.
It seems to be necessary to avoid scalar indexing sadness 🤷 . Happy to use the fallback, though!
There was a problem hiding this comment.
Just investigated this a bit more, couple comments:
- My fallback is needlessly complicated, and should have just been
Base.copyto!(A, B), which then dispatches to Strided.jl - If that fails, the fallback is
copy!(sreshape(A, size(B)), B), which I think works with your last changes.
It might be reasonable to turn around the logic here, and simply go from opt-out to opt-in, i.e. TensorKit._copyto!(A, B) = copyto!(A, B) and then only specialize this for <:Vector + <:Memory parent types.
There was a problem hiding this comment.
TBH all this might be obviated by the fixes that have now been merged into Strided and StridedViews, right? Why don't I just nuke this and we can see if we need it?
There was a problem hiding this comment.
Yes, but this function still bypasses all of the Strided stuff because in this really specific case I have a bit more information and could squeeze out a tiny bit more performance. Ultimately though, if this turns out to be too much of a hassle it might be reasonable to choose maintainability and simply replace this at the callsites
|
|
||
| const _GenericTransformerData{T, N} = Tuple{ | ||
| Matrix{T}, | ||
| DenseMatrix{T}, |
There was a problem hiding this comment.
I think this change makes the types below abstractly typed, do we need this?
There was a problem hiding this comment.
Yes, in order to allow device-side matrices to get passed in. Otherwise you get attempts to multiply CuMatrix * Matrix outside of constructors
There was a problem hiding this comment.
Ok, but in that case we would really have to make that an additional type parameter in the GenericTreeTransformer struct -- these were introduced to hyper specialize and get maximal efficiency, so I don't think we can eat a type-instability here.
There was a problem hiding this comment.
OK, it would have been helpful to have had a comment or anything that this was why they were there
eabfce9 to
0c903ac
Compare
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
|
Let's make this a draft too to cut down on CI thrash |
Needed to get more MPSKit examples working