The default for dimnames()$doc on a dem object are currently text1, text2 etc. That is e.g.
toks <- tokens(cr_sample_corpus)
immig_toks <- tokens_context(x = toks, pattern = "immigr*", window = 6L)
immig_dfm <- dfm(immig_toks)
immig_dem <- dem(immig_dfm, pre_trained = cr_glove_subset,
transform = TRUE, transform_matrix = cr_transform, verbose = FALSE)
dimnames(immig_dem)$doc
returns
[1] "text1" "text2" "text3" "text4" "text5" "text6" "text7"
etc
I wonder if this should be something other than text because it potentially gives the impression that each incidence is a new "text" (a separate document). But, of course, the whole point here is that one can have many instantiations (and thus many embeddings of the same term) in the same document.
Perhaps we could change it to instance (or occurrence or observation or incidence)? Open to not doing anything, but just want to avoid confusion for end-users.
The default for
dimnames()$docon ademobject are currentlytext1,text2etc. That is e.g.returns
etc
I wonder if this should be something other than
textbecause it potentially gives the impression that each incidence is a new "text" (a separate document). But, of course, the whole point here is that one can have many instantiations (and thus many embeddings of the same term) in the same document.Perhaps we could change it to
instance(oroccurrenceorobservationorincidence)? Open to not doing anything, but just want to avoid confusion for end-users.