Skip to content

default naming convention for dimnames()$doc in dem object #13

@ArthurSpirling

Description

@ArthurSpirling

The default for dimnames()$doc on a dem object are currently text1, text2 etc. That is e.g.

toks <- tokens(cr_sample_corpus)
immig_toks <- tokens_context(x = toks, pattern = "immigr*", window = 6L)
immig_dfm <- dfm(immig_toks)
immig_dem <- dem(immig_dfm, pre_trained = cr_glove_subset,
transform = TRUE, transform_matrix = cr_transform, verbose = FALSE)
dimnames(immig_dem)$doc

returns

[1] "text1"    "text2"    "text3"    "text4"    "text5"    "text6"    "text7" 

etc

I wonder if this should be something other than text because it potentially gives the impression that each incidence is a new "text" (a separate document). But, of course, the whole point here is that one can have many instantiations (and thus many embeddings of the same term) in the same document.

Perhaps we could change it to instance (or occurrence or observation or incidence)? Open to not doing anything, but just want to avoid confusion for end-users.

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions