Issue with ncs function

Hi @prodriguezsosa,

I just realized a small issue with the ncs function. It happens that the same context is assigned to both groups. So, for example, the context appearing in a Republican speech might be assigned as a top Democrat context. This is because the ncs function calculates cosine similarities between the group-specific embedding for a term and all contexts. I guess you would want to subset the eligible contexts for each group to only those actually appearing in a group-specific text? Here is a suggestion on how to change the code:

```
cos_sim <- text2vec::sim2(x = as.matrix(contexts_dem), y = as.matrix(x), method = "cosine", norm = "l2") %>% data.frame()
contexts_df <- data.frame(docid = quanteda::docid(contexts), context = sapply(contexts, function(i) paste(i, collapse = " ")),
                          docgroup = docvars(contexts)[,2])
cos_sim <- cos_sim %>% dplyr::mutate(docid = rownames(cos_sim)) %>% dplyr::left_join(contexts_df, by = 'docid')

result <- tidyr::pivot_longer(cos_sim, -c(docid, context, docgroup), names_to = "target") %>%
  dplyr::filter(docgroup==target) %>% 
  dplyr::group_by(target) %>%
  dplyr::slice_max(order_by = value, n = N) %>%
  dplyr::mutate(rank = 1:dplyr::n()) %>%
  dplyr::arrange(dplyr::desc(value)) %>%
  dplyr::ungroup() %>%
  dplyr::select('target', 'context', 'rank', 'value')
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with ncs function #12

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue with ncs function #12

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions