I just realized a small issue with the ncs function. It happens that the same context is assigned to both groups. So, for example, the context appearing in a Republican speech might be assigned as a top Democrat context. This is because the ncs function calculates cosine similarities between the group-specific embedding for a term and all contexts. I guess you would want to subset the eligible contexts for each group to only those actually appearing in a group-specific text? Here is a suggestion on how to change the code:
cos_sim <- text2vec::sim2(x = as.matrix(contexts_dem), y = as.matrix(x), method = "cosine", norm = "l2") %>% data.frame()
contexts_df <- data.frame(docid = quanteda::docid(contexts), context = sapply(contexts, function(i) paste(i, collapse = " ")),
docgroup = docvars(contexts)[,2])
cos_sim <- cos_sim %>% dplyr::mutate(docid = rownames(cos_sim)) %>% dplyr::left_join(contexts_df, by = 'docid')
result <- tidyr::pivot_longer(cos_sim, -c(docid, context, docgroup), names_to = "target") %>%
dplyr::filter(docgroup==target) %>%
dplyr::group_by(target) %>%
dplyr::slice_max(order_by = value, n = N) %>%
dplyr::mutate(rank = 1:dplyr::n()) %>%
dplyr::arrange(dplyr::desc(value)) %>%
dplyr::ungroup() %>%
dplyr::select('target', 'context', 'rank', 'value')
Hi @prodriguezsosa,
I just realized a small issue with the ncs function. It happens that the same context is assigned to both groups. So, for example, the context appearing in a Republican speech might be assigned as a top Democrat context. This is because the ncs function calculates cosine similarities between the group-specific embedding for a term and all contexts. I guess you would want to subset the eligible contexts for each group to only those actually appearing in a group-specific text? Here is a suggestion on how to change the code: