Skip to content

Issue with ncs function #12

@ElisaWirsching

Description

@ElisaWirsching

Hi @prodriguezsosa,

I just realized a small issue with the ncs function. It happens that the same context is assigned to both groups. So, for example, the context appearing in a Republican speech might be assigned as a top Democrat context. This is because the ncs function calculates cosine similarities between the group-specific embedding for a term and all contexts. I guess you would want to subset the eligible contexts for each group to only those actually appearing in a group-specific text? Here is a suggestion on how to change the code:

cos_sim <- text2vec::sim2(x = as.matrix(contexts_dem), y = as.matrix(x), method = "cosine", norm = "l2") %>% data.frame()
contexts_df <- data.frame(docid = quanteda::docid(contexts), context = sapply(contexts, function(i) paste(i, collapse = " ")),
                          docgroup = docvars(contexts)[,2])
cos_sim <- cos_sim %>% dplyr::mutate(docid = rownames(cos_sim)) %>% dplyr::left_join(contexts_df, by = 'docid')

result <- tidyr::pivot_longer(cos_sim, -c(docid, context, docgroup), names_to = "target") %>%
  dplyr::filter(docgroup==target) %>% 
  dplyr::group_by(target) %>%
  dplyr::slice_max(order_by = value, n = N) %>%
  dplyr::mutate(rank = 1:dplyr::n()) %>%
  dplyr::arrange(dplyr::desc(value)) %>%
  dplyr::ungroup() %>%
  dplyr::select('target', 'context', 'rank', 'value')

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions