Skip to content

Many-to-One mappings #186

@vincentarelbundock

Description

@vincentarelbundock

One nagging problem with countrycode (e.g., #182 #180 ) is that the current approach to codelist strictly requires bidirectional one-to-one mappings.

This is problematic in cases where we want:

Russia -> RUS (iso)
USSR -> RUS (iso)
RUS -> Russia

I have been trying to find a solution forever without much result. Today, I pushed a (nearly working) branch with a potential path forward: https://github.com/vincentarelbundock/countrycode/tree/manytoone

The concept:

  1. A unique regex identifies every single geographic unit covered by any of the schemes in countrycode. This means, for example, that we need a different regexes for Russia and USSR because Correlates of War treat them separately.
  2. Each destination code must be associated with one and only one regex: many-to-one
  3. origin codes can be associated with more than one regex: many-to-one
  4. This requires that we keep separate lists of origin and destination codes. The differences between origin and destination codes are handled explicitly in a centralized location: dictionary/merge.R
  5. instead of using codelist internally, we use codelist_map, which is a list of lists of data.frames. For example, if we want to convert from cowc to iso3c, we use codelist_map$cowc$iso3c, which is a data.frame with only two columns.

One key, for me is number 4 above, and right now too much still happens in the get_* functions. The get functions should just be scrapers, and users should have access to a well-document script to see how we reconciled origin vs. destination.

Curious what @cjyetman thinks of this.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions