Skip to content

Allow to specify ID and name separately in FromDataFrameNetwork() #180

@telenskyt

Description

@telenskyt

Currently, the FromDataFrameNetwork() function doesn't have a way to specify node name and its identification separately. That can pose problems when there are duplicated node names (not among siblings, but somewhere else within that tree), as illustrated in the example below.

Feature request: Please allow some way the function FromDataFrameNetwork() would eat three columns instead: unique ID of the node, unique ID of the parent node, and the node name. This would be a clean solution of this problem.

Reasoning: Here is a simple example illustrating the current problems with the duplicated node names. Say I have following data.frame:

d <- data.frame(ID = 1:5, parentID = c(0, 1, 1, 0, 2), nodeName = c("a", "b", "d", "b", "c"))
d$parentName <-  d[match(d$parentID, d$ID), "nodeName"]
d$parentName[is.na(d$parentName)] <- "/"
d

  ID parentID nodeName parentName
1  1        0        a          /
2  2        1        b          a
3  3        1        d          a
4  4        0        b          /
5  5        2        c          b

Some node names are duplicated, but not within siblings. That's why this simple approach cannot be used, because it would add node c two times to the tree instead of just once:

x <- FromDataFrameNetwork(d[,c("nodeName","parentName")])
print(x)

      levelName
1 /            
2  ¦--a        
3  ¦   ¦--b    
4  ¦   ¦   °--c
5  ¦   °--d    
6  °--b        
7      °--c 

(By the way, I believe FromDataFrameNetwork() should issue some warnings, when this duplication is done.)

So, currently the only way is to use the unique IDs, save the node names into a special attribute, and then set the node names post hoc from that attribute:

x <- FromDataFrameNetwork(d[,c("ID", "parentID", "nodeName")])

# now, put real node names to `name` instead of IDs:

nam <- x$Get("nodeName")
nam[is.na(nam)] <- "/"
x$Set(name = nam)

print(x)

      levelName
1 /            
2  ¦--a        
3  ¦   ¦--b    
4  ¦   ¦   °--c
5  ¦   °--d    
6  °--b 

Finally, the tree is imported as intended. However, this is not ideal for two reasons:

  1. The workaround is kludgy, and results in unnecessarily complicated code.

  2. The function x$Set(name = ..), unlike FromDataFrameNetwork(), doesn't test whether there are node name duplicates among siblings. So in this workaround, if that is the case, this problem wouldn't be detected and it would result in an invalid tree (see Set(name = ...) should check whether name is unique among siblings #179).

Therefore, please allow for a good, cleanly designed way to let FromDataFrameNetwork() know the unique IDs as well as the node names.

Tested on data.tree package version 1.2.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions