-
Notifications
You must be signed in to change notification settings - Fork 41
Description
Currently, the FromDataFrameNetwork() function doesn't have a way to specify node name and its identification separately. That can pose problems when there are duplicated node names (not among siblings, but somewhere else within that tree), as illustrated in the example below.
Feature request: Please allow some way the function FromDataFrameNetwork() would eat three columns instead: unique ID of the node, unique ID of the parent node, and the node name. This would be a clean solution of this problem.
Reasoning: Here is a simple example illustrating the current problems with the duplicated node names. Say I have following data.frame:
d <- data.frame(ID = 1:5, parentID = c(0, 1, 1, 0, 2), nodeName = c("a", "b", "d", "b", "c"))
d$parentName <- d[match(d$parentID, d$ID), "nodeName"]
d$parentName[is.na(d$parentName)] <- "/"
d
ID parentID nodeName parentName
1 1 0 a /
2 2 1 b a
3 3 1 d a
4 4 0 b /
5 5 2 c b
Some node names are duplicated, but not within siblings. That's why this simple approach cannot be used, because it would add node c two times to the tree instead of just once:
x <- FromDataFrameNetwork(d[,c("nodeName","parentName")])
print(x)
levelName
1 /
2 ¦--a
3 ¦ ¦--b
4 ¦ ¦ °--c
5 ¦ °--d
6 °--b
7 °--c
(By the way, I believe FromDataFrameNetwork() should issue some warnings, when this duplication is done.)
So, currently the only way is to use the unique IDs, save the node names into a special attribute, and then set the node names post hoc from that attribute:
x <- FromDataFrameNetwork(d[,c("ID", "parentID", "nodeName")])
# now, put real node names to `name` instead of IDs:
nam <- x$Get("nodeName")
nam[is.na(nam)] <- "/"
x$Set(name = nam)
print(x)
levelName
1 /
2 ¦--a
3 ¦ ¦--b
4 ¦ ¦ °--c
5 ¦ °--d
6 °--b
Finally, the tree is imported as intended. However, this is not ideal for two reasons:
-
The workaround is kludgy, and results in unnecessarily complicated code.
-
The function
x$Set(name = ..), unlikeFromDataFrameNetwork(), doesn't test whether there are node name duplicates among siblings. So in this workaround, if that is the case, this problem wouldn't be detected and it would result in an invalid tree (see Set(name = ...) should check whether name is unique among siblings #179).
Therefore, please allow for a good, cleanly designed way to let FromDataFrameNetwork() know the unique IDs as well as the node names.
Tested on data.tree package version 1.2.0.