Subfamilies as a way to resolve too large families #253
-
|
Hello CAFE5 people! To tackle this problem without losing useful data, I was wondering if it could be advisabe to further split such families into subfamilies/suborthogroups that could decrease the number of sequences belonging to the most present species. I was thinking do to so either using interproscan, hopefully finding domain patterns that could help me to fragment such big families, or a reciprocal BLASTp, as reported in the pipeline presented in the tutorial of CAFE5, with more or less strict parameters. Personally, I find the former option more suitable as far as it is able to really take apart big families, while the latter seems to me more prone to focus more on point mutations failing to take into consideration the evolutionary history of the family and its components. Maybe this could be at least a first step toward the solution? Do you think that creating these artificial families I will fail to catch the actual evolutionary signal that I'm trying to study? What do you think? Happy coding :) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
Hello, I think this is a perfectly reasonable thing to do, with one caveat: while this will allow you to better infer ancestral states (and gains and losses per family), I worry about the consequences for inferring overall lambdas. We don't really know what would happen if you cut different trees off at different heights (which is essentially what is happening) for the estimates of lambda. It's kind of like clustering different parts of your dataset differently. It might have no effect or only a subtle effect, but I've never tested it! matt |
Beta Was this translation helpful? Give feedback.
Hello,
I think this is a perfectly reasonable thing to do, with one caveat: while this will allow you to better infer ancestral states (and gains and losses per family), I worry about the consequences for inferring overall lambdas. We don't really know what would happen if you cut different trees off at different heights (which is essentially what is happening) for the estimates of lambda. It's kind of like clustering different parts of your dataset differently. It might have no effect or only a subtle effect, but I've never tested it!
matt