Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #625 +/- ##
==========================================
- Coverage 96.98% 96.98% -0.01%
==========================================
Files 8 8
Lines 1196 1226 +30
==========================================
+ Hits 1160 1189 +29
- Misses 36 37 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This basic information is printed by most other implementations.
| print(io, "Observations: ", lpad(nobs(obj), colwidth-14), sep) | ||
| println(io, "Degrees of freedom: ", lpad(dof(obj), colwidth-20)) | ||
| if wts isa ProbabilityWeights || | ||
| (wts isa AnalyticWeights && obj isa AbstractGLM && obj.rr.d isa Union{Bernoulli, Binomial}) |
There was a problem hiding this comment.
[JuliaFormatter] reported by reviewdog 🐶
| (wts isa AnalyticWeights && obj isa AbstractGLM && obj.rr.d isa Union{Bernoulli, Binomial}) | |
| (wts isa AnalyticWeights && obj isa AbstractGLM && | |
| obj.rr.d isa Union{Bernoulli,Binomial}) |
|
I'm only seeing this one now, so #629 isn't even a reaction to this one. I think it is a good idea to show more information about the fit, but as argued in #629, I don't think we should show the coefficient table until the user requests it. There are more choices to make when reporting (statistical) inference, so delegating that to a separate method like S/R does is the better choice, IMO. |
|
Hmm I would be reluctant to stop printing the coefficients table altogether by default. I know R's default printing is incredibly frustrating, it shows either too much or too little information. Other statistical packages seem to print coefficients and related inference by default. That said, I agree it's not great that we print only Wald tests, which are not the most appropriate nowadays given the computing power we have compared to when R was designed. How about using Preferences.jl to allow users to choose their default inference method? This could even be shared across modeling packages via StatsModels. |
They usually work very differently from how things work here and in R where the fit is stored in a variable and can be passed to other functions. Generally, I don't think a show method should trigger significant computations, which I consider the uncertainty calculations being. In addition, there several input parameters related to the uncertainty which aren't exposed in
I'm not in favor of these and they'd be pretty significant changes to the design of this package, so not something I think we should do so late in the process of releasing 2.0. Based on the current design of this package, I think #629 is the better solution. |
|
The drawback I see to #629 though is that printing coefficients outside of a table isn't super readable beyond very simple models. You have to scan several lines to find the coefficient you want when they don't fit on a single line, which is common. In R I find the default output to just be a nuisance. So if we don't want to print the full coefficient table, I'd rather only print very short information: just the model type and formula, and maybe fit statistics. What do you think? A related issue is consistency across the ecosystem. Currently most packages seem to print the coefficient table, if only because that's what |
I'm mostly arguing over which information is presented. Less about how it is presented. Printing it horizontally saves space but, as you mention, might not work well for large models. I'd be fine with printing the estimates in a table. Such a table just shouldn't contain standard errors, t/z-values, p-values, and confidence intervals, because the object being shown doesn't have the necessary information to show these quantities. |
|
But then we would use exactly the same space as the current output, we would just refuse to show classic Wald standard errors in that space. That would be weird IMO, especially if we expect people to call |
|
The design of the package suggests that |
|
Let's not print coefficient at all then. |
I don't follow. Why not print all the informative quantities associated with the fit? |
|
I mean, have you ever used coefficients without their inference? People are going to call If we print coefficients in julia> m = lm(@formula(SR ~ Pop15 + Pop75 + DPI + DDPI), LifeCycleSavings)
LinearModel:
SR ~ 1 + Pop15 + Pop75 + DPI + DDPI
Observations: 50 Degrees of freedom: 6
Log-likelihood: -135.10 RSS: 650.71
RSE: 3.80 F-test: 0.0008
R²: 0.3385 Adjusted R²: 0.2797
Coefficients:
─────────────────────────
Coef.
─────────────────────────
(Intercept) 28.5661
Pop15 -0.461193
Pop75 -1.6915
DPI -0.000336902
DDPI 0.409695
─────────────────────────
julia> coeftable(m)
─────────────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
─────────────────────────────────────────────────────────────────────────────────
(Intercept) 28.5661 7.35452 3.88 0.0003 13.7533 43.3788
Pop15 -0.461193 0.144642 -3.19 0.0026 -0.752518 -0.169869
Pop75 -1.6915 1.0836 -1.56 0.1255 -3.87398 0.490983
DPI -0.000336902 0.000931107 -0.36 0.7192 -0.00221225 0.00153844
DDPI 0.409695 0.196197 2.09 0.0425 0.0145336 0.804856
─────────────────────────────────────────────────────────────────────────────────Seems quite redundant to me. That's even worse when you have ten or twenty variables, which is common in my field. Even if we print several coefficients on a single line, that takes space. AFAICT the only implementation that prints coefficients separate from inference is R's |
This basic information is printed by most other implementations.
Comments welcome.
I haven't included the LR test for now since we don't have a single-model function like
ftestat the moment and it seems less common to print it than the F-test.After #624 I'll also show the shape parameter of
NegativeBinomialdistributions.Fixes #623.