Skip to content

Print model information above coefficient table in show#625

Open
nalimilan wants to merge 2 commits intomasterfrom
nl/show
Open

Print model information above coefficient table in show#625
nalimilan wants to merge 2 commits intomasterfrom
nl/show

Conversation

@nalimilan
Copy link
Copy Markdown
Member

This basic information is printed by most other implementations.

Comments welcome.

I haven't included the LR test for now since we don't have a single-model function like ftest at the moment and it seems less common to print it than the F-test.

After #624 I'll also show the shape parameter of NegativeBinomial distributions.

Fixes #623.

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 26, 2025

Codecov Report

❌ Patch coverage is 96.96970% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 96.98%. Comparing base (e26c5d5) to head (0b276c5).

Files with missing lines Patch % Lines
src/linpred.jl 96.96% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #625      +/-   ##
==========================================
- Coverage   96.98%   96.98%   -0.01%     
==========================================
  Files           8        8              
  Lines        1196     1226      +30     
==========================================
+ Hits         1160     1189      +29     
- Misses         36       37       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This basic information is printed by most other implementations.
print(io, "Observations: ", lpad(nobs(obj), colwidth-14), sep)
println(io, "Degrees of freedom: ", lpad(dof(obj), colwidth-20))
if wts isa ProbabilityWeights ||
(wts isa AnalyticWeights && obj isa AbstractGLM && obj.rr.d isa Union{Bernoulli, Binomial})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
(wts isa AnalyticWeights && obj isa AbstractGLM && obj.rr.d isa Union{Bernoulli, Binomial})
(wts isa AnalyticWeights && obj isa AbstractGLM &&
obj.rr.d isa Union{Bernoulli,Binomial})

@andreasnoack
Copy link
Copy Markdown
Member

I'm only seeing this one now, so #629 isn't even a reaction to this one. I think it is a good idea to show more information about the fit, but as argued in #629, I don't think we should show the coefficient table until the user requests it. There are more choices to make when reporting (statistical) inference, so delegating that to a separate method like S/R does is the better choice, IMO.

@nalimilan
Copy link
Copy Markdown
Member Author

Hmm I would be reluctant to stop printing the coefficients table altogether by default. I know R's default printing is incredibly frustrating, it shows either too much or too little information. Other statistical packages seem to print coefficients and related inference by default.

That said, I agree it's not great that we print only Wald tests, which are not the most appropriate nowadays given the computing power we have compared to when R was designed. How about using Preferences.jl to allow users to choose their default inference method? This could even be shared across modeling packages via StatsModels.

@andreasnoack
Copy link
Copy Markdown
Member

Other statistical packages seem to print coefficients and related inference by default.

They usually work very differently from how things work here and in R where the fit is stored in a variable and can be passed to other functions.

Generally, I don't think a show method should trigger significant computations, which I consider the uncertainty calculations being. In addition, there several input parameters related to the uncertainty which aren't exposed in fit, e.g. which distribution would you like to use for the tests, which level for the confidence intervals, and (although outside of this package) should robust standard errors be used. The natural consequence of showing all of these quantities right after the fit would be

  1. to expose these input parameters in the fit
  2. store the derived quantities in the output struct

I'm not in favor of these and they'd be pretty significant changes to the design of this package, so not something I think we should do so late in the process of releasing 2.0. Based on the current design of this package, I think #629 is the better solution.

@nalimilan
Copy link
Copy Markdown
Member Author

The drawback I see to #629 though is that printing coefficients outside of a table isn't super readable beyond very simple models. You have to scan several lines to find the coefficient you want when they don't fit on a single line, which is common. In R I find the default output to just be a nuisance.

So if we don't want to print the full coefficient table, I'd rather only print very short information: just the model type and formula, and maybe fit statistics. What do you think?

A related issue is consistency across the ecosystem. Currently most packages seem to print the coefficient table, if only because that's what TableRegressionModel does. Maybe we should reach out to package authors to check whether we can agree on a common solution?

@andreasnoack
Copy link
Copy Markdown
Member

printing coefficients outside of a table isn't super readable beyond very simple models.

I'm mostly arguing over which information is presented. Less about how it is presented. Printing it horizontally saves space but, as you mention, might not work well for large models. I'd be fine with printing the estimates in a table. Such a table just shouldn't contain standard errors, t/z-values, p-values, and confidence intervals, because the object being shown doesn't have the necessary information to show these quantities.

@nalimilan
Copy link
Copy Markdown
Member Author

But then we would use exactly the same space as the current output, we would just refuse to show classic Wald standard errors in that space. That would be weird IMO, especially if we expect people to call coeftable manually right after fitting the model to get the inference they want (just imagine how many examples would look in the docs).

@andreasnoack
Copy link
Copy Markdown
Member

The design of the package suggests that coeftable should be called on the fitted object when inference is needed. The relevant arguments for inference are exposed in the coeftable function, not in fit, so, in my view, the current behavior is weird. As explained above, if your position is that the current output should be part of the show method for the fit itself, then the struct for the model fit should store these quantities, and the fit method should expose all keyword arguments for inference. I prefer the current design, and it is too late to change for 2.0 anyway.

@nalimilan
Copy link
Copy Markdown
Member Author

Let's not print coefficient at all then.

@andreasnoack
Copy link
Copy Markdown
Member

Let's not print coefficient at all then.

I don't follow. Why not print all the informative quantities associated with the fit?

@nalimilan
Copy link
Copy Markdown
Member Author

I mean, have you ever used coefficients without their inference? People are going to call coeftable right after fitting anyway (to see Wald tests or different tests once we support that).

If we print coefficients in show and not their standard erros, the typical use case would look like this, right?

julia> m = lm(@formula(SR ~ Pop15 + Pop75 + DPI + DDPI), LifeCycleSavings)
LinearModel:
SR ~ 1 + Pop15 + Pop75 + DPI + DDPI

Observations:                        50  Degrees of freedom:                   6
Log-likelihood:                 -135.10  RSS:                             650.71
RSE:                               3.80  F-test:                          0.0008
R²:                              0.3385  Adjusted R²:                     0.2797

Coefficients:
─────────────────────────
                    Coef.
─────────────────────────
(Intercept)  28.5661     
Pop15        -0.461193  
Pop75        -1.6915    
DPI          -0.000336902
DDPI          0.409695  
─────────────────────────

julia> coeftable(m)
─────────────────────────────────────────────────────────────────────────────────
                    Coef.   Std. Error      t  Pr(>|t|)    Lower 95%    Upper 95%
─────────────────────────────────────────────────────────────────────────────────
(Intercept)  28.5661       7.35452       3.88    0.0003  13.7533      43.3788
Pop15        -0.461193     0.144642     -3.19    0.0026  -0.752518    -0.169869
Pop75        -1.6915       1.0836       -1.56    0.1255  -3.87398      0.490983
DPI          -0.000336902  0.000931107  -0.36    0.7192  -0.00221225   0.00153844
DDPI          0.409695     0.196197      2.09    0.0425   0.0145336    0.804856
─────────────────────────────────────────────────────────────────────────────────

Seems quite redundant to me. That's even worse when you have ten or twenty variables, which is common in my field. Even if we print several coefficients on a single line, that takes space.

AFAICT the only implementation that prints coefficients separate from inference is R's lm and glm. And even in R, modeling packages generally don't follow that example and do print inference by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Print more information in show

2 participants