Skip to content

[gui] reenable fit panel testing and fix nullptr access#21882

Merged
linev merged 10 commits intoroot-project:masterfrom
ferdymercury:patch-20
Apr 23, 2026
Merged

[gui] reenable fit panel testing and fix nullptr access#21882
linev merged 10 commits intoroot-project:masterfrom
ferdymercury:patch-20

Conversation

@ferdymercury
Copy link
Copy Markdown
Collaborator

@ferdymercury ferdymercury commented Apr 10, 2026

This Pull request:

Changes or fixes:

enables old test based on Makefiles by migrating to CMake

step towards #9090

Checklist:

  • tested changes locally
  • updated the docs (if necessary)

@ferdymercury ferdymercury changed the title [asimage] prevent pointing to invalid memory [gui,asimage] prevent pointing to invalid memory Apr 10, 2026
Comment thread graf2d/asimage/src/TASImage.cxx Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 10, 2026

Test Results

    22 files      22 suites   3d 7h 50m 19s ⏱️
 3 839 tests  3 838 ✅  1 💤 0 ❌
75 777 runs  75 759 ✅ 18 💤 0 ❌

Results for commit a7898cf.

♻️ This comment has been updated with latest results.

Comment thread gui/gui/src/TGGC.cxx Outdated
@ferdymercury ferdymercury marked this pull request as ready for review April 13, 2026 08:55
@linev
Copy link
Copy Markdown
Member

linev commented Apr 13, 2026

I checkout your branch and try to run test.
There are many issues with normal GUI mode.

  1. Last two tests TestTree2D and TestTreeND do not work for me - while fit panel does not have entries like "gaus2d" or "gausND".
  2. Also running test in batch possible only with root -b --web=off UnitTesting.cxx. Produced test executable crashed much earlier because TGClient::Instance() returns nullptr
  3. When run inside root, one sees lot of errors about missing images.

Comment thread graf2d/asimage/src/TASImage.cxx Outdated
@ferdymercury
Copy link
Copy Markdown
Collaborator Author

  • Last two tests TestTree2D and TestTreeND do not work for me - while fit panel does not have entries like "gaus2d" or "gausND".

Yes, same here, I had deactivated (commented) locally those old tests for the moment and was focusing on whether it was running at all without crashing.

  • Also running test in batch possible only with root -b --web=off UnitTesting.cxx. Produced test executable crashed much earlier because TGClient::Instance() returns nullptr

Weird, where? I see a different behavior on Linux. It just crashes when deleting the TGTooltip at shutdown for me.

  • When run inside root, one sees lot of errors about missing images.

Same here, that's what I would say "expected", a lot of errors in batch mode of not being able to load icons, but no crashes.

@linev
Copy link
Copy Markdown
Member

linev commented Apr 14, 2026

Weird, where? I see a different behavior on Linux. It just crashes when deleting the TGTooltip at shutdown for me.

I mean when I run compiled executable.

  • When run inside root, one sees lot of errors about missing images.

Same here, that's what I would say "expected", a lot of errors in batch mode of not being able to load icons, but no crashes.

This is most challenging part - try to understand if there double deletion or just wrong pointer released.
Because of lot of errors I cannot exclude incomplete initialization of some GUI classes

@linev
Copy link
Copy Markdown
Member

linev commented Apr 14, 2026

@ferdymercury @bellenot

I probably found the reason why it crashes.
Seems to be it is wrongly set flag TGLabel::fHasOwnFont.
In batch it is incorrect and TGLabel destructor tries to free TGGC object which does not belong to the label.

Also side problem is that in TGGC::TGGC(GCValues_t *values) constructor ref counter not always set correctly.

Fixing this two problems makes test running for me

@linev
Copy link
Copy Markdown
Member

linev commented Apr 14, 2026

First step is #21907

@ferdymercury
Can you extract your changes in TASImage to separate PR.

After such two steps we will be able to enable this and may be other GUI tests in CI

@ferdymercury ferdymercury changed the title [gui,asimage] prevent pointing to invalid memory [gui] reenable fit panel testing and fix nullptr access Apr 14, 2026
Comment thread gui/gui/src/TGLabel.cxx Outdated
@linev
Copy link
Copy Markdown
Member

linev commented Apr 15, 2026

@ferdymercury

After #21907 is merged you can try rebase your code and see if now fit panel test can be enabled.

@linev
Copy link
Copy Markdown
Member

linev commented Apr 15, 2026

@ferdymercury

Please remove all changes in gui classes from this PR.

And at the end of UnitTesting.cxx one should have:

int UnitTesting()
{
   gROOT->SetWebDisplay("off");

   FitEditorUnitTesting fUT;

   return fUT.UnitTesting();
}

// The main function. It is VERY important that it is run using the
// TApplication.
int main(int argc, char** argv)
{
   TApplication theApp("App",&argc,argv);

   // force creation of client
   if (!TGClient::Instance())
      new TGClient();

   int ret = UnitTesting();

   theApp.Terminate();

   return ret;
}

And please check TestTree1D. It fails while reference values for fir parameters do not match.

Comment thread gui/fitpanel/test/UnitTesting.cxx Outdated
Comment thread gui/fitpanel/test/UnitTesting.cxx Outdated
@linev
Copy link
Copy Markdown
Member

linev commented Apr 15, 2026

We have several platforms failing.

For Windows probably we should force batch mode. One need to add -b flag when start executable.

But several linux and mac platforms need to be investigated deeper

Comment thread gui/fitpanel/test/UnitTesting.cxx Outdated
Comment thread gui/fitpanel/test/UnitTesting.cxx Outdated
@linev
Copy link
Copy Markdown
Member

linev commented Apr 16, 2026

@ferdymercury

Sorry, I push my commit into wrong origin.
I just experimenting with your changes on Windows and hit compilation problem.
I submit these changes as separate PR.
Please restore your commits again

@linev
Copy link
Copy Markdown
Member

linev commented Apr 16, 2026

I run on Windows fitPanelUnitTesting.exe -b and it only report TestTree1D FAILED, but does not crash.

Comment thread gui/fitpanel/test/CMakeLists.txt Outdated
Co-authored-by: Sergey Linev <S.Linev@gsi.de>
@ferdymercury
Copy link
Copy Markdown
Collaborator Author

We're still getting the free-invalid-pointer on Unix

@linev
Copy link
Copy Markdown
Member

linev commented Apr 16, 2026

We're still getting the free-invalid-pointer on Unix

There are only two platforms failing - ubuntu2404 and mac14 - which is already grate.
Nobody says that it will be easy. We never tried test gui classes in batch before

@ferdymercury
Copy link
Copy Markdown
Collaborator Author

There are only two platforms failing - ubuntu2404 and mac14 - which is already grate. Nobody says that it will be easy. We never tried test gui classes in batch before

Indeed :) It's great.
This started all with removing an old legacy Makefile and has evolved into something bigger hehe

Comment thread gui/fitpanel/test/CMakeLists.txt Outdated
ferdymercury and others added 3 commits April 16, 2026 16:53
@linev
Copy link
Copy Markdown
Member

linev commented Apr 21, 2026

@ferdymercury

Did you manage to reproduce failures?
Probably you can try to fully reproduce environment with docker - you can find some info in builder logs.

But I have other idea about ROOT gui testing.
Now we try to enable gui test in true batch mode.
But may be we can try other direction?

Since a while I using xvfb to run JSROOT tests in batch.
It is because of gl module in node.js which requires DISPLAY and some basic X11 functions.
But we also can use xvfb-run for ROOT gui tests.
It provides good X11 emulation and probably better approach for us.
Then we can also try to enable stressGUI tests in CI.
Execution can look like:

xvb-run -a -s "-ac -screen 0 1280x1024x24" ./UnitTesting

So root application will recognize display and will run in normal GUI mode.
Should we try this?

@ferdymercury
Copy link
Copy Markdown
Collaborator Author

ferdymercury commented Apr 21, 2026

Did you manage to reproduce failures?

I also see 100% reproducible stack-smashing when I run in interpreted mode rather than compiled mode. (Batch Or Non-Batch).
Always in line 2238 of TFitEditor.cxx

Valgrind says:

==720346== Process terminating with default action of signal 6 (SIGABRT)
==720346==    at 0x52CF9FC: __pthread_kill_implementation (pthread_kill.c:44)
==720346==    by 0x52CF9FC: __pthread_kill_internal (pthread_kill.c:78)
==720346==    by 0x52CF9FC: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==720346==    by 0x527B475: raise (raise.c:26)
==720346==    by 0x5261884: abort (abort.c:100)
==720346==    by 0x52C2676: __libc_message (libc_fatal.c:156)
==720346==    by 0x536F619: __fortify_fail (fortify_fail.c:26)
==720346==    by 0x536F5E5: __stack_chk_fail (stack_chk_fail.c:24)
==720346==    by 0x2F3E2BE5: TFitEditor::DoFit() (TFitEditor.cxx:2238)
==720346==    by 0x29297F9D: ???
==720346==    by 0x29295CDC: ???
==720346==    by 0x2929519D: ???
==720346==    by 0x29294A2D: ???
==720346==    by 0x4FE7049: ???

I seldom also see those munmap_chunk or free invalid pointer failures locally when in compiled mode. Not very often, it's a bit random. It seems that it only happens if the program is fast (goes away if I put debugging points). If I recall correctly, it was in the same place, when deleting the TGToolTip that then calls ~TGGC

With valgrind ROOT build Release Mode. I get:

Processing /opt/root_src/gui/fitpanel/test/UnitTesting.cxx+...
==849553== Syscall param msync(start) points to unaddressable byte(s)
==849553==    at 0x5107B57: msync (msync.c:25)
==849553==    by 0x6822C85: ??? (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x6711171: cling_runtime_internal_throwIfInvalidPointer (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x29FBA04B: ???
==849553==    by 0x678F697: ??? (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x6714692: ??? (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x67150A6: ??? (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x6806AC1: ??? (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x6815FFA: ??? (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x68178E8: ??? (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x67FF34F: ??? (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x65EC42B: ??? (in /tmp/rbld/lib/libCling.so)
==849553==  Address 0x5475018 is 0 bytes after a block of size 72 alloc'd
==849553==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==849553==    by 0x4A999DD: TStorage::ObjectAlloc(unsigned long) (in /tmp/rbld/lib/libCore.so)
==849553==    by 0x4A325EF: _GLOBAL__sub_I_TQConnection.cxx (in /tmp/rbld/lib/libCore.so)
==849553==    by 0x400647D: call_init.part.0 (dl-init.c:70)
==849553==    by 0x4006567: call_init (dl-init.c:33)
==849553==    by 0x4006567: _dl_init (dl-init.c:117)
==849553==    by 0x40202C9: ??? (in /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
==849553==    by 0x2: ???
==849553==    by 0x1FFF0004CE: ???
==849553==    by 0x1FFF0004DD: ???
==849553==    by 0x1FFF0004E0: ???

so something might be off when initializing TQConnection ?

xvfb-run: I also use that for my application, so sounds good.
But I think we should first try a bit more with valgrind to see if we can fix the current failure.

@ferdymercury
Copy link
Copy Markdown
Collaborator Author

@linev my proposal would be to merge this 'as is' now (squashed). Disabling the 1D tree test seems to avoid the free/munmap/stack smashing issue in the fit editor. One can then later go back into this TTree issue interaction with the TFitEditor, it's marked as TODO.
At least we have enabled 60% of tests that were no longer running, so it's better than nothing :)

@linev
Copy link
Copy Markdown
Member

linev commented Apr 21, 2026

At least we have enabled 60% of tests that were no longer running, so it's better than nothing :)

Yes, it sounds reasonable.

Lets wait for 6.40 branching and will continue.
But idea with xvfb-run also can be investigated.

@ferdymercury ferdymercury added this to the 6.42.00 milestone Apr 21, 2026
@ferdymercury
Copy link
Copy Markdown
Collaborator Author

But idea with xvfb-run also can be investigated.

Yep.

xvfb-run root UnitTesting.cxx

also fails with *** stack smashing detected ***: terminated

whereas xvfb-run root UnitTesting.cxx+ seems ok.

Copy link
Copy Markdown
Member

@linev linev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ideal case we should understand why one or another test-case fails in batch.
I hope we can continue in this direction and enable/provide other gui tests

@linev linev merged commit 20705fe into root-project:master Apr 23, 2026
29 of 30 checks passed
@ferdymercury ferdymercury deleted the patch-20 branch April 23, 2026 08:03
@ferdymercury
Copy link
Copy Markdown
Collaborator Author

In ideal case we should understand why one or another test-case fails in batch. I hope we can continue in this direction and enable/provide other gui tests

Yep.
It also fails in non-batch if interpreted rather than compiled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants