[problem] During the process of comparing a large number of images, the application for memory was interrupted by the operating system

problem:
During the process of comparing a large number of images, the application for memory was interrupted by the operating system

---

God ermig1979, This is my third time using this project and submitting issues. I have applied it to the entire project to identify all project images throughout the year; This time I encountered a new challenge, 3.8 million photos, which is an order of magnitude I have never done before, about 4 terabytes in size


**Application:**
   - I have more than 1,000 folders. I used Python to splice a long Linux shell command and added all the folders to the command line in the form of `-id=floder1 -id=floder2`

**Problem:**
   - There is no problem in `Search Iamges in xxx floder`. Use `top` to check the CPU and memory used. Only about 1% is used. However, in the `Load progress` process of the subsequent process, memory application failed. At 11.9%, `malloc()` had an `invilid size(unsort)`. After asking GPT, 6 points were given.
      1. Invalid parameters were passed
      2. Uninitialized variables
      3. Type conversion error
      4. Memory leak or multiple releases
      5. Insufficient system resources
      6. Compiler or library error
   The most likely one should be 5. Insufficient system resources , but when I ran more than 200,000 images before, I didn’t see how much memory or CPU was used, and the CPU usage rate was always very low.

I have two questions:
   Does the entire AntiDuplX load all data into memory for comparison? Can the current problem only be solved by reducing the sample size for execution?
   When I used C# AntiDupl, I found that there was a cache, and the cache file would be read directly after execution.




---
The specific machine parameters are as follows:

| Hardware name | Hardware model                |
| ------------- | ----------------------------- |
| CPU           | E5 2650 V4 @2.2Ghz  - 48 Core |
| Memory        | 256G                          |
| system        | openEuler22.03                |
| Hard disk     | 25T                           |

ps：Actually, openEuler 22.03 is CentOS 8, but it's a modified version from Huawei
User permission: root



![Snipaste_2024-10-25_14-46-32](https://github.com/user-attachments/assets/22088ec4-8084-495f-a5e7-e48415312ac8)

![Snipaste_2024-10-25_14-43-36](https://github.com/user-attachments/assets/6edbcd1c-4b2a-4950-9eb2-4c479c330af1)








---

праблема:
У працэсе параўнання вялікай колькасці малюнкаў заяўка на памяць была перапынена аперацыйнай сістэмай

---

Божа, ermig1979, гэта мой трэці раз, калі я выкарыстоўваю гэты праект і адпраўляю праблемы. Я прымяніў яго да ўсяго праекта, каб вызначыць усе выявы праекта на працягу года; На гэты раз я сутыкнуўся з новым выклікам, 3,8 мільёна фатаграфій, што з'яўляецца парадкам велічыні, якую я ніколі раней не рабіў, памерам каля 4 тэрабайт


**Ужыванне:**
 - У мяне больш за 1000 папак. Я выкарыстаў Python для злучэння доўгай каманды абалонкі Linux і дадаў усе тэчкі ў камандны радок у выглядзе `-id=floder1 -id=floder2`

**Праблема:**
 - Няма ніякіх праблем з `Пошукам малюнкаў у xxx floder`. Выкарыстоўвайце `top`, каб праверыць выкарыстаны працэсар і памяць. Выкарыстоўваецца толькі каля 1%. Аднак у працэсе `Прагрэс загрузкі` наступнага працэсу прымяненне памяці не атрымалася. Пры 11,9% `malloc()` меў `недапушчальны памер (несартаваць)`. Пасля пытання GPT было дадзена 6 балаў.
 1. Былі перададзены няправільныя параметры
 2. Неініцыялізаваныя зменныя
 3. Памылка пераўтварэння тыпу
 4. Уцечка памяці або некалькі выпускаў
 5. Недастаткова сістэмных рэсурсаў
 6. Памылка кампілятара або бібліятэкі
 Найбольш верагодным павінен быць 5. Недастатковыя сістэмныя рэсурсы, але калі я запускаў больш за 200 000 малюнкаў раней, я не бачыў, колькі памяці або працэсара выкарыстоўваецца, і ўзровень выкарыстання працэсара заўсёды быў вельмі нізкім.

У мяне два пытанні:
 Ці ўвесь AntiDuplX загружае ўсе дадзеныя ў памяць для параўнання? Ці можна вырашыць бягучую праблему толькі шляхам памяншэння памеру выбаркі для выканання?
 Калі я выкарыстаў C# AntiDupl, я выявіў, што ёсць кэш, і файл кэша будзе прачытаны непасрэдна пасля выканання.




===
Канкрэтныя параметры машыны наступныя:

| Назва абсталявання | Апаратная мадэль |
| ------------- | ----------------------------- |
| працэсар | E5 2650 V4 @ 2,2 ГГц - 48 ядраў |
| Памяць | 256G |
| сістэма | openEuler22.03 |
| Жорсткі дыск | 25T |

ps: Насамрэч, openEuler 22.03 - гэта CentOS 8, але гэта мадыфікаваная версія ад Huawei
Дазвол карыстальніка: root

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[problem] During the process of comparing a large number of images, the application for memory was interrupted by the operating system #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Hardware name	Hardware model
CPU	E5 2650 V4 @2.2Ghz - 48 Core
Memory	256G
system	openEuler22.03
Hard disk	25T

Назва абсталявання	Апаратная мадэль
працэсар	E5 2650 V4 @ 2,2 ГГц - 48 ядраў
Памяць	256G
сістэма	openEuler22.03
Жорсткі дыск	25T

[problem] During the process of comparing a large number of images, the application for memory was interrupted by the operating system #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions