Conversation
r0qs
left a comment
There was a problem hiding this comment.
Hi, thank you for your contribution. I only have some comments :)
| const readFile = util.promisify(fs.readFile) | ||
| const readdir = util.promisify(fs.readdir) |
There was a problem hiding this comment.
Why not use fs.Promises instead?
E.g.
const { readdir, readFile } = require('node:fs/promises')
// ...
const files = await readdir(path)| ) | ||
| const pool = workerpool.pool('./update-worker.js', { | ||
| maxWorkers: options.maxWorkerProcesses, | ||
| // Threads would be ideal, but do not work as expected due to an issue with the swarmhash module |
There was a problem hiding this comment.
What issue? Could you provide more information about the problem using the thread workerType?
| pool.terminate() | ||
| } | ||
|
|
||
| main(); No newline at end of file |
There was a problem hiding this comment.
Remove the extra semicolon. Also, please remember to run npm run lint -- --fix
| main(); | |
| main() |
| const latestRelease = parsedList | ||
| .slice() | ||
| .reverse() | ||
| .filter(function (listEntry) { return listEntry.prerelease === null }) |
There was a problem hiding this comment.
It seems that you changed the behavior of the processDir function. Some entries that do not have a prerelease now contains the prerelease field set to null in the final list.json instead of not having the field. The prerelease field must not be set in the parsedList when there is not prerelease.
For example, this one:
{
"path": "soljson-v0.1.1+commit.6ff4cd6.js",
"version": "0.1.1",
"prerelease": null,
"build": "commit.6ff4cd6",
"longVersion": "0.1.1+commit.6ff4cd6",
"keccak256": "0xd8b8c64f4e9de41e6604e6ac30274eff5b80f831f8534f0ad85ec0aff466bb25",
"sha256": "0xfc54135edfe07d835216f87fa6c3b9049e1415c2825027b926daad018e09269f",
"urls": [
"bzzr://8f3c028825a1b72645f46920b67dca9432a87fc37a8940a2b2ce1dd6ddc2e29b",
"dweb:/ipfs/QmPPGxsMtQSEUt9hn9VAtrtBTa1u9S5KF1myw78epNNFkx"
]
},
should be without the prerelease field:
{
"path": "soljson-v0.1.1+commit.6ff4cd6.js",
"version": "0.1.1",
"build": "commit.6ff4cd6",
"longVersion": "0.1.1+commit.6ff4cd6",
"keccak256": "0xd8b8c64f4e9de41e6604e6ac30274eff5b80f831f8534f0ad85ec0aff466bb25",
"sha256": "0xfc54135edfe07d835216f87fa6c3b9049e1415c2825027b926daad018e09269f",
"urls": [
"bzzr://8f3c028825a1b72645f46920b67dca9432a87fc37a8940a2b2ce1dd6ddc2e29b",
"dweb:/ipfs/QmPPGxsMtQSEUt9hn9VAtrtBTa1u9S5KF1myw78epNNFkx"
]
},
| const batchPromises = [] | ||
|
|
||
| for (let i = 0; i < values.length; i += batchSize) { | ||
| batchPromises.push(pool.exec('workerMain', [dir, values.slice(i, i + batchSize), oldList])) |
There was a problem hiding this comment.
Did you run any profiling to see how your changes improved the performance of the current code? The execution time of your implementation does not seem to have any significant improvement in my machine:
Current implementation: 702.21s user 16.80s system 106% cpu 11:15.54 total
PR implementation: 670.77s user 17.92s system 104% cpu 11:01.69 total
Maybe we could also move some of the I/O operations to the workers instead of only moving the hash operations? Like, we could have a worker per binary directory and then multiple child-workers per-directory (each one reading/processing a subset of files in each directory). The main worker thread in each directory would then aggregate the "child/file workers" results and write the respective list.[json,js,txt] files or something like that. What do you think?
I ran a quick profiling here, which seems to hint that your code spends most of the time waiting for file descriptors to become ready. Not sure if this is actually the case though:
PR code:
[C++]:
ticks total nonlib name
1987 71.3% 93.9% epoll_pwait@@GLIBC_2.6
21 0.8% 1.0% __write@@GLIBC_2.2.5
14 0.5% 0.7% fwrite@@GLIBC_2.2.5
12 0.4% 0.6% recvmsg@@GLIBC_2.2.5
11 0.4% 0.5% std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
9 0.3% 0.4% std::ostream::sentry::sentry(std::ostream&)
8 0.3% 0.4% __libc_malloc@@GLIBC_2.2.5
[Summary]:
ticks total nonlib name
37 1.3% 1.7% JavaScript
2078 74.5% 98.3% C++
72 2.6% 3.4% GC
673 24.1% Shared libraries
Current code:
[C++]:
ticks total nonlib name
18358 2.9% 4.9% epoll_pwait@@GLIBC_2.6
901 0.1% 0.2% __dynamic_cast
793 0.1% 0.2% __cxxabiv1::__vmi_class_type_info::__do_dyncast(long, __cxxabiv1::__class_type_info::__sub_kind, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info::__dyncast_result&) const
711 0.1% 0.2% __libc_malloc@@GLIBC_2.2.5
[Summary]:
ticks total nonlib name
1923 0.3% 0.5% JavaScript
24553 3.9% 6.5% C++
12993 2.1% 3.5% GC
256700 40.5% Shared libraries
349903 55.3% Unaccounted
I think it would be beneficial to have some benchmark showing the performance improvement of your implementation.
This addresses argotorg/solidity#12421
It does so by adding the
workerpoolnode module, moving a bunch of code into a separate worker file and running that code in the worker pool. It also adds a--max-worker-processesflag to configure the number of workers.Edit: Missed a pool.terminate() call initially. Had to do a bit of refactoring in order to be able to throw that in. As it was, there was no way to know when all of the async branches were done. This last commit makes the diff quite a bit more unreadable but that was rather unavoidable. The script is a bit dated and uses a combination of callbacks and async/await.