Skip to content

[snippy] Add check of duplicated instructions in histogram#473

Open
Maksim-Sebelev wants to merge 1 commit into
syntacore:mainfrom
Maksim-Sebelev:fix/duplicated-instructions
Open

[snippy] Add check of duplicated instructions in histogram#473
Maksim-Sebelev wants to merge 1 commit into
syntacore:mainfrom
Maksim-Sebelev:fix/duplicated-instructions

Conversation

@Maksim-Sebelev
Copy link
Copy Markdown
Contributor

@Maksim-Sebelev Maksim-Sebelev commented May 25, 2026

Hi!

This PR address the following issue - #470

Changes

In each iteration of the

for (auto &&[Opc, WeightRes] : DecodeEntry->Decoded)

we check if Result already contains Opc:

// OpcodeHistogramNormalization::denormalize
      for (auto &&[Opc, WeightRes] : DecodeEntry->Decoded) {
        // FIXME: here we run probability visitor to collect information on
        // which opcodes were already mentioned
        Result.reinitProbabilityVisitor();
        // Check the re-assignment of 'Opc' instruction
        if (Result.contains(Opc)) { // not a problem, because SmallVector
          IO.setError("Incorrect histogram: Redeclaration of " + NameInfo.Val +
                      " weight");
          return {};
        }
        Result.insertTopOpcode(Opc, WeightRes);
      }

Result is a SmallVector, so O(n) complexity (method contains) per iteration isn't a problem. Anyway, this can be just first simple solution.

Verification

A test with a histogram containing a duplicated instruction was also added. This test and all other tests completed successfully on my device.

:)

Comment on lines +113 to +115
// FIXME: here we run probability visitor to collect information on
// which opcodes were already mentioned
Result.reinitProbabilityVisitor();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call runs expensive visitor so let's not call it inside a loop.
Here Result.contains() would not return true right after Result.insertTopOpcode(Opc, WeightRes). I am not sure how to do this better

@Maksim-Sebelev
Copy link
Copy Markdown
Contributor Author

I've come up with a smarter way to find duplicates, can you please review it?

@Maksim-Sebelev
Copy link
Copy Markdown
Contributor Author

Maksim-Sebelev commented May 30, 2026

I've added the '-Wno-error=duplicated-instructions' option. Could you please review it again?

I haven't documented this yet because I wasn't sure how and where to do it, could you please explain?

About the 4 new similar tests: I can't combine them because the error/warning messages differ by the number of duplicates (there are 2 cases: exactly 1 duplicate, and more than 1 duplicate).

:)

@Maksim-Sebelev Maksim-Sebelev requested a review from arromanoff May 30, 2026 14:51
@Maksim-Sebelev Maksim-Sebelev force-pushed the fix/duplicated-instructions branch 6 times, most recently from f37c1f1 to ecc0bf9 Compare May 31, 2026 12:51
Comment on lines +41 to +51
sections:
- name: 1
VMA: 0x210000
SIZE: 0x100000
LMA: 0x210000
ACCESS: rx
- name: 2
VMA: 0x400000
SIZE: 0x100000
LMA: 0x400000
ACCESS: rw
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not duplicate this section and just reuse Inputs/section.yaml file

- [LW, 1.0]
- [SW, 1.0]

# CHECK: error: (duplicated-instructions) multiple weight definition of DUPLICATED_INSTRUCTION: weight of this instruction will be initialized to the first weight in the histogram.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks weird.
Why do we say "weight of this instruction will be initialized to the first weight in the histogram." if it is an error?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, currently, when an opcode appears repeatedly, their weights are summed - this happens after adding histogram-patterns support. For example

histogram-patterns:
  - AddSub: "ADD | SUB"
histogram:
  - [pattern: AddSub, 1.0]
  - [ADD, 1.0]
  - [ADD, 1.0]
  - [SUB, 1.0]|

In the example above
AddWeight = 1.0 + 1.0 + 1 / 2 (from pattern) = 5 / 2
SubWeight = 1.0 + 1 / 2 (from pattern) = 3 / 2
TotalWeight = AddWeight + SubWeight = 8 / 2 = 4

And here's the question: is this patch even needed? After all, opcodes can now legitimately repeat (given patterns), and the total weight of an opcode is calculated from its weight in the top-level histogram and from patterns.
@arromanoff what do you think?

Comment on lines +426 to +435
if (DuplicatedInstructionsCont.empty()) // no duplicates
return;
std::ostringstream DuplicatedInstructionsMsg;
auto DuplicatedInstructionsIt = DuplicatedInstructionsCont.begin();
DuplicatedInstructionsMsg << "multiple weight definition of "
<< *DuplicatedInstructionsIt++;
std::for_each(DuplicatedInstructionsIt, DuplicatedInstructionsCont.end(),
[&DuplicatedInstructionsMsg](std::string_view Instruction) {
DuplicatedInstructionsMsg << ", " << Instruction;
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have llvm::interleaveComma function from STLExtras.h header.
And we don't use standard streams, llvm has its' own raw_string_ostream

- [C_LUI_HINT, 4.26]
- [C_MV, 4.26]
- [C_MV_HINT, 4.26]
- [C_NOP, 4.26]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant change

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant change

No, it was a duplicate, so this test has failed after chages from this PR (or I can use here ‘-Wno-error=duplicated-instructions‘ in options)

- [C_ADDI, 4.66]
- [C_ADDIW, 4.66]
- [C_ADDI_HINT_IMM_ZERO, 4.66]
- [C_NOP, 4.66]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I answered it here

Comment on lines +1 to +7
# RUN: sed -e s/DUPLICATED_INSTRUCTION/ADD/ %s > %t.1.yaml
# RUN: not llvm-snippy %t.1.yaml --model-plugin=None -mtriple=riscv64 \
# RUN: |& FileCheck %t.1.yaml

# RUN: sed -e s/DUPLICATED_INSTRUCTION/ADDI/ %s > %t.2.yaml
# RUN: not llvm-snippy %t.2.yaml --model-plugin=None -mtriple=riscv64 \
# RUN: |& FileCheck %t.2.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these tests are excessive. Can we stop at 3-5 opcodes?

Result.reinitProbabilityVisitor();
return Result;
}
std::unordered_set<std::string_view> UniqueInstructions,
Copy link
Copy Markdown

@VictorBerbenets VictorBerbenets Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try using LLVM types where possible: StringSet instead of unordered_set (I think will be good here), and StringRef instead of string_view

return Result;
}
std::unordered_set<std::string_view> UniqueInstructions,
DuplicatedInstructions;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to declare one variable per line for readability

Comment thread llvm/tools/llvm-snippy/lib/Config/OpcodeHistogram.cpp Outdated
- [LW, 1.0]
- [SW, 1.0]

# CHECK: error: (duplicated-instructions) multiple weight definition of DUPLICATED_INSTRUCTION: weight of this instruction will be initialized to the first weight in the histogram.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, currently, when an opcode appears repeatedly, their weights are summed - this happens after adding histogram-patterns support. For example

histogram-patterns:
  - AddSub: "ADD | SUB"
histogram:
  - [pattern: AddSub, 1.0]
  - [ADD, 1.0]
  - [ADD, 1.0]
  - [SUB, 1.0]|

In the example above
AddWeight = 1.0 + 1.0 + 1 / 2 (from pattern) = 5 / 2
SubWeight = 1.0 + 1 / 2 (from pattern) = 3 / 2
TotalWeight = AddWeight + SubWeight = 8 / 2 = 4

And here's the question: is this patch even needed? After all, opcodes can now legitimately repeat (given patterns), and the total weight of an opcode is calculated from its weight in the top-level histogram and from patterns.
@arromanoff what do you think?

@Maksim-Sebelev
Copy link
Copy Markdown
Contributor Author

Done

Result.reinitProbabilityVisitor();
return Result;
}
StringSet<> UniqueInstructions;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpcodeHistogram class has TopOpcodes field to keep all top opcodes. Please, use this field and don't create redundant UniqueInstructions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. insertTopNode inserts into TopOpcodes member and we can ask it for contains. This way we don't need to add these UniqueInstructions and DuplicatedInstructions. Yes, we will report one instruction at a time but so be it. After internal discussion we decided that there is no need to report all at once

Result.insertTopOpcode(Tgt.getInternalOpcode(Opc), WeightRes);
}
}
DuplicatedInstructions.begin();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Result.reinitProbabilityVisitor();
return Result;
}
StringSet<> UniqueInstructions;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. insertTopNode inserts into TopOpcodes member and we can ask it for contains. This way we don't need to add these UniqueInstructions and DuplicatedInstructions. Yes, we will report one instruction at a time but so be it. After internal discussion we decided that there is no need to report all at once

@Maksim-Sebelev
Copy link
Copy Markdown
Contributor Author

Done

@Maksim-Sebelev Maksim-Sebelev force-pushed the fix/duplicated-instructions branch from a82c037 to 3658163 Compare June 3, 2026 18:17
break;
}

auto &&NameRef = NameInfo.Val;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why &&?

continue;
}
}
auto Name = NameInfo.Val;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to create NameRef, you already have Name. Leave just one of them

Comment on lines +107 to +108
// FIXME: need to do something when OpcOpt is empty std::optional ???
// now it just ignored
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NameRef can be the name of the pattern - in this case you pass to OpcodeCache not an Opcode name and it gives you nullopt

@Maksim-Sebelev
Copy link
Copy Markdown
Contributor Author

Fixed

@VictorBerbenets
Copy link
Copy Markdown

In general LGTM

@Maksim-Sebelev Maksim-Sebelev force-pushed the fix/duplicated-instructions branch from 45c7d95 to 428f80f Compare June 5, 2026 20:52
@Maksim-Sebelev Maksim-Sebelev force-pushed the fix/duplicated-instructions branch from 428f80f to 32ff28f Compare June 5, 2026 21:00
@Maksim-Sebelev Maksim-Sebelev reopened this Jun 6, 2026
@Maksim-Sebelev
Copy link
Copy Markdown
Contributor Author

My bad, I've broken my branch. Now it's fixed, the status of the branch is similar to the last commit before this message

@Maksim-Sebelev Maksim-Sebelev force-pushed the fix/duplicated-instructions branch from 97ed387 to b9aee52 Compare June 7, 2026 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants