Metric related classes for PyAgent infrastructure.#5
Conversation
Signed-off-by: Keceli, Fuat <fuat.keceli@intel.com>
Signed-off-by: Keceli, Fuat <fuat.keceli@intel.com>
Signed-off-by: Keceli, Fuat <fuat.keceli@intel.com>
…or signal and metric. Signed-off-by: Keceli, Fuat <fuat.keceli@intel.com>
bgeltz
left a comment
There was a problem hiding this comment.
It would be great to break this giant file up into smaller ones as well.
| descriptor (tuple of str, int, int): Tuple of name of the GEOPM signal, a domain enumeration | ||
| from the geopmdpy.topo package and instance number. |
There was a problem hiding this comment.
| descriptor (tuple of str, int, int): Tuple of name of the GEOPM signal, a domain enumeration | |
| from the geopmdpy.topo package and instance number. | |
| descriptor (tuple of str, int, int): Tuple of name of the GEOPM signal, a domain type and a domain index | |
| from the geopmdpy.topo package. |
In all our other documentation, we refer to signals as a tuple of the signal name, domain type, and domain index. Is this what you meant here?
"Instance" is confusing. Please use domain type and domain index if that's what you meant in all these comments.
There was a problem hiding this comment.
That's exactly what I mean, I will change the terminology I use.
| CONSTANT = 0 | ||
| MONOTONE = 1 | ||
| VARIABLE = 2 | ||
| LABEL = 3 |
There was a problem hiding this comment.
This is a duplicate of the enum in IOGroup.hpp. It seems like there's a piece missing (through the cffi wrappers) that would allow you to use the enum in the IOGroup header (without having to duplicate it here), but it's not clear to me how we'd do this just yet.
Please include a comment for now that this must be kept in sync with the enum in IOGroup.hpp otherwise we're going to have problems. I'd love to find a way to avoid this duplicate.
There was a problem hiding this comment.
I absolutely agree, I just didn't want to solve this problem right now.
| else: | ||
| return_val = self._formatter.format(value) |
There was a problem hiding this comment.
What's the use case for this one?
I don't see it being used anywhere.
There was a problem hiding this comment.
In extending classes I found it useful to have alternative ways of defining how the format is going to look like for convenience. Example:
"Total time: {:.4f}secs" # Uses .format
lambda x: f'Avg core frequency: {x/1e9:.2f}GHz' # uses lambda function
| return self._controls | ||
|
|
||
|
|
||
| class Metric: |
There was a problem hiding this comment.
This class has a lot of responsibilities. I think it could be broken up a bit, and made to align better with the C++ code that's similar.
First, we've got the notion of Signals that are derived from multiple signals in 2 different forms in the C++ code: composite signals and combined signals. Here are some examples:
service/src/DerivativeSignal.hpp
service/src/DifferenceSignal.hpp
service/src/MultiplicationSignal.hpp
These composite signals derive from Signal and perform the necessary conversion for their type (e.g. difference or multiplication) in their read() and sample() methods. I think you can do something similar here and avoid putting all this logic into one class.
Combined signals (derived from CombinedSignal.cpp) can probably be ignored for the moment as they aren't being used. They provide the ability to actually combine 2 named signals into a new named signal. This may or may not be useful here.
On the topic of aggregation/accumulation, in the C++ code this is handled by the SampleAggregator and ProcessRegionAggregator. I realize that the Python runtime is a simpler implementation of what the C++ Controller does, so you can probably just focus on the SampleAggregator. The SampleAggregator maintains the state of accumulated values that will ultimately be written to the Report.
It seems like a separate class should be created here with the responsibility of maintaining the aggregated/accumulated state. Further the Reporting and Tracing should also be broken out into their own classes. I think you, Chris, and I might want to get together and talk about the approach here.
In general, this looks pretty different than how we handle things in the C++ code. It will be difficult to maintain.
There was a problem hiding this comment.
I also think we should avoid the word "metric" if we really mean CompositeSignals and lists of signals. It's confusing.
There was a problem hiding this comment.
This is something maybe we can discuss in a meeting. I use the terminology Signal for things that I sample, however Metric is for things that I keep track of (both single Signal and composite/multiple Signal's) -- the word "metric" was something Chris suggested along the same lines of thinking. Metric class provides a lot of functions but the actual code is split between itself and Accumulator. As for signal: I already created a MonotoneSignal since it was needed. Anyway, looks like we need to have a discussion.
| self._last_value = value | ||
|
|
||
|
|
||
| class Algorithm(ABC): |
There was a problem hiding this comment.
Why do we need a separate object for the control algorithm for the Agent?
This really looks like the base class Agent to me. Can you just use the runtime.py Agent class?
There was a problem hiding this comment.
This was attempt to create a container that we can also have versioning, etc. in the future (for different NN training). Maybe, it is redundant and we can just do everything in Agent as you mentioned.
| LABEL = 3 | ||
|
|
||
|
|
||
| class SignalList: |
There was a problem hiding this comment.
This seems to be a helper object for the Agents. This should go in the runtime.py if that's the case.
There was a problem hiding this comment.
But it is a container of Signal objects. Provides methods to run reset() and update on the Signal instances.
| return return_val | ||
|
|
||
|
|
||
| class MetricList: |
There was a problem hiding this comment.
It's not clear to me why this class is necessary. If it's just a matter of tracking time, can't you just add it to the SignalList?
The reporting and tracing capabilities should me moved to a different class.
There was a problem hiding this comment.
Done this way because of the distinction I created between Signal and Metric. So we should discuss this in that context.
Signed-off-by: Keceli, Fuat <fuat.keceli@intel.com>
First step to creating a toolchain for MS 226 Python Agents.