Skip to content

Metric related classes for PyAgent infrastructure.#5

Open
fkeceli wants to merge 5 commits into
cmcantalupo:issue-1886from
fkeceli:ms226-dev
Open

Metric related classes for PyAgent infrastructure.#5
fkeceli wants to merge 5 commits into
cmcantalupo:issue-1886from
fkeceli:ms226-dev

Conversation

@fkeceli
Copy link
Copy Markdown

@fkeceli fkeceli commented Oct 13, 2021

First step to creating a toolchain for MS 226 Python Agents.

Signed-off-by: Keceli, Fuat <fuat.keceli@intel.com>
Signed-off-by: Keceli, Fuat <fuat.keceli@intel.com>
Signed-off-by: Keceli, Fuat <fuat.keceli@intel.com>
…or signal and metric.

Signed-off-by: Keceli, Fuat <fuat.keceli@intel.com>
Copy link
Copy Markdown

@bgeltz bgeltz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to break this giant file up into smaller ones as well.

Comment thread geopmctlpy/geopmctl/metric.py Outdated
Comment on lines +50 to +51
descriptor (tuple of str, int, int): Tuple of name of the GEOPM signal, a domain enumeration
from the geopmdpy.topo package and instance number.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
descriptor (tuple of str, int, int): Tuple of name of the GEOPM signal, a domain enumeration
from the geopmdpy.topo package and instance number.
descriptor (tuple of str, int, int): Tuple of name of the GEOPM signal, a domain type and a domain index
from the geopmdpy.topo package.

In all our other documentation, we refer to signals as a tuple of the signal name, domain type, and domain index. Is this what you meant here?

"Instance" is confusing. Please use domain type and domain index if that's what you meant in all these comments.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly what I mean, I will change the terminology I use.

Comment on lines +76 to +79
CONSTANT = 0
MONOTONE = 1
VARIABLE = 2
LABEL = 3
Copy link
Copy Markdown

@bgeltz bgeltz Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a duplicate of the enum in IOGroup.hpp. It seems like there's a piece missing (through the cffi wrappers) that would allow you to use the enum in the IOGroup header (without having to duplicate it here), but it's not clear to me how we'd do this just yet.

Please include a comment for now that this must be kept in sync with the enum in IOGroup.hpp otherwise we're going to have problems. I'd love to find a way to avoid this duplicate.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I absolutely agree, I just didn't want to solve this problem right now.

Comment on lines +668 to +669
else:
return_val = self._formatter.format(value)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case for this one?
I don't see it being used anywhere.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In extending classes I found it useful to have alternative ways of defining how the format is going to look like for convenience. Example:

"Total time:           {:.4f}secs" # Uses .format
lambda x: f'Avg core frequency:   {x/1e9:.2f}GHz' # uses lambda function

return self._controls


class Metric:
Copy link
Copy Markdown

@bgeltz bgeltz Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class has a lot of responsibilities. I think it could be broken up a bit, and made to align better with the C++ code that's similar.

First, we've got the notion of Signals that are derived from multiple signals in 2 different forms in the C++ code: composite signals and combined signals. Here are some examples:

service/src/DerivativeSignal.hpp
service/src/DifferenceSignal.hpp
service/src/MultiplicationSignal.hpp

These composite signals derive from Signal and perform the necessary conversion for their type (e.g. difference or multiplication) in their read() and sample() methods. I think you can do something similar here and avoid putting all this logic into one class.

Combined signals (derived from CombinedSignal.cpp) can probably be ignored for the moment as they aren't being used. They provide the ability to actually combine 2 named signals into a new named signal. This may or may not be useful here.

On the topic of aggregation/accumulation, in the C++ code this is handled by the SampleAggregator and ProcessRegionAggregator. I realize that the Python runtime is a simpler implementation of what the C++ Controller does, so you can probably just focus on the SampleAggregator. The SampleAggregator maintains the state of accumulated values that will ultimately be written to the Report.

It seems like a separate class should be created here with the responsibility of maintaining the aggregated/accumulated state. Further the Reporting and Tracing should also be broken out into their own classes. I think you, Chris, and I might want to get together and talk about the approach here.

In general, this looks pretty different than how we handle things in the C++ code. It will be difficult to maintain.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we should avoid the word "metric" if we really mean CompositeSignals and lists of signals. It's confusing.

Copy link
Copy Markdown
Author

@fkeceli fkeceli Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something maybe we can discuss in a meeting. I use the terminology Signal for things that I sample, however Metric is for things that I keep track of (both single Signal and composite/multiple Signal's) -- the word "metric" was something Chris suggested along the same lines of thinking. Metric class provides a lot of functions but the actual code is split between itself and Accumulator. As for signal: I already created a MonotoneSignal since it was needed. Anyway, looks like we need to have a discussion.

self._last_value = value


class Algorithm(ABC):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a separate object for the control algorithm for the Agent?

This really looks like the base class Agent to me. Can you just use the runtime.py Agent class?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was attempt to create a container that we can also have versioning, etc. in the future (for different NN training). Maybe, it is redundant and we can just do everything in Agent as you mentioned.

LABEL = 3


class SignalList:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a helper object for the Agents. This should go in the runtime.py if that's the case.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is a container of Signal objects. Provides methods to run reset() and update on the Signal instances.

return return_val


class MetricList:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why this class is necessary. If it's just a matter of tracking time, can't you just add it to the SignalList?

The reporting and tracing capabilities should me moved to a different class.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done this way because of the distinction I created between Signal and Metric. So we should discuss this in that context.

Signed-off-by: Keceli, Fuat <fuat.keceli@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants