Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions README.NotOnlyFLOPs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# What have we changed?

In the modifications of the code, we have added the required functions and variables that we have needed to implement the Task 3 and Task 4 of the Coding Challenge.

We have tried to follow the basis of the profiler structure, in order to follow an easy integration and update of future codes.

We have also followed the way of generating the diagrams, using gnuplot, following the same pattern of generating gnuplots as the current profiler.

# Changes made for Task 3

## Generating heatmaps

For the creation of the heatmaps for the different patterns, in the function `Analyze` of the `profiler.go` file we have added some function calls at STEP 6.

- The function `GetSendDataForTask3` gets the array of patterns of bytes sent from the sender ranks to the receiver ranks. These data are obtained from the `counts_reader.go` file, inside the function `LoadCallsData`, where we append the `sendCounts` map to the array of patterns in each iteration. If we have more than one communicator, this array will have all patterns of all communicators.

- As we only want the first patterns of the first communicator, with the function GetNumberSendDataForTask3(), where we get the number of the patterns in one communicator. Next we create an slice with only the patterns we want.

- With the function `GetNumberOfCalls` we get an array with the proportion of pattern calls over the total number of calls. To get it, inside the function we read the file `send-counters.job0.rank0.txt`, and we keep only the strings with the proportions (e.g., 121/964).

- Finally we have the call `Task3`, which generates the plots using the returns of the two previous functions.
This function, located at `plot.go`, iterates over the patterns, and in every iteration generates a matrix ready to be plotted, using the original matrix of bytes sent between ranks.
At the end of every iteration, generates the gnuplot.

## Visualization in the WebUI

To show the gnuplots in the WebUI, we have added a tab called Task3, with the same layout as the Calls tab.

We have only edited the files `webui.go` and `index.html`, and added the files `task3Details.html` and `task3Layout`.

Inside the files we have "replicated" the functions and variables of the call and calls data. The major difference is in the function `serviceTask3HeatmapDetailsRequest`, where we only watch for the param that indicates the number of pattern.

The two added files `task3Details.html` and `task3Layout` are also very similar to the `callDetails.html` and `callsLayout` files.

The table showing the number of patterns adapts the size with the number of patterns found, with a maximum of 10 (the 10 heaviest patterns).

# Changes made for Task 4

## Generating heatmaps

To generate the weighted sum of all patterns, we have added a call to `Task4` in the file `profiler.go`, following the `Task3` call.

All the steps are almost the same as the ones we have done for task 3.

Inside `Task4` function we first calculate the sum of all pattern cells for every cell, and then we proceed to generate the plot.

## Visualization in the WebUI

For the visualization of the generated gnuplot in the WebUI we have followed the same steps as for the task3, although now we only show one plot in the `Task4` tab.
59 changes: 59 additions & 0 deletions tools/internal/pkg/counts/counts.go
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,8 @@ type CallData struct {

// RecvData is all the data from the receive counts
RecvData Data

SccPattern map[int][]int
}

// Stats represent the stats related to counts of a specific collective operation
Expand Down Expand Up @@ -696,3 +698,60 @@ func GatherStatsFromCallData(cd map[int]*CallData, sizeThreshold int) (SendRecvS

return cs, nil
}

// GetFilePath returns the full path to the pattern file associated to a rank within a job
func GetFilePath(basedir string, jobid int, rank int) string {
return filepath.Join(basedir, fmt.Sprintf("send-counters.job%d.rank%d.txt", jobid, rank))
}

// Returns an slice with the proportion of calls for pattern, e.g. 361/964 calls
func GetNumberOfCalls(dir string, jobid int, callNum int) ([]string, error) {
var numberCallsCleaned []string
var numberTotalCallsCleaned []string

// Prepare the file to read
countsOutputFile := GetFilePath(dir, jobid, callNum)
countsFd, err := os.Open(countsOutputFile)
if err != nil {
return numberCallsCleaned, err
}
defer countsFd.Close()
countsReader := bufio.NewReader(countsFd)

// The very first line should be '#Raw counters'
line, readerErr := countsReader.ReadString('\n')
if readerErr != nil {
return numberCallsCleaned, readerErr
}
if line != "# Raw counters\n" {
return numberCallsCleaned, fmt.Errorf("wrong file format: %s", line)
}

// Read the file until EOF. In each iteration we only get the string XXX/YYY from the file and append to numberCallsCleaned
for {
line, readerErr := countsReader.ReadString('\n')
if readerErr != nil && readerErr != io.EOF {
return numberCallsCleaned, readerErr
}
if readerErr == io.EOF {
break
}

if strings.HasPrefix(line, "Alltoallv calls ") {
numberTotalCalls := strings.Split(line, "-")
numberTotalCallsCleaned_aux := strings.TrimSuffix(numberTotalCalls[1], "\n")
numberTotalCallsCleaned = append(numberTotalCallsCleaned, numberTotalCallsCleaned_aux)
}

if strings.HasPrefix(line, "Count: ") {
numberCalls := strings.Split(line, " ")
numberCallsCleaned = append(numberCallsCleaned, numberCalls[1])
}
}
for i := 0; i < len(numberCallsCleaned); i++ {
total, _ := strconv.Atoi(numberTotalCallsCleaned[i])
numberCallsCleaned[i] = fmt.Sprintf("%s/%d", numberCallsCleaned[i], total+1)
}

return numberCallsCleaned, nil
}
10 changes: 10 additions & 0 deletions tools/internal/pkg/counts/counts_reader.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ const (
rawCountsRecvCountsPrefix = "Recv counts"
)

// Global variable for task 3. The outer slice is for the number of the pattern. The map is a matrix of sender->receiver
var SendDataForTask3 []map[int][]int
var NumberSendDataForTask3 int

// AnalyzeCounts analyses the count from a count file
func AnalyzeCounts(counts []string, msgSizeThreshold int, datatypeSize int) (Stats, map[int][]int, error) {
var stats Stats
Expand Down Expand Up @@ -452,6 +456,8 @@ func LoadCallsData(sendCountsFile, recvCountsFile string, rank int, msgSizeThres
}
defer sendFile.Close()
reader := bufio.NewReader(sendFile)

NumberSendDataForTask3 = 0
for {
cd := new(CallData)
cd.SendData.CountsMetadata, readerErr = GetHeader(reader)
Expand Down Expand Up @@ -484,6 +490,10 @@ func LoadCallsData(sendCountsFile, recvCountsFile string, rank int, msgSizeThres
cd.SendData.Counts[callID] = sendCounts
}

// Append the counts found on the array for data send
NumberSendDataForTask3++
SendDataForTask3 = append(SendDataForTask3, sendCounts)

if readerErr == io.EOF {
break
}
Expand Down
9 changes: 9 additions & 0 deletions tools/internal/pkg/patterns/patterns.go
Original file line number Diff line number Diff line change
Expand Up @@ -607,3 +607,12 @@ func WriteData(patternsFd *os.File, patternsSummaryFd *os.File, patternsData Dat

return nil
}

// Returns the array of maps of the send counts. The outer array is the number of pattern
func GetSendDataForTask3() ([]map[int][]int) {
return counts.SendDataForTask3;
}

func GetNumberSendDataForTask3() (int) {
return counts.NumberSendDataForTask3;
}
Loading