During the writing of simple laser scanner obstacles detector for KF Hungarian node, I've experienced very slow performance of main callback from kf_hungarian_node.py, which is around ~1.3 seconds per each callback call. Typically, laser scanner produces 360 points (obstacles) which makes the following part of code to be the performance bottleneck of O(360*360) complexity:
cost = np.zeros((num_of_obstacle, num_of_detect))
for i in range(num_of_obstacle):
for j in range(num_of_detect):
cost[i, j] = self.obstacle_list[i].distance(detections[j])
This part executes average in ~1 second on my PC. Allowing the simplification of np.linalg.norm() heuristic to abs(delta_x) + abs(delta_y) + abs(delta_z) decreases the execution time of the part of code above up to ~0.3 seconds which is better, but not enough for real-time processing of laser scanners, operating e.g. at 5-50Hz frequencies.
The simple experiment repeating the part of hot code from above on both Python and C++ interfaces, attached to current ticket as hotcode_check.zip archive.
Execution estimates gives ~25x speed increase when switching on C++ for this hot part of the code:
[leha@leha-PC hotcode_check]$ ./a.out
dt = 0.0115969
[leha@leha-PC hotcode_check]$ ./a.out
dt = 0.0117751
[leha@leha-PC hotcode_check]$ ./a.out
dt = 0.0108645
[leha@leha-PC hotcode_check]$ python3 ./check.py
dt = 0.28881096839904785
[leha@leha-PC hotcode_check]$ python3 ./check.py
dt = 0.29091525077819824
[leha@leha-PC hotcode_check]$ python3 ./check.py
dt = 0.28969860076904297
Therefore, I am proposing (and ready to try) to re-write the KF Hungarian node on a C++ which is expected to make callbacks to operate much faster.
The following table shows the migration of base methods for Python->C++ KF node code:
| Python |
C++ |
Comment |
| do_transform_point() |
doTransform() |
TF2 standard function |
| do_transform_vector3() |
doTransform() |
TF2 standard function |
| scipy.optimize.linear_sum_assignment() |
??? |
Could be re-written from the scratch, use already known Hungarian algorithm implementation, or OR-Tools from Google |
| cv2.KalmanFilter.predict() |
const Mat& cv::KalmanFilter::predict(const Mat & control = Mat()) |
OpenCV Kalman predict |
| cv2.KalmanFilter.correct() |
const Mat& cv::KalmanFilter::correct(const Mat &measurement) |
OpenCV Kalman update |
What do you think about such activity?
During the writing of simple laser scanner obstacles detector for KF Hungarian node, I've experienced very slow performance of main callback from
kf_hungarian_node.py, which is around~1.3 secondsper each callback call. Typically, laser scanner produces 360 points (obstacles) which makes the following part of code to be the performance bottleneck ofO(360*360)complexity:This part executes average in
~1 secondon my PC. Allowing the simplification ofnp.linalg.norm()heuristic toabs(delta_x) + abs(delta_y) + abs(delta_z)decreases the execution time of the part of code above up to~0.3 secondswhich is better, but not enough for real-time processing of laser scanners, operating e.g. at 5-50Hz frequencies.The simple experiment repeating the part of hot code from above on both Python and C++ interfaces, attached to current ticket as hotcode_check.zip archive.
Execution estimates gives
~25xspeed increase when switching on C++ for this hot part of the code:Therefore, I am proposing (and ready to try) to re-write the KF Hungarian node on a C++ which is expected to make callbacks to operate much faster.
The following table shows the migration of base methods for Python->C++ KF node code:
What do you think about such activity?