Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
target
mwa_full_embedded_element_pattern.h5
/include/mwa_hyperbeam.h
/comparisons/
/examples/fee
/examples/fee_parallel
/examples/fee_parallel_omp
Expand Down
64 changes: 33 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,37 +222,39 @@ maturin develop --release -b pyo3 --features=python,hdf5-static --strip

## Comparing with other FEE beam codes

Below is a table comparing other implementations of the FEE beam code. All
benchmarks were done with unique azimuth and zenith angle directions, and all
on the same system. The CPU is a Ryzen 9 3900X, which has 12 cores and SMT (24
threads). The CUDA benchmarks uses an NVIDIA GeForce RTX 2070. All benchmarks
were done in serial, unless indicated by "parallel". Python times were taken
by running `time.time()` before and after the calculations. Memory usage is
measured by running `time -v` on the command (not the `time` associated with
your shell; this is usually at `/usr/bin/time`).

| Code | Number of directions | Duration | Max. memory usage |
|:-----------------|---------------------:|---------:|------------------:|
| [mwa_pb](https://github.com/MWATelescope/mwa_pb) | 500 | 98.8 ms | 134.6 MiB |
| | 100000 | 13.4 s | 5.29 GiB |
| | 1000000 | 139.8 s | 51.6 GiB |
| mwa-reduce (C++) | 500 | 115.2 ms | 48.9 MiB |
| | 10000 | 2.417 s | 6.02 GiB |
| mwa_hyperbeam | 500 | 10.0 ms | 9.75 MiB |
| | 100000 | 1.82 s | 11.3 MiB |
| | 1000000 | 18.1 s | 25.0 MiB |
| mwa_hyperbeam (parallel) | 1000000 | 1.55 s | 88.8 MiB |
| mwa_hyperbeam (via python) | 500 | 20.5 ms | 44.2 MiB |
| | 100000 | 3.70 s | 45.4 MiB |
| | 1000000 | 37.2 s | 59.0 MiB |
| mwa_hyperbeam (via python, parallel) | 1000000 | 2.49 s | 246.6 MiB |
| mwa_hyperbeam (CUDA, single precision) | 1000000 | 450 ms | 253.8 MiB |
| | 1e8 | 3.08 s | 14.26 GiB |

Not sure what's up with the C++ code. Maybe I'm calling `CalcJonesArray` wrong,
but it uses a huge amount of memory. In any case, `hyperbeam` seems to be
roughly 10x faster. If you know how to compare with `Everybeam`, please let me
know.
A high-level summary is below. Further details and info on how these results
were obtained can be found [here](./comparisons).

| Package | Config | Number of directions | Duration | Max. memory usage |
|:--------|:-------|---------------------:|---------:|------------------:|
| [mwa_pb](https://github.com/MWATelescope/mwa_pb) | serial | 1 | 14.85 ms | 153 MiB |
| | serial | 1,000 | 130.1 ms | 201 MiB |
| | serial | 300,000 | 37.94 s | 14.4 GiB |
| [pyuvdata](https://github.com/RadioAstronomySoftwareGroup/pyuvdata) | serial | 32,760 | 7.446 s | 653 MiB |
| | serial | 130,320 | 11.85 s | 1.92 GiB |
| [EveryBeam](https://git.astron.nl/RD/EveryBeam) | serial | 1 | 114 µs | 61.7 MiB |
| | serial | 1,000 | 103.9 ms | 61.9 MiB |
| | serial | 300,000 | 31.16 s | 71.1 MiB |
| mwa_hyperbeam | serial | 1 | 32.54 µs | 11.1 MiB |
| | serial | 1,000 | 29.02 ms | 13.3 MiB |
| | parallel | 1,000 | 4.598 ms | 13.5 MiB |
| | serial | 300,000 | 8.610 s | 33.9 MiB |
| | parallel | 300,000 | 596.1 ms | 34.6 MiB |
| | CUDA | 300,000 | 63.70 ms | 134 MiB |
| | CUDA | 999,999 | 164.4 ms | 195 MiB |

| | mwa_pb | pyuvdata | EveryBeam | mwa_hyperbeam |
|-------------------------------------------|:------------------:|:------------------:|:---------:|:------------------:|
| Can be run in parallel? | :x: | :x: | :x: | :white_check_mark: |
| Parallactic-angle correction? | :x: | :x: | :x: | :white_check_mark: |
| GPU (CUDA/HIP) support? | :x: | :x: | :x: | :white_check_mark: |
| Supports MWA analytic beam? | :white_check_mark: | :x: | :x: | :white_check_mark: |
| Supports per-dipole gains? | :white_check_mark: | :white_check_mark: | :x: | :white_check_mark: |
| Python interface? | :white_check_mark: | :white_check_mark: | :x:* | :white_check_mark: |
| Can be called from other languages via C? | :x: | :x: | :x: | :white_check_mark: |
| Supports MWA CRAM tile? | :x: | :x: | :x: | :white_check_mark: |

*: `EveryBeam` has a Python interface, but it does not support the MWA beam.

## Troubleshooting

Expand Down
Binary file added comparisons/1090008640_2s_40kHz.trunc.ms.tar.gz
Binary file not shown.
281 changes: 281 additions & 0 deletions comparisons/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,281 @@
# Comparisons between a few implementations of the MWA FEE beam code

## Performance summary

| Package | Config | Number of directions | Duration | Max. memory usage |
|:--------|:-------|---------------------:|---------:|------------------:|
| [mwa_pb](https://github.com/MWATelescope/mwa_pb) | serial | 1 | 14.85 ms | 153 MiB |
| | serial | 1,000 | 130.1 ms | 201 MiB |
| | serial | 300,000 | 37.94 s | 14.4 GiB |
| [pyuvdata](https://github.com/RadioAstronomySoftwareGroup/pyuvdata) | serial | 32,760 | 7.446 s | 653 MiB |
| | serial | 130,320 | 11.85 s | 1.92 GiB |
| [EveryBeam](https://git.astron.nl/RD/EveryBeam) | serial | 1 | 114 µs | 61.7 MiB |
| | serial | 1,000 | 103.9 ms | 61.9 MiB |
| | serial | 300,000 | 31.16 s | 71.1 MiB |
| mwa_hyperbeam | serial | 1 | 32.54 µs | 11.1 MiB |
| | serial | 1,000 | 29.02 ms | 13.3 MiB |
| | parallel | 1,000 | 4.598 ms | 13.5 MiB |
| | serial | 300,000 | 8.610 s | 33.9 MiB |
| | parallel | 300,000 | 596.1 ms | 34.6 MiB |
| | serial | 999,999 | 28.66 s | 87.3 MiB |
| | parallel | 999,999 | 1.986 s | 90.3 MiB |
| | CUDA | 300,000 | 63.70 ms | 134 MiB |
| | CUDA | 999,999 | 164.4 ms | 195 MiB |
| | serial, Python | 1 | 36.95 µs | 42.5 MiB |
| | serial, Python | 1,000 | 31.57 ms | 44.9 MiB |
| | parallel, Python | 1,000 | 3.890 ms | 45.5 MiB |
| | serial, Python | 300,000 | 9.393 s | 93.5 MiB |
| | parallel, Python | 300,000 | 660.3 ms | 96.3 MiB |
| | serial, Python | 999,999 | 31.05 s | 211 MiB |
| | parallel, Python | 999,999 | 2.191 s | 212 MiB |
| | CUDA, Python | 999,999 | 1.342 s | 305 MiB |

All of the durations refer to "hot cache" times. See the full print out of the
benchmarks run and the system details at the bottom of this page. You can verify
these numbers by running `run.sh`.

## Compared packages and their features

Please file an issue if this information is incorrect.

| | mwa_pb | pyuvdata | EveryBeam | mwa_hyperbeam |
|-------------------------------------------|:------------------:|:------------------:|:---------:|:------------------:|
| Can be run in parallel? | :x: | :x: | :x: | :white_check_mark: |
| Parallactic-angle correction? | :x: | :x: | :x: | :white_check_mark: |
| GPU (CUDA/HIP) support? | :x: | :x: | :x: | :white_check_mark: |
| Supports MWA analytic beam? | :white_check_mark: | :x: | :x: | :white_check_mark: |
| Supports per-dipole gains? | :white_check_mark: | :white_check_mark: | :x: | :white_check_mark: |
| Python interface? | :white_check_mark: | :white_check_mark: | :x:* | :white_check_mark: |
| Can be called from other languages via C? | :x: | :x: | :x: | :white_check_mark: |
| Supports MWA CRAM tile? | :x: | :x: | :x: | :white_check_mark: |

*: `EveryBeam` has a Python interface, but it does not support the MWA beam.

## Compared packages and methodology

To my knowledge, `EveryBeam` only takes RADec coordinates to do its simulations,
whereas others use azimuth and elevation. Also unlike others, `EveryBeam`
needs a measurement set for its work. To keep things as fair as possible, I've
provided a stripped-down measurement set and given it to `EveryBeam` with RADec
(0, -27). This corresponds to a specific AzEl which I've then used in other
packages.

### mwa_pb

Installed with `pip install mwa_pb`.

The latest pip-installable version needs a hack to work correctly with the
latest `numpy`. Change all instances of `numpy.complex` and `numpy.complex64` to
just `complex`. Also this line

```python
if not interp and "mwa_hyperbeam" in sys.modules:
```

needs to be changed to

```python
if False:
```

to actually use the pure-Python `mwa_pb`, otherwise `hyperbeam` is used
internally :)

Finally, the results here are a little misleading. All other examples use the
same azimuth and elevation for their simulations, and therefore the resulting
Jones matrices are all the same. This is fine, as all other code doesn't check
the input to de-duplicate (it's the caller's responsibility). However, `mwa_pb`
does notice this duplication. To actually get it to do some work, it is called
with different azimuths and elevations to other packages, which is a little
unfair.

In addition, `mwa_pb` has a nice interpolation feature, which reduces
the number of FEE simulations to be made by gridding coarsely and using "nearby"
simulations instead. Using interpolation makes the code quite fast, but would
also be a very unfair comparison, as other packages don't have interpolation and
using it compromises the accuracy of the results.

### EveryBeam

Version 0.5.3, alongside `casacore` 3.5.0. I had to manually install the MWA
header files, as the default `CMake` build doesn't seem to. I also used the
Python interface via `pip` (version 0.5.1) and from source (version 0.5.3);
neither supports MWA beam responses.

`EveryBeam` does not appear to have a way to do FEE calculations in parallel.
Attempting to compile with `OpenMP` and annotating a pragma on a for loop either
caused segfaults or showed HDF5 errors. Thus, `EveryBeam` is only using a single
thread in this comparison. If there is a way to make the code run in parallel,
please share it!

It also seems that the beam responses are normalised regardless of the beam
normalisation setting.

### pyuvdata

I used version 2.4.1 installed with `pip`.

`pyuvdata` includes a module `uvbeam`, which has MWA beam code. I'm not sure how
to use this code best in a fair comparison; it seems designed to generate beam
responses over a grid with adjustable resolution rather than allowing arbitrary
directions. I've opted to adjust the resolution and report the timings and
memory usage.

### mwa_hyperbeam

Also referred to as `hyperbeam`. Installed from source, and the Python interface
was also from source or via `pip`.

#### CUDA

`hyperbeam` can use CUDA-capable (or HIP-capable) GPUs to run the simulations
faster. Here, we are using the `gpu-single` feature; this means the simulations
are calculated with 32-bit floats, which benefits my desktop-grade GPU. Omitting
this feature will make the calculations use 64-bit floats, which run much
slower on my GPU. However, these floats will be crunched much faster on a
"datacenter"-grade GPU, such as those hosted by Pawsey.

## Omitted

### "Marcin's C++ code"

This code can be seen on
[this commit of `hyperbeam`](https://github.com/MWATelescope/mwa_hyperbeam/commit/72e0914).
From memory, this code isn't ready to be used as a library, so it's not easy
to do comparisons with. However, my (CHJ) testing from years ago suggested that
this code is 10x slower and uses much more memory than `hyperbeam`, while giving
extremely similar if not the same results.

### RTS FEE

The Real-Time System (RTS) has an implementation of the FEE code, but only for
CUDA. The RTS is not open sourced, so it is difficult to fairly compare this
code. However, Jack Line's testing showed that its results were consistent with
`hyperbeam` but approximately 3-4x slower.

## Benchmark print out

Obtained by running `run.sh`.

```
*** System information ***
uname -a:
Linux sirius 6.5.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 06 Sep 2023 21:01:01 +0000 x86_64 GNU/Linux
CPU:
AMD Ryzen 9 3900X 12-Core Processor (12 cores, 24 threads)
GPU:
NVIDIA GeForce RTX 2070
Total memory:
128735 MiB
glibc:
GNU C Library (GNU libc) stable release version 2.38.
Compilers:
GCC: g++ (GCC) 13.2.1 20230801
Rust: rustc 1.75.0 (82e1608df 2023-12-21)
nvcc: Cuda compilation tools, release 12.2, V12.2.91
Python:
Python 3.11.5
***

*** mwa_pb Python results ***
time taken to produce 1 simulation (cold cache): 0.04600405693054199s
time taken to produce 1 simulation (hot cache): 0.01485133171081543s
time taken to produce 1000 simulations (hot cache): 0.13009285926818848s
time taken to produce 300000 simulations (hot cache): 37.93662881851196s
Max memory use (kBytes): 15080124
***

*** pyuvdata Python results ***
time taken to produce 32760 simulations: 7.445570707321167s
time taken to produce 130320 simulations: 11.852437734603882s
Max memory use (kBytes): 2459428
***

*** hyperbeam Python results with 1 CPU core ***
time taken to produce 1 simulation (cold cache): 0.001977205276489258s
time taken to produce 1 simulation (hot cache): 3.695487976074219e-05s
time taken to produce 1000 simulations (hot cache): 0.031566619873046875s
time taken to produce 300000 simulations (hot cache): 9.393456220626831s
time taken to produce 999999 simulations (hot cache): 31.051132917404175s
First and last MWA beam responses:
[ 0.95149467+0.23737095j -0.16821772-0.04190302j -0.1687059 -0.04202984j
-0.95181881-0.23673898j]
[ 0.95149467+0.23737095j -0.16821772-0.04190302j -0.1687059 -0.04202984j
-0.95181881-0.23673898j]
Max memory use (kBytes): 235028

*** hyperbeam Python results with all CPU cores ***
time taken to produce 1 simulation (cold cache): 0.001982450485229492s
time taken to produce 1 simulation (hot cache): 3.6716461181640625e-05s
time taken to produce 1000 simulations (hot cache): 0.0038902759552001953s
time taken to produce 300000 simulations (hot cache): 0.6602745056152344s
time taken to produce 999999 simulations (hot cache): 2.190699577331543s
First and last MWA beam responses:
[ 0.95149467+0.23737095j -0.16821772-0.04190302j -0.1687059 -0.04202984j
-0.95181881-0.23673898j]
[ 0.95149467+0.23737095j -0.16821772-0.04190302j -0.1687059 -0.04202984j
-0.95181881-0.23673898j]

*** hyperbeam Python results with CUDA ***
time taken to produce 999999 simulations (hot cache): 1.3418140411376953s
Max memory use (kBytes): 381324
***

*** Compiling EveryBeam C++ example ***
make: Nothing to be done for 'all'.

*** EveryBeam C++ results with 1 CPU core ***
time taken to produce 1 simulation (cold cache): 0.099906s
time taken to produce 1 simulation (hot cache): 0.000114s
time taken to produce 1000 simulations: 0.103940s
time taken to produce 300000 simulations: 31.161970s

First and last MWA beam responses:
[+0.951391+0.237339i, -0.168200-0.041897i
-0.168688-0.042025i, -0.951721-0.236713i]
[+0.951391+0.237339i, -0.168200-0.041897i
-0.168688-0.042025i, -0.951721-0.236713i]
Max memory use (kBytes): 72784
***

*** Compiling hyperbeam Rust code ***

*** hyperbeam Rust results with 1 CPU core ***
time taken to produce 1 simulation (cold cache): 1.936581ms
time taken to produce 1 simulation (hot cache): 32.542µs
time taken to produce 1000 simulations (hot cache): 29.02333ms
time taken to produce 300000 simulations (hot cache): 8.609547797s
time taken to produce 999999 simulations (hot cache): 28.656200548s
First and last MWA beam responses:
[+0.951495+0.237371i, -0.168218-0.041903i
-0.168706-0.042030i, -0.951819-0.236739i]
[+0.951495+0.237371i, -0.168218-0.041903i
-0.168706-0.042030i, -0.951819-0.236739i]
Max memory use (kBytes): 91416

*** hyperbeam Rust results with all CPU cores ***
time taken to produce 1 simulation (cold cache): 1.884332ms
time taken to produce 1 simulation (hot cache): 30.046µs
time taken to produce 1000 simulations (hot cache): 4.598173ms
time taken to produce 300000 simulations (hot cache): 596.130603ms
time taken to produce 999999 simulations (hot cache): 1.985971281s
First and last MWA beam responses:
[+0.951495+0.237371i, -0.168218-0.041903i
-0.168706-0.042030i, -0.951819-0.236739i]
[+0.951495+0.237371i, -0.168218-0.041903i
-0.168706-0.042030i, -0.951819-0.236739i]
Max memory use (kBytes): 92028

*** hyperbeam Rust results with CUDA ***
time taken to produce 300000 simulations (cold cache): 1.069158203s
time taken to produce 300000 simulations (hot cache): 63.699647ms
time taken to produce 999999 simulations (hot cache): 164.427598ms
First and last MWA beam responses:
[+0.951469+0.237365i, -0.168213-0.041902i
-0.168701-0.042029i, -0.951793-0.236733i]
[+0.951469+0.237365i, -0.168213-0.041902i
-0.168701-0.042029i, -0.951793-0.236733i]
Max memory use (kBytes): 205064
***
```

8 changes: 8 additions & 0 deletions comparisons/everybeam/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
CXX = g++

all: everybeam_example

everybeam_example: everybeam_example.cpp
$(CXX) -O3 -march=native \
-leverybeam -lcasa_ms -lcasa_measures -lcasa_casa \
everybeam_example.cpp -o everybeam_example
Loading