--------- Averaging --------- Routines for averaging visibility data. Time and Channel Averaging -------------------------- The routines in this section average row-based samples by: 1. Averaging samples of consecutive **time** values into bins defined by an period of :code:`time_bin_secs` seconds. 2. Averaging channel data into equally sized bins of :code:`chan_bin_size`. In order to achieve this, a **baseline x time** ordering is established over the input data where **baseline** corresponds to the unique **(ANTENNA1, ANTENNA2)** pairs and **time** corresponds to the unique, monotonically increasing **TIME** values associated with the rows of a Measurement Set. ======== === === === === === Baseline T0 T1 T2 T3 T4 ======== === === === === === (0, 0) 0.1 0.2 0.3 0.4 0.5 (0, 1) 0.1 0.2 0.3 0.4 0.5 (0, 2) 0.1 0.2 X 0.4 0.5 (1, 1) 0.1 0.2 0.3 0.4 0.5 (1, 2) 0.1 0.2 0.3 0.4 0.5 (2, 2) 0.1 0.2 0.3 0.4 0.5 ======== === === === === === It is possible for times or baselines to be missing. In the above example, T2 is missing for baseline (0, 2). .. warning:: The above requires unique lexicographical combinations of (TIME, ANTENNA1, ANTENNA2). This can usually be achieved by suitably partitioning input data on indexing rows, DATA_DESC_ID and SCAN_NUMBER in particular. For each baseline, adjacent time's are assigned to a bin if :math:`h_c - h_e/2 - (l_c - l_e/2) <` :code:`time_bin_secs`, where :math:`h_c` and :math:`l_c` are the upper and lower time and :math:`h_e` and :math:`l_e` are the upper and lower intervals, taken from the **INTERVAL** column. Note that no distinction is made between flagged and unflagged data when establishing the endpoints in the bin. The reason for this is that the `Measurement Set v2.0 Specification `_ specifies that **TIME** and **INTERVAL** columns are defined as containing the *nominal* time and period at which the visibility was sampled. This means that their values includie valid, flagged and missing data. Thus, averaging a regular high-resolution **baseline x htime** grid should produce a regular low-resolution **baseline x ltime** grid (**htime > ltime**) in the presence of bad data By contrast, other columns such as **TIME_CENTROID** and **EXPOSURE** contain the *effective* time and period as they exclude missing and bad data. Their increased accuracy, and therefore variability means that they are unsuitable for establishing a grid over the data. To summarise, the averaged times in each bin establish a map: - from possibly unordered input rows. - to a reduced set of output rows ordered by averaged :code:`(TIME, ANTENNA1, ANTENNA2)`. Flagged Data Handling ~~~~~~~~~~~~~~~~~~~~~ Both **FLAG_ROW** and **FLAG** columns may be supplied to the averager, but they should be consistent with each other. The averager will throw an exception if this is not the case, rather than making an assumption as to which is correct. When provided with flags, the averager will output averages for bins that are completely flagged. Part of the reason for this is that the specifies that the **TIME** and **INTERVAL** columns represent the *nominal* time and interval values. This means that they should represent valid as well as flagged or missing data in their computation. By contrast, most other columns such as **TIME_CENTROID** and **EXPOSURE**, contain the *effective* values and should only include valid, unflagged data. To support this: 1. **TIME** and **INTERVAL** are averaged using both flagged and unflagged samples. 2. Other columns, such as **TIME_CENTROID** are handled as follows: 1. If the bin contains some unflagged data, only this data is used to calculate average. 2. If the bin is completely flagged, the average of all samples (which are all flagged) will be used. 3. In both cases, a completely flagged bin will have it's flag set. 4. To support the two cases, twice the memory of the output array is required to track both averages, but only one array of merged values is returned. Guarantees ~~~~~~~~~~ 1. Averaged output data will be lexicographically ordered by :code:`(TIME, ANTENNA1, ANTENNA2)` 2. **TIME** and **INTERVAL** columns always contain the *nominal* average and sum and therefore contain both and missing or unflagged data. 3. Other columns will contain the *effective* average and will contain only valid data *except* when all data in the bin is flagged. 4. Completely flagged bins will be set as flagged in both the *nominal* and *effective* case. 5. Certain columns are averaged, while others are summed, or simply assigned to the last value in the bin in the case of antenna indices. 6. **Visibility data** is averaged by multiplying and dividing by **WEIGHT_SPECTRUM** or **WEIGHT** or natural weighting, in order of priority. .. math:: \frac{\sum v_i w_i}{\sum w_i} 7. **SIGMA_SPECTRUM** is averaged by multiplying and dividing by **WEIGHT_SPECTRUM** or **WEIGHT** or natural weighting, in order of priority and availability. **SIGMA** is only averaged with **WEIGHT** or natural weighting. .. math:: \sqrt{\frac{\sum w_i^2 \sigma_i^2}{(\sum w_i)^2}} The following table summarizes the handling of each column in the main Measurement Set table: =============== ================= ============================ =========== Column Unflagged/Flagged Aggregation Method Required sample handling =============== ================= ============================ =========== TIME Nominal Mean Yes INTERVAL Nominal Sum Yes ANTENNA1 Nominal Assigned to Last Input Yes ANTENNA2 Nominal Assigned to Last Input Yes TIME_CENTROID Effective Mean No EXPOSURE Effective Sum No FLAG_ROW Effective Set if All Inputs Flagged No UVW Effective Mean No WEIGHT Effective Sum No SIGMA Effective Weighted Mean No DATA (vis) Effective Weighted Mean No FLAG Effective Set if All Inputs Flagged No WEIGHT_SPECTRUM Effective Sum No SIGMA_SPECTRUM Effective Weighted Mean No =============== ================= ============================ =========== The following SPECTRAL_WINDOW sub-table columns are averaged as follows: =============== ============================ Column Aggregation Method =============== ============================ CHAN_FREQ Mean CHAN_WIDTH Sum EFFECTIVE_BW Sum RESOLUTION Sum =============== ============================ Dask Implementation ~~~~~~~~~~~~~~~~~~~ The dask implementation chunks data up by row and channel and averages each chunk independently of values in other chunks. This should be kept in mind if one wishes to maintain a particular ordering in the output dask arrays. Typically, Measurement Set data is monotonically ordered in time. To maintain this guarantee in output dask arrays, the chunks will need to be separated by distinct time values. Practically speaking this means that the first and second chunk should not both contain value time 0.1, for example. Numpy ~~~~~ .. currentmodule:: africanus.averaging .. autosummary:: time_and_channel bda .. autofunction:: time_and_channel .. autofunction:: bda Dask ~~~~ .. currentmodule:: africanus.averaging.dask .. autosummary:: time_and_channel bda .. autofunction:: time_and_channel .. autofunction:: bda