SAMI Data Reduction
Changes between DR2 and DR3
The SAMI DR3 data reduction has a number of improvements. The specific changes are:
- Improved scattered light modelling during spectral extraction.
- Improved wavelength calibration, making use of twilight frames.
- Improved sky subtraction.
- Improved telluric absorption correction using the molecfit code.
- Improved flux calibration using secondary calibration stars for relative and absolute flux calibration.
- Improved WCS calculation and cube centring.
- Improved bad pixel rejection within cube construction.
Below we describe the data reduction, but further details of changes, as well as various quality control checks are also discussed in the DR3 paper (Croom et al. 2020).
- Introduction
SAMI data reduction divides neatly into two overarching stages; from raw data straight off the telescope to row stacked spectra (RSS), and from RSS frames to individual galaxy data cubes. RSS frames are two-dimensional arrays containing one-dimensional, wavelength-calibrated spectra from all SAMI fibres for a single exposure. RSS frames represent an intermediate step in the data reduction process and are not included as part of the DR3 release. Data cubes are then formed by extracting all spectra for a given galaxy from each of the contributing RSS frames, drizzling, combining, then resampling onto a regular grid. For further details see the following papers: Allen et al. (2015), Sharp et al. (2015), Green et al. (2018), Scott et al. (2018), Croom et al. (submitted). The process is run through the sami python package and the latest version is available on github.
2. Raw data to RSS frames
The following steps are applied by the Two-Degree Field Data Reduction (2dfDR) package to individual raw frames to produce RSS frames:
Bias, dark, and overscan subtraction: Bias and dark frames are subtracted to correct errant CCD pixels for blue arm data taken prior to mid-2014. Both CCDs were upgraded in 2014, making this step largely redundant. An overscan correction is applied, subtracting the bias level in each frame.
Flat-fielding: Each frame is divided by a detector flat generated by averaging (typically >30) fibre flats, for which the spectrograph has been defocussed so that the illumination is relatively uniform. These frames are then filtered to remove large-scale variations, leaving only smaller-scale pixel-to-pixel flat-field variations. Charge spots due to cosmic rays are removed from each individual science frame using a tuned implementation of the LaCosmic routine (see van Dokkum 2001).
Tramline map construction: Fibre locations are traced across the detector, generating a so-called tramline map giving the pixel-by-pixel [x,y] location of each fibre. This is performed using a twilight sky observation. The fibre peaks are identified and fitted approximately using a quadratic fit to the 3 pixels around each peak. Then as a second stage we implement an algorithm that assumes a Gaussian fibre profile (a good approximation to SAMI fibres in AAOmega) and fits five Gaussians (the central one and two either side) to precisely determine both the centre and width of the fibre profile.
Spectral extraction and flat-fielding: Flux from the 2D image is extracted to generate a 1D spectrum for each fibre. An optimal extraction (see Sharp & Birchall 2010) is performed to fit the flux amplitudes perpendicular to the dispersion axis. Gaussian profiles are fit, holding the centre and width constant (i.e., at the values obtained from the tramline and fibre width maps measured above) and fitting all 819 fibres simultaneously. For DR3 an improved method to model the scattered light was implemented. This involved identifying gaps between fibre blocks (one gap every 63 fibres) and fitting a cubic spline along the gap (spectral direction). A second cubic split was then fitted in the spatial direction. An additional Lorentzian model was adopted for the scattered light around the 5577Å OI night sky line.
Following extraction, the 1D spectra are divided by an extracted and normalised 1D dome lamp flat-field spectrum, which removes residual fibre-to-fibre variations in spectral response.
Wavelength calibration: Emission lines in a CuAr arc lamp exposure are identified in extracted 1D spectra and matched to line-lists with a 3rd order polynomial for each fibre solution. The extracted object spectra are interpolated onto a single, fixed wavelength grid after applying a heliocentric velocity correction. For DR3 an additional step calculated fibre-to-fibre differences in wavelength calibration using twilight sky exposures.
Fibre throughput correction: Fibre throughputs were calibrated primarily from the relative strength of twilight flat-field frames. If no twilight frame was observed for a given field, a dome flat-field frame was used. These throughput values were then used for subtraction of the night sky spectrum. If the sky residuals after sky subtraction (see below) were large, the fibre throughputs were remeasured using the integrated flux in the night sky lines. If all sky lines were affected by bad pixels (typically only an issue for the blue CCD, which covers only a single sky line), then the mean fibre throughputs, derived from all other frames for this field, were adopted. The sky subtraction was then repeated with the revised throughput values.
Sky subtraction: The sky spectrum is measured from 26 dedicated sky fibres, taking a median spectrum after throughput correction, and subtracted from all spectra. For DR3 a further step was applied that used principal component analysis to remove residuals from scattered light around the bright 5577Å OI night sky line.
The above steps result in RSS frames consisting of 819 wavelength-calibrated, flat-fielded, one-dimensional spectra.
3. RSS frames to cubes
The 819 fibres in each object RSS frame constitute: 12 hexabundles (61-fibre integral-field units; IFUs) targeted on SAMI Galaxy Survey galaxy targets, 1 hexabundle targeted on a secondary standard star, and 26 sky fibres (which are not used beyond the steps outlined above). Each galaxy field (i.e. set of 12 galaxy targets and one secondary standard star) is observed at least 6 (and typically 7) times. In addition to galaxy object frames, several exposures containing only a single spectrophotometric standard star centred in one hexabundle are also observed throughout the night.
The process of combining and reconstructing these RSS frames into three-dimensional data cubes of individual galaxies is accomplished using the SAMI data reduction package. In addition, the SAMI package applies a telluric correction and absolute flux calibration step to each RSS frame and a final flux calibration step to each output data cube. The individual reduction steps are implemented as follows:
Initial flux calibration: Each spectrum is corrected for the large-scale (in wavelength) extinction by the atmosphere at Siding Spring Observatory at the observed airmass.
Primary flux calibration: The spectrum is extracted for each spectrophotometric standard star observed (accounting for light lost between fibres) and compare to the known stellar spectrum to determine the transfer function. The model for the flux takes into account differential atmospheric refraction based on the atmospheric parameters at the time of observation. Extracted spectra were first corrected for telluric absorption using molecfit (new to DR3) and then compared to high resolution reference spectra from the Supernovae Factory. Galaxy RSS frames were multiplied by the transfer function derived from the primary spectrophotometric observation closest in time to the galaxy frame.
Telluric correction: The spectrum is extracted for the secondary standard star (one per field), again accounting for light lost between fibres. These secondary stars are selected to be a relatively featureless F-dwarfs based on the stars' colours. The atmospheric absorption in these F-stars is fit using the molecfit code (Smette et al. 2015). Use of molecfit is new to DR3. The telluric model is applied to all the galaxy spectra in the same RSS frame as the secondary standard, The correction is applied across the whole of the red arm, correcting the strong features at 6850-6960 Å and 7130-7360 Å, but also weaker telluric absorption at other wavelengths.
Secondary flux calibration: The secondary stars are then used to derive an improved flux calibration by comparing them to model spectra based on Kurucz (1992) model atmospheres, using a similar approach to Abazajian et al (2004). This is new to DR3. Secondary star observations from all frames that are part of a given field are used to estimate an optimal template. The optimal template is estimated using pPXF to fit the secondary stars, allowing for a multiplicative polynomial to remove residual flux calibration errors, so the fit is only dependent on the absorption features in the spectra. The best fit template is then compared to SDSS or VST/ATLAS photometry to normalize them. A transfer function is then derived by taking the ratio of the secondary star and the best fit template. The transfer function is applied to all the spectra in the RSS frame.
Examination of spectral residuals from SSP fits in the observed frame find that small-scale residuals in flux calibration are typically less than 1 percent. A set of correction vectors for these changes are made available, although it should be noted that these typically do not make significant differences to spectral fits. For further discussion on this see Croom et al. (2021). The correction vector is available here:
Centering: The centroids of each of the 12 galaxies in the frame are fit using a two-dimensional Gaussian and a simple empirical model describing the telescope offset and atmospheric refraction. Centering is repeated for all frames for a given galaxy field to measure the variation in centroid for each galaxy from frame to frame. Each individual galaxy is aligned across frames using the measured centroids. In cases where there are multiple bright sources in an IFU, we mask the object that is not the primary target (new to DR3).
Cube creation: Each frame is drizzled onto a regular 0.5 arc second-square spaxel grid. The flux in each output spaxel is taken to be the mean of the flux in each input fibre, weighted by the fractional spatial overlap of that fibre with the spaxel. To regain some of the spatial resolution that would otherwise be lost in convolving the 1.6 arc second fibres with 0.5 arc second spaxels, the overlaps are calculated using a fibre footprint with only an 0.8 arc second diameter. The centroid of a galaxy varies as a function of wavelength due to atmospheric refraction - this effect was corrected for by re-calculating the drizzle locations when the expected shift due to atmospheric diffraction exceeded 1/50$^\mathrm{th}$ of a spaxel. The derived flux cube is then multiplied by a weight cube (see below), such that the output flux cube is in units of $10^{−16} \mathrm{erg}\ \mathrm{s}^{−1} \mathrm{cm}^{−2}\mathrm{Å}^{−1}$. When combining RSS frames into cubes we check for outliers by comparing the 7 spatially nearest spectra and clipping outliers after allowing for scaling. This removes cosmic ray events that may not have need removal at an earlier stage. This clipping is new to DR3.
The above data reduction process is applied simultaneously to both blue and red arm data, with separate blue and red data cubes produced as output. The secondary and final flux calibration scaling values are derived from the blue arm data but are applied to both arms.
4. Changes between DR1 and DR2
Several improvements were made to the data reduction pipeline between the 1st and 2nd data releases. All changes are documented and their effects quantified in Scott et al. (2018), but here we briefly summarise these changes. For DR2, the version of the SAMI Python pipeline identified by Mercurial changeset ID 17EBC0FF0A1C was used in conjunction with 2dfDR v6.65.
Spectral extraction: Twilight frames are now used to derive all tram line maps, and the preliminary scattered light model has been improved, resulting in improved extraction and reduced noise below ~4000 Å.
Fibre flat-fielding: Twilight frames are used to derive fibre throughputs, resulting in improved absolute flux calibration below ~4000 Å.
Wavelength resampling: The wavelength solution is corrected for heliocentric motion, and all galaxies are sampled onto a single, fixed, common wavelength scale, resulting in a modified wavelength range and sampling.