## Core data products

The core data included in DR2 consists of three different products; primary data cubes, binned data cubes and aperture spectra. The binned data cubes and aperture spectra are derived from the primary cubes. In this document we describe how the binned and aperture data were constructed and the file structure of each data product. The structure and construction of the red and blue files are identical. For further details see Allen et al. (2015), Sharp et al. (2015), Green et al. (2018) and Scott et al. (submitted).

**1. Cubes**

SAMI data cubes consist of four main extensions: flux (primary), variance, weight map, and covariance hypercube, as well as accompanying metadata and additional data quality extensions. Here we describe the basic format of each of these data products.

PRIMARY (Flux): The primary SAMI data product - a 2048×50×50 array, where the long axis corresponds to wavelength and the two short axes correspond to the x and y spatial directions respectively. The pixel scale on the two spatial axes is 0.5 arc seconds per pixel. For blue cubes the wavelength axis pixel scale is 1.04 Å per pixel; for the red cubes the wavelength scale is 0.57 Å per pixel. The flux cubes are in units of $10^{−16} \mathrm{erg}\ \mathrm{s}^{−1} \mathrm{cm}^{−2}$, with the weight map already applied.

VARIANCE: A 2048×50×50 array, where the value of each pixel is the variance of the corresponding pixel in the flux cube. Variances are propagated throughout the reduction pipeline, from the raw frames through to the final data cubes.

WEIGHT: A 2048×50×50 array, where the value of each pixel is the effective exposure time of the corresponding pixel in the flux cube. The weight map is calculated using the drizzle algorithm. Considering each input fibre core in turn, we calculate the overlap area of the input fibre core with each element of a predefined regular grid of output pixels. The fractional area of an input fibre core covering each output pixel dictates how the flux should be redistributed to each output pixel. This fractional area provides a weight for each output pixel that represents the relative exposure of each output pixel. Output pixels that do not sit within any input fibre cores are assigned a weight of zero. Output pixels that fall on the border between multiple input cores have a weighted contribution from each core, but the total weight for such pixels will not exceed unity. To place dithered data onto the regularised grid, we simply recalculate the overlaps after perturbing the baseline position (by the known telescope offset) of the input fibre cores relative to the initial reference position. The final weights are therefore normalised to the number of contributing frames for a given object. Note that the flux cubes, by default, have the weight map pre-applied.

COVAR: A N×50×50×5×5 array, where each 5×5 sub-array contains the spatial covariance of the given pixel in the flux array at the wavelength slice N with all surrounding pixels. The covariance array contains the relative covariance, and each 5×5 sub-array should be multiplied by the corresponding variance value to recover the true covariance. Note that the covariance is not calculated for every individual wavelength slice but only for those slices for which the dispersion atmospheric refraction correction has been recalculated. The total number of slices N for which the covariance has been calculated is specified by the COVAR_N keyword in the file header. See below for details on the format and reconstruction of the full covariance array.

QC: A table that records the relative transmission, PSF and sky subtraction accuracy for each individual observation contributing to the cube.

DUST: A 2048 element array containing the Milky Way extinction dust vector derived from Cardelli, Clayton & Mathis (1989). To correct the flux cube for MW dust extinction, multiply each spaxel by this vector. No foreground dust extinction correction has been applied to the default cubes.** 2. Metadata**

The fits header of the flux cube contains key metadata describing the data, observations, and reduction process. Along with standard fits header keywords, the cube header contains:

HGCUBING (Cubing Code Version): Mercurial changeset ID of the sami Python package used to generate the cube.

RSS_FILE n (Source RSS Frame): Fits file name of the RSS frames contributing to the data cube.

PLATEID and LABEL: Identifiers for the SAMI field

STDNAME (Standard Star ID): SAMI ID of the secondary standard star for the field

BUNIT: Units of the flux cube.

PSFFWHM: Full-width half-maximum of the Moffat profile fit to the standard star for the field.

PSFALPHA and PSFBETA: Parameters of the Moffat profile fit to the standard star in the field.

RESCALE (Data Rescale): Flux rescaling applied to the cube, derived from the ratio of the observed and catalogue g-band magnitudes of the secondary standard star.

IFUPROBE: The hexabundle number(s) which contributed data to the cube.

**3. An Additional Note On Covariance**

Because each input fibre overlaps with more than one output spaxel, the flux measurements in the datacubes are covariant with nearby spaxels. This overlap is a generic issue for any data that are resampled onto a grid. A crucial consequence is that, when spectra from two or more spaxels are summed, the variance of the summed spectrum is not equal to the quadrature sum of the variances of the individual spectra. Similarly, a model fit to the data cube would need to account for the covariance between spaxels.

The spatial covariance between spaxels is stored in the covariance array for each cube. Spatial covariance introduced by the resampling process is the dominant but not exclusive source of covariance in SAMI data. We do not track the spectral covariance or any spatial covariance introduced during the fibre extraction process, but these are negligible with respect to the tracked covariance.

The covariance is stored in a condensed form to reduce the size of the output data cubes. Each covariance hypercube is stored as a N×50×50×5×5 array, where N is the number of wavelength slices for which the covariance has been calculated and stored (COVAR_N in the file header). For each wavelength slice m, the value of COVAR[i,j,k,l,m] records the relative covariance between the spaxel (i,j) and one nearby spaxel, for data that have been weighted by the relative exposure times. This value should be multiplied by the weighted variance of the (i,j)th spaxel to recover the true covariance. The central value in the k and l directions (k=2, l=2 for 0-indexed arrays) corresponds to the relative covariance of the (i,j)th spaxel with itself, which is always equal to unity. Surrounding values give the relative covariance of the (i,j)th spaxel with all locations up to two spaxels away. Covariance on larger scales is negligible. The positions of the wavelength slices at which the covariance was calculated were chosen to accurately sample the regions where the covariance changes most rapidly, which occurs where the drizzle locations were recalculated due to atmospheric dispersion. The full covariance array can be reconstructed using the reconstruct_covariance function in the sami.dr.binning module of the sami Python package.

4**. Binned cubes**

As some science applications require higher S/N than present in the default cubes we provide a set of binned data cubes that apply three simple binning schemes to the data. The three binning schemes are:

ADAPTIVE: An adaptive binning scheme using a Voronoi tessellation (following Cappellari & Copin (2003)) with a target S/N = 10, measured on the blue continuum.

ANNULAR: Five linearly-spaced elliptical annuli are constructed based on the continuum flux distribution of the galaxy. The centre, position angle and ellipticity of each annulus are the same, and are derived from the blue continuum using find_galaxy.py (Cappellari 2002).

SECTORS: As the annular scheme, but each annulus is further subdivided into 8 sectors, where each sector within an annulus has equal area.

All binning schemes are derived from the blue continuum, but applied to both the blue and red cubes. Binned fluxes are derived by summing the unweighted fluxes of all spaxels within a bin, then reweighting to produce physical fluxes. Binned variances are derived similarly, with an additional weighting due to covariance between nearby spaxels. This approach improves the S/N of the binned spectra compared to a simple summation of the flux and variance in the default cube.

The binned cubes have the following extensions (units and metadata the same as the default cubes unless otherwise stated):

PRIMARY: A 2048×50×50 array, where the long axis corresponds to wavelength and the two short axes correspond to the x and y spatial directions respectively. Each pixel contains the binned flux associated with the bin it belongs to - pixels in the same bin have identical flux.

VARIANCE: A 2048×50×50 array, where the value of each pixel is the variance of the corresponding pixel in the flux array. Pixels belonging to the same bin have identical variance. NB Due to a error in the SAMI python pipeline version used to produce DR2 data, the variance of large bins is underestimated by up to 5%. This error will be corrected in future releases.

BIN_MASK: A 50x50 array, where the value of each pixel indicates the bin to which it belongs. The bin mask is used to construct the binned fluxes and variances in the above two extensions from the default cubes.

5**. Aperture spectra**

To facilitate comparison to existing large single-aperture galaxy surveys and to provide consistent 'global' measurements of DR2 galaxy properties we provide a set of six aperture spectra.

The aperture spectra are constructed in the same way as the binned cubes described above, with the exception that the weighted (i.e. default) flux and variance are summed to produce the 'binned' spectra, as opposed to the unweighted values used for the binned cubes. This more closely matches the sampling of single-fibre spectra, though is not optimal for maximising S/N. Aperture spectra are constructed by summing the fluxes of all spaxels that fall inside the designated aperture, then corrected for the difference in area between the contributing spaxels are the true aperture.

The six aperture spectra we provide are:

3KPC_ROUND: A circular aperture centred on the galaxy with a fixed physical diameter of 3 kilo parsecs. The aperture diameter (in arc seconds) is determined using a distance derived from the galaxy redshift and assuming the standard SAMI Galaxy Survey concordance cosmology of Hinshaw et al. (2009).

RE: An elliptical aperture centred on the galaxy, with position angle, ellipticity and radius taken from the GAMA Sersic photometric fits v9 (Kelvin et al. 2012). Note, this Re measurement is different to that included in the DR2 sample table.

1_4_ARCSECOND: A 1.4 arc second diameter circular aperture centred on the galaxy.

2_ARCSECOND: A 2 arc second diameter circular aperture centred on the galaxy.

3_ARCSECOND: A 3 arc second diameter circular aperture centred on the galaxy.

4_ARCSECOND: A 4 arc second diameter circular aperture centred on the galaxy.

All aperture spectra files have the following extensions (units and metadata as the default cubes unless otherwise stated):

PRIMARY: A 2048 element array containing the aperture flux as a function of wavelength.

VAR: A 2048 element array containing the aperture variance as a function of wavelength. NB As for the binned cubes, the variance of large apertures may be underestimated by up to 5% for SAMI DR2 data.

MASK: A 50x50 element array containing the bin mask used to construct the aperture spectra. A 1 indicates a spaxel was included in the aperture, a 0 indicates a spaxel was not included.