An LSST.org project

Compute Resource Usage of DP0.2 production run

  • Brian Yanny
  • Nikolay Kuropatkin
  • Huan Lin
  • Jennifer Adelman-McCarthy
  • Colin Slater
  • Hsin-Fang Chiang

Abstract

We present a summary of wallclock and cpu time, memory and storage usage of the DP0.2 production run in the iDF. Over the course of 150 calendar days in 2021-2022, preliminary Data Release Production (DRP) pipelines ran on 20,000 exposures of simulated Rubin data, representing 10-20 full nights of images, covering 150 tracts (300 square degrees), to typically 5 year depth (50 returns to the same spot on the sky for each of 6 filters). This ’half-percent’ survey used a compute cluster of approximately 4000 cores, each core with up to 14 GB/core RAM available (less typically used) with jobs taking typically a few seconds to a fraction of an hour. Small subsets (a few percent) of the processing pipetasks used nodes with up to 236 GB/core, and were allowed to run for up to 3 days. The input images were approximately 75TB in size, while the outputs, counting transient datasetTypes took up 3.4PB in an object store. After removing transient datasetTypes (e.g. warps), the final storage used by DP0.2 was approximately 2.5 PB. Of the 150 calendar days, approximately 60 days were running and the remainder used for debugging. Thus, to keep up with expected DRP during the course of the survey (6-12 days to process 10-20 nights of data), a factor of 5-10 speed up (or increased core counts, or some combination of both) are estimated to be needed.

Download