![]() ![]() Total(profmem(colSums2(dm_matrix, force_block_processing = TRUE))) #> matrix of class DelayedMatrix and type "double": ![]() # Fast, memory-efficient column sums of DelayedMatrix with ordinary matrix seed In each case, the ‘seed-aware’ method is many times faster and allocates The profmem package to measure the total memory allocations of We use the microbenchmark package to measure running time and Dense data in \(\), where there are multiple runs of identical values, using a RleArraySeed from the DelayedArray package as the seed.Sparse data with values in \([0, 1)\), where \(60\%\) are zeros, using a dgCMatrix, a sparse matrix representation from the Matrix package, as the seed.Dense data with values in \((0, 1)\) using an ordinary matrix as the seed.We will demonstrate this by computing the column sums matrices with 20,000 rowsĪnd 600 columns where the data have different structure and are stored inĭelayedMatrix objects with different types of seed: This is implemented in the DelayedMatrixStats and is optimized for both speed and memory but only for DelayedMatrix objects with certain types of seed. This was developed in the DelayedArray package and is available for all methods in the DelayedMatrixStats through the force_block_processing argument To illustrate this idea, we will compare two ways of computing the column sums ‘seed-aware’ methods that are optimized for specific types of DelayedMatrix Transition period from matrix-based to DelayeMatrix-based objects.ĭelayedArray::RleArray ( SolidRleArraySeed) optimizedĭelayedArray::RleArray ( ChunkedRleArraySeed) optimizedĪs well as offering a familiar API, DelayedMatrixStats provides ![]() Objects, e.g., packages wanting to retain backward compatibility or during a This is useful for packages wishing to support both types of ( colMedians(x)) regardless of whether the input is an ordinary matrix or aĭelayedMatrix. Loaded into memory in one block.: library(DelayedMatrixStats)įinally, by using DelayedMatrixStats we can use the same code, Is large enough) then all the data is loaded as a single block, but thisĪpproach generalizes and still works when the data are too large to be Notably, if the data are small enough (and the default block size The amount ofĭata loaded into memory at any one time is controlled by theĭefault block size global setting see ?DelayedArray::getAutoBlockSizeįor details. but without having to load theĮntire data into memory at once 2 2 2 In this case, it loads blocks of data row-by-row. Instead, we can use DelayedMatrixStats::rowSds(), which has the speedīenefits of matrixStats::rowSds() 1 1 1 In fact, it currently uses matrixStats::rowSds() under the hood. It rather defeats the purpose of using a HDF5Array for storing theĭata since we have to bring all the data into memory at once to compute the This works and is many times faster than the apply()-based approach! However, MatrixStats system.time(row_sds user system elapsed Well, why don’t we just first realize our data in-memory and then use MatrixStats is designed for use on in-memory matrix objects. Unfortunately (and perhaps unsurprisingly) this doesn’t work. #> Error in rowVars(x, rows = rows, cols = cols, na.rm = na.rm, center = center, : Argument 'x' must be a matrix or a vector. Provides a rowSds() function: matrixStats::rowSds(x) Or perhaps you already know that the matrixStats package You might think to use apply() like in the following: system.time(row_sds user system elapsed Suppose you wish to compute the standard deviation of the read counts for each X matrix of class HDF5Matrix and type "integer": We’ll simulate some (unrealistic) RNA-seq read counts data from 10,000 genesĪnd 20 samples and store it on disk as a HDF5Array: library(DelayedArray) We briefly demonstrate and explain these two features using a simple example. Providing a rich set of column-wise and row-wise summary functions. The DelayedMatrixStats package is designed to make life easierįor Bioconductor developers wanting to use DelayedArray by Provides a common and familiar array-like interface for interacting with these The DelayedArray package allows developers to store array-likeĭata using in-memory or on-disk representations (e.g., in HDF5 files) and
0 Comments
Leave a Reply. |