The latest updates, guides and stories from DataXight

Introducing PROTOplast: Scalable Machine Learning for Molecular Data Analysis
{News}
{scRNA-seq}
{PROTOplast}
3 mins read

We're excited to announce the early developer preview of PROTOplast, our new Python library designed for fast scalable analysis of molecular data. PROTOplast addresses the unique challenges of working with large-scale molecular datasets while maintaining the flexibility needed for cutting-edge research. What is PROTOplast? PROTOplast is an open-source Python library, released under the Apache License 2.0, that bridges the gap between molecular data analysis and modern machine learning infrast

A Note on Parquet-based scRNA ML Pipelines
{Insight}
{scRNA-seq}
2 mins read

Single-cell RNA sequencing (scRNA-seq) is revolutionizing our understanding of cellular biology, but the computational challenges of processing these massive datasets continue to evolve. As datasets grow from thousands to millions of cells, the choice of data format and processing pipeline becomes critical.  Parquet files, with their columnar storage and excellent compression ratios, seem like a natural fit for intermediate data storage in machine learning workflows. In a previous blog post, we

Tahoe-100M in Practice: Workflows, Pitfalls, and Pathways to Scalable scRNA Analysis
{scRNA-seq}
{Insight}
9 mins read

Single-cell transcriptomics (scRNA) studies now profile millions of cells, revealing identity, state, and tissue heterogeneity, and create unprecedented opportunities to extract biological insights that would be invisible in smaller studies. Tahoe-100M, a groundbreaking resource hosted by Arc Institute, contains 100 million cells covering 379 distinct drugs and 50 cancer cell lines, is one such study. On the other hand, at Tahoe-100M scale, even routine queries pose significant computational ch




Loading
Have an idea?
Drop us a line