The universe of supercomputing has expanded rapidly to incorporate AI, advanced data analytics and cloud computing. The era of serial data is ending, with parallel data management replacing network file systems (NFS).
This shift has corresponded to the rise of AI, with investments in the technology hitting a new record in 2021. As an example, Microsoft invested $1 billion in an artificial intelligence project, co-founded by Elon Musk.
The AI imperative
This shift in the boundaries of traditional computing and analytics has caused several data challenges that need to be resolved:
Data talent – there is need to source new data science talent, maintain currency and up to date skills sets in a rapidly changing software environment.
Data sources – there is a need to ingest high volume data from broad sources through a variety of ingest methods at rates well beyond traditional computing requirements.
Data processing – there is a need for a different type of data processing to implement large scale GPU environments to bring the parallelism needed for training and inference in real-time.
Data governance – there is a need to label, track and manage that data (forever) and share that data across organisations with the right security policies. Explainable AI on an application level and available data on a platform level.
Data storage in the AI era
During The IT Press Tour in San Francisco, James Coomer, Sr. Vice President Products Data at DDN, explained that “data is the source code of AI, data is imperative for AI and storage is imperative for AI.
“Storage can’t be an afterthought, as it’s key for data ingestion, sourcing, management, labelling and longevity, which is critical for AI.”
Data or AI storage is necessary to solve the emerging data challenges caused by the move away from traditional computing and analytics:
- Data talent: AI storage provides data scientists with streamlined concurrent and continuous workflows.
- Data sources: AI storage helps scale projects, economically, with super-fast data ingest speeds.
- Data processing: AI storage provides tight AI integration with optimised performance.
- Data governance: AI storage secures a no-silo approach with advanced workload insight.
AI storage in action
Traditionally, DDN Storage has focused on traditional data storage for unstructured data and big data in enterprise, government and academic sectors.
Now, it is redefining the imperatives that are driving it as a company, focusing on AI storage, with its solution, A³I, which is at the heart of its growth strategy.
In action, over the last two years DDN has acted as the core backend storage system for NVIDIA to increase performance & scale and flexibility to drive innovation.
NVIDIA commands “nearly 100%” of the market for training AI algorithms and has multiple AI clusters, according to Karl Freund, analyst at Cambrian AI Research.
Following this success, DDN is powering the UK’s most powerful supercomputer, Cambridge 1, which went live in 2021 and is focused on transforming AI-based healthcare research.
The AI storage vendor is also working with Recursion, the drug discovery company.
“Our at-scale data needs require fast ingest, optimised processing and reduced application run times,” said Kris Howard, Systems Engineer at Recursion.
Working with DDN, the drug discovery company achieved up to 20x less costs and raised the possibilities for accelerating the drug discovery pipeline with new levels of AI capability.
It previously ran on the cloud, but now operates more efficiently, with greater value for money on-premise.
“DDN pioneered accelerated data-at-scale to tackle what ordinary storage cannot. We make data environments for innovators to create the future. We’re the largest AI storage provider in the world, proven in an array of different industries and customers, from financial services to life sciences,” added Coomer.
Case studies: in detail
1. Transforming cancer care with managed services: DDN as a service for precision Oncology
- Seeking to conquer cancer globally through proprietary blood tests, massive data sets and advanced analytics.
- Use bioinformatics and HPC to analyse sequence genome data for liquid biopsy.
- Require performance and reliability.
- Previous systems experienced tremendous hardware failures.
- Parallel file system storage as a managed service.
- Benefits: Evergreen, capacity and performance on demand, Opex model.
2. Simplifying data management for a global financial services and venture firm
- Mathematics and programming-centric financial organisation, who brings a scientific approach to financial products.
- Metadata heavy applications challenged existing NFS-based storage performance.
- 50+ PB dataset presented a management challenge with existing systems.
The DDN Solution:
- Efficient performance from fewer systems – easier to manage and scale in the future.
- Our consultative approach and insight into data helped them determine and address their specific issues.
3. Transforming research data storage: from management and maintenance to universal resources
- Large California university life sciences research organisation.
- Supply storage and data management to a wide variety of projects and requirements.
- Prior system was home grown, and self-supported.
- Scale, performance and stability needs out-grew current capabilities.
The DDN Solution:
- Simplified and reliable infrastructure with a roadmap for growth and added capability.
- Easy access for researchers, no need to change workflows.
- Plans to further accelerate research with GPU clusters aided by DDN expertise.
See also: Mining the metadata and more – Tips for good AI data storage practices