Posted on 15 January, 2021
Organisations of various sizes, use cases and technical skills are looking for infrastructure solutions to accelerate their AI, ML and DL intiatives. WekaIO and NVIDIA partnered to architect and validate a high-performance scalable AI solution accessible to everyone. The Weka AI™ reference architecture, powered by NVIDIA DGX A100 systems and Weka's industry leading file system WekaFS™, was developed and verified by Weka and NVIDIA.
Weka AI simplifies DL and AI deployments by combining WekaFS based-storage systems with DGX A100 systems and NVIDIA Networking, in order to create a tightly integrated solution that minimises time to production. The WekaFS system performance has currently been verified with up to four DGX A100 systems. By adding additional storage nodes, the architecture can grow to support many more DGX A100 systems while maintaining linear performance scalability. With the addition of any Amazon Simple Storage Service (S3) compliant object storage, WekaFS can expand the global namescape to support a massive data lake.
This gives IT organisations an architectural framework that significantly reduces the time to productivity by eliminating the integration complexity of multiple infrastructure components. Organisations can start small and easily and independently scale compute and storage resources to multi-rack configurations with predictable performance to meet any ML workload requirement. Weka AI can start as small as 50TB of capacity and scale seamlessly, while effortlessly in a single namespace, while effortlessly managing data across the edge to the core to the cloud.
The DGX A100 system is a universal system for AI workloads - from analytics to training inference and HPC applications. A DGX A100 system contains eight NVIDIA A100 Tensor Core GPUs, with each system delivering over 5 petaFLOPS of DL training performance. The eight GPUs within a DGX system A100 are interconnected in a hybrid cube-mesh topology using the next generation NVIDIA NVLink™ technology which doubles the GPU-to-GPU direct bandwidth to 600 gigabytes per second (GB/s), and a new NVIDIA NVSwitch™ chip that is 2x faster than the previous generation. The DGX A100 system also features eight single-port Mellanox ConnectX®-6 VPI HDR InfiniBand adapters for clustering and one dual-port ConnectX-6 VPI Ethernet adapter for storage.
To read more about scaling Deep Learning performance with Weka Software and Industry Standard Servers, download the full reference architecture here.