Last update: April 29, 2021
Review the new XSEDE resources and information below prior to submitting your allocation request through the XSEDE User Portal. Also, consult the Estimated Resource Amounts Available for the current XRAC meeting on the Research allocations page.
The next Research allocation submission period is June 15, 2021 - July 15, 2021. The year-long allocation begin date will be October 1, 2021.
First time here? Check out the Resource Info page to learn about the resources available, and then visit the Startup page to get going! Startup, Campus Champions, and Education Allocation requests may be submitted at any time throughout the year.
See the XSEDE Resources Catalog for a complete list of XSEDE compute, visualization and storage resources, and more details on the new systems.
XSEDE has just published a 2021 revision of the XSEDE allocation policies that reflects an effort to formally incorporate recommendations from an earlier review of the policies as well as de facto changes that have evolved since the last official update. In addition, we have attempted to focus this revised text on allocation policy and not the implementation of those policies in the day-to-day conduct of the allocation process. Thus, this policy revision does not repeat but complements and is explicitly supported by the online allocations documentation, the published XSEDE Allocation Practices and Procedures, and the XRAC Reviewer Manual.
We hope you find the resulting policy text to be easier to digest. At the same time, the policies themselves are not substantially different at a high level, and in practice the day-to-day experience of most users will likely not change.
XSEDE has several new resources debuting this allocation cycle:
- National Center for Supercomputing Applications (NCSA) Delta Coming Soon
- Johns Hopkins University's Rockfish Coming Soon
- University of Kentucky's KyRIC Coming Soon
- Open Storage Network (OSN)
- Pittsburgh Supercomputing Center's Bridges-2
- San Diego Supercomputer Center's Expanse Now in Production!
Johns Hopkins University, through the Maryland Advanced Research Computing Center (MARCC) will participate in the XSEDE Federation with its new NSF-funded flagship cluster Rockfish, funded by NSF MRI award #1920103 that integrates high-performance and data-intensive computing while developing tools for generating, analyzing and disseminating data sets of ever-increasing size. The cluster will contain compute nodes optimized for different research projects and complex, optimized workflows. Rockfish consists of 368 regular compute nodes with 192GB of memory, 10 large memory nodes with 1.5TB of memory and 10 GPU nodes with 4 Nvidia A100 GPUs featuring Intel Cascade Lake 6248R, 48 cores per node, 3.0GHz processor base frequency, and 1TB NVMe for local storage. All compute nodes have HDR100 connectivity. In addition, the cluster has access to several GPFS file systems totaling 10PB of storage. 20% of these resources will be allocated via XSEDE.
The National Center for Supercomputing applications (NCSA) is pleased to announce the availability of its newest resource, Delta, which is designed to deliver a highly capable GPU-focused compute environment for GPU and CPU workloads. Delta will provide three new resources for allocation, as specified below:
- Delta CPU: The Delta CPU resource will support general purpose computation across a broad range of domains able to benefit from the scalar and multi-core performance provided by the CPUs such as appropriately scaled weather and climate, hydrodynamics, astrophysics, and engineering modeling and simulation, and other domains that have algorithms that have not yet moved to the GPU. Delta also supports domains that employ data analysis, data analytics or other data-centric methods. Delta will feature a rich base of preinstalled applications, based on user demand. The system will be optimized for capacity computing, with rapid turnaround for small to modest scale jobs, and will feature support for shared-node usage. Local SSD storage on each compute node will benefit applications with random access data patterns or require fast access to significant amounts of compute-node local scratch space.
- Delta GPU: The Delta GPU resource will support accelerated computation across a broad range of domains such as soft-matter physics, molecular dynamics, replica-exchange molecular dynamics, machine learning, deep learning, natural language processing, textual analysis, visualization, ray tracing, and accelerated analysis of very large in-memory datasets. Delta is designed to support the transition of applications from CPU-only to using the GPU or hybrid CPU-GPU models. Delta will feature a rich base of preinstalled applications, based on user demand. The system will be optimized for capacity computing, with rapid turnaround for small to modest scale jobs, and will feature support for shared-node usage. Local SSD storage on each compute node will benefit applications with random access data patterns or require fast access to significant amounts of compute-node local scratch space.
- Delta Projects Storage: The Delta Storage resource provides storage allocations for allocated projects using the Delta CPU and Delta GPU resources. Unpurged storage is available for the duration of the allocation period.
KyRIC (Kentucky Research Informatics Cloud) Large Memory nodes will provide 3TB of shared memory for processing massive NLP data sets, genome sequencing, bioinformatics and memory intensive analysis of big data. Each of KyRIC's 5 large memory nodes will consist of Intel(R) Xeon(R) CPU E7-4820 v4 @ 2.00GHz with 4 sockets, 10 cores/socket), 3TB RAM, 6TB SSD storage drives and 100G Ethernet interconnects.
PIs requesting allocations should consult the KyRIC website for additional details and the most current information.
XSEDE welcomes the Open Storage Network (OSN), a distributed data sharing and transfer service intended to facilitate exchanges of active scientific data sets between research organizations, communities and projects. OSN is providing easy access and high bandwidth delivery of large data sets to researchers who leverage the data to train machine learning models, validate simulations, and perform statistical analysis of live data. The OSN is intended to serve two principal purposes: (1) enable the smooth flow of large data sets between resources such as instruments, campus data centers, national supercomputing centers, and cloud providers; and (2) facilitate access to long tail data sets by the scientific community.OSN data is housed in storage pods, located at Big Data Hubs, interconnected by national, high-performance networks and accessed via a RESTful interface following Simple Storage System (S3) conventions, creating well-connected, cloud-like storage with data transfer rates comparable to or exceeding the public cloud storage providers. Users can park data, back data up, and/or create readily accessible storage for active data sets. 5 PB of storage are currently available for allocation.
PSC's Bridges-2 platform will address the needs of rapidly evolving research by combining high-performance computing (HPC), high-performance artificial intelligence (HPAI), and high-performance data analytics (HPDA) with a user environment that prioritizes researcher productivity and ease of use.
Hardware highlights of Bridges-2 include HPC nodes with 128 cores and 256 to 512GB of RAM, scalable AI with 8 NVIDIA Tesla V100-32GB SXM2 GPUs per accelerated node and dual-rail HDR-200 InfiniBand between GPU nodes, a high-bandwidth, tiered data management system to support data-driven discovery and community data, and dedicated database and web servers to support persistent databases and domain-specific portals (science gateways).
User environment highlights include interactive access to all node types for development and data analytics; Anaconda support and optimized containers for TensorFlow, PyTorch, and other popular frameworks; and support for high-productivity languages such as Jupyter notebooks, Python, R, and MATLAB including browser-based (OnDemand) use of Jupyter, Python, and RStudio. A large collection of applications, libraries, and tools will make it often unnecessary for users to install software, and when users would like to install other applications, they can do so independently or with PSC assistance. Novices and experts alike can access compute resources ranging from 1 to 64,512 cores, up to 192 V100-32GB GPUs, and up to 4TB of shared memory.
Bridges-2 will support community datasets and associated tools, or Big Data as a Service (BDaaS), recognizing that democratizing access to data opens the door to unbiased participation in research. Similarly, Bridges-2 is available to support courses at the graduate, undergraduate, and even high school levels. It is also well-suited to interfacing to other data-intensive projects, instruments, and infrastructure.
Bridges-2 will contain three types of nodes: Regular Memory (RM), Extreme Memory (EM), and GPU (Graphics Processing Unit; GPU). These are described in turn below.
Bridges-2 Regular Memory (RM) nodes will provide extremely powerful general-purpose computing, machine learning and data analytics, AI inferencing, and pre- and post-processing. Each of Bridges-2's 504 RM nodes will each consist of two AMD 7742 "Rome" CPUs (64 cores, 2.25-3.4 GHz, 3.48 Tf/s peak), 256-512 GB of RAM, 3.84 TB NVMe SSD, and one HDR-200 InfiniBand adaptor. 488 Bridges-2 RM nodes have 256 GB RAM, and 16 have 512 GB RAM for more memory-intensive applications. Bridges-2 will be HPE Apollo 2000 Gen11 servers.
Bridges-2 Extreme Memory (EM) nodes will provide 4TB of shared memory for genome sequence assembly, graph analytics, statistics, and other applications that need a large amount of memory and for which distributed-memory implementations are not available. Each of Bridges-2's 4 EM nodes will consist of four Intel Xeon Platinum 8260M CPUs, 4 TB of DDR4-2933 RAM, 7.68 TB NVMe SSD, and one HDR-200 InfiniBand adaptor. Bridges-2 will be HPE ProLiant DL385 Gen10+ servers.
Bridges-2 GPU (GPU) nodes will be optimized for scalable artificial intelligence (AI). Each of *Bridges-2's 24 GPU nodes will contain 8 NVIDIA Tesla V100-32GB SXM2 GPUs, providing 40,960 CUDA cores and 5,120 tensor cores. In addition, each GPU node will contain two Intel Xeon Gold 6248 CPUs, 512 GB of DDR4-2933 RAM, 7.68 TB NVMe SSD, and two HDR-200 adaptors. Their 400 Gbps connection will enhance scalability of deep learning training across up to 192 GPUs. The GPU nodes can also be used for other applications that make effective use of the V100 GPUs' tensor cores. Bridges-2 GPU nodes will be HPE Apollo 6500 Gen10 servers.
The Bridges-2 Ocean data management system will provide a unified, high-performance filesystem for active project data, archive, and resilience. Ocean will consist of two tiers – disk and tape – transparently managed by HPE DMF (Data Management Framework) as a single, highly usable namespace, and a third all-flash tier will accelerate AI and genomics. Ocean's disk subsystem, for active project data, is a high-performance, internally resilient Lustre parallel filesystem with 15 PB of usable capacity, configured to deliver up to 129 GB/s and 142 GB/s of read and write bandwidth, respectively. Its flash tier will provide 9M IOps and an additional 100 GB/s. The disk and flash tiers will be implemented as HPE ClusterStor E1000 systems. Ocean's tape subsystem, for archive and additional resilience, is a high-performance tape library with 7.2 PB of uncompressed capacity (estimated 8.6 PB compressed, with compression done transparently in hardware with no performance overhead), configured to deliver 50TB/hour. The tape subsystem will an HPE StoreEver MSL6480 tape library, using LTO-8 Type M cartridges. (The tape library is modular and can be expanded, if necessary, for specific projects.)
Bridges-2, including both its compute nodes and its Ocean data management system, is internally interconnected by HDR-200 InfiniBand in a fat tree Clos topology. Bridges-2 RM and EM nodes each have one HDR-200 link (200 Gbps), and Bridges-2 GPU nodes each have two HDR-200 links (400 Gbps) to support acceleration of deep learning training across multiple GPU nodes.
Bridges-2 will be federated with Neocortex, an innovative system also at PSC that will provide revolutionary deep learning capability that accelerates training orders of magnitude. This will complement the GPU-enabled scalable AI available on Bridges-2 and provide transformative AI capability for data analysis and to augment simulation and modeling.
More information about the Bridges-2 resource can be found at: https://www.psc.edu/bridges-2
SDSC is pleased to announce it's newest supercomputer Expanse. Expanse will be a Dell integrated cluster, composed of compute nodes with AMD Rome processors, GPU nodes with NVIDIA V100 GPUs (with NVLINK), interconnected with Mellanox HDR InfiniBand in a hybrid fat-tree topology. Expanse supercomputer will provide three new resources for allocation. Limits noted below are subject to change, so consult the Expanse website for the most up-to-date information.
Expanse Compute: The compute portion of Expanse features AMD Rome processors, interconnected with Mellanox HDR InfiniBand in a hybrid fat-tree topology. There are 728 compute nodes, each with two 64-core AMD EPYC 7742 (Rome) processors for a total of 93,184 cores in the full system. Each compute node features 1TB of NVMe storage, 256GB of DRAM per node, and PCIe Gen4 interfaces. Full bisection bandwidth will be available at the rack level (56 nodes) with HDR100 connectivity to each node. HDR200 switches are used at the rack level and are configured for a 3:1 over-subscription between racks. In addition, Expanse has four 2 TB large memory nodes.
There are two allocation request limits for the Expanse Compute resource:
- A maximum request(SU) limit of 15M SUs except for Science Gateway requests, which may request larger amounts (up to 30M SUs)
- A limit on the maximum size of a job set at 4,096 cores, with higher core counts possible by special request
Expanse GPU: The GPU component of Expanse has 52 GPU nodes each containing four NVIDIA V100s (32 GB SMX2), connected via NVLINK, and dual 20-core Intel Xeon 6248 CPUs. Each GPU node has 1.6TB of NVMe storage and 256GB of DRAM per node, and HDR100 connectivity.
Expanse Projects Storage: Lustre-based allocated storage will be available as part of an allocation request. The filesystem will be available on both the Expanse Compute and GPU resources. Storage resources, as with compute resources, must be requested and justified, both in the XRAS application and the proposal's main document.
Expanse will feature two new innovations: 1) scheduler-based integration with public cloud resources; and 2) composable systems, which supports workflows that combine Expanse with external resources such as edge devices, data sources, and high-performance networks.
Since the Expanse AMD Rome CPUs are currently not available for benchmarking, PIs are requested to use Comet (or any comparable system) performance/scaling information in their benchmarking and scaling section. For the Expanse GPU nodes, PIs can use performance info on V100 GPUs (if available) or use 1.3X speed up over Comet P100 GPU (or comparable GPU) performance as a conservative estimate. The time requested must be in V100 GPU hours.
PIs requesting allocations should consult the Expanse website (https://expanse.sdsc.edu) for additional details and the most current information.
NEW See the latest webinar from XSEDE, "Code Performance and Scaling". Learn the technical aspects of research allocation proposals including how best to gather and present scaling and code performance statistics and estimating SU requests.
|Code Performance and Scaling |
Recorded: April 1, 2020
Run time: 1hr 7mins
These retiring resources, listed below, are not available for new or renewal research requests.
- Bridges (PSC)
- Comet (SDSC)
- Data Oasis (SDSC) Decommissioning July, 2021
Continuing this submission period, access to XSEDE storage resources along with compute resources will need to be requested and justified, both in the XSEDE Resource Allocation System (XRAS) and in the body of the proposal's main document. The following XSEDE sites will be offering allocatable storage facilities, these are:
- Open Storage Network
- Bridges-2 Ocean (PSC) New!
- Delta Storage (NCSA) New!
- Expanse Storage (SDSC) New!
- TACC Ranch - required when requesting TACC Stampede2
Storage needs have always been part of allocation requests, however, XSEDE will be enforcing the storage awards in unison with the storage sites. Please vist XSEDE's Storage page for more info.
The XRAS developers have updated the allocations submissions interface so that Renewal submissions will now have some data fields pre-populated with values from the prior submission.
The pre-populated values include the title, project roles, fields of science, keywords, and supporting grants that have not expired. Users are strongly encouraged to review these pre-filled values for any that may need updates.
After being used for decades by XSEDE and its predecessor programs, the Field of Science hierarchy for categorizing allocation requests was long overdue for a replacement. Based on an NSF program organization from the distant past, the old hierarchy had more than 150 options, was peppered with idiosyncratic entries that were not actual fields of science, and was limited to areas within NSF's scope.
Based on an international standard from the Organisation of Economic Co-operation and Development (OECD), the new list has been added to our allocations documentation. Users can expect to see the new Fields of Science appearing in the allocation request forms starting in mid-July for Startup and other small requests and for the Research submission period starting in September.