Last update: July 19, 2021
The next Research allocation submission period is September 15, 2021 - October 15, 2021. The year-long allocation begin date will be January 1, 2022.
First time here? Check out the Resource Info page to learn about the resources available, and then visit the Startup page to get going! Startup, Campus Champions, and Education Allocation requests may be submitted at any time throughout the year.
See the XSEDE Resources Catalog for a complete list of XSEDE compute, visualization and storage resources, and more details on the new systems.
XSEDE has many new resources debuting this allocation cycle:
- Indiana University's Jetstream2Coming Soon
- Johns Hopkins University's Rockfish
- National Center for Supercomputing Applications (NCSA) Delta
- Open Storage Network (OSN)
- Purdue University's Anvil Coming Soon
- University of Delaware's DARWIN Coming Soon
- University of Kentucky's KyRIC
Jetstream2, the latest NSF-supported hybrid-cloud platform, will expand the availability of flexible, on-demand, programmable cyberinfrastructure tools. Jetstream2 features a core cloud computing platform at Indiana University with a mix of more than 500 CPU (AMD EPYC 3rd generation), GPU (NVIDIA A100), and 1TiB large-memory nodes paired to a 14PB hybrid storage environment. The project also includes four regional partners with single-rack systems at Arizona State University, Cornell University, the University of Hawai'i, and the Texas Advanced Computing Center. Jetstream2, available to the US research community in Q42021, is configured to allow all users AI accessibility through virtual GPU (vGPU) capabilities as well as pre-built containers that include machine learning (ML) and deep learning (DL) tools. Jetstream2's prioritization of flexible user experience and programmatic interfaces will ease cross-disciplinary collaborations, cross-platform workflows, and workforce development.
As a novel resource in the XSEDE infrastructure, the code performance and scaling for Jetstream2 may require taking a different approach. The Jetstream2 staff will consult with any PIs that may have questions about what is appropriate for their code and scaling submissions. Please contact firstname.lastname@example.org if you have such questions.
Jetstream2 will consist of the following allocatable components:
The primary/default resource for Jetstream2, is CPU-only virtual machine services. There are 384 nodes with AMD 7713 (Milan) CPUs featuring 128 cores and 512GB of ram available. Virtual machine sizes will range from a single core to the entire node with no hyperthreading to oversubscribe floating point operators. Virtual machines on Jetstream2 CPU have no time limitations outside of allocation length and can run without worry of preemption or wait times as long as the allocation is active and SUs are available.
The GPU resource of Jetstream2, will have 90 nodes of AMD 7713 (Milan) CPUs with 64 cores available, 512GB of RAM, and 4 NVIDIA A100 GPUs per node for a total of 360 GPUs. The GPU nodes will utilize NVIDIA's multi-instance GPU (MIG) feature to allow GPUs to be subdivided to be accessible to more researchers, educators, and students.
The large memory resource of Jetstream2, will feature 32 GPU-ready nodes with 128 AMD 7713 (Milan) cores and 1TB of available RAM. Virtual machine sizes will range from 64 cores to the entire node with no hyperthreading to oversubscribe floating point operators. Access to this resource will be limited to those who justify the need for the larger RAM quantities.
The Jetstream2, Storage resource is a Ceph-backed storage system featuring both flash and traditional disk totaling approximately 14PB of total space. Jetstream2, storage is only available with one or more of the Jetstream2 compute resources and may not be requested without a compute resource.
Johns Hopkins University, through the Maryland Advanced Research Computing Center (MARCC) will participate in the XSEDE Federation with its new NSF-funded flagship cluster Rockfish, funded by NSF MRI award #1920103 that integrates high-performance and data-intensive computing while developing tools for generating, analyzing and disseminating data sets of ever-increasing size. The cluster will contain compute nodes optimized for different research projects and complex, optimized workflows. Rockfish consists of 368 regular compute nodes with 192GB of memory, 10 large memory nodes with 1.5TB of memory and 10 GPU nodes with 4 Nvidia A100 GPUs featuring Intel Cascade Lake 6248R, 48 cores per node, 3.0GHz processor base frequency, and 1TB NVMe for local storage. All compute nodes have HDR100 connectivity. In addition, the cluster has access to several GPFS file systems totaling 10PB of storage. 20% of these resources will be allocated via XSEDE.
The National Center for Supercomputing applications (NCSA) is pleased to announce the availability of its newest resource, Delta, which is designed to deliver a highly capable GPU-focused compute environment for GPU and CPU workloads. Delta will provide three new resources for allocation, as specified below:
The Delta CPU resource will support general purpose computation across a broad range of domains able to benefit from the scalar and multi-core performance provided by the CPUs such as appropriately scaled weather and climate, hydrodynamics, astrophysics, and engineering modeling and simulation, and other domains that have algorithms that have not yet moved to the GPU. Delta also supports domains that employ data analysis, data analytics or other data-centric methods. Delta will feature a rich base of preinstalled applications, based on user demand. The system will be optimized for capacity computing, with rapid turnaround for small to modest scale jobs, and will feature support for shared-node usage. Local SSD storage on each compute node will benefit applications with random access data patterns or require fast access to significant amounts of compute-node local scratch space.
The Delta GPU resource will support accelerated computation across a broad range of domains such as soft-matter physics, molecular dynamics, replica-exchange molecular dynamics, machine learning, deep learning, natural language processing, textual analysis, visualization, ray tracing, and accelerated analysis of very large in-memory datasets. Delta is designed to support the transition of applications from CPU-only to using the GPU or hybrid CPU-GPU models. Delta will feature a rich base of preinstalled applications, based on user demand. The system will be optimized for capacity computing, with rapid turnaround for small to modest scale jobs, and will feature support for shared-node usage. Local SSD storage on each compute node will benefit applications with random access data patterns or require fast access to significant amounts of compute-node local scratch space.
Delta Projects Storage
The Delta Storage resource provides storage allocations for allocated projects using the Delta CPU and Delta GPU resources. Unpurged storage is available for the duration of the allocation period.
XSEDE welcomes the Open Storage Network (OSN), a distributed data sharing and transfer service intended to facilitate exchanges of active scientific data sets between research organizations, communities and projects. OSN is providing easy access and high bandwidth delivery of large data sets to researchers who leverage the data to train machine learning models, validate simulations, and perform statistical analysis of live data. The OSN is intended to serve two principal purposes: (1) enable the smooth flow of large data sets between resources such as instruments, campus data centers, national supercomputing centers, and cloud providers; and (2) facilitate access to long tail data sets by the scientific community.OSN data is housed in storage pods, located at Big Data Hubs, interconnected by national, high-performance networks and accessed via a RESTful interface following Simple Storage System (S3) conventions, creating well-connected, cloud-like storage with data transfer rates comparable to or exceeding the public cloud storage providers. Users can park data, back data up, and/or create readily accessible storage for active data sets. 5 PB of storage are currently available for allocation.
Purdue University is pleased to announce the availability of the Anvil supercomputer to the science and engineering community. Anvil, funded by the National Science Foundation (NSF) through award OAC-2005632, is a supercomputer built and operated by Purdue University through a partnership with Dell and AMD. Anvil integrates a large capacity HPC computing cluster with a comprehensive set of software services for interactive computing to aid users transitioning from familiar desktop to unfamiliar HPC environments. Anvil features the third generation AMD EPYC processors and is designed for high throughput moderate scale jobs, will provide fast turnaround for large volumes of work, and will complement leadership class XSEDE systems such as Frontera. Anvil compute nodes provide high core counts (128 cores/node), as well as improved memory bandwidth and I/O, and will support both traditional HPC and data analytics applications and enable integrative long-tail science workflows. A set of 16 GPU nodes and 32 large memory nodes will enable modern machine learning applications. In addition, a composable sub-system provides the ability to deploy science gateway components, long running processing pipelines, and data analytics applications via a developer-friendly Rancher Kubernetes cluster manager.
The CPU portion of Anvil features 3rd generation AMD EPYC processors (Milan). There are 1000 compute nodes, each with two 64-core AMD EPYC 7763 processors for a total of 128,000 cores in the full system. Each compute node also features 256GB of DRAM per node, and PCIe Gen4 interfaces. In addition, Anvil has 32 1 TB large memory nodes.
The GPU component of Anvil has 16 GPU nodes each containing four NVIDIA A100s (40 GB HBM2) and dual 64-core AMD EPYC 7763 CPU. Each GPU node has 256GB of DRAM per node and HDR100 connectivity.
All Anvil nodes, as well as the scratch storage system, will be interconnected by an oversubscribed (3:1 fat tree) HDR InfiniBand interconnect. Full bisection bandwidth will be available at the rack level (56 nodes) with HDR100 connectivity to each node. The scratch storage system is a high-performance, internally resilient Lustre parallel filesystem with 10 PB of usable capacity, configured to deliver up to 150 GB/s bandwidth. Default shared project space through Purdue's existing GPFS Research Data Depot system and archive storage space on Purdue's Fortress system will be provided to each allocation. PIs requiring additional storage may contact the Anvil team with their needs.
Since AMD Milan CPUs are currently not widely available, PIs should use performance data on Rome CPUs (available on Expanse, Bridges2) as an estimate in their benchmarking and scaling section. The time requested should be in Rome CPU core hours.
For the Anvil GPU nodes, PIs can use performance information on A100 GPUs (if available). If not, PIs should use V100 GPU performance as an estimate. The time requested must be in GPU hours for the same GPU utilized for benchmarking.
PIs requesting allocations should consult the Anvil website for additional details and the most current information. PIs who are not able to conduct benchmarking on these recommended systems (Rome, A100 or V100) should contact the Anvil team (email@example.com).
The University of Delaware is pleased to announce the availability of DARWIN (Delaware Advanced Research Workforce and Innovation Network). DARWIN is a big data and high performance computing system designed to catalyze research and education in the Delaware region.
The DARWIN computing system provides the user community access to an array of technologies designed to meet the growing needs within the data sciences community. The system is based on AMD Epyc™ 7502 processors with three main memory sizes to support different workload requirements (512 GiB, 1024 GiB, 2048 GiB). All compute nodes have two AMD EPYC™ 7502 processors (32 cores each). The compute portion of the cluster consists of 92 nodes:
- 48 standard nodes with 512 GiB RAM
- 32 large-memory nodes with 1024 GiB RAM
- 11 xlarge-memory nodes with 2048 GiB RAM, and
- 1 lg-swap node with 1024 GiB RAM + 2.73 TiB Intel Optane NVMe swap
Additionally, the system provides access to three GPU architectures to facilitate Artificial Intelligence (AI) research in the data sciences domains. The GPU portion of the cluster consists of 13 nodes:
- 3 nodes with two Intel® Xeon® Platinum 8260 processors (24 cores each), 768 GiB RAM, and 4 NVIDIA Tesla V100 32GB GPUs connected via NVLINK™
- 9 nodes with two AMD EPYC™ 7502 processors (32 cores each), 512 GiB RAM, and a single NVIDIA Tesla T4 GPU, and
- 1 node with two AMD EPYC™ 7502 processors (32 cores each), 512 GiB RAM, and a single AMD Radeon Instinct MI50 GPU
The cluster provides more than 1 PiB usable, shared storage via a dedicated Lustre parallel file system to support large, data sciences workloads. The Mellanox HDR 200Gbps InfiniBand network provides near-full bisectional bandwidth at 100 Gbps per node.
The XSEDE allocation of DARWIN is intended, in this order of priority, for researchers in the Delaware region, for users in EPSCoR states, and for general XSEDE users with a maximum of 20% of the available resources awarded each year. Approximately 5,000,000 core hours and 19,000 GPU hours will be available during each allocation period. More information about DARWIN can be found at http://docs.hpc.udel.edu/,
KyRIC (Kentucky Research Informatics Cloud) Large Memory nodes will provide 3TB of shared memory for processing massive NLP data sets, genome sequencing, bioinformatics and memory intensive analysis of big data. Each of KyRIC's 5 large memory nodes will consist of Intel(R) Xeon(R) CPU E7-4820 v4 @ 2.00GHz with 4 sockets, 10 cores/socket), 3TB RAM, 6TB SSD storage drives and 100G Ethernet interconnects.
PIs requesting allocations should consult the KyRIC website for additional details and the most current information.
NEW See the latest webinar from XSEDE, "Code Performance and Scaling". Learn the technical aspects of research allocation proposals including how best to gather and present scaling and code performance statistics and estimating SU requests.
|Code Performance and Scaling |
Recorded: April 1, 2020
Run time: 1hr 7mins
These retiring resources, listed below, are not available for new or renewal research requests.
- Data Oasis (SDSC) Decommissioning July, 2021
Continuing this submission period, access to XSEDE storage resources along with compute resources will need to be requested and justified, both in the XSEDE Resource Allocation System (XRAS) and in the body of the proposal's main document. The following XSEDE sites will be offering allocatable storage facilities, these are:
- Open Storage Network
- Bridges-2 Ocean (PSC) New!
- Delta Storage (NCSA) New!
- Expanse Storage (SDSC) New!
- TACC Ranch - required when requesting TACC Stampede2
Storage needs have always been part of allocation requests, however, XSEDE will be enforcing the storage awards in unison with the storage sites. Please vist XSEDE's Storage page for more info.
XSEDE has just published a 2021 revision of the XSEDE allocation policies that reflects an effort to formally incorporate recommendations from an earlier review of the policies as well as de facto changes that have evolved since the last official update. In addition, we have attempted to focus this revised text on allocation policy and not the implementation of those policies in the day-to-day conduct of the allocation process. Thus, this policy revision does not repeat but complements and is explicitly supported by the online allocations documentation, the published XSEDE Allocation Practices and Procedures, and the XRAC Reviewer Manual.
We hope you find the resulting policy text to be easier to digest. At the same time, the policies themselves are not substantially different at a high level, and in practice the day-to-day experience of most users will likely not change.
The XRAS developers have updated the allocations submissions interface so that Renewal submissions will now have some data fields pre-populated with values from the prior submission.
The pre-populated values include the title, project roles, fields of science, keywords, and supporting grants that have not expired. Users are strongly encouraged to review these pre-filled values for any that may need updates.
After being used for decades by XSEDE and its predecessor programs, the Field of Science hierarchy for categorizing allocation requests was long overdue for a replacement. Based on an NSF program organization from the distant past, the old hierarchy had more than 150 options, was peppered with idiosyncratic entries that were not actual fields of science, and was limited to areas within NSF's scope.
Based on an international standard from the Organisation of Economic Co-operation and Development (OECD), the new list has been added to our allocations documentation. Users can expect to see the new Fields of Science appearing in the allocation request forms starting in mid-July for Startup and other small requests and for the Research submission period starting in September.