10 minutes before the first job

From HLRS Platforms
Jump to: navigation, search

We ask all users of any server operated by HLRS

Please take 10 minutes to read this article completly!


Within this page we describe basic rules as short as possible, if you want to know more within this topic, follow the link. But again please read at least this page!


Storage Storage_usage_policy

No Backup on any filesystem. Please copy important data into the archive.

HOME: Do not run any computational (IO - intensive) job within the HOME directory. For compute jobs use the work space!

Workspace: High performance storage is an expensive ressource. It is intended for active projects only. Move suspended projects into the archive. Each workspace has a lifetime, if this liftime is exceeded, all data will be deleted (automatic). It is possible to receive an email reminder. Copy important data into the archive! More information ==> Workspace_mechanism

Archive: do not store small files in the archive. Please check HPSS_User_Access for more information.

Data transfer to / from the workspace could be done using Data_Transfer_with_GridFTP . Using scp via frontend nodes will fail due to CPU limits

compute server

The frontend/login nodes are behind a firewall. Access and file transfer is only possible for registered IP addresses and only by using ssh protocol. if your IP address changes regularly or you have to work from different locations, you may give VPN a chance.

The frontend nodes have a cpu timelimit of 2h configured. Do not run compute intensive jobs on frontend nodes. Frontend nodes are intented for access, batch submission, filetransfer, workflow and development work.

The compute resources (compute nodes) for parallel compute jobs are only available through the batch system. Please read the batch system documents for the corresponding platform.


Cray Hazel Hen: this system is NOT a cluster. Here we decribe two topics which caused trouble multiple times:

    • to start parallel tasks, using aprun is required (NOT mpirun!!!)
    if you start a parallel job using a wrong mechanism, this may cause trouble for all users. Please consult CRAY_XC40_Using_the_Batch_System if unsure contact your project supervisor!
    • a task using large amount of memory shold be started on a compute node,
    use aprun to do so

documentation

Online documentations for each compute platform are available adjusted for HLRS/HWW site. There you can find information about how to access, how to use compute resources, how to adapt and develop your application, how to start batch jobs with many examples, tips and specific features at HLRS/HWW site. You need to get an overview about the documents before you start working on a specific compute platform.