Task Description

An engine is required to run pipelines that need a large amount of computing resources. Though it is possible to execute pipelines and blocks directly in Jupyter notebook kernel, for executing real world machine learning and deep learning tasks one needs scalable and distributed computing infrastructure like spark and horovod (for distributed GPU training). An admin user can add  and manage   computing resources from the "Infrastructure" management portal.

Role for performing this task

Admin only

Task Steps

  • Log in to RZT aiOS as an Admin, and click Settings icon on lower left corner

The Settings dialog is displayed.

  • Click on the "INFRASTRUCTURE"   option on the left panel. List of all provisioned engines and their current status is displayed. To add a new engine click on  icon

        

   

  • The Provision Engine dialogue box is displayed. Enter a name and description for the engine. Based on your computing requirements, select a type of machine from the left panel. The cost and capacity of (CPU cores, RAM and GPU)  the  selected machine is displayed on the right panel. Selecting the checkbox "Experiment mode" allows the infrastructure to be shared across multiple jobs instead of allocating a dedicated resource for a job. Select the required option and click on "PROVISION" 

         

  • The engine gets added to the list and status is shown as "Provisioning".



  • Once provisioning is completed that status will change to "Running" and the newly provisioned engine is available to run your pipelines from UI and jupyter notebook