Task Description

As an Razorthink Platform User or Admin, you want to access Amazon S3(Simple Storage Service) drive for storing and retrieving data. A preconfigured data source can be accessed from a Jupyter notebook and blocks code.

Roles for performing this task

User, Admin

Task Steps

Configure Credentials

  • Log in to Razorthink Platform as a User or Admin. Click the Settings icon
    on the bottom left corner.

  • Click on Data Source Credentials.  The list of all configured datasources are displayed. Click the Add Data Source Credential icon

  • The Add New Configuration dialogue box is displayed. Select S3  from the  Configuration dropdown. Enter the data source name, access key, and secret key in the appropriate fields and click on the Create button.

  • The datasource credential configuration is saved and can be used to create a datasource inside a project

A screenshot of a computer screen

Description automatically generated

 Create Datasource

  • Open the project in which you want to add the datasource and click on data icon . List of all configured data sources are displayed. Click on Add Datasource icon

  • In the Add New Datasource dialogue box, select type as S3. Enter all other details and click on ADD button

  • The data source gets created and can be accessed from Jupyter notebook and block code

Using the connected Datasource

  • Open a Pipeline in the Pipeline Builder, and add the Blocks called 'S3CSVReader'. The Block inputs are:
    1. 'data_source_name' - the name of data source you just created
    2. 'bucket_name' - bucket name you are using in S3
    3. 'data_path' - path to your CSV file in that bucket

The Block in the Pipeline canvas will look something like this:

That's all you need to do to use a file from S3! Now you can just go ahead and run the Pipeline with your other Blocks.