Introduction:

Sentiment analysis is a natural language processing problem where text is understood and the underlying intent is predicted.

Through this tutorial, you will build a Project on RZT aiOS that will scrape reviews from Rotten Tomatoes movie URL and give a sentiment score to each review.


Here are 2 examples of Rotten Tomatoes movie reviews page:

https://www.rottentomatoes.com/m/baby_geniuses/reviews

https://www.rottentomatoes.com/m/bucky_larson_born_to_be_a_star/reviews

The ‘target class’ is nothing but the unique HTML class tag given for each review on the web page. Refer to the screenshot below. In this case the target class is ‘the_review’.



The following tutorial will mainly walk you through:

  • How to use third-party libraries on RZT aiOS. In this case, Beautiful Soup and TextBlob
  • How to create your own blocks to scrape reviews data from Rotten Tomatoes and get their polarity
  • How to create a Block to save your results to a file
  • And finally how to use those blocks and create your own pipeline


How to build it on RZT aiOS:


  • Create a project from the RZT aiOS homepage, we'll name it ‘Sentiment Analysis’. You can provide it a description you would like.


  • Click on the Project you just created and once you are inside, click on the “data” icon on the left side panel. Here you’ll see the “Open Jupyter Lab” button on the top right corner. Click on that. It will open an instance of Jupyter Lab for you.
  • Go inside the default “__block”  folder to get started with creating your block.



  • We have two levels of hierarchy for blocks, org and project. The blocks which are published as “org” blocks will be available for all the projects in your organisation account, and “project” blocks will be available only inside that particular project.

  

    



  • We will be creating a project level block in this tutorial. We go inside the project folder and create our Python bundle (Bundle is basically how we are grouping our blocks in a category to have proper hierarchy). We will create a “SentimentBlocks” bundle for now.
  • Then we’ll go inside our bundle folder and create a python file to write our block code. Here, the python file name is “sentiment_analysis.py”.


  • Then import RZT aiOS SDK libraries to create blocks along with other necessary libraries.

            

import razor.flow as rf

import typing as t


  • Now we can define our block class. We’re calling this class Scrapper, we’ll add details about the blocks with class variables. Define all our input and output class variables. Then use the scrape function to write our scraping code with Beautiful Soup.

FYI:Run is the function which actually runs when we run our blocks



@rf.block

class Scraper:

    __publish__ = True

    __label__ = ""

    __description__ = ""

    __tags__ = []

    __category__ = 'Sentiment Blocks'

    

    

    url: str

    target_class: str

    content: rf.SeriesOutput[str]

    

    def scrape(self):

        page_content = requests.get(self.url).content

        soup = BeautifulSoup(page_content)

        return soup.find_all('div', {'class': self.target_class})

    

    def run(self):

        self.logger.debug('loading url: %s and finding all: %s', self.url, self.target_class)

        items = self.scrape()

        for item in items:

            self.content.put(item.text.strip())



  • Then we’ll create our Sentiment block to get a polarity of reviews. Like the previous block first we define our inputs and outputs and then we define a Sentiment class with details of the block and then write our code using the TextBlob library inside our Run function.


@rf.block

class Sentiment:

    __publish__ = True

    __label__ = ""

    __description__ = ""

    __tags__ = []

    __category__ = 'Sentiment Blocks'

    sentences: rf.SeriesInput[str]

    threshold: int

    positive:rf.SeriesOutput[tuple]

    negative:rf.SeriesOutput[tuple]

   

 def validate():

        test_cases = []        

        return False

    

    def run(self):

        sentiment_score=[]

        from textblob import TextBlob

        import time

        time.sleep(10)

        for sent in self.sentences:

            sent_blob = TextBlob(sent)

            polarity = sent_blob.sentiment.polarity

            data = (polarity, sent)

            sentiment_score.append(data)

            if polarity < self.threshold:

                self.negative.put(data)

            else:

                self.positive.put(data)

            self.logger.info(f'{sent}: {polarity}')


  • Similarly we create a block to write output in a file of our choice.


   

@rf.block

class FileWriter:

    __publish__ = True

    __label__ = ""

    __description__ = ""

    __tags__ = []

    __category__ = 'Sentiment Blocks'

    path: str

    content:rf.SeriesInput[tuple]

    def run(self):

        import os

        from razor.api import project_space_path

        

        if os.path.exists(project_space_path(self.path)):

            os.remove(project_space_path(self.path))

        path = project_space_path(self.path)

        self.logger.debug('writing into file %s',path)

        with open(path, 'w') as file:

            for line in self.content:

                file.write(', '.join([str(x) for x in line]) + '\n')



  • Then we go inside our “__init__.py” file inside the bundle folder and import our blocks.


from .sentiment_blocks import Scraper, Sentiment, FileWriter


from razor.platform.setuptools import block_setup


__metadata__ = block_setup(version="0.0.1")


  • We can publish our blocks in any notebook or python file. Like in our case we are creating a “Block_Publish.ipynb” python notebook inside the same bundle folder. Now execute the following code to publish your blocks.

import razor

razor.platform.publish_project_blocks(bundle=SentimentBlocks, overwrite=True)



  • Now that we have published our blocks, we can go ahead and create our pipeline. Click on the pipeline option on the left panel, and create a pipeline with the name ‘Run Sentiment Analysis’.



  • Then we go inside our “Run Sentiment Analysis” pipeline and add the blocks we just published. Join the inputs and outputs as show in the screenshot below.



  • Now we can provide value to the input parameters for our blocks. Starting with Rotten Tomatoes url and target html class name, set threshold as 0 for sentiment block and then add file path for our output. Note: file path is a path under the ‘Project Space’.
  • Once we run the pipeline using the play icon top right side, you need to select an engine that you have provisioned. Click on ‘Run’ button and this will take you to the ‘Pipeline Runs’ interface with your latest run on the top.
  • Click on the latest run row to monitor the run details like Status of each Block, Logs, Metrics, infra consumption and other meta details.


Cheers! Now you have successfully created a pipeline.