Introduction
Blocks and pipelines are foundational to any robust design of an Intelligent System. How they are thought of and constructed has a wide ranging impact on the accuracy, robustness and flexibility for an AI. Specifically:
- Ability to rapidly put together complex workflows.
- Ability to share code between multiple users, applications
- Ability to encapsulate complex algorithms and make it available with ease.
- Ability to run multiple technology blocks in one pipeline
- Ability to control execution mode (in process, container) and transport (file, streaming, socket) at a granular level
- Ability to scale the execution of experiments and compare them
While they may seem a bit of an overkill for the beginning data scientist, you will very quickly learn the beauty and elegance of building intelligent systems using blocks and pipelines in a framework like Razorthink's. As your system goes anywhere beyond a toy implementation, you will see the absolute criticality of this approach.
A Block in 60 Seconds
Problem
You want to define a basic Block that performs a simple operation. For example, one that takes as inputs - a string and a delimiter, splits the string by the delimiter and gives out an array of sub-strings.
Solution
The definition of a simple Block comprises of the following components:
- Subclass from
Block
base class - Inputs declared using
@inputs
decorator - Outputs declared using
@outputs
decorator - Implementation of
run
method
Example
The blocks perform a function, that can take multiple inputs and provide multiple outputs. In the following example we define a block to split a string based on a delimiter. The block takes two inputs and provides a single output.
- Inputs : The block input is defined using the
@input
decorator. There are two types of input a block can take- Atomic, where the input to the function is a single object
- Series, where the input is a list of values which are streamed to the block
- Output The block output is defined using the
@output
decorator. There are two types of input a block can take- Atomic, where the output is a single value which is streamed out of the block
- Series, where the output is a series of values which are streamed from the block
Defining the block class
The block class consists of the run function, which will house the logic for performing the necessary operation. Also, the block class will take the input and output parameters that are defined as attributes. Once the operation is performed, the results would need to be placed into the output variable/stream using the put function as shown in the following example.
from razor.blocks import Block, inputs, outputs @inputs.atomic.generic(name='text', doc='A string of text to split') @inputs.atomic.generic(name='delimiter', doc="A single character or a sequence of characters") @outputs.atomic.generic(name='data', doc='An array of text') class SplitString(Block): def run(self, text, delimiter, data): result = text.split(delimiter) data.put(result)
This block can be instantiated and used as follows:
split_string = ( SplitString() .text('91-97384-20742') .delimiter('-') )
The instantiated block, can be executed as follows. Once the block is executed the results are stored in a dictionary, and the values can be access as show below.
results = split_string.execute() results['data'].values()
Discussion
Conceptually, blocks are akin to functions in a programming language. They take in inputs as parameters, perform some kind of operation on them and then return an output. In case of blocks, there can be multiple outputs as well and each input and output of a block are individually configured. Examples of blocks with multiple outputs will follow shortly.
Notice that run
method receives inputs by the same name as they were declared using the @inputs
decorator. They are keyword arguments so the order of their declaration and order of parameters in run
need not be the same.
Also notice how the block is instantiated and used. Razor SDK widely adopts method-chaining pattern across most of its functionality. In this pattern, a method returns the object itself so that another method of the same object can be called right after. In the above example, input methods like .text
and .delimiter
are dynamically created in addition to the various other methods provided by the Block
base class such as .execute
. An interface method will be created for every input and output declared during the block definition. These methods will return the current value of the input if a new value is not passed as parameter. For example:
split_string.text(), split_string.delimiter()
An interface method with no parameters provided cannot be chained any further as they return the current value rather than the block object. So calling
split_string.text().delimiter('/')
will raise an exception: AttributeError: 'str' object has no attribute 'delimiter'.
Although the primary purpose blocks are to be composed together into a pipeline, they can be individually executed using the .execute
method as shown above. This is equivalent as treating a block as a function.
Doc string is automatically generated for the interface methods using the value of the doc
parameter in @inputs
. Users of your block can view them using python's help
method.
# FIXME: this isn't working help(split_string.delimiter)