Blocks and pipelines are foundational to any robust design of an Intelligent System. How they are thought of and constructed has a wide ranging impact on the accuracy, robustness and flexibility for an AI. Specifically:
Ability to rapidly put together complex workflows.
Ability to share code between multiple users, applications
Ability to encapsulate complex algorithms and make it available with ease.
Ability to run multiple technology blocks in one pipeline
Ability to control execution mode (in process, container) and transport (file, streaming, socket) at a granular level
Ability to scale the execution of experiments and compare them
While they may seem a bit of an overkill for the beginning data scientist, you will very quickly learn the beauty and elegance of building intelligent systems using blocks and pipelines in a framework like Razorthink's. As your system goes anywhere beyond a toy implementation, you will see the absolute criticality of this approach.
A Block in 60 Seconds
You want to define a basic Block that performs a simple operation. For example, one that takes as inputs - a string and a delimiter, splits the string by the delimiter and gives out an array of sub-strings.
The definition of a simple Block comprises of the following components:
Subclass from Block base class
Inputs declared using @inputs decorator
Outputs declared using @outputs decorator
Implementation of run method
Example The blocks perform a function, that can take multiple inputs and provide multiple outputs. In the following example we define a block to split a string based on a delimiter. The block takes two inputs and provides a single output.
Inputs : The block input is defined using the @inputdecorator. There are two types of input a block can take
Atomic, where the input to the function is a single object
Series, where the input is a list of values which are streamed to the block
Output The block output is defined using the @outputdecorator. There are two types of input a block can take
Atomic, where the output is a single value which is streamed out of the block
Series, where the output is a series of values which are streamed from the block
Defining the block class
The block class consists of the run function, which will house the logic for performing the necessary operation. Also, the block class will take the input and output parameters that are defined as attributes. Once the operation is performed, the results would need to be placed into the output variable/stream using the put function as shown in the following example.
from razor.blocks import Block, inputs, outputs
@inputs.atomic.generic(name='text', doc='A string of text to split')
@inputs.atomic.generic(name='delimiter', doc="A single character or a sequence of characters")
@outputs.atomic.generic(name='data', doc='An array of text')
def run(self, text, delimiter, data):
result = text.split(delimiter)
This block can be instantiated and used as follows:
Conceptually, blocks are akin to functions in a programming language. They take in inputs as parameters, perform some kind of operation on them and then return an output. In case of blocks, there can be multiple outputs as well and each input and output of a block are individually configured. Examples of blocks with multiple outputs will follow shortly.
Notice that run method receives inputs by the same name as they were declared using the @inputs decorator. They are keyword arguments so the order of their declaration and order of parameters in run need not be the same.
Also notice how the block is instantiated and used. Razor SDK widely adopts method-chaining pattern across most of its functionality. In this pattern, a method returns the object itself so that another method of the same object can be called right after. In the above example, input methods like .text and .delimiter are dynamically created in addition to the various other methods provided by the Block base class such as .execute. An interface method will be created for every input and output declared during the block definition. These methods will return the current value of the input if a new value is not passed as parameter. For example:
An interface method with no parameters provided cannot be chained any further as they return the current value rather than the block object. So calling
will raise an exception: AttributeError: 'str' object has no attribute 'delimiter'.
Although the primary purpose blocks are to be composed together into a pipeline, they can be individually executed using the .execute method as shown above. This is equivalent as treating a block as a function.
Doc string is automatically generated for the interface methods using the value of the doc parameter in @inputs. Users of your block can view them using python's help method.
# FIXME: this isn't working