Sept. 20, 2023

In a previous post, we introduced vantage6 version 4.0. Version 4.0 introduces several new tools for algorithms and is intended to improve the algorithm development experience, specifically for Python. As such, it will unfortunately also be necessary to update algorithms developed in Python that are running on older versions of vantage6.This blog post is aimed to be a step-by-step guide to help you update your Python algorithms to version 4.0.

New package: vantage6-algorithm-tools

In previous versions of vantage6, the tools to help you build algorithms were present in the vantage6-client package. Now, they have been put into the new package, because as algorithm developer you now only have to install the algorithm tools. Also, regular users don’t install algorithm tools they don’t need when they install the regular (user) Python client.

If you are starting a new algorithm, you should install the algorithm tools with:

pip install vantage6-algorithm-tools

If you are working on an existing algorithm, you should replace the dependency vantage6-client by vantage6-algorithm-tools in your setup.py and/or requirements.txt. Then, you should also update all the imports that you made. In general, you should changes imports from vantage6.tools to vantage6.algorithm.tools, e.g. from vantage6.algorithm.wrap import wrap_algorithm. There’s one exception: the algorithm client should now be imported as from vantage6.algorithm.client import AlgorithmClient.

Note that this means that you should probably also update the import in your Dockerfile. In most cases, the last line of the Dockerfile of a Python algorithm states something like:

CMD python -c "from vantage6.tools.wrapper import csv_wrapper; csv_wrapper('${PKG_NAME}')"

This should now be replaced with:

CMD python -c "from vantage6.algorithm.tools.wrap import wrap_algorithm; wrap_algorithm()"

Note that you no longer need to decide yourself which wrapper you select: this is now handled automatically, which will be discussed in more detail below. You should now just call the wrap_algorithm() function to wrap your algorithm. Also, the PKG_NAME environment variable is now no longer used directly in the Dockerfile but instead in the Python code.

Function signature and decorators

Federated algorithms often consist of central (or orchestrator/aggregator) functions and partial (or local) functions. In vantage6 version 3.x and older versions, the functions had the following fixed function profiles:

# this is a central function
def some_master_name(client, data, *args, **kwargs):
  # do something
  pass

# this is a partial function. Note that these *always* started with `RPC_`, which
# stands for 'Remote Procedure Call'
def RPC_some_regular_method(data, *args, **kwargs):
  # do something
  pass

The args and kwargs in the functions above could be freely defined by the algorithm developer - this remains the same in v4.0. However, in v4.0, there is no longer a strict division between central and partial functions. All functions may be changed to the following signatures:

import pandas as pd
from vantage6.algorithm.client import AlgorithmClient
from vantage6.algorithm.tools.decorators import algorithm_client, data

# if you want an algorithm client and 1 data frame
@algorithm_client
@data(1)
def my_function(client: AlgorithmClient, df1: pd.DataFrame, *args, **kwargs):
  pass

# if you want only an algorithm client
@algorithm_client
def my_second_function(client: AlgorithmClient, *args, **kwargs):
  pass

# if you want to use multiple dataframes (in this case 2 but this is not limited)
@data(2)
def my_function(df1: pd.DataFrame, df2: pd.DataFrame, *args, **kwargs):
  pass

Updated algorithm client

In vantage6 version 3.x and older, algorithms made use of a client class called the ContainerClient. This client contained functions like client.get_results() to get the results of subtasks. In version 3.8, a new AlgorithmClient was introduced that was modelled to be more consistent with the user client. In that client, client.get_results() is replaced by client.result.get(). From v4.0 onwards, the old ContainerClient has been removed.

Below is an overview of all the old commands and which new commands replace them:

  • client.get_results() client.result.get() *
  • client.create_new_task() client.task.create() **
  • client.get_task() client.task.get()
  • client.get_organizations_in_my_collaboration() client.organization.list()
  • client.get_algorithm_addresses() client.vpn.get_addresses()**
  • client.get_algorithm_address_by_label(label) client.vpn.get_addresses(label=…)


* client.result.get() only gets the result, whereas client.get_results() used to also get details on e.g. when the algorithm run started and finished. To get that information, you should now run client.run.get(). Alternatively, you can also get the info from all algorithm runs in a single task using the new functions client.run.from_task(task_id=...) or client.run.from_task(task_id=...)

** Apart from client.vpn.get_addresses(), there are also utility functions client.get_parent_address() and client.get_child_addresses() to get the VPN addresses plus ports of parent and child tasks, respectively.

Note that you can view all commands from the new algorithm client and how to call them at https://docs.vantage6.ai/en/version-4.0.0/function-docs/algorithm-tools.html#vantage6.algorithm.client.__init__.AlgorithmClient.

Waiting for completed tasks

In algorithms prior to version 4.0, in central or aggregator functions there was often a code snippet that looked somewhat like the following:

info("Waiting for results")
task = client.get_task(task_id)
while not task.get("complete"):
    task = client.get_task(task_id)
    info("Waiting for results")
    time.sleep(1)

The snippet above no longer works in v4.0, as the tasks no longer have a property ‘complete’, but rather a status which can have many different values, such as ‘pending’, ‘started’, ‘finished’, and more. With the new AlgorithmClient introduced above, this snippet should be simplified to:

info("Waiting for results")
results = client.wait_for_results(task_id=task.get("id"), interval=1)
info("Results obtained!")

Serialization

In previous versions of vantage6, serialization using pickles was the default. In vantage6 version 4.0, pickles are no longer used due to the security risks associated with them. Instead, JSON serialization is used. The added benefit of using JSON is that the results (if not encrypted) can be read by clients using any programming language.

The change in serialization affects your algorithm in two places: for the input field and for the results. In the input, you can now only pass JSON-serializable data, so for instance, you can no longer pass complex Python objects. Also, the result that you return to the server (in the return statement) should now be JSON serializable. For instance, if you were previously returning a Pandas dataframe df, you should now return the JSON variant of that python object, using df.to_json().

Testing your algorithm

Similarly to the new AlgorithmClient, which replace the ContainerClient, there is also a new MockAlgorithmClient, which replaces the ClientMockProtocol. The thought behind these replacements is the same: just like the AlgorithmClient, the MockAlgorithmClient is now also using the same interface as all the other clients. Also, it has been extended to cover more functionality of the regular AlgorithmClient; you can use all the same functions except the VPN functionality to get addresses of other containers. These will be mocked in a later version of vantage6.

For full details on how to use the MockAlgorithmClient, see https://docs.vantage6.ai/en/version-4.0.0/function-docs/algorithm-tools.html#vantage6-tools-mock-client or check out the new boilerplate test script at https://github.com/IKNL/v6-boilerplate-py/blob/master/v6-boilerplate-py/example.py.

Return to overview