Deploying on several machines

In the first tutorials, you have been using the solve command to run your DCOPs. This command is very convenient as it handles a lot of plumbing details for you, but it only works if you want to run the whole system on a single machine. In this tutorial, you will learn how to really distribute your system, by running different agents on different machines.

Running independent agents

If you want to use several machine to run your DCOP (remenber, the D stands for Distributed !) you need to use the agent and orchestrator commands.

Orchestrator

The Orchestrator is a special agent that is not part of the DCOP: it’s role is to bootstrap the solving process by distributing the computations on the agents. It also collects metrics for benchmark purpose. Once the system is started (and if no metric is collected), the orchestrator could be removed. In any case, the orchestrator never participates in the coordination process, which stays fully decentralised.

The orchestrator command looks very much like the solve command ; it takes a DCOP yaml file as input and supports the same --algo, --ditribution options. The main difference is that the orchestrator command only launches an orchestrator, which then waits for agents to enter the system. The DCOP algorithm will only be started once all required agents have been started.

For example, using this graph coloring problem definition file, you can start an orchestrator:

pydcop -v 3 orchestrator --algo mgm --algo_param stop_cycle:20 \
                         graph_coloring_3agts.yaml

Once the DCOP algorithm finishes, or when reaching the timeout, the command outputs the end-results. The content and format is the same than what is described in Analysing results.

All metrics-collection options can also be used with the orchestrator and works the same way than with the solve command command.

Agents

The agent command launches an agent on the local machine (actually it can also launch several agents, see the detailed command documentation). Initially, this agent does not know anything about the DCOP (variables, constraints, etc. ). It only knows the address of an orchestrator, which is responsible for sending DCOP information to all agents in the system:

pydcop -v 3 agent -n a1 -p 9001 --orchestrator 192.168.1.10:9000

Example

Instead of using solve, you can run the very simple DCOP used in the first tutorial on different machines. For easier setup, we reduces the agents number to 3 in this file : graph_coloring_3agts.yaml.

First launch the orchestrator on a machine:

pydcop -v 3 orchestrator --algo mgm --algo_param stop_cycle:20 \
                         graph_coloring_3agts.yaml

You must check in the logs the ip address and port the orchestrator is listening on, or you can set it using --address and --port

Now launch on 3 different machines (or virtual machines) the following commands to run 3 agents that all use the orchestrator started before (make sure you give them the right IP address and port!):

# Machine 1 runs agent a1
pydcop -v 3 agent -n a1 -p 9001 --orchestrator 192.168.1.10:9000
# Machine 2 runs agent a2
pydcop -v 3 agent -n a2 -p 9001 --orchestrator 192.168.1.10:9000
# Machine 3 runs agent a3
pydcop -v 3 agent -n a3 -p 9001 --orchestrator 192.168.1.10:9000

Each agent receives the responsibility for one of the variables from the DCOP and runs MGM for 20 cycles. Once each agent has performed 20 cycles, the agents and the orchestrator commands return.

Note

If you know in advance the IP address and port the orchestrator will use, you can launch the agents before the orchestrator. In that case, agents will periodically attempt to connect to the orchestrator, until they can reach it.

Provisioning pyDCOP

You may have noticed that the previous section silently assumed that pyDCOP was installed on every machine you want to use in your system. Indeed, we use the pydcop command line application, which is only available if you have installed pyDCOP!

Of course, you can simply follow the installation instructions to install manually pyDCOP on all your machines, but the process is rather tedious and error prone. Moreover, if you are working on DCOP algorithms, you will probably make changes in pyDCOP implementation (at least in the implementation of your algorithm), which requires updating it on all your machine, copying the new development version on all machines, reinstalling it, etc.

When running a large system, one needs to automate this kind of tasks. To help you with this, we provide as set of ansible playbooks that automates the installation process. See the Provisioning guide for full details.