No containers and no sudo: Building and deploying ProviewR on SlapOS

ProviewR is a process automation software used in large scale industries. SlapOS is an operation management platform for cloud computing and edge computing. We explain in this article how we adapted ProviewR to build and run on SlapOS.
  • Last Update:2020-06-06
  • Version:007
  • Language:en

ProviewR is a distributed control system to automate factories. It is based on its own implementation of a virtual Programmable Logic Controller (PLC), and is free and opensource (repo).

One can try to deploy it to the cloud to see if existing opensource cloud automation tools can make setup and maintainance of factories easier.

For cloud machines based on Nexedi tech, it is job of SlapOS (repo) to automate building from sources, deployment and orchestration of services.

So first Alain Takoudjou and then also I tried to make ProviewR compile and run on SlapOS.

SlapOS overview and integration requirements

SlapOS uses Buildout to compile software from sources using a collection of build scripts defined here (repo). While building, it goes like Gentoo: recursively compiles all of the dependency tree and then compiles the software itself.

Making ProviewR compile on Slapos means to write Buildout scripts to compile ProviewR and each of its dependencies.

After the software is built, one needs to make sure that SlapOS is able to deploy it many times on the same machine and run each instance in its own folder under its own user. There are no virtual machines, containers, chroots or jails used in SlapOS.

Making ProviewR to run on Slapos means writing deployment scripts so that:

  • it can be contained in a given directory;
  • no sudo is needed;
  • multiple instances of ProviewR do not interfere with each other.

For connectivity, SlapOS gives each instance its own public IPv6 address. IPv6-incompatible software receives a special IPv4 address to listen to so that from outside such an IPv6-incapable instance can still be adressed via IPv6.

One needs to make sure that each instance listens correctly.

Finally, to have a rough indication that everything works as planned, SlapOS periodically runs simple tests called promises. This is needed to nuke the instance and try remaking it if something goes wrong.

One needs to write these promises and make them pass.

Simplified Overview of ProviewR

ProviewR is written mainly in C and C++ (+ Java API). In sum, sloccount returns ~860k of code. ProviewR has been in use for at least 2 decades. ProviewR still supports VAX, AXP and associated VMS (or at least such compatibility was not removed from sources).

ProviewR uses it own interactive build system written in Perl (pwre). This system uses make as a backend. There is a shell script which wraps pwre to make it non-interactive (build.sh), and finally, there is a Makefile which wraps build.sh to call it from make.

One can roughly divide the core of ProviewR into 2 parts:

  • A runtime - around 220 dedicated C source files (almost no C++), prefix rt_
  • An IDE called Workbench, 300+ source files (mainly C++), prefix wb_

Simply put, IDE defines classes (it is an object-oriented system), makes PLC programs, compiles them and puts them into files needed for runtime.

Files IDE needs to write include:

  • PLC executable (ELF, NEEDS libc, libstdc++,libm, librt, libgcc);
  • flow files (.flw), for a graphical representation of a plc program and for tracing of variables;
  • a snapshot of a database with objects (.dbs). (More precisely a node volume, see ProviewR Designer Guide for details on that, this is a bit complicated);
  • runtime applications configuration. One can disable individual runtime-related daemons there so they are not autorun (.txt)
  • other configs (including connectivity and project info, multiple text files with .dat extension).

Runtime consumes those files, runs PLC, runs other daemons from the config and makes necessary connections with other runtimes.

One must keep in mind that ProviewR is a distributed system: different PLC programs can run on different runtimes (called nodes) and have their own sets of sensors and actuators to control. For example, different runtimes may be responsible for different bioreactors in a cellular agriculture facility.

Communication protocol between the runtimes is a custom Queue Communication Protocol, Qcom for short. It is a message queue based system with a bus common to all clients. Instead of making point-to-point connections with each other, applications (including plc) write to a message bus, and then runtime handles delivery for them (via qmon executable). So while runtimes as a whole do make temporary p2p connections with each other, individual applications inside do not. Qcom by itself does not have authentification, but it can be run on top of other protocols that do.

All needed files for a runtime can be compressed into an archive (so called runtime package) and then this archive can be installed on site with a script pwr_pkg.sh. This is how one can manually deliver PLC programs to individual machines they need to be on.

NOTE: Workbench can be accessed both graphically and from terminal. For graphics it can use both qt4 or gtk2. Its graphical administration utility is called as pwra and it can be used to access all Workbench functions. And from command line, the main shell is called as wb_cmd.

Required Tweaks for SlapOS build

As mentioned above, first one needs to compile ProviewR and all of its dependencies automatically with SlapOS build system.
Some dependencies were very trivial to port or were already ported, while others were more tricky (more on them below).
For ProviewR itself, we used slapos.recipe.cmmi which is a Buildout recipe used in SlapOS to handle configure_make_make-install like workflows.

Most changes that were made to ProviewR sources (excluding pwrp_profile script that was copied and Buildout used this copy on deploy) were implemented as a sequence of sed calls in slapos.recipe.cmmi pre-configure hook.

Here I try to describe the most significant issues that we encountered. For brevity, it does not include pure SlapOS issues that also happened but were not related to Proview.

Qt4 Failing To Report

One of dependencies of ProviewR which one needs to compile before ProviewR can be built is Qt4.

ProviewR uses QGtkStyle for qt interface, and QGtkStyle uses gtk2. So whatever flavor you choose for ProviewR interfaces (gtk or qt) you still need GTK in place.

From SlapOS perspective, this means that GTK interface is preferred because there are less things to build in this case. However, with GTK there were some cryptic errors on ProviewR side, so I ended up switching to qt. This ended up also not error-free, but somewhat more understandable.

The configure script of Qt 4.8.7 has a bug in error reporting mechanism, so even if you try to build qt4 with explicitly requested QGtkStyle support as is the case with default SlapOS qt4, under some circumstances the build won't fail if it can't be done.

i.e. Qt will build, but the requested QGtkStyle won't be there and no warning or error will be produced. The responsible piece of configure code is this:

    if [ "$CFG_GLIB" = "yes" -a "$CFG_QGTKSTYLE" != "no" ]; then
        if [ -n "$PKG_CONFIG" ]; then
            QT_CFLAGS_QGTKSTYLE=`$PKG_CONFIG --cflags gtk+-2.0 ">=" 2.10 atk 2>/dev/null`
            QT_LIBS_QGTKSTYLE=`$PKG_CONFIG --libs gobject-2.0 2>/dev/null`
        fi
        if [ -n "$QT_CFLAGS_QGTKSTYLE" ] ; then
            CFG_QGTKSTYLE=yes
            QMakeVar set QT_CFLAGS_QGTKSTYLE "$QT_CFLAGS_QGTKSTYLE"
            QMakeVar set QT_LIBS_QGTKSTYLE "$QT_LIBS_QGTKSTYLE"
        else
            if [ "$CFG_QGTKSTYLE" = "yes" ] && [ "$CFG_CONFIGURE_EXIT_ON_ERROR" = "yes" ]; then
                echo "Gtk theme support cannot be enabled due to functionality tests!"
                echo " Turn on verbose messaging (-v) to $0 to see the final report."
                echo " If you believe this message is in error you may use the continue"
                echo " switch (-continue) to $0 to continue."
                exit 101
            else
                CFG_QGTKSTYLE=no
            fi
        fi
    elif [ "$CFG_GLIB" = "no" ]; then
        CFG_QGTKSTYLE=no
    fi

As one can see, if glib is not explicitly requested (CFG_GLIB sets to "no" during autodetect if there is no glib on the system), then the error is not reported even if one requests QGtkStyle which cannot be fullfilled.

Qt4 is no longer supported, so i was late to write a bug report on that, thus I just spent some time digging in qt4 configure script trying to understand what it actually needs to build QGtkStyle.

Attaching RPATHs to a precompiled JDK

Another dependency of ProviewR build process is JDK, because it has Java API for applications that needs to be built (there is no easy way for disabling it). My predecessor Alain used a precompiled distribution of OpenJDK for it, however since his work SlapOS enabled more strict tests related to how shared libraries should be linked against.

Basically, for SlapOS to remain relatively distro-independent, it forbids linking against system libraries except for very few exceptions (such as C standard library). Instead the current standard is to use RPATHs to enable linking against SlapOS-built libraries instead.

Recently SlapOS enabled automatic testing to see if binaries are indeed linked correctly. This naturally makes precompiled binaries hard to use, however debugging OpenJDK build from sources can be rather a long process and is considered of low priority now.

So what I did instead is I ported patchelf and used patchelf to add needed RPATHs to precompiled OpenJDK binaries on the go. ProviewR port was a first SlapOS software to use patchelf in such a way.

And it worked.

Xorg for compilation

Most SlapOS machines are headless.

Build instructions in README.md at the ProviewR github page tell you to build while running dummy Xorg if building headless. That is because during compilation there is a lot of generation going on, and this generation is made in part by Workbench IDE.

Workbench command-line interface is not independent from its graphics. Like many other executables of Proview, wb_cmd comes in qt and gtk flavors, and underneath it uses WNavQt or WNavGtk classes. wb_wnav* part of the code is a mechanism underlying a graphical navigation panel, and WNavQt and WNavGtk rely on Xorg to be up and running.

I.e. from your terminal wb_cmd calls code that is tailored to be run as a part of a GUI.

This leads to the fact that you need to connect to a Xorg server for wb_cmd (Workbench command-line interface) to work.

Here we did not try to circumvent this but conformed to a ProviewR tradition. The bulk of the work on Xorg and graphics in general was made by my predecessor at this task, Alain Takoudjou. He made it so that Xorg is started in pre-build hook of slapos.recipe.cmmi and stopped in post-build hook.

I myself only fixed cases of multiple conflicting Xorgs (as it can be in SlapOS, and especially on SlapOS test nodes) during compilation. For that, one needs to make each Xorg store its locks and socks in a place of our choosing using a patch to Xorg which makes it sensitive to XORG_LOCK_DIR environment variable. This patch was already used in other SlapOS software, so just plopped it there.

One more subtle challenge with Xorg was to make it avoid hanging slapos.recipe.cmmi Buildout recipe if something goes wrong during build. It caused troubles on test nodes. To that end, putting a timeout 2h on Xorg seems to be ok.

Note: No idea why, but putting any timeout on the build itself fails in very subtle ways.

Hidden exceptions in the build system

If one makes the ProviewR with the included Makefile, this top-level make will always return 0 which is interpreted by SlapOS as a sign that everything went ok. Even if nothing is ok and there is a failed ProviewR build. It was introducing elusive errors for SlapOS for some time, but then using build.sh directly solved the issue.

HOWTO: Debugging individual ProviewR build stages

ProviewR is an extensive user of environment variables (10+) both during build and during run. Sometimes it can be unwieldy to set these envs by hand when one needs to reproduce the conditions of a failing command. If build fails, SlapOS provides envs which SlapOS itself used in a special script in a build directory, but this is not enough because it does not include envs from pwr build system.

If one wants to actually run a failing wb_cmd call the same way it is called during build, one needs to also specify pwr-specific envs. The shortest way to achieve it as I found is to truncate or comment build.sh from right before it calls pwre create_all_modules till the end and source it.

Though, doing that during ssh session will lead to any press of TAB in bash to exit the subshell (and can lead to a disconnect or a terminal closing on you).

Compiler standards

In order for ProviewR to compile on a arbitrary SlapOS system as it turns out one needs to ensure that std=gnu++11 is set in variables.mk - otherwise either extensions don't work or some language functions don't work, resulting in build errors.

SlapOS Long Paths vs ProviewR strcpy()

As Nexedi's Jerome Perrin have said, 98% of problems with SlapOS come from its long (up to ~300 characters) paths.

Among all obstacles with integrating ProviewR, long paths produced the highest number of errors. This was due to the fact that ProviewR code puts strict (by SlapOS standard) limits on buffers meant for file paths.

The exact number of allowed bytes varies from place to place. Because of this, and because of the fact that ProviewR rarely checks boundaries when using strcpy() there were a lot of buffer overflows, and a big part of porting ProviewR to SlapOS was coping with segfaults and other memory issues.

There is a centralized maximum filepath size constant defined in pwr.h header (which is included in most source files directly and indirectly) however, different files and sections, such as in co_dcli module (command-line instruments) can have their own limits (not always direct, for example if rising maximum file path length, maximum command length when using co_dcli should also be increased).

Making filepath limits uniform would most probably solve SlapOS problems, but considering the ProviewR size and complexity, I ended up thinking that changes needed to uniformize ProviewR filepath limits can be far too invasive and hard to test, and besides i don't know exactly the design considerations in place, so the chances of such a change to be accepted upstream could be low.

So I took my trusty gdb and investigated cases of segfaults case-by case. There were a lot of them, and looking for root causes was quite tiring, so eventually I enabled Address Sanitizer in compiling options to be able to catch inadvertent reads and writes on the fly.

However, enabling ASan made code stop at various and numerous more subtle memory errors, most of which were not SlapOS related. Basically, if one wants to use ASan, one needs to be ready to make ALL of the sources relatively memory-clean.

I did a bit of that (since ASan-found problems are usually quite easy to hotpatch) but this seemed like a lot of time. Since ProviewR uses parts of itself during compilation, it also meant that one won't be able to even compile unless it is clean. So before I dug any further with that, i wanted to make sure paths are as short as possible and that I will encounter the minimal set of problems possible.

Regarding path shortening my predecessor Alain made a special version of ProviewR source archive with a shortened top directory name, but that was not enough and it was not really convenient since one needed to rezip and reupload sources each time. On the other hand, if using plain github/gitlab versions, top directory name can be quite long (especially if you want a specific commit out of ProviewR stable branch), and moreover different for different scenarios (like an archive from a branch or for a commit).

So what I did is i modified slapos.recipe.cmmi Buildout recipe so that it can strip top directory on demand. This was an easy job because cmmi uses slapos.recipe.build for actual download, and it has that ability, it just needed to be exposed.

After that, segfaults during compilation toned down and I turned off ASan to maybe return back to it later (since for sure one should not leave memory errors like that).

I was able to make it to a successful build without turning ASan back on.

Required Tweaks for Running

Ok, so finally one can rejoice, it COMPILES! Running ProviewR for real though yet needs another set of changes, mainly to ProviewR init and SlapOS deployment scripts.

SlapOS directory structure means that a software that was not designed to be portable from on directory to another may need some modifications to run on SlapOS. This includes hardcoded paths that need to be made configurable.

pwrp user and sudo

Since ProviewR is a web-enabled software with some security considerations in place, just like PostgreSQL coming with postgres, it comes with its own user, pwrp.

Among other things, this user plays a role in ProviewR runtime files distribution. When ProviewR wants to have a package delivered to a remote node to start runtime there, it ssh's into a pwrp user on the remote machine.

In SlapOS, this is not allowed. pwrp was circumvented in the vein that SlapOS-generated user comes through the similar init as pwrp (Alain did this part). And we don't consider doing package distribution over ssh.

Curiously enough, many ProviewR systems can tolerate the absence of user named pwrp and for example graphical interfaces of Workbench can launch without it even though pwra is an alias for 'wb -p pwrp pwrp'.

Also, runtime packages contain an unpacking script in themselves. This script is using sudo to manage permissions. This was circumvented by cutting away a tail of an unpacking script of any incoming package, so that vanilla Proview-generated packages are still valid for use with SlapOS-hosted runtime.

Containing an Instance in a Folder and adding Remote Access

Modifications to pwrp_profile (one of ProviewR init scripts) made by Alain included redefining paths in pwr envs pwrb_root, pwrp_root, pwra_db, pwrp_web and their descendants so that they all point to folders inside a Slapos-designated directory. This makes it possible to launch pwra (ProviewR administration) interface, and also Alain made it so that SlapOS provides Shellinabox web shell and noVNC web VNC interface.

I made it so that one can also feed it runtime packages (from a predefined place for now, though it can be made configurable rather easily) and it will install and launch them automatically using pwr_pkg.sh and rt_ini.

Shared Pools for Databases

Even if ProviewR runs, not everything is over.

Slapos needs to handle situtaions with many instances of ProviewR running in parallel on the same machine. Or at the very least many runtimes (making graphics work is of a lesser priority).

So after Xorg compile-time worries, here come runtime run-time worries.

The first of these comes from the fact that ProviewR runtime is a multiprocessed system. It uses System V interface for making memory segments that are shared between its processes, and for this it employs ftok() and shmget().

In this scheme, for 2 unrelated processes to find a region of shared memory to use together, they use an arbitrary file accessible via filesystem as a beacon.

ftok() deterministically calculates a key based on the file stat info. 2 processes calling ftok() on same file will get the same shared memory key. This key is then used with shmget(), and a process can then map shared memory into its address space to use it. File itself is not changed in this process and its contents are irrelevant.

ProviewR runtime uses quite a bit of shared segments (10+), mainly for databases, which in ProviewR use pools for allocation.

Databases shared in this manner are gdb (global database) which contains classes and objects, and qdb, which contains message queues and other communication info for Qcom. Also there is rtdb, a real-time database, which contains live values throughout a running PLC program (such as IO values). Multiple processes need access to the same memory pools where databases are, and database locks are also shared.

The SlapOS problem with these however is that the corresponding token files are all in /tmp/ and this path is hardcoded with #define's.

Luckily there are ready-made global structures associated with these databases, so one can generate needed paths based on TMPDIR environment variable on init of these structures, and then they will be usable from anywhere.

Unfortunately, this change is not fully compatible with VMS if paths are long (which they are on SlapOS). But then, SlapOS is itself not VMS-compatible and even ProviewR's default path for some files used in this manner is already longer then the VMS's $ASCEFC limits of 15 bytes. I don't know whether ProviewR still pursues VMS compatibility.

Binding runtime servers to an IP address

And now, the (hopefully) final stand, are the servers that runtime launches to be controllable remotely and to report its status.

These 2 servers are qmon and statussrv.

qmon brokers Qcom messages for a runtime and statussrv provides basic info and simple controls like an ability to restart.

These however currently do not have an ability to bind to an arbitrary ip address, which ability is needed for SlapOS to be able to separate traffic going to different runtimes running on the same machine.

Hotpatching this ability into ProviewR and achieving mass deployment of ProviewR will be the next step of my work.

Promises

As a preliminary simple test to see if ProviewR runtime starts, Slapos checks if statussrv is up and listening on the port 18084. This promise also runs on testnodes so regressions of runtime are partially monitored. There is no in-depth periodic testing so far.

Thoughts and Future Directions

Ok, this was quite a story, so what's next?

As I mentioned, next on the table is mass parallel deployment of runtimes (mainly to show it is possible). But then?

We don't really know yet what would be the best way to continue with ProviewR. To optimize future efforts, I personally would try (in no particular order):

  • to shrink ProviewR distribution on SlapOS as far as possible;
  • to enable/expand ProviewR's original tests on test nodes;
  • to complete ASan-assisted cleaning of memory access;
  • to adopt (if possible, if any) industry-standard interfaces between ProviewR components (for example to make it possible to use ProviewR PLC engine with other IDEs, if at all feasible);
  • ... and along this process to make build process independent from Xorg;
  • add support for OPC-UA;
  • at least somewhat verify runtime code with Frama-C.

In the next articles on the topic I will try to give more details on the inner workings of ProviewR so that one can make an own opinion of what would be cool to do.

Was a long one. Thanks for reading.

Contact

  • Logo Nexedi
  • Dmitry Blinov
  • dmitry (dot) blinov (at) nexedi (dot) com
  • Photo Jean-Paul Smets
  • Logo Nexedi
  • Jean-Paul Smets
  • jp (at) rapid (dot) space
  • Jean-Paul Smets is the founder and CEO of Nexedi. After graduating in mathematics and computer science at ENS (Paris), he started his career as a civil servant at the French Ministry of Economy. He then left government to start a small company called “Nexedi” where he developed his first Free Software, an Enterprise Resource Planning (ERP) designed to manage the production of swimsuits in the not-so-warm but friendly north of France. ERP5 was born. In parallel, he led with Hartmut Pilch (FFII) the successful campaign to protect software innovation against the dangers of software patents. The campaign eventually succeeeded by rallying more than 100.000 supporters and thousands of CEOs of European software companies (both open source and proprietary). The Proposed directive on the patentability of computer-implemented inventions was rejected on 6 July 2005 by the European Parliament by an overwhelming majority of 648 to 14 votes, showing how small companies can together in Europe defeat the powerful lobbying of large corporations. Since then, he has helped Nexedi to grow either organically or by investing in new ventures led by bright entrepreneurs.