VyOS Platform Blog

Building an open source network OS for the people, together.

Is VyOS CLI fast yet?

Posted 8 May, 2020 by Daniil Baturin

Many people have noticed that in rolling release builds, everything was getting slower.

Boot time, commit time, and even the time it takes to enter set commands got uncomfortably slow, and for a time it was only getting worse.

We are neither blissfully unaware of it, nor we intentionally ignored that problem. We knew it, and we hated it as much as you did. In fact we considered it a blocker for the 1.3 branch freeze.

The good thing is that the worst of it is over: we’ve found a way to make everything almost as fast as in 1.2.x. If you download a rolling release image built after May the 8th, the worst performance problems should be gone, for most configs. If your boot time is as bad as before, let us know! And if your config broke from this change, let us know too.

If you are interested in the details, read on

A lot of early VyOS development was essentially fire fighting. There were many changes in the Linux kernel and the GNU/Linux ecosystem as a whole that by themselves were either positive or mixed, but for the code we’ve inherited from Vyatta, it was as if every rug was pulled from under our feet.

First we’ve lost the rug that really tied the room together: kernel support for UnionFS. Union mount is how config sessions work, and the config backend relies on subtle behaviour details, so it couldn’t be easily switched to OverlayFS. We switched to unionfs-fuse instead, and that was the first performance hit—a userspace implementation is obviously slower.

The limitations of the old approach with config tree represented as a directory tree were already evident by then, and it was equally obvious that Perl and command definitions represented as sprawling directory trees with tiny node.def files were also dead ends.

Thus was born the long term plan to replace the config backend with one that uses in-memory data structures for the config, corrects unfortunate design decisions, and can be used with code in many languages. The only problem is that all legacy code has to be rewritten before that is possible—else we'd have to implement a compatibility layer with the legacy code, which means replicating all the design decisions we want to change.

Instead we chose to identify a good subset of the old APIs and behaviour, and create a compatibility layer that allows us writing new code as if the new config backend is already there. When the last bit of legacy code is gone, then we can swap the config backends.

That’s why we started the big rewrite. It brought command definitions in XML files that are checked against a schema at build time, so obviously malformed definitions fail the build. It also allowed writing config scripts in Python, which brought more contributors within a year than the project had in its entire history.

That’s all great, but we are still faking it—writing code as if the new backend already exists, even if it doesn’t. In reality, the new XML and Python APIs is still a compatibility layer for the old backend.

And, frankly, a lot of that compatibility layer was written hastily as a proof of concept, because no one knew if that idea would work. It did work, but the more code we migrated to the new model, the more limitations became apparent. Some abstractions aren't free, and initially we haven't even made a serious effort to make them cheap.

The biggest offender that made the CLI so slow? The wrapper for value validation scripts. It was a Python script, and since it’s called for every set command, interpreter time added up quickly. The fix was rewriting it, and a widely used numeric validator in OCaml—a native, compiled language without a startup time penalty. That code comes from the new backend prototype, so it’s not exactly new. Our migration scripts (vyos.configtree) and a function for getting config tree as a dictionary and JSON export for the HTTP API are also powered by the code from the future backend. It just takes time to replace everything, but the progress is happening.

Is it all? Of course not. There are still bottlenecks, and things to speed up. But that’s a start, and we’ll get to the rest.

As always, you can support our effort on: