Anonymous stats collection from VyOS routers: we need your opinions!
Posted 29 May, 2020 by Daniil Baturin
The T-word. Telemetry. Everyone hates it, not least because the most widely known implementations are unambiguously evil. We hate collection and sale of personal information as much as you do, so before we dive deeper into the subject, let’s set it straight: we will never collect any personally identifiable information from VyOS devices without explicit consent, ever.
However, not collecting anything at all does limit our ability to correctly prioritize issues. Now we are thinking of including anonymous stats collection into VyOS images (rolling and LTS alike) by default, and it needs a broad and serious discussion with the community. Our preliminary plan is to enable collection of basic usage stats by default, and provide CLI options to enable more detailed reporting. For that we need to know what people wouldn't hesitate to share by default, so we included a poll in this post.
There are many questions that can only be answered with usage data on hand. For example, should we treat an issue with a particular network card model as a priority? Can we safely drop support for some old hardware? Should we put effort into improving support for something?
There are also many questions that cannot be answered that way, but usage stats provide a way to ask better questions. For example, if no one is using a certain feature, is it because no one wants it, or because something is broken there? “Why are you not using a feature?” is a better question than “Are you using it?” because it will give us the answers we need.
You can make polls, but the usual 90-9-1 rule still applies, and for VyOS it’s more like 99-0.99-0.1 because it’s a plumbing project that most people don’t interact with on a daily basis. A web browser or any program that people frequently interact with has many more chances to engage its users in a conversation. Routers usually just do their job in the background, and people may not even login to them unless there’s a problem or they need to make a configuration change.
So, data collection has to be opt-out in order to be useful. Since many image flavors don’t even have an installer, there’s no chance to ask users if they want to opt in—they’ll have to find the commands to enable it in the documentation. We are sure many people will, but we are already hearing from active community members all the time. We are after the people we never hear from.
If it’s going to be opt-out, we have to make sure there’s no chance to identify individual routers from that data. We also intend to make the complete data set public so that every community member can do their own research with it, which requires an even higher anonymity standard.
Thus the design and implementation of the usage stats collection require a serious discussion with the community. We cannot just push some design without making sure there are no unfortunate implications—we need as many eyes as possible.
Which of the following data would you agree to anonymously share with us?
While we want to make statistics completely anonymous, we also need a way to coordinate data coming from the same VyOS installation.
We can say for certain that no private information such as addresses, hostnames, or password hashes will be transmitted.
However, for the stats to be useful, we need a way to check if a report is coming from a new or already known installation. That means we need a way to associate reports with routers without revealing anything about them. That’s the hard part.
So far we are just gauging what people think is appropriate and what isn’t. Please complete this poll to let us know!
Note that we are talking about data collection built into the VyOS image itself now. When we get to the multi-router management UI, it will come with its own sets of data it has to collect, and what it shares with us will be explicitly configurable of course.