Everything you should know about federated machine learning on risk-based authentication
Federated machine learning and its impact on risk-based authentication is an important topic, especially after the online boom we experienced in 2020. To explain what this means and how it can be translated into our everyday life, we interviewed Markus Ruppert, an experienced computer engineer who has dedicated his professional life to protecting digital identities and supporting KOBIL Systems on its mission.
1. What is federated learning?
Federated machine learning (FML) is an advanced technology based on machine learning.
Classic machine learning has some disadvantages that make it hard or even impossible to be used in special conditions.
In general, machine learning needs a big amount of data to teach a machine model. In addition, the data sets have to be classified one by one to teach the machine the expected results.
The artificial intelligence community calls the classification process labelling.
2. What should a company have in mind if it is aiming to reduce the risk of cyberattacks?
To reduce cyberattack risks, companies should first of all adapt the common IT security mechanisms to the FML model and the data and systems used to build it.
- Use only information and software from well-known and approved sources
- Protect software and configuration data against manipulation (role-based access rights, integrity protection, malware protection tools)
- Split your IT infrastructure in well defined and independently protected parts (development, production, test, and so on).
- Protect the FML model and the data used to teach the system.
To build federated machine learning models, a big amount of data is needed to teach the system in the initial phase. So FML is only fit for problems that offer enough data. The good news is the more data, the better. In information systems producing a high amount of data, FML can take full advantage.
Big data is hard or even impossible to analyse with a classic algorithmic approach built from models that can be fully understood by the human brain. It is like the finite element methods used for decades for the engineering and analyses of technical products. In the past, analytic mathematical models have been used that are hard to understand — and offer a very limited fit to the real-world problems.
To get back to the point of cyberattacks, a huge number of data offers lots of possibilities of ‘poisoning’!
The term poisoning is used for the enrichment of learning data with incorrect information.
Let’s give a simple real-world example:
Imagine a prank is being played on a young child who is learning to speak.
An older child, who can already speak, points to a dog and says cat. Later it points to a cat and says dog. This is how the little child learns the terms.
When the little child later sees a dog and calls him a cat and vice versa, his parents are very surprised. They may even consider visiting a psychologist with the child.
The child’s brain was poisoned with wrong information.
Poisoning can be used to change the expected behaviour of FML models, and even to produce almost undetectable back doors in machine learning models.
For that reason, the source of information is extremely security-relevant for machine learning. Even data collected by big internet companies may be infected.
Ideally, you should produce the data yourself and have the process and the data under your own control.
FML offers advantages over classic machine learning. It counteracts poisoning because it aggregates models from different sources. An attacker would have to manipulate many sources in order to successfully attack the model.
3. Do IoT manufacturers pay enough attention to security problems? What are they doing wrong?
The great security risk of IoT products arises because of their enormous quantity, which is increasing every day. Manufacturers want to be cheap in order to sell a lot. Customers want everything for free. But in the end, everyone has to pay dearly — sometimes even directly.
To address all relevant problems of an IT system — including IoT — a risk analysis is needed. To do this, consideration must be given to where the product is used.
That is why there are testing standards for professional products (for example, medical products must be proven safe, so they do not harm the patient).
4. Can we detect attacks before the infection stage?
Of course, FML will not act like the Oracle of Delphi. But it does offer the possibility to detect even fine-grained changes in system behaviour. Detecting such changes may lead to more attention and analysis.
Attacks aimed at directly and immediately worthwhile targets are particularly limited to parts of the infrastructure. In these cases, a comparison between the different models can reveal anomalies.
As part of a research project, KOBIL plans to develop a platform and mechanisms that will help both the development of AI models and the analysis of changes and abnormalities.
One of the major problems of AI is the deeper understanding of why a model behaves how it does, and to identify the cause of certain results it delivers.
An AI system provides answers based on learned patterns. In simple systems, these answers are essentially based on correlated probabilities.
There is a well-known example: an AI model that has learned to recognize the digits 0 to 9 on pictures will even recognize digits if there is no number on a picture.
In order for the system to work more reliably, you must teach it that there are also pictures without numbers.
When detecting anomalies, you want to know what the cause of the detection is. If you know the cause, you can easily decide whether the cause can be considered ‘normal’ or whether the cause must be classified as a potential danger.
Today, you cannot directly deduce the cause from the answers of a neural network (model). The best thing you can do is to merge the knowledge of different models, to aggregate the models into one model.
FML works in a similar way to democratic elections. Everyone is allowed to contribute their opinion and their decision equally and it is re-elected regularly.
It also has positive aspects in that so far it is not possible to infer the cause directly from the answer. This makes attacks to AI models more difficult.
For example, the software of mobile phones is constantly being changed by updates. The behaviour of the device also changes depending on the apps a user installs and uses. A simple static AI model that monitors cell phone anomalies would be useless in such a dynamic environment, because it will produce a lot of false reports after a short time.
5. How can risk-based authentication help?
Risk-based authentication may help to reduce the costs of protection and lowering the final risk.
Risk-based authentication simply enforces a more secure authentication mechanism if the risk seems to be higher. The associated paradigm is very old. In most cases, the lock on the front door is more secure than the locks on the room doors in the apartment.
In big buildings with a bunch of doors, it is sometimes hard to decide the risk a door or lock should protect against — like a device connected to the internet.
KOBIL built a system to detect indirect risks — weak walls and signs of tunnels being dug.
The system collects parameters called risk indicators.
The main problem in the past has been to infer the potential risk level from all of these parameters. Nobody knew how to interpret a certain combination of risk parameters.
KOBIL’s FML solution solved this problem for mobile devices. A FML model trained to interpret risk indicators and their correlations leads to a system that is able to distinguish between low/normal risk and high risk environments.
This helps the service provider using the system to apply the adequate authentication mechanisms.
For more please visit: https://www.kobil.com/blog
Everything you should know about federated machine learning on risk-based authentication was originally published in KOBIL on Medium, where people are continuing the conversation by highlighting and responding to this story.