research paper on network attack

Open access
Published: 14 June 2021

Apply machine learning techniques to detect malicious network traffic in cloud computing

Amirah Alshammari ORCID: orcid.org/0000-0001-5666-7427 1 &
Abdulaziz Aldribi 2

Journal of Big Data volume 8 , Article number: 90 ( 2021 ) Cite this article

20k Accesses

29 Citations

1 Altmetric

Metrics details

Computer networks target several kinds of attacks every hour and day; they evolved to make significant risks. They pass new attacks and trends; these attacks target every open port available on the network. Several tools are designed for this purpose, such as mapping networks and vulnerabilities scanning. Recently, machine learning (ML) is a widespread technique offered to feed the Intrusion Detection System (IDS) to detect malicious network traffic. The core of ML models’ detection efficiency relies on the dataset’s quality to train the model. This research proposes a detection framework with an ML model for feeding IDS to detect network traffic anomalies. This detection model uses a dataset constructed from malicious and normal traffic. This research’s significant challenges are the extracted features used to train the ML model about various attacks to distinguish whether it is an anomaly or regular traffic. The dataset ISOT-CID network traffic part uses for the training ML model. We added some significant column features, and we approved that feature supports the ML model in the training phase. The ISOT-CID dataset traffic part contains two types of features, the first extracted from network traffic flow, and the others computed in specific interval time. We also presented a novel column feature added to the dataset and approved that it increases the detection quality. This feature is depending on the rambling packet payload length in the traffic flow. Our presented results and experiment produced by this research are significant and encourage other researchers and us to expand the work as future work.

Introduction

The last two years have seen some of the most shared and stark cybersecurity attacks regularly recorded toward networks in different industries. Security specialists expect another record-breaking year of network breaches and data security risks; companies must make themselves aware of the latest threats in circulation to ensure their security countermeasures are up to par. Ninth attacks type are the most significant frequency in the security First report in Garg et al. [ 9 ]

In network attacks, the attacker must know active addresses, network topology, and available services. Network scanners can identify open ports on a system, whether TCP or UDP ports, where shared services are related to specific ports, and an attacker could send packets to every port [ 6 ]. TCP fingerprinting abilities of how systems react to unauthorized packet formats different vendors TCP/IP stacks answer differently to unauthorized packets. So, the attacker can determine OS by sending numerous combinations of illegal packet options, initiating a connection with an RST packet, or combining other odd and illegal TCP code bits. The attacker could know if a machine is running, whether Linux, Windows, or any other operating system. This information helps to refine the attack and search for weaknesses in specific services and systems to access [ 25 ]. For example, DoS attack, the attacker remains network buffers or memory resources in over-busy. They send massive traffic to a system on the network overcoming its capability to respond to legitimate users. The attacker does this by flood systems with ICMP and UDP packets. The most popular packet flood attack takes benefit of the weakness in TCP’s three-way handshake. Exhaust a server’s ability by leaving half-open connections, so it consumes bandwidth; an attacker would require a more significant relationship than the victim to cut all service [ 5 ]. The network must protected from such attacks; robust IDS should deploy before network routers from the company side.

Recently, ML techniques were used to train IDS to capture malicious network traffic. The main idea of IDS based on ML analysis is finding patterns and building an IDS based on the dataset. The IDS can detect adequately. We need to have a real network traffic dataset and proper feature selection to learned enough. Therefore, we aim to propose a detection framework with an ML model to detect malicious traffic rely on a dataset consisting of network traffic attributes to feed IDS, as illustrated in Fig. 1 . The dataset called ISOT-CID was created by Aldribi et al. [ 2 ] and described in detail in the methodology. The presented model is prepared, constructed, fitted, and evaluated by python language using Sklearn, Numpy, Matplotlib, and Pandas. Our attractive model should construct and fit in memory, so it listens to the extracted feature from network traffic to predict anomalies in real-time. The contribution of our study consists of five things:

Extracting network features (Calculated): T-IN, T-OUT, APL, PV, TBP, and novel Rambling can help IDS better detect. These six features added to the dataset are significant to produce a qualitative dataset applicable to the train machine learning model for anomaly detection.

Propose a lightweight ML model so it can feed IDS in real-time.

Evaluating how calculated features would provide the best classification accuracy using the cross-validation method and split validation.

Our model is applicable to be placed on a local network or before the internet router from the company side.

Detect whether anomaly or normal traffic.

Detection framework

The remainder of this paper organizes as follows. “ Related work ” section presents related works, and similar studies are listed. “ Detection framework (Our Approach) ” section illustrates our framework as a complete solution for detection anomaly, including the machine learning model trained by dataset constructed from network row traffic data. The methodology and experimental results are illustrated in “ Methods ” and “ Results and analysis ” sections, respectively. Finally, the discussion and the conclusion are presented in “ Discussion ” and “ Conclusions and future work ” sections, respectively.

Related work

As anomaly detection is most inserting as a researcher issue, there are many explorations and examination efforts in this field. Briefly, we write about significant of them as related works categorized about the kind of proposed solution.

Supervised learning

Parul and Gurjwar [ 23 ] used the Decision Tree algorithms classifier to train the IDS in a layered approach. The result of this approach gave a good result in a layered approach used for each layer. They used the Random forest algorithm and gave good results for every layer but have limited U2R attach, which presents a very low-rate classification. The author argues to modify the random forest to improve the result of the U2R layer. The proposed system used the KDDcup99 dataset, which has significant enhancement on the new release of the dataset call NSL_KDD.

Peng et al. [ 24 ] presented an IDS based on the decision tree classifier algorithm. The authors compared the result of the work by multi-methods were not only 10% of the dataset; the entire dataset was tested. The experiment results showed that the proposed IDS system was effective. However, when comparing the detection time for each method, the decision tree’s time was not the best in the case of guaranteed accuracy. The authors argue that the proposed IDS system can be used in fog computing environments over big data. The proposed system was not tested as a real-time application. The system also used the old version KDD cup 99, a new, recent version with significant development.

The presented paper for Anton et al. [ 3 ] shown that some ML anomaly detection algorithms such as SVM and Random Forest achieved well in detecting network traffic anomalies in business networks, where both of them are classifier techniques. The dataset needed for training these models delivered by simulators [ 14 ]. The trouble lies in producing sound, actual data that matches the business environment where an anomaly detection model can be applied. There are many opportunities for the allowance of the proposed methods. Data from various resources can be collected, composed, and utilized to increase performance. The overview of context information into the anomaly detection process was capable and encouraged the increase of accuracy.

Additionally, the engagement of trickery technologies as devices for anomaly detection could improve the vision of anomaly behavior. One of the essential dominant requirements is capturing data by attacks exact to business applications in general. The analysis achieved in this work only employs network-based features which, in the same form, residence in home and office devices. The only main diversion was the timing pattern that is strongly interrelated to attacks.

Manna and Alkasassbeh [ 15 ] presented a recent approach that used ML, such as decision tree J48, random forest, and REP tree. The proposed technique used SNMP-MIB data for the trained IDS system to detect DOS attack anomalies that may affect the network. The classifiers and attributes were applied to the IP group. The results showed that applying the REP tree algorithm classifier donated the highest performance to all IP set times. The average performance of these three classifiers was accurate enough to be an IDS System. However, it has a limitation that the dataset is extensive and needs more challenges to be used in real-time.

Unsupervised learning

Jianliang et al. [ 11 ] proposed applying the K-means clustering algorithm used as ML in intrusion detection. K-means was used for intrusion detection to detect anomalies traffic and divide ample data space efficiently, but it has many drawbacks in cluster dependence. They constructed the intrusion detection model using the k-Medoid clustering algorithm with positive modifications. The algorithm stated selecting initial K-Medoid and verified it to be better than K-means for intrusion detection of an anomaly. The proposed approach has exciting advantages over the existing algorithm, which mostly overwhelms the drawbacks of dependency on primary centroids, dependency on the number of clusters, and unrelated clusters. The proposed algorithm is needed to investigate the detection rate for the root attack and real-time environment.

Qiu et al. [ 26 ] presented GAD as a group anomaly detection scheme to pinpoint the subgroup of samples and a subgroup of features that together identify an anomalous cluster. The system was applied in network intrusion detection to detect Botnet and peer-to-peer flow clusters. The approach intended to capture and exploit statistical dependencies that might remain among the measured features. The experiments of the model on real-world network traffic data showed the advantage of the proposed system.

A novel Network Data Mining approach was proposed by Kumari et al. [ 12 ]. Their approach uses the K-means clustering technique to feature datasets that are extracted from flow instances. Training data divided into clusters of periods of anomalies and regular flow. While the data mining process was moderately complex, the resulting centroids of clusters are used to detect anomalies in new live observing data with a small number of distance calculations. This approach allows arranging the detection method for accessible real-time detection as part of the IDS system. Applying the clustering technique separately for different services identified by their transport protocol and port number enhances detection accuracy. The presented approach conducted an experiment using generated and actual flow. As the author said, this approach needs several improvements, such as comparing clustering results with different K to determine the optimal number of clusters, considering other features such as the average flow duration, and considering different distance metrics.

Nikiforov [ 20 ] used a Cluster-based technique to detect anomalies for Virtual Machines within both production and testing LAB environments with reasonable confidence. Some improvements need to be made to have even welled results in testing environments. This model does not consider the time of day and day of week dependability of the VM load. For example, the night is usually a busy time since many auto-tests were running during the night in the testing infrastructure. Some tests were being run at the same time every day. Based on this, the following improvements in the model might be made. Analyze a detected outlier based on the same time as it was detected but for several days before. Check if this is a case when a load is scheduled and planned. Divide the metrics used for analysis into business days vs. weekends since the load might differ.

Cloud-based techniques

Mobilio et al. [ 17 ] presented Cloud-based anomaly detection as a service that used the as-a-service paradigm exploited in cloud systems to announce the anomaly detection logic’s control. They also proposed early results with lightweight detectors displaying a promising solution to better control anomaly detection logic. They also discussed how to apply the as-a-service paradigm to the anomaly detection logic and achieving anomaly detection as-a-service. They also proposed an architecture that supports the as-a-service paradigm and can work jointly with any observing system that stores data in time-series databases. The early experimentation of as-a-service with the Clearwater cloud system obtained results demonstrating how the as-a-service paradigm can effectively handle the anomaly detection logic. This approach is fascinating, which integrates new technology of as-a-service in anomaly detection in real-time.

Moustafa et al. [ 18 ] proposed a Collaborative Anomaly Detection Framework named CADF for handling big data in cloud computing systems. They provided the technical functions and the way of deployment of this framework for these environments. The proposed approach comprises three modules: capturing and logging network data, preprocessing these data, and a new Decision Engine using a Gaussian Mixture Model [ 10 ] and lower–upper Interquartile Range threshold [ 16 ] for detecting attacks. The UNSW-NB15 dataset was used for evaluating the new Decision Engine to assess its reliability while deploying the model in real cloud computing systems, and it compared with three ADS techniques. The architecture for deploying this mode as Software as a Service (SaaS) was produced to be installed easily in cloud computing systems.

An ensemble-based multi-filter feature selection method is proposed by Osanaiye et al. [ 22 ]. This method achieves an optimum selection by integrating the output of four filter methods. The proposed approach is deployed in cloud computing and used for detecting DDOS attacks. An extensive experimental evaluation of the proposed method was accomplished using the intrusion detection benchmark dataset, NSL-KDD, and decision tree classifier. The obtained result shows that the proposed method decreases the number of features to 13 instead of 41 efficiently. Besides, it has a high detection rate and classification accuracy when compared to other classification techniques.

Barbhuiya et al. [ 4 ] presented Real-time ADS named RADS.RADS addresses detecting the anomaly using a single-class classification model and a window-based time series analysis. They evaluated the performance of RADS by running lab-based and real-world experiments. The lab-based experiments were performed in an OpenStack-based Cloud data center, which hosts two representatives, Cloud Applications Graph Analytics and Media Streaming, collected from the CloudSuite workload collection. In contrast, the real-world experiments carried out on the real-world workload traces collected from a Cloud data center named Bitbrains. The evaluation results demonstrated that RADS could achieve 90–95% accuracy with a low false-positive rate of 0–3% while detecting DDoS and crypto-mining attacks in real-time. The result showed that RADS experiences fewer false positives while using the proposed window-based time series analysis than entropy-based analysis. They evaluated the performance of RADS in conducting the training and the testing in real-time in a lab-based Cloud data center while hosting varying 2 to 10 of VMs. The evaluation results suggest that RADS can be used as a lightweight tool to consume minimal hosting node CPU and processing time in a Cloud data center.

Zhang [ 28 ] presented Multi-view learning techniques for detecting the cloud computing platform’s anomaly by implementing the extensible ML model. They worked on a gap formulated as the pair classification in real- time, which is trained by improving the ELM model’s multiple features.

The presented technique automatically fuses multiple features from different sub-systems and attains the improved classification solution by reducing the training mistakes. Sum ranked anomalies are identified by the relation between samples and the classification boundary, and weighting samples ranked retrain the classification model. The proposed model deals with different challenges in detecting an anomaly, such as imbalance spreading, high dimensional features, and others, efficiently via Multi-view learning and feed regulating.

Deep learning techniques

Fernandez and Xu [ 8 ] presented a case study using a Deep learning network to detect anomalies. The author said that he achieved excellent results in supervised network intrusion detection. They also showed that using only the first three octets of IP addresses can be efficient in handling the use of dynamic IP addresses, representing the strangeness of DNN in the attendance of DHCP. This approach showed that autoencoders could be used to detect anomalies wherever they trained on expected flows.

Kwon [ 13 ] proposed Recurrent Neural Network RNN and Deep Neural Network DNN with ML techniques related to anomaly detection in the network. They also conducted local experiments showing the feasibility of the DNN approach to network flow traffic analysis. This survey also investigated DNN models’ effectiveness in network flow traffic analysis by introducing the conducting experiments with their FCN model. This approach shows encouraging results with enhancement accuracy to detect anomalies compared to the conventional techniques of ML such as SVM, random forest, and Ad boosting.

Garg et al. [ 9 ] presented a hybrid data processing model for detection anomaly in the network that influences Grey Wolf optimization and Convolution Neural Network CNN. Improvements in the GWO and CNN training approaches improved with exploration and initial population capture capabilities and restored failure functionality. These extended alternatives are mentioned as Improved-GWO and Improved CNN. The proposed model runs in two stages for detection anomaly in the network. At the first stage, improved GWO was utilized for feature selection to attain an ideal trade-off among two objectives to reduce the frailer rate and minimize the feature set. In the second stage, improved CNN was utilized for the classification of network anomalies. The author said that the proposed model’s efficiency is evaluated with a benchmark (DARPA’98 and KDD’99) and artificial datasets. They showed the results obtained, which validate that the proposed cloud-based anomaly detection model was superior to the other related works utilized for anomaly detection in the network, accuracy, detection rate, false-positive rate, and F-score. The proposed model shows an overall enhancement of 8.25%, 4.08%, 3.62% in detection rate, false positives, and accuracy, respectively, related to standard GWO with CNN.

Feature extraction

Umer et al. [ 27 ] Proposed a flow-based IDS which gets IPFIX/Net Flow records treated as input. Each flows record can have several attributes. Some of these attributes are tacked to the classification model for the decision, while others are used in computational. The significant attributes such as originating IP address destination port play an essential part in the detection judgment’s proposed approach. They conducted feature selection to select related attributes required for increasing the performance of the decision. They conducted a preparing process for flow records to convert them into a specific format to be acceptable to anomaly detection algorithms.

Nisioti et al. [ 21 ] presented a survey of the unsupervised model for the IDS system. This model’s features are extracted from different evidence sources as network traffic, logs from different devices and host machines, etc. Unsupervised techniques proposed to consider as more flexible to the additional features extracted from different sources evidence and do not need regular training back. They also proposed and compared feature selection methods for IDS. This survey finds and uses the optimum feature subset for each class to decrease the computational complexity and time.

Münz et al. [ 19 ] presented a detection model for the anomaly in network traffic using a clustering algorithm, which is K-Means for input. The proposed detection model takes captured hypervisor packets and composes them into a stream of packet flows related to operating system time. The model consists of two phases of feature extraction based on the packet’s header as a primary feature vector computed for each unique packet. The second phase extracts a separated feature vector to every packet flow related to the primary feature vectors attendant with the packets included in the flow.

Aldribi et al. [ 2 ] introduced a hypervisor-based cloud for IDS that includes a novel feature extraction approach depending on the activities of user instances and their related behaviors into the hypervisor. The proposed model intended to detect anomalous behavior into the cloud by tracing statistical variations using a grouping of the gradient descent algorithms and E-Div. The new dataset was introduced as an intrusion detection dataset gathered in a cloud environment available and publicly for researchers. The dataset involves multistage attack scenarios that permit developing and evaluate threat environments relying on cloud computing. They conducted an experimental evaluation using the Riemann rolling feature extraction scheme and produce promising results. The dataset carried the number of communications over encrypted channels, for instance, using protocols like SSH.

Detection framework (our approach)

As shown in Fig. 1 , the network traffic dataset consists of flow network traffic attributes described in Aldribi et al. [ 2 ] with no label. The proposed dataset extracted from network traffic in different period and contains frame time, source MAC, destination MAC, source IP, source port, destination IP, source port, IP length, IP header length, TCP header length, frame length, offset, TCP segment, TCP acknowledgment, in frequency number, and out frequency number. These attributes of network flow can specify packets, whether anomaly or normal. The formulas shown in Fig. 2 can calculate the in-frequency number, and, similarly, the out-frequency number. Other features that are vital and added to the ISOT-CID dataset are. APL is the average payload packet length for a time interval, PV is the variance of payload packet length for a time interval, and TBP means the average time between packets in the time interval [ 29 ].

In Frequency number & out frequency number [ 2 ]

The main significant thing in our research that we added the novel feature. We believe this novel feature gives support for the ML model in the training process. This feature is called rambling.

Most machine learning models are learning from the diversions of instance values. The closer values can support the classification process more accurately. Depending on our knowledge network flow traffic have many different packet sizes through the various type of contents. The network protocols have limited packet size related to industrial Corporations such as Xerox Ethernet V2, intel, etc. Most of them ranged from (64 to 1518) bytes. Suppose we capture a group of packets that have the same destination IP address in a time interval. Let payload of the packet in specific time T is Vi and Xi is the mean of these V (0,1, 2, …. n) the rambling feature (R) calculate for each instance flow for the interval (t, dt) as the following.

This new feature (Rambling feature) can reduce each flow packet size difference, supporting the machine learning algorithm's classification process.

The dataset is labeled related to specific normal IPs, including in the data instances, and used in the ML classification model. The classification model, as presented in Fig. 1 which is trained by an updated ISOT-CID dataset able to classify the new feature extracted from the network data flow, whether normal or anomaly, in real-time. Figure 3 summarizes the whole process.

Detection process

The methodology of our work illustrated in Fig. 4 . It consists of three stages. Stage 1 concerns the dataset preparation, and stage 2 builds the detection model. The last stage will consist of the evaluation stage, which ensures our approach accuracy for anomaly detection.

Flowchart showing method in the research

Dataset preparation stage

Understanding dataset.

Cloud computing networks facing security threats, same as the traditional computing networks with some other differences [ 1 ]. According to several protocols, services, and technologies such as virtual structures, these additional security threats related to the cloud infrastructure have data formatting levels. With such an environment providing protection should consider all data traffic in both insider and outsider. The remaining challenge of completing this job is building an ML model that trains IDS to capture these various data abstraction anomalies. Furthermore, the extracting features from these several data places need related tools to pass the gathered row data to the trained ML model. The extracting tools should be gathering recent instances of data from several resources in real-time.

ISOT-CID [ 1 ] dataset was presented as an exciting job contains several data collections about data transmission behavior and buffer data format. The presented dataset has enough properties and data attributes to train IDS for robust and comprehensive protection. The data collections of ISOT-CID consist of system call properties, network traffic memory dump, events log, and resource utilization. The ISOT-CID cloud intrusion detection dataset contains terabytes of data, including regular traffic, activities, and multiple attack scenarios. The data gathered in several periods in the cloud in a natural environment. This dataset’s content is considered essential for the business industry for developing a realistic intrusion detection model for cloud computing. The ISOT dataset collects various data goatherd from cloud environment and collected from different cloud layers, involved guest hosts, networks, and hypervisors, and encompasses data with various data formats and several data resources such as memory, CPU, system, and network traffic. It includes various attack scenarios such as a denial-of-service masquerade attack, stealth attacks, attacks data from inside and outside the cloud, and anomalous user behavior. ISOT-CID aims to represent a real dataset in the cloud for scientist’s researchers so that they can develop, evaluate and make a comparison of their works. It intends to help various and comprehensive IDS systems development and evaluation.

Furthermore, ISOT-CID is fundamentally raw data and has not been converted, altered, or manipulated. It is prepared and structured for securing the cloud community. In this research, we consider only the network traffic part, as described in the Ph.D. thesis of Aldribi et al. [ 1 ].

In this research, we are working on only the network traffic part. The dataset attributes describe in Tables 1 and 2 .

Preprocessing the dataset

The preprocessing dataset means looking for data instances to deduct redundancy, handle missing values and outlier values. Most ML algorithms need data organized in a way that is suitable to their procedure. So, datasets demand preparation and preprocessing before they can produce valuable patterns. Usually, datasets have missing and invalid data or otherwise difficult for an algorithm to process.

If data is missing, the algorithm cannot deal with it. If data is invalid, the algorithm produces less accurate outcomes. As preprocessing, we convert the columns protocol, MAC source, and MAC destination from categorical data to be numeric to fed into the machine learning algorithm. The conversion process is done by Python code and related libraries.

The dataset we arranged to consist of 416 dump files contains network traffic flow extracted from networks in several periods. The dataset contains only no calculated attributes described in Table 1 . We use WIRE- SHARK Version 1.10.2 to extract these features from dump files and save them in corresponding CSV files.

The calculated attributes illustrated in Table 2 in the previous section. These attributes compute by the Java program designated for this purpose. This Java program uses 0,003 as interval time to compute most of the attributes according to their formula. Our contributed feature, which is called RAMBLING, is computed with the same interval time. The last attribute is label class, as described in the next section. The total size of the dataset, which contains all attributes consist of 89,364 instances.

Label the dataset

Labeling dataset is a significant process for training the ML Algorithm to classify the new traffic as malicious or normal. After computing the attributes in Table 2 in the previous section using the Java program, we extend the program for labeling the instance class by Normal if it has a source or destination IP address. The list of Normal IP addresses shown in Table 3 otherwise Malicious. The java program produces only 1612 instances as malicious and 87,752 instances as normal. The anomaly numbers of instance founded in the dataset is good, but this gives the dataset are imbalanced. To preserve the Normal instance number is large enough and increase the number of malicious, we use over-sampling and under-sampling methods to make a balanced dataset containing 44,569 instances as Malicious and 44,795 Normal instances. The total number of instances in the dataset used in the training ML model is 89364 instances.

We believe that the over-sampling and under-sampling change in misclassification costs and class distribution. Also, over-sampling is unexpectedly effective and producing a change in performance. However, it is noteworthy that representing these changes internally by down-weighting gives the best performance overall [ 7 ]. In our dataset, we experimented before under-sampling and over-sampling. The result found in cross-validation some folds give low accuracy while the average accuracy is still high. The two columns feature used for labeling removed from the dataset, so the ML models are trained by others, which are (Time -Protocol–Length -Source Port-Destination Port-IPHdrLength-SOURCE MAC-DMAC-TCPHdr-Length-FramLength-IPOfsetNo-TCPSEQ-TCP_ACK-F_IN-F_out-Rambling-APL-PV-TBP-class).

Building detection model stage

Selecting ml technique.

In this task, we construct a model with several well-known ML models for selecting the accurate classifier. These well-known models are the Decision Tree (DT), Neural Networks (NNs), K-nearest neighbor (KNN), Naïve Bayes (NB), Support Vector Machine (SVM), and Finally, Random Forest (RF).

Extracting features

After building, testing, and evaluating the detection model, this task can use when the system is deployed and fitted into the memory. In real-time, one by one feature can extract from network flow traffic. For those interested in our result, can for dataset feature and ML experiments use our approach to create full software to be a feeder of the IDS system on the computer network.

Trigger the model and passing features

After the feature extracted in Real-Time should pass into the detection model for classification, IDS can alert for another device for decision-making once the packet is classified.

Evaluating stage

Cross-validation.

In this task, we conduct several experiments to evaluate the ML algorithm for accuracy. The confusion matrix uses to calculate the percentage of accuracy of each algorithm.

Split -validation

The alternative technique also uses for judgment of the accuracy of the ML model. It split the data into training and testing parts 90%, 80%, or 70%; the testing part uses to calculate the presence of each algorithm’s accuracy by using the confusion matrix.

Results and analysis

The result of training ML models by provided dataset which described in the previous section is consists of two sections as the following:

Cross-validation evaluation

Cross-validation is a technique used to validate the ML algorithm according to divide the dataset into folds to ensure all kinds of dataset instances hold in training and testing. This division is called K-folds, where K represents the number of division parts. For example, K-folds = 5 means the dataset split into five parts, where part-1 uses for training and part-2 for testing as fold-1. In fold-2, part-2 takes as training and part-3 for testing. Part-3 uses in training and part-4 for testing in fold-3. Fold-4 gives part-4 for training and part-5 testing. In fold-5, part-5 uses for training, and part-1 for testing at the end. The model accuracy is the average accuracy of all five folds. This technique will ensure if there is overfitting in training or not. The meaning of overfitting in machine learning is there is no clear separation in data instances by other meaning the value of the attributes are closer so; ML could take the same instance in the classes.

Evaluating ANN

Table 4 illustrates three experiments result for the ANN model. The first one by using K = 5, K = 10 in the second, and K = 15 in the third experiment. This result shows that most folds accuracy is closer, which ensures no overfitting in the ML model, and the accuracy result is 94% which is acceptable.

Evaluating DTREE

Table 5 shows three experimental results for the DTREE model with K = (5, 10 and 15). The accuracy result given is unexpected, but it comes as 100%.

Evaluating K-nearest Neighbor (KNN classifier)

Table 6 illustrates that the KNN model is also applicable to be reliable for detecting anomalies by the presented dataset.

Evaluating support vector machine

Table 7 presents that the SVM model is not appropriate for detecting anomalies by a presented dataset.

Evaluating Random Forest

Table 8 shows that the Random Forest model and Decision Tree give the same result, which is 100%.

Evaluating Naive Bayes

Table 9 shows that the Naïve Bayes model is not applicable to be reliable for detection anomaly by a presented dataset. The model gives a pure result with cross-validation among three experiments in different kinds of folds.

Split-validation evaluation

This evaluation method breaks apart from dataset instances for testing after fitting the ML model running in the memory and trained by another. That is means dividing the dataset into two parts, one for testing and the other for training. The accuracy of the model is given by computing the confusion matrix that consists of four values:

True Positive (TP): This is the number of observations positive and predicted to be positive.

True Negative (TN): This is the number of observations positive and predicted to be negative

False Positive (FP): This is the number of observations negative but predicted to be positive.

False Negative (FN): This is the number of observations positive but predicted to be negative.

Tables 10 and 11 show the ANN model’s accuracy result, which is 0.96, according to the split-validation evaluation technique. In this experiment, we use 90% of data instances for training and 10% for testing. The Confusion Matrix presented in Table 10 clarifies that 39,130 instances classify as normal from the testing data part, where they label as expected in the dataset. The classifier ANN failed with 931 instances where these instances were labeled normal in the dataset and classified as Malicious is the wrong classification. On the other hand, 37,834 instances classify by ANN as accurate as malicious, where 2533 instances are classified as usual as wrong, while the ANN classifier should classify them as malicious. The total accuracy result is 0.96% is acceptable and can be reliable to feed the IDS for anomaly detection.

Tables 12 and 13 present the DTREE model result, which is 100% according to training by 90% of the dataset tested by 10% of dataset instances. The confusion matrix illustrated in Table 12 clarifies that no wrong instance was found in the testing part after classification Table 13 .

Evaluating KNN

As in Table 14 for K-nearest Neighbor (KNN) model, the confusion matrix clarifies that 489 instances failed in a classification where 39,572 instances have a correct classification in the Normal class. On the other side,362 have classification errors as Normal where these instances should be malicious, and 40,005 instances classified true as malicious. Also, the classification report presents in Table 15 .

Evaluating SVM

Tables 16 and 17 SVM give an 81% accuracy result by splitting the dataset into 90% for training and 10% for testing.

Tables 18 and 19 presented that Random Forest is the most accurate model, same as DTREE for anomaly detection in network traffic flow.

Tables 20 and 21 show that the Naïve Bayes model is not applicable for prediction anomaly, where it has low accuracy of 60%.

We get good results by conducting several experiments by python programming language on the ISOT-CID dataset, collected from network traffic extracted in different periods. Six ML models are trained by this dataset and evaluated by two evaluation methods cross-validation and split-validation. Four of them give significant accurate result while the other two give none accepted result as the following:

Cross-validation result

The evaluation method cross-validation is conducted several times with different values of K-Fold on the dataset. Table 22 shows the result of each experiment for a specific ML model. Also, Fig. 5 visualizes the result of each experiment on the dataset. Cross-validation gave the same result by split-validation, DTREE and Random Forest produces an optimal result with no error or mistake found in the testing fold allocated from the dataset. That is means DTREE, and Random Forest models are most accurate and applicable to be a feeder for IDS to anomaly detection on network traffic flow.

Model accuracy comparison with cross-validation

Split-validation result

By conducting an ML experiment on the IOST-CID dataset. Table 23 and Fig. 6 show six ML model results. DTREE and Random Forest gave optimal results 100%, which means no error or mistake was found in the classification process on the testing part allocated from the dataset.

Model accuracy comparison with split-validation

While all the results are excellent, random forest and DTREE show that they give the best results based on split validation or cross-validation. We think that this due to the characteristics of random forest and DTREE. Random forest characteristics are:

It needs to be some actual signal in the dataset features, which helps it do better. It is available in our dataset.

The predictions (and therefore the errors) made by the individual trees need to have low correlations.

While the DTREE explores all possible outcomes of a decision, this methodology helps create an analysis that includes all the outcomes. It is suitable for our comprehensive dataset.

Conclusions and future work

As one of the extensive uses of computer networks and telecommunication devices, network security has become significant for all these networks’ users. Consequently, this issue of intrusion detection has identified the helpfulness of both research and corporate associations intending to develop and deploy effective IDSs that are proficient in protecting severe system components against intruders.

We present a reliable model running in Real-time to detect malicious data flow traffic depending on the ML supervised techniques based on the ISOT-CID dataset that contains network traffic data features. Our challenge in this research is to capture the deviations between the data instances so; malicious and normal properties categorize the data. Six column features are computed and added to the network traffic properties to support the ML model for diagnoses the malicious traffic.

We present one novel feature called rambling that compute related to interval time of traffic data connection. The packet payload length can be extracted at this period and compute the diversion of length about the mean of all packet’s length. We approved that the six features added to the dataset are vital to producing a qualitative dataset applicable to the train machine learning model for anomaly detection. DTREE and Random Forest are both gave optimal accuracy results when evaluated by cross-validation and split-validation. These two models did not fail in any instance on the classification process applied to testing parts or folds from the dataset.

Despite the encouraging results of the machine learning models are used and the six vital features that have been able to raise the efficiency of machine learning models, there are some limitations in the model presented. IDS security systems for computer networks must be very fast where it is deployed in real-time to extract the communication traffic characteristics and give its response in real-time. The presented model relies on a vast dataset that is considered a type of big data, where it influenced the performance of fitting the system and its evaluation. Simultaneously, the deployment of this model in real networks will harm the speed required. Therefore, we will apply deep learning techniques using cloud computing to exploit the dataset, integrating with the six calculated features as future work.

Availability of data and materials

The data that used in this research are available upon request from the author Dr. Abdulaziz Aldribi. Data are publicly available and available under this website https://www.uvic.ca/engineering/ece/isot/datasets/cloud-security/index.php .

Aldribi A, Traore I, Moa B. Data sources and datasets for cloud intrusion detection modeling and evaluation. In: Cloud computing for optimization: foundations, applications, and challenges. Cham: Springer; 2018. p. 333–66.

Chapter Google Scholar

Aldribi A, Traoré I, Moa B, Nwamuo O. Hypervisor-based cloud intrusion detection through online multivariate statistical change tracking. Comput Secur. 2020;88:101646–101646. https://doi.org/10.1016/j.cose.2019.101646 .

Article Google Scholar

Anton SD, Kanoor S, Fraunholz D, Schotten HD. Evaluation of machine learning-based anomaly detection algorithms on an industrial modbus/tcp data set. In:Proceedings of the 13th international conference on availability, reliability and security. 2018. p. 1–9.

Barbhuiya S, Papazachos Z, Kilpatrick P, Nikolopoulos DS. RADS: Real-time anomaly detection system for cloud data centres. 2018. arXiv preprint arXiv:1811.04481 .

Bellaïche M, Grégoire JC. SYN flooding attack detection by TCP handshake anomalies. Secur Commun Netw. 2012;5(7):709–24. https://doi.org/10.1002/sec.365 .

Desai MM. Hacking for beginners: a beginners guide to learn ethical hacking. 2010.

Drummond C, Holte RC. C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced datasets II, vol. 11. Washington DC: Citeseer; 2003. p. 1–8.

Fernández GC, Xu S. A case study on using deep learning for network intrusion detection. In: MILCOM 2019–2019 IEEE Military Communications Conference (MILCOM). IEEE; 2019. p. 1–6.

Garg S, Kaur K, Kumar N, Kaddoum G, Zomaya AY, Ranjan R. A hybrid deep learning-based model for anomaly detection in cloud datacenter networks. IEEE Trans Netw Serv Manage. 2019;16(3):924–35. https://doi.org/10.1109/tnsm.2019.2927886 .

Gelman D, Shvartsev B, Ein-Eli Y. Aluminum–air battery based on an ionic liquid electrolyte. J Mater Chem A. 2014;2(47):20237–42. https://doi.org/10.1039/c4ta04721d .

Jianliang M, Haikun S, Ling B. The application on intrusion detection based on k-means cluster algorithm. In: 2009 International Forum on Information Technology and Applications, vol. 1. IEEE; 2009. p. 150–2.

Kumari R, Singh MK, Jha R, Singh NK. Anomaly detection in network traffic using K-mean clustering. In: 2016 3rd International Conference on Recent Advances in Information Technology (RAIT). IEEE; 2016. p. 387–93.

Kwon D, Kim H, Kim J, Suh SC, Kim I, Kim KJ. A survey of deep learning-based network anomaly detection. Cluster Comput. 2019;22(1):949–61.

Lemay A, Fernandez JM). Providing {SCADA} network data sets for intrusion detection research. In: 9th Workshop on Cyber Security Experimentation and Test ({CSET} 16). 2016.

Manna A, Alkasassbeh M. Detecting network anomalies using machine learning and SNMP-MIB dataset with IP group. In: 2019 2nd International Conference on new Trends in Computing Sciences (ICTCS). IEEE; 2019. p. 1–5.

Peel D, McLachlan GJ. Robust mixture modelling using the t distribution. Stat Comput. 2000;10(4):339–48.

Mobilio M, Orrù M, Riganelli O, Tundo A, Mariani L. Anomaly detection as-a-service. In: 2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE; 2019. p. 193–9.

Moustafa N, Creech G, Sitnikova E, Keshk M. Collaborative anomaly detection framework for handling big data of cloud computing. In: 2017 military communications and information systems conference (MilCIS). IEEE; 2017. p. 1–6.

Münz G, Li S, Carle G. Traffic anomaly detection using k-means clustering. In: GI/ITG Workshop MMBnet. 2007. p. 13–4.

Nikiforov R. Clustering-based anomaly detection for microservices. 2018. arXiv preprint arXiv:1810.02762 .

Nisioti A, Mylonas A, Yoo PD, Katos V. From intrusion detection to attacker attribution: a comprehensive survey of unsupervised methods. IEEE Commun Surv Tutor. 2018;20(4):3369–88. https://doi.org/10.1109/comst.2018.2854724 .

Osanaiye O, Cai H, Choo KKR, Dehghantanha A, Xu Z. Dlodlo M (2016) Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing. EURASIP J Wirel Commun Netw. 2016;1:130–130. https://doi.org/10.1186/s13638-016-0623-3 .

Parul B, Kumar Gurjwar R. A review on attacks classification using decision tree algorithm. Int J. 2014;2(2).

Peng K, Leung VCM, Zheng L, Wang S, Huang C, Lin T. Intrusion detection system based on decision tree over big data in fog environment. Wirel Commun Mob Comput. 2018;2018:1–10. https://doi.org/10.1155/2018/4680867 .

Pilli ES, Joshi RC, Niyogi R. Data reduction by identification and correlation of TCP/IP attack attributes for network forensics. In: Proceedings of the International Conference & Workshop on Emerging Trends in Technology. 2011. p. 276–83.

Qiu Z, Miller DJ, Kesidis G. Detecting clusters of anomalies on low-dimensional feature subsets with application to network traffic flow data. In: 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE; 2015. p. 1–6.

Umer MF, Sher M, Bi Y. Flow-based intrusion detection: Techniques and challenges. Comput Secur. 2017;70:238–54. https://doi.org/10.1016/j.cose.2017.05.009 .

Zhang J. Anomaly detecting and ranking of the cloud computing platform by multi-view learning. Multimedia Tools Appl. 2019;78:30923–42.

Zhao D, Traore I, Sayed B, Lu W, Saad S, Ghorbani A, Garant D. Botnet detection based on traffic behavior analysis and flow intervals. Comput Secur. 2013;39:2–16. https://doi.org/10.1016/j.cose.2013.04.007 .

Download references

Acknowledgements

I would like to state my special thanks and appreciation to Jouf university that gave me the best chance to study for my master’s degree.

The corresponding author has a master’s degree funding from Jouf University.

Author information

Authors and affiliations.

Department of Computer Science, College of Computer, Jouf University, Al Jouf, Saudi Arabia

Amirah Alshammari

Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia

Abdulaziz Aldribi

You can also search for this author in PubMed Google Scholar

Contributions

AAls and AAld have participated in the design of the proposed method. AAls has implemented and coded the method and go testing and obtain the results. As a supervisor, AAld support and guide AAls during her MSc degree with some ideas and knowledge. Both authors read and approved the manuscript.

Corresponding author

Correspondence to Amirah Alshammari .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

I declare that “on behalf of all authors” I have no significant competing financial, professional, or personal interests that might have influenced the performance or presentation of the work described in this manuscript.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Alshammari, A., Aldribi, A. Apply machine learning techniques to detect malicious network traffic in cloud computing. J Big Data 8 , 90 (2021). https://doi.org/10.1186/s40537-021-00475-1

Download citation

Received : 04 September 2020

Accepted : 24 May 2021

Published : 14 June 2021

DOI : https://doi.org/10.1186/s40537-021-00475-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Network traffic
Machine learning

Zero-day attack detection: a systematic literature review

Published: 27 February 2023
Volume 56 , pages 10733–10811, ( 2023 )

Cite this article

Rasheed Ahmad ORCID: orcid.org/0000-0002-7154-7295 1 ,
Izzat Alsmadi 2 ,
Wasim Alhamdani 1 &
Lo’ai Tawalbeh 3

3035 Accesses

12 Citations

Explore all metrics

With the continuous increase in cyberattacks over the past few decades, the quest to develop a comprehensive, robust, and effective intrusion detection system (IDS) in the research community has gained traction. Many of the recently proposed solutions lack a holistic IDS approach due to explicitly relying on attack signature repositories, outdated datasets or the lack of considering zero-day (unknown) attacks while developing, training, or testing the machine learning (ML) or deep learning (DL)-based models. Overlooking these factors makes the proposed IDS less robust or practical in real-time environments. On the other hand, detecting zero-day attacks is a challenging subject, despite the many solutions proposed over the past many years. One of the goals of this systematic literature review (SLR) is to provide a research asset to future researchers on various methodologies, techniques, ML and DL algorithms that researchers used for the detection of zero-day attacks. The extensive literature review on the recent publications reveals exciting future research trends and challenges in this particular field. With all the advances in technology, the availability of large datasets, and the strong processing capabilities of DL algorithms, detecting a completely new or unknown attack remains an open research area. This SLR is an effort towards completing the gap in providing a single repository of finding ML and DL-based tools and techniques used by researchers for the detection of zero-day attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

AI-Driven Cybersecurity: An Overview, Security Intelligence Modeling and Research Directions

Cybersecurity data science: an overview from machine learning perspective

A comprehensive survey of AI-enabled phishing attacks detection techniques

Abdalgawad N, Sajun A, Kaddoura Y, Zualkernan IA, Aloul F (2022) Generative deep learning to detect cyberattacks for the IoT-23 dataset. IEEE Access 10:6430–6441. https://doi.org/10.1109/ACCESS.2021.3140015

Article Google Scholar

Agrawal S, Sarkar S, Aouedi O, Yenduri G, Piamrat K, Bhattacharya S, Maddikunta PKR, Gadekallu TR (2021) Federated learning for intrusion detection system: concepts, challenges and future directions. https://arxiv.org/abs/2106.09527v1

Ahmad R, Alsmadi I (2021) Machine learning approaches to IoT security: a systematic literature review. Internet Things 14:100365. https://doi.org/10.1016/j.iot.2021.100365

Alam MS, Yakopcic C, Subramanyam G, Taha TM (2020) Memristor based neuromorphic adaptive resonance theory for one-shot online learning and network intrusion detection. In: International conference on neuromorphic systems 2020, pp 1–8

Aljawarneh S, Aldwairi M, Yassein MB (2018) Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model. J Comput Sci 25:152–160. https://doi.org/10.1016/j.jocs.2017.03.006

Al-Zewairi M, Almajali S, Ayyash M (2020) Unknown security attack detection using shallow and deep ANN classifiers. Electronics 9(12):2006. https://doi.org/10.3390/electronics9122006

Andresini G, Appice A, Mauro ND, Loglisci C, Malerba D (2020) Multi-channel deep feature learning for intrusion detection. IEEE Access 8:53346–53359. https://doi.org/10.1109/ACCESS.2020.2980937

Andropov S, Guirik A, Budko M, Budko M (2017) Network anomaly detection using artificial neural networks. In: 2017 20th conference of open innovations association (FRUCT), pp 26–31. https://doi.org/10.23919/FRUCT.2017.8071288

Anindya IC, Kantarcioglu M (2018) Adversarial anomaly detection using centroid-based clustering. In: 2018 IEEE international conference on information reuse and integration (IRI). IEEE, pp 1–8

Anthi E, Williams L, Słowińska M, Theodorakopoulos G, Burnap P (2019) A supervised intrusion detection system for smart home IoT devices. IEEE Internet Things J 6(5):9042–9053. https://doi.org/10.1109/JIOT.2019.2926365

Asam M, Khan SH, Akbar A, Bibi S, Jamal T, Khan A, Ghafoor U, Bhutta MR (2022) IoT malware detection architecture using a novel channel boosted and squeezed CNN. Sci Rep 12(1):15498. https://doi.org/10.1038/s41598-022-18936-9

Ashfaq Khan M, Karim M, Kim Y (2019) A scalable and hybrid intrusion detection system based on the convolutional-LSTM network. Symmetry 11:583. https://doi.org/10.3390/sym11040583

Ashi Z, Al-Fawa’reh M, Al-Fayoumi M (2020) Fog computing: security challenges and countermeasures. Int J Comput Appl 175(15):30–36. https://doi.org/10.5120/ijca2020920648

Ashiku L, Dagli C (2021) Network intrusion detection system using deep learning. Procedia Comput Sci 185:239–247. https://doi.org/10.1016/j.procs.2021.05.025

Attenberg J, Ipeirotis P, Provost F (2015) Beat the machine: challenging humans to find a predictive model’s “unknown unknowns.” J Data Inf Qual 6(1):11–117. https://doi.org/10.1145/2700832

Attia TM (2019) Challenges and opportunities in the future applications of IoT technology. https://www.econstor.eu/handle/10419/201752

Aygun RC, Yavuz AG (2017) Network anomaly detection with stochastically improved autoencoder based models. In: 2017 IEEE 4th international conference on cyber security and cloud computing (CSCloud), pp 193–198. https://doi.org/10.1109/CSCloud.2017.39

Bayoğlu B, Soğukpınar İ (2012) Graph based signature classes for detecting polymorphic worms via content analysis. Comput Netw 56:832–844

Bendale A, Boult TE (2016) Towards open set deep networks. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 1563–1572. https://doi.org/10.1109/CVPR.2016.173

Bhargavi M, Kumar MN, Meenakshi NV, Lasya N (2019) Intrusion detection techniques used for internet of things. Internal J Applied Eng Res 14(24):5 pp. 4462–4466

Bhatia R, Benno S, Esteban J, Lakshman TV, Grogan J (2019) Unsupervised machine learning for network-centric anomaly detection in IoT. In: Proceedings of the 3rd ACM CoNEXT workshop on Big DAta, machine learning and artificial intelligence for data communication networks, pp 42–48. https://doi.org/10.1145/3359992.3366641

Bîrlog I, Borcan D, Covrig G (2020) Internet of things hardware and software. Informatica Economica 24(2):54–62. https://doi.org/10.24818/issn14531305/24.2.2020.05

Boutaba R, Salahuddin MA, Limam N, Ayoubi S, Shahriar N, Estrada-Solano F, Caicedo OM (2018) A comprehensive survey on machine learning for networking: evolution, applications and research opportunities. J Internet Serv Appl 9(1):16. https://doi.org/10.1186/s13174-018-0087-2

Brindha S, Abirami P, Arjun V, Logesh B, Mohammed S (2020) Heuristic approach to intrusion detection system. Int Res J Eng Technol 07(03):3

Google Scholar

Campos GO, Zimek A, Sander J, Campello RJGB, Micenková B, Schubert E, Assent I, Houle ME (2016) On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Disc 30(4):891–927. https://doi.org/10.1007/s10618-015-0444-8

Article MathSciNet Google Scholar

Chaabouni N, Mosbah M, Zemmari A, Sauvignac C, Faruki P (2019) Network intrusion detection for IoT security based on learning techniques. IEEE Commun Surv Tutor 21(3):2671–2701. https://doi.org/10.1109/COMST.2019.2896380

Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):1–58

Charyyev B, Gunes MH (2020) Detecting anomalous IoT traffic flow with locality sensitive hashes. In: GLOBECOM 2020–2020 IEEE global communications conference, pp 1–6. https://doi.org/10.1109/GLOBECOM42002.2020.9322559

Chatterjee S, Hanawal MK (2021) Federated learning for intrusion detection in IoT security: a hybrid ensemble approach. https://arxiv.org/abs/2106.15349v1

Chaudhary P, Gupta BB (2019) DDoS detection framework in resource constrained internet of things domain. In: 2019 IEEE 8th global conference on consumer electronics (GCCE), pp 675–678. https://doi.org/10.1109/GCCE46687.2019.9015465

Chiba Z, Abghour N, Moussaid K, Omri AE, Rida M (2019) Newest collaborative and hybrid network intrusion detection framework based on suricata and isolation forest algorithm. In: Proceedings of the 4th international conference on smart city applications, pp 1–11. https://doi.org/10.1145/3368756.3369061

Chouhan N et al (2019) Network anomaly detection using channel boosted and residual learning based deep convolutional neural network. Appl Soft Comput 83:105612. https://doi.org/10.1016/j.asoc.2019.105612

Chung Y, Haas PJ, Upfal E, Kraska T (2019a) Learning unknown examples for ML model generalization. [Cs, Stat]. http://arxiv.org/abs/1808.08294

Chung Y, Haas PJ, Upfal E, Kraska T (2019b) Unknown examples & machine learning model generalization. [Cs, Stat]. http://arxiv.org/abs/1808.08294

Cisco (2020) Cisco annual internet report (2018–2023) white paper. Cisco. https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html

Cook DJ, Greengold NL, Ellrodt AG, Weingarten SR (1997) The relation between systematic reviews and practice guidelines. Ann Intern Med 127(3):210–216. https://doi.org/10.7326/0003-4819-127-3-199708010-00006

Cui Z, Ke R, Pu Z, Wang Y (2019) Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. [Cs]. http://arxiv.org/abs/1801.02143

Das S, Venugopal D, Shiva S, Sheldon FT (2020) Empirical evaluation of the ensemble framework for feature selection in DDoS attack, pp 56–61. https://doi.org/10.1109/CSCloud-EdgeCom49738.2020.00019

Dau HA, Ciesielski V, Song A (2014) Anomaly detection using replicator neural networks trained on examples of one class. In: Dick G, Browne WN, Whigham P, Zhang M, Bui LT, Ishibuchi H, Jin Y, Li X, Shi Y, Singh P, Tan KC, Tang K (eds) Simulated evolution and learning. Springer International Publishing, Cham, pp 311–322. https://doi.org/10.1007/978-3-319-13563-2_27

Chapter Google Scholar

De Michele R, Furini M (2019) IoT healthcare: benefits, issues, and challenges. In: Proceedings of the 5th EAI international conference on smart objects and technologies for social good, pp 160–164. https://doi.org/10.1145/3342428.3342693

Dietterich TG (2017) Steps toward robust artificial intelligence. AI Mag 38(3):3–24. https://doi.org/10.1609/aimag.v38i3.2756

Duessel P, Gehl C, Flegel U, Dietrich S, Meier M (2017) Detecting zero-day attacks using context-aware anomaly detection at the application-layer. Int J Inf Secur 16(5):475–490

Engelbrecht ER, du Preez JA (2020) Learning with an augmented (unknown) class using neural networks. Sci Afr 10:e00600. https://doi.org/10.1016/j.sciaf.2020.e00600

Fei G, Liu B (2016) Breaking the closed world assumption in text classification. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 506–514. https://doi.org/10.18653/v1/N16-1061

Feng F, Liu X, Yong B, Zhou R, Zhou Q (2019a) Anomaly detection in ad-hoc networks based on deep learning model: a plug and play device. Ad Hoc Netw. https://doi.org/10.1016/j.adhoc.2018.09.014

Feng Z, Xu C, Tao D (2019b) Self-supervised representation learning from multi-domain data. In: 2019b IEEE/CVF international conference on computer vision (ICCV). https://doi.org/10.1109/ICCV.2019.00334

Fernandes Silveira FA, Lima-Filho F, Dantas Silva FS, de Medeiros Brito Junior A, Silveira LF (2020) Smart detection-IoT: a DDoS sensor system for internet of things. In: 2020 international conference on systems, signals and image processing (IWSSIP), pp 343–348. https://doi.org/10.1109/IWSSIP48289.2020.9145265

Ferrag MA, Maglaras L, Ahmim A, Derdour M, Janicke H (2020) RDTIDS: rules and decision tree-based intrusion detection system for internet-of-things networks. Futur Internet 12(3):44. https://doi.org/10.3390/fi12030044

Fotiadou K, Velivassaki T-H, Voulkidis A, Skias D, Tsekeridou S, Zahariadis T (2021) Network traffic anomaly detection via deep learning. Information 12(5):215. https://doi.org/10.3390/info12050215

Garcia S, Parmisano A, Erquiaga MJ (2020) IoT-23: a labeled dataset with malicious and benign IoT network traffic. Zenodo. https://doi.org/10.5281/zenodo.4743746

García-Teodoro P, Díaz-Verdejo J, Maciá-Fernández G, Vázquez E (2009) Anomaly-based network intrusion detection: Techniques, systems and challenges. Comp Sec 28(1):18–28. https://doi.org/10.1016/j.cose.2008.08.003

Garitano I, Uribeetxeberria R, Zurutuza U (2011) A review of SCADA anomaly detection systems. In: Soft computing models in industrial and environmental applications, 6th international conference SOCO 2011. Springer, Berlin, Heidelberg, pp 357–366

Godala S, Vaddella RPV (2020) A study on intrusion detection system in wireless sensor networks. Int J Commun Netw Inf Secur 12(1):127–41

Global new malware volume (2020) Statista. http://www.statista.com/statistics/680953/global-malware-volume/. Accessed 29 July 2021

Gogoi P, Bhattacharyya DK, Borah B, Kalita JK (2011) A survey of outlier detection methods in network anomaly identification. Comput J 54(4):570–588. https://doi.org/10.1093/comjnl/bxr026

Goldstein M, Uchida S (2016) A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE 11(4):e0152173

Hagan Memorial Library (2020) University of the Cumberlands. https://www.ucumberlands.edu/library

Hamija AR, Günther M, Boult TE (2018) Reducing network agnostophobia. [Cs]. http://arxiv.org/abs/1811.04110

Hammad M, Hewahi N, Elmedany W (2021) T-SNERF: a novel high accuracy machine learning approach for Intrusion detection systems. IET Inf Secur 15(2):178–190. https://doi.org/10.1049/ise2.12020

Hassen M, Chan PK (2020a) Learning a neural-network-based representation for open set recognition. In: Proceedings of the 2020a SIAM international conference on data mining (SDM). Society for Industrial and Applied Mathematics, pp 154–162. https://doi.org/10.1137/1.9781611976236.18

Hassen M, Chan PK (2020b) Unsupervised open set recognition using adversarial autoencoders. In: 2020b 19th IEEE international conference on machine learning and applications (ICMLA), pp 360–365. https://doi.org/10.1109/ICMLA51294.2020.00064

He S, Zhu J, He P, Lyu MR (2016) Experience report: system log analysis for anomaly detection. In 2016 IEEE 27th international symposium on software reliability engineering (ISSRE). IEEE, pp 207–218

He Z, Rezaei A, Homayoun H, Sayadi H (2022) Deep neural network and transfer learning for accurate hardware-based zero-day malware detection. In Proceedings of the Great Lakes Symposium on VLSI 2022, pp 27–32

Hindy H, Atkinson R, Tachtatzis C, Colin J-N, Bayne E, Bellekens X (2020) Utilising deep learning techniques for effective zero-day attack detection. Electronics 9(10):1684. https://doi.org/10.3390/electronics9101684

Hinnefeld JH, Cooman P, Mammo N, Deese R (2018) Evaluating fairness metrics in the presence of dataset bias. [Cs, LG]. http://arxiv.org/abs/1809.09245

Hong Z, Chen W, Huang H, Guo S, Zheng Z (2019) Multi-hop cooperative computation offloading for industrial IoT–edge–cloud computing environments. IEEE Trans Parallel Distrib Syst 30(12):2759–2774. https://doi.org/10.1109/TPDS.2019.2926979

Hwang R-H, Peng M-C, Nguyen V-L, Chang Y-L (2019) An LSTM-based deep learning approach for classifying malicious traffic at the packet level. Appl Sci 9(16):3414. https://doi.org/10.3390/app9163414

Hwang R-H, Peng M-C, Huang C-W, Lin P-C, Nguyen V-L (2020) An unsupervised deep learning model for early network traffic anomaly detection. IEEE Access 8:30387–30399. https://doi.org/10.1109/ACCESS.2020.2973023

InfoSec (2021) The cost of zero-day attack protection. https://2020infosec.com/the-cost-of-zero-day-attackprotection . Accessed 23 May 2021

Ioulianou P, Vasilakis V, Moscholios I, Logothetis M (2018) A signature-based intrusion detection system for the internet of things. Information and Communication Technology Form, AUT. https://eprints.whiterose.ac.uk/133312/

Jiang F, Fu Y, Gupta BB, Liang Y, Rho S, Lou F, Meng F, Tian Z (2020) Deep learning based multi-channel intelligent attack detection for data security. IEEE Trans Sustain Comput 5(2):204–212. https://doi.org/10.1109/TSUSC.2018.2793284

Jin Y (2019) Towards hardware-assisted security for IoT systems. In: 2019 IEEE computer society annual symposium on VLSI (ISVLSI), pp 632–637. https://doi.org/10.1109/ISVLSI.2019.00118

Jin D, Lu Y, Qin J, Cheng Z, Mao Z (2020) SwiftIDS: real-time intrusion detection system based on LightGBM and parallel intrusion detection mechanism. Comput Secur 97:101984. https://doi.org/10.1016/j.cose.2020.101984

Jo I, Kim J, Kang H, Kim Y-D, Choi S (2018) Open set recognition by regularising classifier with fake data generated by generative adversarial networks. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2686–2690. https://doi.org/10.1109/ICASSP.2018.8461700

Kelly C, Pitropakis N, McKeown S, Lambrinoudakis C (2020) Testing and hardening IoT devices against the Mirai botnet. In: 2020 international conference on cyber security and protection of digital services (cyber security), pp 1–8. https://doi.org/10.1109/CyberSecurity49315.2020.9138887

Khan AY, Latif R, Latif S, Tahir S, Batool G, Saba T (2020) Malicious insider attack detection in IoTs using data analytics. IEEE Access 8:11743–11753. https://doi.org/10.1109/ACCESS.2019.2959047

Khan AS, Ahmad Z, Abdullah J, Ahmad F (2021) A spectrogram image-based network anomaly detection system using deep convolutional neural network. IEEE Access 9:87079–87093. https://doi.org/10.1109/ACCESS.2021.3088149

Khare S, Totaro M (2020) Ensemble learning for detecting attacks and anomalies in IoT smart home. In: 2020 3rd international conference on data intelligence and security (ICDIS), pp 56–63. https://doi.org/10.1109/ICDIS50059.2020.00014

Khare N, Devan P, Chowdhary CL, Bhattacharya S, Singh G, Singh S, Yoon B (2020) SMO-DNN: spider monkey optimization and deep neural network hybrid classifier model for intrusion detection. Electronics 9(4):692. https://doi.org/10.3390/electronics9040692

Khraisat A, Gondal I, Vamplew P, Kamruzzaman J (2019) Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity 2(1):20. https://doi.org/10.1186/s42400-019-0038-7

Khraisat A, Gondal I, Vamplew P, Kamruzzaman J, Alazab A (2020) Hybrid intrusion detection system based on the stacking ensemble of C5 decision tree classifier and one class support vector machine. Electronics 9(1):173. https://doi.org/10.3390/electronics9010173

Kim JY, Bu SJ, Cho SB (2018a) Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders. Inf Sci 460:83–102

Kim T, Suh SC, Kim H, Kim J, Kim J (2018b) An encoding technique for CNN-based network anomaly detection. In: 2018b IEEE international conference on Big Data (Big Data), pp 2960–2965. https://doi.org/10.1109/BigData.2018.8622568

Kim S, Hwang C, Lee T (2020) Anomaly based unknown intrusion detection in endpoint environments. Electronics 9(6):1022. https://doi.org/10.3390/electronics9061022

Ko C (2000) Logic induction of valid behavior specifications for intrusion detection. In: Proceeding 2000 IEEE symposium on security and privacy. S P 2000, pp 142–153. https://doi.org/10.1109/SECPRI.2000.848452

Koroniotis N, Moustafa N, Sitnikova E, Turnbull B (2018) Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-IoT dataset. [Cs]. http://arxiv.org/abs/1811.00701

Koroniotis N, Moustafa N, Sitnikova E, Turnbull B (2019) Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-IoT dataset. Futur Gener Comput Syst 100:779–796. https://doi.org/10.1016/j.future.2019.05.041

Kosek AM (2016) Contextual anomaly detection for cyber-physical security in smart grids based on an artificial neural network model. In 2016 joint workshop on cyber-physical security and resilience in smart grids (CPSR-SG). IEEE, pp 1–6

Kotani G, Sekiya Y (2018) Unsupervised scanning behavior detection based on distribution of network traffic features using robust autoencoders. In: 2018 IEEE international conference on data mining workshops (ICDMW), pp 35–38. https://doi.org/10.1109/ICDMW.2018.00013

Kumar A, Lim TJ (2019) EDIMA: early detection of IoT malware network activity using machine learning techniques. [Cs]. http://arxiv.org/abs/1906.09715

Kumar S, Spafford EH (1994) An application of pattern matching in intrusion detection. Purdue University. https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=2115&context=cstech

Lai Y, Zhou K, Lin S, Lo N (2019) Flow-based anomaly detection using multilayer perceptron in software defined networks. In: 2019 42nd international convention on information and communication technology, electronics and microelectronics (MIPRO), pp 1154–1158. https://doi.org/10.23919/MIPRO.2019.8757199

Lakkaraju H, Kamar E, Caruana R, Horvitz E (2016) Discovering unknown unknowns of predictive models, p 5. http://web.stanford.edu/~himalv/unknownunknownsws.pdf

Liang X, Znati T (2019) A long short-term memory enabled framework for DDoS detection. In: 2019 IEEE global communications conference (GLOBECOM), pp 1–6. https://doi.org/10.1109/GLOBECOM38437.2019.9013450

Liu Y, Zhou Y, Wen S, Tang C (2014) A strategy on selecting performance metrics for classifier evaluation. Int J Mob Comput Multimed Commun 6:20–35. https://doi.org/10.4018/IJMCMC.2014100102

Liu J, Liu S, Zhang S (2019) Detection of IoT botnet based on deep learning. In: 2019 Chinese control conference (CCC), pp 8381–8385. https://doi.org/10.23919/ChiCC.2019.8866088

Liu Z, Li S, Zhang Y, Yun X, Cheng Z (2020) Efficient malware originated traffic classification by using generative adversarial networks. In: 2020 IEEE symposium on computers and communications (ISCC), pp 1–7. https://doi.org/10.1109/ISCC50000.2020.9219561

Liu F, Li X, Xiong W, Jiang H, Xie G (2021a) An accuracy network anomaly detection method based on ensemble model. In: ICASSP 2021a—2021a IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 8548–8552. https://doi.org/10.1109/ICASSP39728.2021.9414675

Liu Q, Hagenmeyer V, Keller HB (2021b) A review of rule learning-based intrusion detection systems and their prospects in smart grids. IEEE Access 9:57542–57564. https://doi.org/10.1109/ACCESS.2021.3071263

Lobato AGP, Lopez MA, Sanz IJ, Cardenas AA, Duarte OCMB, Pujolle G (2018) An adaptive real-time architecture for zero-day threat detection. In: 2018 IEEE international conference on communications (ICC), pp 1–6. https://doi.org/10.1109/ICC.2018.8422622

Lu X, Liu P, Lin J (2019) Network traffic anomaly detection based on information gain and deep learning. In: Proceedings of the 2019 3rd international conference on information system and data mining—ICISDM 2019, pp 11–15. https://doi.org/10.1145/3325917.3325946

Luo Y, Xiao Y, Cheng L, Peng G, Yao D (2021) Deep learning-based anomaly detection in cyber-physical systems: progress and opportunities. ACM Comput Surv 54(5):106:1-106:36. https://doi.org/10.1145/3453155

Ma L, Chai Y, Cui L, Ma D, Fu Y, Xiao A (2020) A deep learning-based DDoS detection framework for internet of things, pp 1–6. https://doi.org/10.1109/ICC40277.2020.9148944

Maurya S, Ahmad RB (2020) Cloud of things (CoT) based smart cities. In: 2020 7th international conference on computing for sustainable global development (INDIACom), pp 94–97. https://doi.org/10.23919/INDIACom49435.2020.9083697

Meidan Y, Bohadana M, Mathov Y, Mirsky Y, Breitenbacher D, Shabtai A, Elovici Y (2018) N-BaIoT: network-based detection of IoT botnet attacks using deep autoencoders. IEEE Pervasive Comput 17(3):12–22. https://doi.org/10.1109/MPRV.2018.03367731

Meira J (2018) Comparative results with unsupervised techniques in cyber attack novelty detection. Proceeedings 2(18):1191. https://doi.org/10.3390/proceedings2181191

Mergendahl S, Li J (2020) Rapid: robust and adaptive detection of distributed denial-of-service traffic from the internet of things. In: 2020 IEEE conference on communications and network security (CNS), pp 1–9. https://doi.org/10.1109/CNS48642.2020.9162278

Mohammadi M, Al-Fuqaha A, Sorour S, Guizani M (2018) Deep learning for IoT big data and streaming analytics: a survey. IEEE Commun Surv Tutor 20(4):2923–2960. https://doi.org/10.1109/COMST.2018.2844341

Mokhtari S, Abbaspour A, Yen KK, Sargolzaei A (2021) A machine learning approach for anomaly detection in industrial control systems based on measurement data. Electronics 10(4):407. https://doi.org/10.3390/electronics10040407

Mou L, Jin Z (2018) Tree-based convolutional neural networks: principles and applications. Springer, Singapore

Book Google Scholar

Moussa MM, Alazzawi L (2020) Cyber attacks detection based on deep learning for cloud-dew computing in automotive IoT applications. In: 2020 IEEE international conference on smart cloud (SmartCloud), pp 55–61. https://doi.org/10.1109/SmartCloud49737.2020.00019

Moustafa N, Slay J (2015) UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 military communications and information systems conference (MilCIS). https://doi.org/10.1109/MilCIS.2015.7348942

Mu X, Ting KM, Zhou Z-H (2017) Classification under streaming emerging new classes: a solution using completely-random trees. IEEE Trans Knowl Data Eng 29(8):1605–1618. https://doi.org/10.1109/TKDE.2017.2691702

Mutombo VK, Lee Y, Kim H, Kim Y, Debska NW, Hong J (2020) Smart transportation platform for private transportation. In: Proceedings of the 35th annual ACM symposium on applied computing, pp 1920–1927. https://doi.org/10.1145/3341105.3374043

Nagisetty A, Gupta GP (2019) Framework for detection of malicious activities in IoT networks using keras deep learning library. In: 2019 3rd international conference on computing methodologies and communication (ICCMC), pp 633–637. https://doi.org/10.1109/ICCMC.2019.8819688

Narla SRK, Stowell HG (2019) Connected and automated vehicles. Inst Transport Eng ITE J 89(3):28–33

Narudin FA, Feizollah A, Anuar NB, Gani A (2016) Evaluation of machine learning classifiers for mobile malware detection. Soft Comput 20(1):343–357. https://doi.org/10.1007/s00500-014-1511-6

Naveed K, Wu H (2020) Poster: a semi-supervised framework to detect botnets in IoT devices. In: 2020 IFIP networking conference (networking), pp 649–651

Nawaratne R, Alahakoon D, De Silva D, Yu X (2020) Spatiotemporal anomaly detection using deep learning for real-time video surveillance. IEEE Trans Ind Inf 16(1):393–402. https://doi.org/10.1109/TII.2019.2938527

Neuschmied H, Winter M, Stojanović B, Hofer-Schmitz K, Božić J, Kleb U (2022) APT-attack detection based on multi-stage autoencoders. Appl Sci 12(13):6816

Ng W, Minasny B, de Sousa Mendes W, Demattê JAM (2019) Estimation of effective calibration sample size using visible near infrared spectroscopy: deep learning vs machine learning. Soil. https://doi.org/10.5194/soil-2019-48

NSL-KDD Datasets (2009) https://www.unb.ca/cic/datasets/nsl.html

Osterweil E, Stavrou A, Zhang L (2019) 20 years of DDoS: a call to action. [Cs]. http://arxiv.org/abs/1904.02739

Otoum Y, Liu D, Nayak A (2019) DL-IDS: a deep learning–based intrusion detection framework for securing IoT. Trans Emerg Telecommun Technol. https://doi.org/10.1002/ett.3803

Pan Y, An J, Fan W, Huang W (2019) Shellfier: a shellcode detection method based on dynamic binary instrumentation and convolutional neural network. In: Proceedings of the 2019 8th international conference on software and computer applications, pp 462–466. https://doi.org/10.1145/3316615.3316731

Pang G, Shen C, Cao L, Hengel AVD (2021) Deep learning for anomaly detection: a review. ACM Comput Surv 54(2):38:1-38:38. https://doi.org/10.1145/3439950

Pérez-Díaz JA, Valdovinos IA, Choo K-KR, Zhu D (2020) A flexible SDN-based architecture for identifying and mitigating low-rate DDoS attacks using machine learning. IEEE Access 8:155859–155872. https://doi.org/10.1109/ACCESS.2020.3019330

Qureshi A-U-H, Larijani H, Mtetwa N, Javed A, Ahmad J (2019) RNN-ABC: a new swarm optimization based technique for anomaly detection. Computers 8(3):59. https://doi.org/10.3390/computers8030059

Qureshi AS, Khan A, Shamim N, Durad MH (2020a) Intrusion detection using deep sparse auto-encoder and self-taught learning. Neural Comput Appl 32(8):3135–3147. https://doi.org/10.1007/s00521-019-04152-6

Qureshi A-U-H, Larijani H, Mtetwa N, Yousefi M, Javed A (2020b) An adversarial attack detection paradigm with swarm optimization. In: 2020b international joint conference on neural networks (IJCNN), pp 1–7. https://doi.org/10.1109/IJCNN48605.2020.9207627

Rafique MF, Ali M, Qureshi AS, Khan A, Mirza AM (2020) Malware classification using deep learning based feature extraction and wrapper based feature selection technique. arXiv. https://doi.org/10.48550/arXiv.1910.10958

Rahman SA, Tout H, Talhi C, Mourad A (2020) Internet of things intrusion detection: centralized, on-device, or federated learning? IEEE Netw 34(6):310–317. https://doi.org/10.1109/MNET.011.2000286

Rashid MM, Kamruzzaman J, Hassan MM, Imam T, Gordon S (2020) Cyberattacks detection in IoT-based smart city applications using machine learning techniques. Int J Environ Res Public Health 17(24):9347. https://doi.org/10.3390/ijerph17249347

Ring M, Wunderlich S, Scheuring D, Landes D, Hotho A (2019) A survey of network-based intrusion detection data sets. Comput Secur 86:147–167. https://doi.org/10.1016/j.cose.2019.06.005

Rivero J, Ribeiro B, Chen N, Leite FS (2017) A Grassmannian approach to zero-shot learning for network intrusion detection. In: Liu D, Xie S, Li Y, Zhao D, El-Alfy E-SM (eds) Neural information processing. Springer International Publishing, Cham, pp 565–575. https://doi.org/10.1007/978-3-319-70087-8_59

Rodríguez E, Valls P, Otero B, Costa JJ, Verdú J, Pajuelo MA, Canal R (2022) Transfer-learning-based intrusion detection framework in IoT networks. Sensors 22(15):5621

Roopak M, Tian GY, Chambers J (2019) Deep learning models for cyber security in IoT networks 0452–0457. https://doi.org/10.1109/CCWC.2019.8666588

Roopak M, Tian GY, Chambers J (2020) An intrusion detection system against DDoS attacks in IoT networks. In: 2020 10th annual computing and communication workshop and conference (CCWC), pp 0562–0567. https://doi.org/10.1109/CCWC47524.2020.9031206

Sabeel U, Heydari SS, Elgazzar K, El-Khatib K (2021) Building an intrusion detection system to detect atypical cyberattack flows. IEEE Access 9:94352–94370. https://doi.org/10.1109/ACCESS.2021.3093830

Said Elsayed M, Le-Khac N-A, Dev S, Jurcut AD (2020) Network anomaly detection using LSTM based autoencoder. In: Proceedings of the 16th ACM symposium on QoS and security for wireless and mobile networks, pp 37–45. https://doi.org/10.1145/3416013.3426457

Sameera N, Shashi M (2020) Deep transductive transfer learning framework for zero-day attack detection. ICT Express 6(4):361–367

Samy A, Yu H, Zhang H (2020) Fog-based attack detection framework for internet of things using deep learning. IEEE Access 8:74571–74585. https://doi.org/10.1109/ACCESS.2020.2988854

Sarhan M, Layeghy S, Gallagher M, Portmann M (2021) From zero-shot machine learning to zero-day attack detection. arXiv preprint. https://arxiv.org/abs/2109.14868

Sarker IH, Shahriar B, Watters P, Ng A (2020) Cybersecurity data science: an overview from machine learning perspective. J Big Data. https://doi.org/10.1186/s40537-020-00318-5

Scheirer WJ, de Rezende Rocha A, Sapkota A, Boult TE (2013) Toward open set recognition. IEEE Trans Pattern Anal Mach Intell 35(7):1757–1772. https://doi.org/10.1109/TPAMI.2012.256

Scheirer WJ, Jain LP, Boult TE (2014) Probability models for open set recognition. IEEE Trans Pattern Anal Mach Intell 36(11):2317–2324. https://doi.org/10.1109/TPAMI.2014.2321392

Schlachter P, Liao Y, Yang B (2019) Deep one-class classification using intra-class splitting. In: 2019 IEEE data science workshop (DSW), pp 100–104. https://doi.org/10.1109/DSW.2019.8755576

Schlachter P, Liao Y, Yang B (2020) Deep open set recognition using dynamic intra-class splitting. SN Comput Sci 1(2):77. https://doi.org/10.1007/s42979-020-0086-9

Sharafaldin I, Habibi Lashkari A, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th international conference on information systems security and privacy, pp 108–116. https://doi.org/10.5220/0006639801080116

Sharma B, Pokharel P, Joshi B (2020) User behavior analytics for anomaly detection using LSTM autoencoder—insider threat detection. In: Proceedings of the 11th international conference on advances in information technology, pp 1–9. https://doi.org/10.1145/3406601.3406610

Singla A, Bertino E, Verma D (2019) Overcoming the lack of labeled data: training intrusion detection models using transfer learning. In: 2019 IEEE international conference on smart computing (SMARTCOMP). IEEE, pp 69–74

Smys S, Basar D, Wang D (2020) Hybrid intrusion detection system for internet of things (IoT). J ISMAC 2:190–199. https://doi.org/10.36548/jismac.2020.4.002

Soe YN, Santosa PI, Hartanto R (2019) DDoS attack detection based on simple ANN with SMOTE for IoT environment, pp 1–5. https://doi.org/10.1109/ICIC47613.2019.8985853

Stoian N-A (2020) Machine learning for anomaly detection in IoT networks: malware analysis on the IoT-23 Data set. 10. http://purl.utwente.nl/essays/81979

Strubell E, Ganesh A, McCallum A (2019) Energy and policy considerations for deep learning in NLP. [Cs]. http://arxiv.org/abs/1906.02243

Sun X, Dai J, Liu P, Singhal A, Yen J (2018) Using Bayesian networks for probabilistic identification of zero-day attack paths. IEEE Trans Inf Forensics Secur 13:2506–2521

Sung F, Yang Y, Zhang L, Xiang T, Torr PH, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1199–1208.

Syarif I, Prugel-Bennett A, Wills G (2012) Unsupervised clustering approach for network anomaly detection. In: International conference on networked digital technologies. Springer, Berlin, Heidelberg, pp 135–145

Takahashi Y, Shima S, Tanabe R, Yoshioka K (2020) APTGen: an approach towards generating practical dataset labelled with targeted attack sequences. In: 13th {USENIX} workshop on cyber security experimentation and test ({CSET} 20). https://www.usenix.org/conference/cset20/presentation/takahashi

Tao H, Bhuiyan MZA, Abdalla AN, Hassan MM, Zain JM, Hayajneh T (2019) Secured data collection with hardware-based ciphers for IoT-based healthcare. IEEE Internet Things J 6(1):410–420. https://doi.org/10.1109/JIOT.2018.2854714

Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP 99 data set, pp 1–6. https://doi.org/10.1109/CISDA.2009.5356528

Thamilarasu G, Chawla S (2019) Towards deep-learning-driven intrusion detection for the internet of things. Sensors 19(9):1977. https://doi.org/10.3390/s19091977

Toward developing a systematic approach to generate benchmark datasets for intrusion detection—ScienceDirect (n.d.) https://www.sciencedirect.com/science/article/pii/S0167404811001672 . Accessed 26 Aug 2021

Umer MA, Junejo KN, Jilani MT, Mathur AP (2022) Machine learning for intrusion detection in industrial control systems: applications, challenges, and recommendations. Int J Crit Infrastruct Prot 38 https://doi.org/10.1016/j.ijcip.2022.100516

Van CN, Phan VA, Cao VL, Nguyen KDT (2020) IoT malware detection based on latent representation. In: 2020 12th international conference on knowledge and systems engineering (KSE), pp 177–182. https://doi.org/10.1109/KSE50997.2020.9287373

Vanerio J, Casas P (2017) Ensemble-learning approaches for network security and anomaly detection. In: Proceedings of the workshop on big data analytics and machine learning for data communication networks, pp 1–6. https://doi.org/10.1145/3098593.3098594

Viegas E, Santin A, Abreu V, Oliveira LS (2018) Enabling anomaly-based intrusion detection through model generalization. In: 2018 IEEE symposium on computers and communications (ISCC), pp 00934–00939. https://doi.org/10.1109/ISCC.2018.8538524

Wang W, Zhu M, Wang J, Zeng X, Yang Z (2017a) End-to-end encrypted traffic classification with one-dimensional convolution neural networks. In: 2017a IEEE international conference on intelligence and security informatics (ISI), pp 43–48. https://doi.org/10.1109/ISI.2017.8004872

Wang W, Zhu M, Zeng X, Ye X, Sheng Y (2017b) Malware traffic classification using convolutional neural network for representation learning. In: 2017b international conference on information networking (ICOIN), pp 712–717. https://doi.org/10.1109/ICOIN.2017.7899588

Wang H, Yang J, Lu Y (2020) A logical combination based application layer intrusion detection model. In: Proceedings of the 2020 international conference on cyberspace innovation of advanced technologies, pp 310–316. https://doi.org/10.1145/3444370.3444590

Xie W, Xu S, Zou S, Xi J (2020) A system-call behavior language system for malware detection using a sensitivity-based LSTM Model. In: Proceedings of the 2020 3rd international conference on computer science and software engineering, pp 112–118. https://doi.org/10.1145/3403746.3403914

Xue B, Fu W, Zhang M (2014) Multi-objective feature selection in classification: a differential evolution approach. Simul Evol Learn. https://doi.org/10.1007/978-3-319-13563-2_44

Yang Y, Zheng K, Wu B, Yang Y, Wang X (2020) Network intrusion detection based on supervised adversarial variational auto-encoder with regularization. IEEE Access 8:42169–42184. https://doi.org/10.1109/ACCESS.2020.2977007

Yang J, Li H, Shao S, Zou F, Wu Y (2022) FS-IDS: a framework for intrusion detection based on few-shot learning. Comput Secur 122:102899

Yichao Z, Tianyang Z, Xiaoyue G, Qingxian W (2019) An improved attack path discovery algorithm through compact graph planning. IEEE Access 7:59346–59356

Yu Y, Long J, Cai Z (2017) Network intrusion detection through stacking dilated convolutional autoencoders. Secur Commun Netw 2017:e4184196. https://doi.org/10.1155/2017/4184196

Yu X, Lu H, Yang X, Chen Y, Song H, Li J, Shi W (2020) An adaptive method based on contextual anomaly detection in internet of things through wireless sensor networks. Int J Distrib Sens Netw 16(5):1550147720920478

Zahoora U, Khan A, Rajarajan M, Khan SH, Asam M, Jamal T (2022a) Ransomware detection using deep learning based unsupervised feature extraction and a cost sensitive Pareto Ensemble classifier. Sci Rep 12(1):15647. https://doi.org/10.1038/s41598-022-19443-7

Zahoora U, Rajarajan M, Pan Z, Khan A (2022b) Zero-day ransomware attack detection using deep contractive autoencoder and voting based ensemble classifier. Appl Intell 52(12):13941–13960. https://doi.org/10.1007/s10489-022-03244-6

Zavrak S, İskefiyeli M (2020) Anomaly-based intrusion detection from network flow features using variational autoencoder. IEEE Access 8:108346–108358. https://doi.org/10.1109/ACCESS.2020.3001350

Zhang Z, Liu Q, Qiu S, Zhou S, Zhang C (2020) Unknown attack detection based on zero-shot learning. IEEE Access 8:193981–193991. https://doi.org/10.1109/ACCESS.2020.3033494

Zhao J, Shetty S, Pan JW, Kamhoua C, Kwiat K (2019) Transfer learning for detecting unknown network attacks. EURASIP J Inf Secur 2019(1):1–13

Zong Y, Huang G (2019) A feature dimension reduction technology for predicting DDoS intrusion behavior in multimedia internet of things. Multimed Tools Appl. https://doi.org/10.1007/s11042-019-7591-7

Zoppi T, Ceccarelli A, Capecchi T, Bondavalli A (2021) Unsupervised anomaly detectors to detect intrusions in the current threat landscape. ACM/IMS Trans Data Sci 2(2):1–26

Zou M, Wang C, Li F, Song W (2018) Network phenotyping for network traffic classification and anomaly detection. In: 2018 IEEE international symposium on technologies for homeland security (HST), pp 1–6. https://doi.org/10.1109/THS.2018.8574178

Zou J, Zhang J, Jiang P (2019) Credit card fraud detection using autoencoder neural network. [Cs, Stat]. http://arxiv.org/abs/1908.11553

Download references

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and affiliations.

University of the Cumberlands, 6178 College Station Drive, Williamsburg, KY, 40769, USA

Rasheed Ahmad & Wasim Alhamdani

Texas A&M University, San Antonio, One University Way, San Antonio, TX, 78224, USA

Izzat Alsmadi

Jordan University of Science and Technology, Irbid, 22110, Jordan

Lo’ai Tawalbeh

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rasheed Ahmad .

Ethics declarations

Competing interests.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1: list of included and excluded studies

Table 14 presents the list of papers included and excluded in this research review. The columns “QA1–QA6” show the quality assessment score after the quality criteria identified in Sect. 5.2.4 are applied. Results are an aggregated answer to the six QA scores. The last column reflects “I-Included” and “E-Excluded” studies from this review based on QA results. Studies with over 50% (> 3) are included in this SLR; otherwise, they were excluded. It is important to mention here that any study not answering QA1 (i.e., Does the study address zero-day attack detection?) defeats the purpose of this SLR so that it will be excluded from further analysis.

Appendix 2: data extraction form and details

Table 15 presents the details of unknown attack detection research papers included in this study.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Ahmad, R., Alsmadi, I., Alhamdani, W. et al. Zero-day attack detection: a systematic literature review. Artif Intell Rev 56 , 10733–10811 (2023). https://doi.org/10.1007/s10462-023-10437-z

Download citation

Accepted : 10 February 2023

Published : 27 February 2023

Issue Date : October 2023

DOI : https://doi.org/10.1007/s10462-023-10437-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Zero-day attacks
Unknown attacks
Anomaly detection
Intrusion detection
Closed and open set recognition
Find a journal
Publish with us
Track your research

Open access
Published: 12 March 2020

Analysis of computer network attack based on the virus propagation model

Yanshan He 1 ,
Ting Wang 1 ,
Jianli Xie 1 &
Ming Zhang 1

EURASIP Journal on Wireless Communications and Networking volume 2020 , Article number: 63 ( 2020 ) Cite this article

2712 Accesses

2 Citations

Metrics details

The conventional method can make a reasonable analysis of common network attacks, but the reliability of the analysis is low under the virus propagation model. This paper proposes a new research method of computer network attack analysis based on the virus propagation model. Based on the relationship between the framework and daemon, the framework of the model for computer network attack analysis is set up, the attack analysis technology of computer network is determined, and the construction of the model for computer network attack analysis is completed. Computer attack objects and computer attack process are analyzed, and computer network attack analysis is carried out. Using the coverage test and the uncertainty test, the parameters of the reliability calculation variables are measured and the reliability calculation formula is replaced. It is concluded that the designed method of computer network attack analysis is 47.15% more reliable than the conventional analysis method, and is suitable for the network attack analysis under the virus propagation model.

1 Introduction

In the conventional method of computer network attack analysis, the data end analysis method is used, which can make a reasonable analysis of the common network attack, but for the network attack under the virus propagation model, because of the model limitation, there is a low analysis reliability [ 1 , 2 ]. Therefore, an analysis of computer network attack based on the virus propagation model is proposed. Depending on the framework daemon relationship, the frame analysis unit is defined, and the mutual instructions and file relationships are clarified. The framework of the model for computer network attack analysis is set up to determine password attack analysis, denial of service attack analysis, buffer overflow attack analysis, data-driven attack analysis, forged information attack analysis, and address resolution protocol (ARP) attack analysis technology so that the analysis of computer attack objects and the process of computer attack are implemented. The analysis and research of computer network attack based on the virus propagation model are completed. In order to ensure the effectiveness of the designed method of computer network attack analysis, the network attack test environment under the virus propagation model is simulated, and two different methods of computer network attack analysis are used to carry out the reliability simulation test of computer network attack analysis. The experimental results show that the method of computer network attack analysis is very effective.

The rest of this paper is organized as follows. Section 2 discusses the methods, followed by implementation of computer network attack analysis in Section 3. Experimental simulation is discussed in Section 4. Section 5 concludes the paper with summary and future research directions.

2.1 To build the framework of the model for computer network attack analysis

The framework of the model for the computer network attack analysis is to analyze the network attacks in the virus propagation model by establishing the relationship function of the framework daemon, so as to determine the structure of the framework analysis unit, and to clarify the mutual instructions and file relations of each analysis unit.

Computer network attack under the virus propagation model adopts client/server mode to initiate computer network attack. The principle of network client/service mode is that a host provides service (server side) and another host receives service (client). A server as a host usually calls a default port and listens. If a client has a connection request on the port of the server, the corresponding program on the server will run automatically to respond to the client’s request. This program, called the daemon (which was originally a term for UNIX, has been transplanted to the Microsoft Corp system) [ 3 ], and after the two machines were connected successfully, the network began to control and be controlled. The end to be controlled becomes a server, and the control end is a client. Its daemon relationship function is shown in formula 1.

In which P ok is the daemon of the program, γ i is the state coefficient of program’s sub-process, h i is the new meeting words in the sub-process, q k is the reset file to create the mask, and K o is the security factor.

The network attack under the virus propagation model usually needs to install the server program on the controlled computer and install the client program on the master computer [ 4 ]. The entire control process is generally first to execute the client program on the master computer, and then the client program will send a connection request to the server program in the controlled end computer and establish a network connection. Then the client program can control the computer to be controlled through the server-side program and can carry out various kinds of network attacks and control. It mainly includes modifying the setup of the controlled terminal system, obtaining the list of the target computer process, closing or restarting the operating system in the controlled terminal computer, activating the goods to discontinue various processes on the controlled end, recording and extracting the remote keyboard events, and managing the files and folders of the controlled computer.

Therefore, the analysis unit in the framework can be divided into two parts: one is the server analysis unit, and the other is the client analysis unit. The analysis unit in the framework mainly analyzes the path information among the client, the communication side and the server side [ 5 ]. The analysis unit structure includes analysis unit instructions and analysis files.

Analysis unit instruction refers to a string of characters sent by the analysis unit on the client side to the server to perform some function. The return instruction includes acknowledgement information and error message. It is a string of characters sent back from the server to the client. An analysis file is a file that analyzes the transmission between the client and the server, that is, the files to upload and download at both ends. In order not to confuse instructions with files, we define an internal protocol to distinguish them. The format of the instruction we define is as follows: a multi-bit string type, in which the first three bits are numeric labeled bits to indicate the function represented by this command, from the fourth bits, it represents different structures from the first three bits, which can be represented as the path of the disk or folder, the structure of the file (file path, file name and file size), process name, specific string, etc. The instructions and files have the following relationship [ 6 ]:

in which Z p is the program instruction, σ o is the string mask law, γ i is the state coefficient program’s sub-process, h i is the new meeting words in the sub-process, q k is the reset file to create the mask, and W 0 represents the file attributes which are transmitted between the client and the server.

Since the client is connected to the server side, the server side initiatively transfers the list of drivers of server side computers to the client, so we do not define to view the command format of the list of drivers [ 7 ]. For file transfer, we do not define the corresponding format, but only transmit it from the first byte in turn after receiving the corresponding instruction. The following table lists the file interfaces that we define, and the corresponding path relations between the analysis instructions and the files are shown in Table 1 .

The file path in the file structure of the upload file is the path for the client to place the file on the server. The name of the file is the file name established on the server side [ 8 ]. The file path in the file structure of the downloaded file is the location of the file that the client wants to download on the server side

2.2 Analysis of computer network attack

The computer network attacks under the virus propagation model include password attack analysis, denial of service attack analysis, buffer overflow attack analysis, data-driven attack analysis, forged information attack analysis, and ARP attack analysis technology. The password attack analysis technology is the simplest and most direct attack analysis technology under the virus propagation model. Password attack analysis is an attack technique used by an attacker in view of other population orders, and the attacker often attacks the user’s password as the start of the attack when it attacks the target. As long as an attacker can guess or determine the user’s password, he can get access to a machine or network and can access any resource that the user can access. If this user has domain administrator or root user permissions, this is extremely dangerous [ 9 ].

Password attacks mainly include dictionary generation, password interception and deception, and non-technical means. The dictionary generation attack uses a word library to generate passwords, which contain a number of word roots that can form a password to generate a guessing password under the rules. Because the choice of roots is based on people’s habit of making passwords, they are obtained after a lot of statistics. The non-technical means in password attack mainly refers to the compilation rules of passwords obtained by non-information means, which satisfies the following rule [ 10 ]:

in which k is the non-technical means coefficient in the password attack, K o is the security factor, q j is the query recognition rate of finger command of the target host, and S x , k is the number of bytes in the directory query service within the k time period. \( {R}_{kk}^{\hbox{'}} \) is the effective byte rate obtained by k time password attack. b j stands for the difficulty of setting password. ΔG j represents password complexity. Password attack objects mainly include Linksys, MikroTik, NETGEAR, and TP-Link routers used in the small and home office (SOHO) and QNAP network additional storage (NAS) devices, which have not been found to be infected by other network equipment suppliers [ 11 ].

A denial of service attack is a simulation of a denial of service attack. First, an attacker wants to stop the target machine from providing service. It is one of the hacker’s commonly used attacks. In fact, the consumption attack on the network bandwidth is only a small part of the denial of service attack. As long as it can cause trouble to the target, some services are suspended or even the host is dead, all of which belong to the denial of service attack [ 12 ]. The problem of the denial of service attacks has not been properly solved. The reason is the security defects of the network protocol itself, and the denial of service attack has also become the ultimate technique of the attacker. The attacker performs a denial of service attack; in fact, it enables the server to achieve two effects: one is to force the server’s buffer to be full and not to receive a new request; another is to use IP deception to force the server to reset the connection of the illegal user and to affect the connection of the legitimate user.

Connectivity attack refers to the use of a large number of connection requests to impact the computer, making all available operating system resources depleted, and eventually the computer cannot rehandle the request of the legitimate user. Common attack means are synchronous flood, WinNuke, PNG of death, Echl attack, ICMP/SMURF, Finger bomb, Land attack, Ping flood, Rwhod, tearDrop, TARGA3, UDP attack, OOB, and so on [ 13 ].

Buffer overflow attacks mean that when the computer fills the buffer with the number of bits more than the capacity of the buffer itself, the overflow data is covered on the legitimate data. Ideally, the program will check the length of the data and do not allow the input of characters that exceed the buffer length. But most programs assume that the length of data is always matched with the allocated storage space, which is a hidden danger for buffer overflow. The buffer zone used by the operating system, also known as the “stack”. Between operations, instructions are temporarily stored in the stack, and the stack also has buffer overflow [ 14 ].

By writing the content beyond its length to the buffer of the program, it causes the overflow of the buffer, which destroys the stack of the program, and makes the program to execute other instructions, so as to achieve the purpose of the attack. The reason for the buffer overflow is that the parameters entered by the user are not carefully checked in the program. For example, it is as the following program [ 15 ]:

void function(char *str)

char buffe r[ 16 ]; strcpy(buffer,str);

The strcpy () above will directly transform the content from str to copy into the buffer. So long as the length of str is greater than 16, it will cause buffer overflow and cause the program to run wrong. There are standard functions for problems like strcpy, strcat (), sprintf (), vsprintf (), gets (), scanf (), and so on.

Of course, filling anything in the buffer zone will cause it to overflow. Generally, there will only be segmentation fault, which cannot achieve the purpose of attack. The most common way is to create a buffer overflow to enable the program to run a user shell and execute other commands through shell. If the program belongs to root and has suid permissions, the attacker gets a shell with root permissions and can operate any on the system [ 16 ].

The reason why buffer overflow attacks become a common security attack is that overflow vulnerabilities in buffer zone are common and easy to implement. Moreover, the main reason for a buffer overflow to become a remote attack is that the overflow vulnerability gives the attacker everything he wants: implants and executes the attack code. The embedded attack code runs a program having overflow vulnerabilities with certain permissions to get the control of the attacked host.

Overflow vulnerabilities and attacks in buffer zone take many forms. Accordingly, defense means vary with different attack methods, including effective defense measures for each type of attack.

The counterfeit user attacks the deception gateway, and host A imitates host B to send a forged ARP message to the gateway, causing the gateway’s ARP table to record the wrong address mapping relationship of host B, so that the normal data message cannot be correctly received by host B. The false ARP message is sent to host C by cheating other host A and phishing host B, causing the host C’s ARP table to record the wrong address mapping relationship of host B, so that the normal data message cannot be correctly received by host B [ 17 ].

The attack of ARP is to achieve ARP deception by forging IP addresses and MAC addresses, which can generate a large amount of ARP traffic in the network to block the network. An attacker can change the IP-MAC entry in the target host’s ARP cache by making a continuous delivery of a forged ARP response packet, resulting in a network interruption or a middleman attack [ 18 ].

ARP attacks mainly exist in the LAN network. If a computer is infected with an ARP Trojan in the LAN, the system that infects the ARP Trojan will try to intercept the communication information of other computers in the network by means of “ARP deception” and thus cause the communication failure of the other computers in the network [ 19 ].

The attacker sent a forged ARP response to computer A and tell computer A that the MAC address corresponding to computer B’s IP address 192.168.0.2 is 00-aa-00-62-c6-03, and computer A believes and writes the corresponding relationship into its own ARP caching table. When the data is sent later, the data that should have been sent to computer B are sent to the attacker. Similarly, the attacker sends a fake ARP response to computer B and tells computer B that the MAC address corresponding to computer B’s IP address 192.168.0.2 is 00-aa-00-62-c6-03, and then computer B will also send the data to the attacker [ 20 ].

At this point, the attacker controls the traffic between computer A and computer B. He can choose to monitor the traffic passively, get the password and other secret information, and can also forge the data and change the communication content between computer A and computer B [ 21 , 22 ].

In order to solve the problem of ARP attack, 802.1x protocol can be configured on the switches in the network.

IEEE 802.1x is a port-based access control protocol, which authenticates and authorizes users connected to switches. After configuring the 802.1x protocol on a switch, an attacker needs to authenticate when connecting a switch (combined with MAC, port, account, VLAN, password, etc.), and only by authentication, it can be sent to the network. The attacker cannot send forged ARP messages to the network without authentication [ 23 ].

Based on the analysis of the computer network attack principle under the virus propagation model, the computer network attack interface is built, and the computer network attacks under the virus propagation model include password attack, denial of service attack, buffer overflow attack, data-driven attack, forged information attack, and ARP attack, to implement the construction of the computer network attack analysis model [ 24 ].

3 Implementation of computer network attack analysis

3.1 analysis of computer network attack objects.

Based on the construction of the computer network attack analysis model, we divide the network attack into three modules: the client module, the server module, and the communication module. Therefore, it first establishes three classes: client, server, and message. To define the class needed for development, and instantiate the class, the object of computer network attack can be gotten, which mainly includes client, server, message, list, disk, folder, file, process, HTTP packet, server request message, and client response message [ 25 ].

3.2 Analysis of the process of computer network attack

The unusual complexity of the invasion process leads to various kinds of intrusion, and the characteristics of the intrusion are different. It is obvious that it is difficult to use a unified formal model to describe the intrusion. The finite state automata are used to describe some typical intrusion processes and try to find out their characteristics in order to seek a formal description of the various intrusion processes.

For the automaton model M = ( Q , sigma, F , S , Z ) corresponding to different intrusion processes, the system state Q may be described with different objects, which can be made up of the state of one or a few monitored hosts and can also be described by the process state running within the host. Transformation condition set, also known as transformation function, is a function cluster, including attack function class, communication function class, and feature judgment function class. In the process of attack, each function is instantiated gradually, and the system changes from one state to another. For different intrusion processes, the specific meaning of each element in M may be different. In the process of using the model, through the analysis and audit of all the captured data packets and log data, the characteristic parameters are obtained to determine whether there is a system anomaly or an intrusion behavior. In either case, a state transition diagram will be obtained from the initial state to the end state. The state of the system can be known by judging the state of termination.

The TCP protocol in the Internet is a connected protocol. When two network nodes communicate, they first need to connect through the three handshake signals. When host A wants to access the resources of server B, host A first establishes a connection with server B. Firstly, host A sends a connection request with a SYI\ flag to server B. The packet contains the initial serial number of host A. After receiving the SYI\ package, server B changes the state to SYN RCVL1 and assigns the required data structure for the connection. Then server B sends the confirmation packet with the SYN/ACK flag to the host A. It contains the initial serial number of server B and clearly confirms that the serial number ACK is x + 1, which is in the so-called semi connection state. Host A receives the SYN/ACK packet and then sends the ACK packet to server B, at this time the ACK confirmation number is y + 1; server B receives the confirmation packet after the state turns to be established, and the connection is completed. In this way, host A establishes a connection with server B, and then they can communicate through the link.

The above is the case of the normal establishment of a connection for the TCP protocol. However, if server B sends the SYN/ACK packets to host A without the response of host A for a long time, server B will have to wait for quite a long time. If such a half connection is too large, it is likely to consume the resources (such as buffer) that server B is used to establish a connection. Once the system resources are exhausted, the normal connection request to server B will not respond.

The specific process of network attack is as follows: attacker/intruder forges one or more nonexistent host C and sends a large number of connection requests to server B. Because the forged host does not exist, each connection request server B is waiting for a period of time because of receiving no confirmation information from the connection; a large number of connection requests in a semi connected state appear in a short time, which quickly depletes the related system resources of the server B, making the normal connection request unable to respond. It leads to a denial of service attack. Next, a finite state automaton is used to describe the network attack process.

Set M = ( Q , sigma, F , S , Z ). Supposing M =( Q, Σ, F, S, Z ), where, q ϵ Q , and q = (intruder status serves status system status). Intruder status is the state of the attacker, and its fetching range is {listen, faked, SYN SENT, ACK., SENT, failed established}; semen status is the state of server B, and its value range is {listen, SYN, RCVD, SYIN, ACK, SENT, ACK, RCVD, blocked, established}. System status indicates whether the system intrusion occurs. Its value range is {false, true}, and when system status = hue, it indicates that intrusion occurs. Σ is a set of transformation functions, including attack function, communication function, and test function. Specifically,

E0 : fake()

E1 : Communication(Res host Des host SYN-ISN, 0)

F2 : Communication (Res host Des host SYN-ISN, ACK-ISN)

E3 : Tcp_resource_used_out()

where function E0 is used to forge a nonexistent host randomly. Function E1 is used to issue a SYN request packet to the server and SYN-ISN is the serial number sent. Function E2 is used to send the SYN-ACK reply packet to the server sending the connection request, and SYN-ISN and ACK-1SN are the sending and confirming serial number. The function E is used to determine whether the TCP connection resource on the server is running out. If it is used up, it will return to “true”; otherwise, it will return to “false.” The states of the figure are as follows:

S0 =(listen, listen, false), S1 =(&ked, listen, false), S2 =(SYN, SENT, SYN, RCVLI false), S3 =(failed, SYN-ACK, SENT, false), S4 =(listen, blocked, true);

S0 is the initial state of the model. The attacker enters the S1 state by forging a nonexistent host, and the attacker is faked. Then the host tries to establish a connection with server B, and the model enters state S2 . The server responds to a request to establish a connection, and the model enters state S3 , but the attacker is failed since the attacker cannot receive the SYN-ACK packet because of forgery of the nonexisting host. Finally, the model determines whether the TCP connection resources of the system are depleted. If there is no exhaustion, the model returns to the initial state; otherwise, it will enter the termination state S4 to achieve the analysis of the attack process.

4 Experimental simulation

In order to ensure the effectiveness of the computer network attack analysis and research based on the virus propagation model, simulation analysis is carried out. In the process of the experiment, the network attack under different virus propagation models is taken as the test object, and the reliability simulation test for computer network attack analysis is carried out. The different virus types and modes of network attack under the virus propagation model are simulated. In order to ensure the validity of the experiment, the conventional method of computer network attack analysis is used as the comparison object. The results of the two simulation experiments are compared, the experimental data are presented in the same data chart, and the conclusions of the test are obtained by analyzing the reliability calculation.

4.1 Preparation of the experimental data

In order to ensure the accuracy of the simulation test process, the test parameters are set firstly. In this paper, the experiment process is simulated, and the network attack under different virus propagation models is used as the test object. Using the two different methods of computer network attack analysis, the reliability simulation test of computer network attack analysis is carried out, and the simulation test results are analyzed. Because the analytical results obtained by the different methods are different from those of analysis, it is necessary to ensure the consistency of test environment parameters in the test process. The results of the test data set in this article are shown in Table 2 .

4.2 Design of the test process

In the virus propagation model, the client of the network attack uses the active port, and the server uses the passive port. When the connection is to be established, the server opens a default port and enters the monitoring state. The client puts forward connection requests to the port to the server. The server regularly reads requests from HTTP protocol and initiatively connects them. The client’s listening port is generally open at 80, and the 80 port is a port dedicated to the HTTP protocol. In order not to let the server’s firewall find, we can also use a port number greater than 1024.

Because of our testing environment and some of the limitations of the software, it cannot be widely tested on the Internet or in the military network, but only through several computers using a hub to form a LAN for testing. On this LAN, firstly, a fully shared folder (files can be readable and written and deleted) is built in the FTP server as a springboard computer.

Then, when the client is open, the program takes the active IP address of the client and the open port number, and then sets up a file in the shared folder of the FTP server. The file is named ip, the type is.txt, and the format content is as follows:

IP:10.131.1.130

Finally, after the server is started, it first connects to the computer as a springboard, and then reads the shared folders on the FTP server. In this folder, keep looking for the existence of a file ip.txt, if there is a file ip.txt, read the content of the file, get the client’s IP and port number; if it does not exist, it will be searched periodically at a certain interval until the file is found or the server is closed.

When the server reads the IP and port numbers of the client, the server initializes and then connects the client according to the read IP and port numbers.

For a communication link between two network nodes, link encryption can provide security for the data transmitted on the Internet and can also intercept the commands and data transmitted on the network through the firewall. For link encryption, all messages are encrypted before being transmitted, and the received messages are decrypted at the destination node. A more common des encryption algorithm is used.

The client on the client machine is opened. The client program takes the active ip (10. 131. 1.130), and the available port number (5656), writes the native ip address and port number in the ip.txt file, and then uploads the ip.txt file to the FTP server of the springboard. The server first connects to the FTP server that is a springboard computer, reads the shared folders on the FTP server, and downloads the ip. txt to the computer of the server. Then the server program opens and reads the ip. txt file to get the IP and port number of the client’s computer. With the obtained IP address and port number, the server and the client program successfully establish a connection.

The attack methods of password attack analysis, denial of service attack analysis, buffer overflow attack analysis, data-driven attack analysis, and forged information attack and ARP attack analysis are selected, and the analysis coverage and uncertainty of analytical method are obtained by using two kinds of methods for computer network attack analysis. The reliability of the two methods is determined by the calculation method.

4.3 Analysis of coverage test results

In the course of the experiment, two different methods of computer network attack analysis are used in the simulation environment to analyze the changes of computer network attacks. At the same time, because of two different methods for computer network attack analysis, the analysis results cannot be directly compared. The third party analysis record software is used to record and analyze the test process and results, and the results are displayed in the comparison result curve of this experiment. In the simulation test result curve, the third party analysis and recording software function is used to eliminate the uncertainty of the personnel operation and computer equipment factors in the simulation test, which only aims at the network attack under different virus propagation models and the different methods of computer network attack analysis, and the analysis coverage test model is carried out. The test was made. The comparison curve of the analysis coverage test is shown in Fig. 1 .

Analysis of comparison result curve of coverage test. In the results of the test curve, a represents a conventional method of computer network attack analysis and b represents the proposed method of computer network attack analysis. The third party software is used to analyze and record. The average analysis coverage of the conventional method is 86.42%, and that of the method designed in this paper is 99.89%

4.4 Analysis of the test results of uncertainty

At the same time, different network propagation models and different computer network attack analysis methods are used to analyze the uncertainty simulation test. The comparison curve of the test results is shown in Fig. 2 .

Analysis of the comparison result curve of the uncertainty test. In the test curve results, a represents the conventional method of computer network attack analysis and b represents the designed method of computer network attack analysis. Using third party analysis to record software analysis records, the average analysis uncertainty of conventional method for computer network attack analysis is 15.12%, and that of the method in this paper is 2.70%

4.5 Analysis of reliability analysis

According to the analysis of coverage and uncertainty by the above analysis methods, the reliability analysis is obtained by formula 4.

in which ξ is the coverage of analysis, λ is the uncertainty of analysis, μ is the test influence parameter, and θ is the influence of attack effectiveness. The method proposed in this paper is S 1 , and the conventional method is S 2 . ΔS = S 1 − S 2 is the positive number to represent the reliability improvement, and ΔS = S 1 − S 2 is the negative number to represent the reliability reduction. It is substituted to formula 4 to obtain △ S :

It can be seen that the proposed method of computer network attack analysis is better than the conventional method, and the reliability is improved by 47.15%. It is suitable for the analysis of network attacks under the virus propagation model.

5 Conclusion

In this paper, an analysis of computer network attack based on the virus propagation model is proposed. Based on the construction of the computer network attack analysis model, the object and process of computer network attack are analyzed, and the research in this paper is completed. The experimental data show that the method designed in this paper is very effective. It is hoped that the research in this paper can provide a theoretical basis for computer network attack analysis method under the virus propagation model.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

Address resolution protocol

L. Wenjie, H. Lihua, Y. Xinfeng, Simulation analysis of optimal network port communication selection model. Computer Simulation 33 (8), 248–251 (2016)

Google Scholar

M. Yangyang, H. Gaofeng, Z. Bo, Simulation study on optimal identification of attack information in power resource network. Computer Simulation 34 (6), 104–107 (2017)

S. Lazfi, S. Lamzabi, A. Rachadi, et al., The impact of neighboring infection on the computer virus spread in packets on scale-free networks. Int J Modern Physics B 31 (30), 1750228 (2017)

Article MathSciNet Google Scholar

Z. Frontistis, C. Drosou, K. Tyrovola, et al., Experimental and modeling studies of the degradation of estrogen hormones in aqueous TiO2 suspensions under simulated solar radiation. Ind. Eng. Chem. Res. 51 (51), 16552–16563 (2017)

Article Google Scholar

H. Rahbari, M. Krunz, L. Lazos, Swift Jamming Attack on Frequency Offset Estimation: The Achilles Heel of OFDM Systems. IEEE Trans. Mob. Comput. 15 (5), 1264–1278 (2016)

A. Souyah, K.M. Faraoun, Fast and efficient randomized encryption scheme for digital images based on Quadtree decomposition and reversible memory cellular automata. Nonlinear Dynamics 84 (2), 715–732 (2016)

B. Yao, X. Li, L. Shi, et al., A Multiscale Model of Reentry Plasma Sheath and Its Nonstationary Effects on Electromagnetic Wave Propagation. IEEE Transactions on Plasma Science PP (99), 1–8 (2017)

K. Jung, H. Hu, L.J. Saif, Porcine deltacoronavirus infection: etiology, cell culture for virus isolation and propagation, molecular epidemiology and pathogenesis. Virus Res. 226 , 50–59 (2016)

B. Hilary, J. Sembia, M.S. Bangura, et al., Exposure-specific and age-specific attack rates for Ebola virus disease in Ebola-affected households, Sierra Leone. Emerg. Infect. Dis. 22 (8), 1403–1411 (2016)

C. Sva, N. Rls, M.P. Papa, et al., Development of standard methods for Zika virus propagation, titration, and purification. J. Virol. Methods 246 , 65–74 (2017)

P. Zhu, L. Liang, X. Shao, et al., Host cellular protein TRAPPC6AΔ interacts with influenza A virus M2 protein and regulates viral propagation by modulating M2 trafficking. J. Virol. 91 (1), JVI.01757–JVI.01716 (2016)

J.N. Conde, E.M.D. Silva, D. Allonso, et al., Inhibition of the membrane attack complex by dengue virus NS1 through interaction with vitronectin and terminal complement proteins. J. Virol. 90 (21), JVI.00912–JVI.00916 (2016)

M. Ge, Y. Zhang, Y. Liu, et al., Propagation of field highly pathogenic porcine reproductive and respiratory syndrome virus in MARC-145 cells is promoted by cell apoptosis. Virus Res. 213 (1), 322–331 (2016)

A. Sayaka, O. Toru, S. Yukari, et al., TRC8-dependent degradation of hepatitis C virus immature core protein regulates viral propagation and pathogenesis. Nat. Commun. 7 , 11379 (2016)

R. Suzuki, K. Saito, M. Matsuda, et al., Single-domain intrabodies against hepatitis C virus core inhibit viral propagation and core-induced NFÎ°B activation. J. Gen. Virol. 97 (4), 887–892 (2016)

S.M. Soubies, C. Courtillon, M. Abed, et al., Propagation and titration of infectious bursal disease virus, including non-cell-culture-adapted strains, using ex vivo-stimulated chicken bursal cells. Avian Pathology, 1–10 (2017)

J.K. Stodola, G. Dubois, A.L. Coupanec, et al., The OC43 human coronavirus envelope protein is critical for infectious virus production and propagation in neuronal cells and is a determinant of neurovirulence and CNS pathology. Virology 515 , 134–149 (2018)

J.R. Glynn, H. Bower, S. Johnson, et al., Variability in intra-household transmission of Ebola virus, and estimation of the household secondary attack rate. J. Infect. Dis. 217 , 2 (2017)

C.J. Schweitzer, F. Zhang, A. Boyer, et al., N-Myc downstream-regulated gene 1 restricts hepatitis C virus propagation by regulating lipid droplet biogenesis and viral assembly. J. Virol. 92 (2), JVI.01166–JVI.01117 (2017)

Wu, X., Narasimha Reddy, A.L.: SCMFS: a file system for storage class memory. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, New York, NY, USA, pp. 39:1–39:11. ACM (2011) Google Scholar

Xu, J., Swanson, S.: NOVA: A Log-Structured File System for Hybrid Volatile/Non-Volatile Main Memories. In: FAST, pp. 323–338 (2016).

Liu, Z., Sha, E.H.-M., Chen, X., Jiang, W., Zhuge, Q.: Performance Optimization for In-Memory File Systems on NUMA Machines. In: 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 7–12. IEEE (2016)

Zhenhua Huang, Xin Xu, Juan Ni, Honghao Zhu, and Cheng Wang. Multimodal representation learning for recommendation in Internet of Things. IEEE Internet of Things Journal (2019).

Z. Chen, H. Cai, Y. Zhang, C. Wu, M. Mu, Z. Li, M.A. Sotelo, A novel sparse representation model for pedestrian abnormal trajectory understanding. Expert Syst. Appl. 138 , 112753 (2019)

K.L. Schierhorn, F. Jolmes, J. Bespalowa, et al., Influenza A virus virulence depends on two amino acids in the N-terminal domain of its NS1 protein to facilitate inhibition of the RNA-dependent protein kinase PKR. J. Virol. 91 (10), JVI.00198–JVI.00117 (2017)

Download references

Acknowledgements

National Natural Science Foundation of China, Research on Joint algorithm of Network selection and Cognitive Spectrum allocation in High-Speed Rail Communication Environment, 61661026.

About the authors

Yanshan He (1983-), female. She graduated from Lanzhou University in China in 2010 as a Master of Computer Science. She is currently a lecturer in the school of Electronic and Information Engineering, Lanzhou JiaoTong University. Her research interests include data mining and the technology of Internet of Things.

Ting Wang (1981-), female. She graduated from Lanzhou Jiaotong University in China in 2008 as a Master of Computer Science. She is currently an associate professor in the school of Electronic and Information Engineering, Lanzhou JiaoTong University. Her research interests include computer networking and the technology of Internet of Things.

Jianli Xie (1972-), male. He got his PhD of Intelligence Traffic from the Lanzhou Jiaotong University in China in 2014. He is currently a professor in the school of Electronic and Information Engineering, Lanzhou JiaoTong University. His research interests include communication engineering, intelligence traffic, and the technology of Internet of Things.

Ming Zhang (1982-), male. He got his PhD of Engineering from the Pukyong National University in Korea in 2015. He is currently an associate professor in the school of Electronic and Information Engineering, Lanzhou JiaoTong University. His research interests include machine intelligence and the technology of Internet of Things.

Author information

Authors and affiliations.

Electronics and Information Engineering, Lanzhou Jiaotong University, Lanzhou, 730070, China

Yanshan He, Ting Wang, Jianli Xie & Ming Zhang

You can also search for this author in PubMed Google Scholar

Contributions

YH wrote the entire article. TW is responsible for data preprocessing. JX was responsible for the simulation part of the experiment. MZ is responsible for the analysis of the results of the article. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Yanshan He .

Ethics declarations

Ethics approval and consent to participate.

This article does not contain any studies with human participants or animals performed by any of the authors.

All authors agree to submit this version and claim that no part of this manuscript has been published or submitted elsewhere.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

He, Y., Wang, T., Xie, J. et al. Analysis of computer network attack based on the virus propagation model. J Wireless Com Network 2020 , 63 (2020). https://doi.org/10.1186/s13638-020-1660-5

Download citation

Received : 20 November 2019

Accepted : 30 January 2020

Published : 12 March 2020

DOI : https://doi.org/10.1186/s13638-020-1660-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Virus propagation
Network attack
Interface construction
Analysis technology
ARP analysis

network security Recently Published Documents

Total documents.

Latest Documents
Most Cited Documents
Contributed Authors
Related Sources
Related Keywords

A Survey on Ransomware Malware and Ransomware Detection Techniques

Abstract: is a kind of malignant programming (malware) that takes steps to distribute or hinders admittance to information or a PC framework, for the most part by scrambling it, until the casualty pays a payoff expense to the assailant. As a rule, the payoff request accompanies a cutoff time. Assuming that the casualty doesn't pay on schedule, the information is gone perpetually or the payoff increments. Presently days and assailants executed new strategies for effective working of assault. In this paper, we center around ransomware network assaults and study of discovery procedures for deliver product assault. There are different recognition methods or approaches are accessible for identification of payment product assault. Keywords: Network Security, Malware, Ransomware, Ransomware Detection Techniques

Analysis and Evaluation of Wireless Network Security with the Penetration Testing Execution Standard (PTES)

The use of computer networks in an agency aims to facilitate communication and data transfer between devices. The network that can be applied can be using wireless media or LAN cable. At SMP XYZ, most of the computers still use wireless networks. Based on the findings in the field, it was found that there was no user management problem. Therefore, an analysis and audit of the network security system is needed to ensure that the network security system at SMP XYZ is safe and running well. In conducting this analysis, a tool is needed which will be used as a benchmark to determine the security of the wireless network. The tools used are Penetration Testing Execution Standard (PTES) which is one of the tools to become a standard in analyzing or auditing network security systems in a company in this case, namely analyzing and auditing wireless network security systems. After conducting an analysis based on these tools, there are still many security holes in the XYZ wireless SMP that allow outsiders to illegally access and obtain vulnerabilities in terms of WPA2 cracking, DoS, wireless router password cracking, and access point isolation so that it can be said that network security at SMP XYZ is still not safe

A Sensing Method of Network Security Situation Based on Markov Game Model

The sensing of network security situation (NSS) has become a hot issue. This paper first describes the basic principle of Markov model and then the necessary and sufficient conditions for the application of Markov game model. And finally, taking fuzzy comprehensive evaluation model as the theoretical basis, this paper analyzes the application fields of the sensing method of NSS with Markov game model from the aspects of network randomness, non-cooperative and dynamic evolution. Evaluation results show that the sensing method of NSS with Markov game model is best for financial field, followed by educational field. In addition, the model can also be used in the applicability evaluation of the sensing methods of different industries’ network security situation. Certainly, in different categories, and under the premise of different sensing methods of network security situation, the proportions of various influencing factors are different, and once the proportion is unreasonable, it will cause false calculation process and thus affect the results.

The Compound Prediction Analysis of Information Network Security Situation based on Support Vector Combined with BP Neural Network Learning Algorithm

In order to solve the problem of low security of data in network transmission and inaccurate prediction of future security situation, an improved neural network learning algorithm is proposed in this paper. The algorithm makes up for the shortcomings of the standard neural network learning algorithm, eliminates the redundant data by vector support, and realizes the effective clustering of information data. In addition, the improved neural network learning algorithm uses the order of data to optimize the "end" data in the standard neural network learning algorithm, so as to improve the accuracy and computational efficiency of network security situation prediction.MATLAB simulation results show that the data processing capacity of support vector combined BP neural network is consistent with the actual security situation data requirements, the consistency can reach 98%. the consistency of the security situation results can reach 99%, the composite prediction time of the whole security situation is less than 25s, the line segment slope change can reach 2.3% ,and the slope change range can reach 1.2%,, which is better than BP neural network algorithm.

Network intrusion detection using oversampling technique and machine learning algorithms

The expeditious growth of the World Wide Web and the rampant flow of network traffic have resulted in a continuous increase of network security threats. Cyber attackers seek to exploit vulnerabilities in network architecture to steal valuable information or disrupt computer resources. Network Intrusion Detection System (NIDS) is used to effectively detect various attacks, thus providing timely protection to network resources from these attacks. To implement NIDS, a stream of supervised and unsupervised machine learning approaches is applied to detect irregularities in network traffic and to address network security issues. Such NIDSs are trained using various datasets that include attack traces. However, due to the advancement in modern-day attacks, these systems are unable to detect the emerging threats. Therefore, NIDS needs to be trained and developed with a modern comprehensive dataset which contains contemporary common and attack activities. This paper presents a framework in which different machine learning classification schemes are employed to detect various types of network attack categories. Five machine learning algorithms: Random Forest, Decision Tree, Logistic Regression, K-Nearest Neighbors and Artificial Neural Networks, are used for attack detection. This study uses a dataset published by the University of New South Wales (UNSW-NB15), a relatively new dataset that contains a large amount of network traffic data with nine categories of network attacks. The results show that the classification models achieved the highest accuracy of 89.29% by applying the Random Forest algorithm. Further improvement in the accuracy of classification models is observed when Synthetic Minority Oversampling Technique (SMOTE) is applied to address the class imbalance problem. After applying the SMOTE, the Random Forest classifier showed an accuracy of 95.1% with 24 selected features from the Principal Component Analysis method.

Cyber Attacks Visualization and Prediction in Complex Multi-Stage Network

In network security, various protocols exist, but these cannot be said to be secure. Moreover, is not easy to train the end-users, and this process is time-consuming as well. It can be said this way, that it takes much time for an individual to become a good cybersecurity professional. Many hackers and illegal agents try to take advantage of the vulnerabilities through various incremental penetrations that can compromise the critical systems. The conventional tools available for this purpose are not enough to handle things as desired. Risks are always present, and with dynamically evolving networks, they are very likely to lead to serious incidents. This research work has proposed a model to visualize and predict cyber-attacks in complex, multilayered networks. The calculation will correspond to the cyber software vulnerabilities in the networks within the specific domain. All the available network security conditions and the possible places where an attacker can exploit the system are summarized.

Network Security Policy Automation

Network security policy automation enables enterprise security teams to keep pace with increasingly dynamic changes in on-premises and public/hybrid cloud environments. This chapter discusses the most common use cases for policy automation in the enterprise, and new automation methodologies to address them by taking the reader step-by-step through sample use cases. It also looks into how emerging automation solutions are using big data, artificial intelligence, and machine learning technologies to further accelerate network security policy automation and improve application and network security in the process.

Rule-Based Anomaly Detection Model with Stateful Correlation Enhancing Mobile Network Security

Research on network security technology of industrial control system.

The relationship between industrial control system and Internet is becoming closer and closer, and its network security has attracted much attention. Penetration testing is an active network intrusion detection technology, which plays an indispensable role in protecting the security of the system. This paper mainly introduces the principle of penetration testing, summarizes the current cutting-edge penetration testing technology, and looks forward to its development.

Detection and Prevention of Malicious Activities in Vulnerable Network Security Using Deep Learning

Export citation format, share document.

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
Sensors (Basel)
PMC10346235

CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment

Associated data.

https://www.unb.ca/cic/datasets/iotdataset-2023.html , accessed on 19 June 2023.

Nowadays, the Internet of Things (IoT) concept plays a pivotal role in society and brings new capabilities to different industries. The number of IoT solutions in areas such as transportation and healthcare is increasing and new services are under development. In the last decade, society has experienced a drastic increase in IoT connections. In fact, IoT connections will increase in the next few years across different areas. Conversely, several challenges still need to be faced to enable efficient and secure operations (e.g., interoperability, security, and standards). Furthermore, although efforts have been made to produce datasets composed of attacks against IoT devices, several possible attacks are not considered. Most existing efforts do not consider an extensive network topology with real IoT devices. The main goal of this research is to propose a novel and extensive IoT attack dataset to foster the development of security analytics applications in real IoT operations. To accomplish this, 33 attacks are executed in an IoT topology composed of 105 devices. These attacks are classified into seven categories, namely DDoS, DoS, Recon, Web-based, brute force, spoofing, and Mirai. Finally, all attacks are executed by malicious IoT devices targeting other IoT devices. The dataset is available on the CIC Dataset website.

1. Introduction

Nowadays, the Internet of Things (IoT) plays a pivotal role in society and brings new capabilities to different industries [ 1 , 2 , 3 ]. IoT projects in areas such as transportation and healthcare are becoming increasingly popular, and new applications are under development [ 4 , 5 ]. This new paradigm relies on an extensively connected sensors and actuators network with multiple devices producing network traffic [ 6 , 7 , 8 ]. Research and industrial communities have been evolving this concept for years, and these devices are becoming more present in our daily lives [ 9 , 10 , 11 ].

Several areas have been transformed by this technology. For example, in healthcare applications, patients can be regularly monitored using IoT technology [ 12 , 13 , 14 ]. In transportation, IoT devices have been used to detect and prevent accidents [ 15 , 16 , 17 ]. Industrial IoT (IIoT) has also brought different solutions, such as high reliability and low latency automated monitoring and collaborative control [ 18 ]. IoT applications have also been developed for areas such as education [ 19 ], aviation [ 20 ], and forestry [ 21 ]. In the last decade, society has experienced a drastic increase in IoT connections [ 22 ]. In fact, IoT connections will increase in the next few years across different areas [ 23 ]. This motivates the creation and development of business ideas and new concepts that rely on a highly distributed infrastructure. In addition, various strategies have been proposed to solve potential problems in IoT operations, i.e., the deployment of new services is leveraged by the scientific findings achieved in the past few years.

Conversely, despite these benefits, several challenges still need to be faced to enable efficient and secure operations (e.g., interoperability, security, standards, and server technologies) [ 24 , 25 , 26 , 27 ]. The development of new applications may also bring new requirements to the systems [ 28 , 29 ]. For example, the Internet of Vehicles (IoV) may require more restrictive response times than common IoT applications. Furthermore, detecting and mitigating attacks performed against IoT devices is challenging due to several factors. For example, distributed connections and light devices without security mechanisms may harden the process of detecting and mitigating attacks [ 30 , 31 , 32 , 33 ].

Furthermore, although efforts have been made to produce datasets composed of attacks against IoT devices, several possible attacks are not considered. In addition, most efforts do not consider an extensive network topology with real IoT devices. Finally, the attacks performed against IoT devices are executed by computer systems (i.e., non-IoT devices), highlighting the need for a dataset composed of attacks performed by malicious IoT devices. To enable the development of security analytics solutions for intrusion detection in real-world scenarios, the data produced need to (i) include a variety of attacks that can harm IoT operations, (ii) be collected from an extensive topology with real IoT devices of different types and brands, and (iii) include attacks performed by malicious IoT devices.

The main goal of this research is to propose a novel and extensive IoT attack dataset to foster the development of security analytics applications in real IoT operations. To accomplish this, 33 attacks are executed in an IoT topology composed of 105 devices. These attacks are classified into seven categories, namely DDoS, DoS, Recon, Web-based, brute force, spoofing, and Mirai. In addition, all attacks are executed by malicious IoT devices targeting other IoT devices. This dataset includes multiple attacks not available in other IoT datasets and enables IoT professionals to develop new security analytics solutions. Furthermore, the data are available in different formats, allowing researchers to use features extracted in our evaluation or engineer new features.

The main contributions of this research are:

We design a new realistic IoT attack dataset, CICIoT2023, using an extensive topology composed of several real IoT devices acting as either attackers or victims;
We perform, document, and collect data from 33 attacks divided into 7 classes against IoT devices and demonstrated how they can be reproduced;
We evaluate the performance of machine and deep learning algorithms using the CICIoT2023 dataset to classify and detect IoT network traffic as malicious or benign.

This paper is organized as follows: Section 2 presents an extensive comparison of the contributions of this research with other works present in the literature. Secondly, Section 3 introduces the CICIoT2023 dataset and presents the steps involved in the data collection. After that, Section 4 presents the feature extraction process and describes the data. Section 5 presents the machine learning (ML) evaluation in the classification of different attacks using the CICIoT2023 dataset. Finally, Section 6 presents the conclusion of this research.

2. Related Works

In the past few years, different contributions have been published regarding IoT security datasets. In fact, data have been produced with different goals and using different methods and resources. To better understand the characteristics of existing datasets, we review several initiatives present in the literature and compare them with the proposed CICIoT2023. The authors in [ 34 ] propose a novel network-based dataset for detecting botnet attacks in the IoT environment called N-BaioT (2018). Mirai and BASHLITE botnets were used to attack nine commercial IoT devices. Multiple features were extracted from the network traffic and used by a deep-learning autoencoder for attack detection. In [ 35 ], the authors introduce a host-based IoT dataset composed of data from real IoT devices. This dataset, called IoTHIDS (2018), is produced based on experiments considering a topology of three devices infected by Mirai, Hajime, Adira, BASHLITE, Doflo, Tsunami, and Wroba malware botnets.

IoT-SH (2019) [ 36 ] is a dataset composed of captures of twelve attacks (categorized into four classes) against eight different smart home devices. A three-layer Intrusion Detection System (IDS) is used considering various combinations of rule-based and machine learning approaches to classify the attacks. BoT-Iot (2019) is introduced in [ 37 ] as a realistic traffic dataset, produced considering heterogeneous network profiles. Multiple attacks are performed (e.g., DDoS, DoS, data theft, and scan) against five devices. In the evaluation process, a set of new features are selected and used based on correlation coefficient and joint entropy techniques. Various machine and deep learning models are trained to evaluate the attack detection accuracy.

The authors in [ 38 ] introduce the Kitsune (2019) dataset, which is composed of four different categories of attacks executed against nine IoT devices. In the experiments conducted, a security camera was infected by a real Mirai botnet sample. This dataset is intended to support the development of plug-and-play Network Intrusion Detection Systems (NIDS) to detect normal and malicious traffic. Similarly, IoTNIDS (2019) [ 39 ] represent an initiative focused on collecting data from a real-world IoT networking environment based on the interaction between two IoT devices (speaker and camera). Multiple attacks are analyzed in this effort, e.g., Mirai, MITM, DoS, and scanning. MedBIoT (2020) [ 40 ] is an IoT network architecture dataset based on using real and emulated devices. The authors evaluated multiple machine learning techniques using 100 statistical features extracted from the IoT network traffic. In [ 41 ], the authors propose the IoT-23 (2020) dataset. This contribution refers to a botnet dataset captured composed of real network environment captures of benign and malicious traffic.

IoTIDs (2020) [ 42 ] is proposed as a dataset composed of IoT-related flow-based features, selected and ranked by the correlation coefficients technique and the Shapiro–Wilk algorithm, respectively. In the experiments, the authors performed four different attacks against two IoT devices (speaker and camera) and recorded the data. Multiple machine learning methods were used in the evaluation process (e.g., SVM, G-NB, LDA, and LR) focusing on attack detection and classification. The authors in [ 43 ] present the MQTT (2020) dataset with the primary goal of providing realistic data that include a protocol dedicated to IoT network scenarios. Furthermore, eight IoT devices were connected to the MQTT broker and a set of 33 different features were extracted and provided to various machine learning algorithms. Similarly, MQTT-IoT-IDS (2020) [ 44 ] is another contribution focused on producing a dataset using a lightweight protocol, i.e., MQTT, which is used in IoT networks. The authors focus on replicating a realistic IoT network by using a camera feed, twelve MQTT sensors, and a broker. Five scenarios are considered based on the variation in the attacks performed. Several packet-based, uni-, and bi-flow features are used alongside six different machine learning algorithms in the evaluation phase.

In [ 45 ], the authors proposed a new telemetry-based data-driven IoT/IIoT dataset called TON-IoT (2020). This heterogeneous dataset comprises both normal and attack samples captured in different scenarios. Targeting the development of a realistic dataset, the authors include attack sub-categories, data recorded from operating system logs, and network traffic. Several machine learning and deep learning algorithms are used in the evaluation phase and the achieved results are reported in detail. Finally, the Edge-IIoTSet (2022) dataset is introduced as a realistic cybersecurity resource for IoT and IIoT applications to enable the development of Intrusion Detection Systems (IDS) in centralized and distributed applications [ 46 ]. Throughout the paper, an in-depth description of the testbed used is presented. In addition, the authors also describe the dataset generation framework. Regarding the machine learning evaluation process, centralized and federated learning considerations are presented.

3. The Proposed CICIoT2023

This section introduces the CICIot2023 dataset. We aim to present an in-depth description of all steps and resources involved in producing this dataset. First, we describe the CIC IoT Lab. Then, we focus on the IoT topology, listing all IoT and network devices used and how they are connected. Then, we present a discussion on all attacks that have been executed. Finally, we provide insights into how the data were collected for benign and malicious scenarios.

3.1. IoT Lab

The production of IoT security data that can be used to support real applications is challenging for several reasons. One of the main problems is having an extensive network composed of several real IoT devices, similar to topologies of real IoT applications. Many works adopt simulated or very few IoT devices due to costs, network equipment required (e.g., switches, routers, and network tap), and personnel dedicated to maintaining such an infrastructure.

Thereupon, the Canadian Institute for Cybersecurity (CIC) has a distinguished presence in the cybersecurity ecosystem and a history of high-impact contributions to industry and academia. Examples are datasets used to develop new cybersecurity applications and several partnerships with the industry to improve the cybersecurity practice and develop new solutions. This success enabled CIC to establish an IoT lab with a dedicated network to foster the development of IoT security solutions. In fact, by sharing the data collected from this extensive topology, we intend to foster the advancement of IoT security research and support several initiatives in different IoT security aspects.

Figure 1 shows the IoT lab at the CIC and its devices. Indeed, IoT devices are distributed across the lab, in which some of them are placed on the table, others on the floor, and some on the walls. We adopt a local network topology and several power plugs are available in the lab. Additionally, there are racks and storage rooms in order to organize the IoT and network devices.

An external file that holds a picture, illustration, etc.
Object name is sensors-23-05941-g001.jpg

CIC IoT Lab.

3.2. IoT Topology

The IoT topology deployed to produce the CICIoT2023 is illustrated in Figure 2 and comprises 105 IoT devices. A total of 67 IoT devices were directly involved in the attacks and other 38 Zigbee and Z-Wave devices were connected to five hubs.

An external file that holds a picture, illustration, etc.
Object name is sensors-23-05941-g002.jpg

IoT network topology used in the experiments.

This topology mimics a real-world deployment of IoT products and services in a smart home environment. The devices list includes smart home devices, cameras, sensors, and micro-controllers which are connected and configured to enable the execution of several attacks and capture the corresponding attack traffic. The lab is also equipped with various tools and software, which enable us to perform several attacks and capture both benign and malicious attack traffic.

This topology is divided into two parts. In the first part, an ASUS router connects the network to the Internet and a Windows 10 Desktop computer shares this connectivity. In addition, a Cisco switch is placed between this computer and a VeraPlus access point connecting 7 Raspberry Pi devices. These devices are responsible for executing the attacks and malicious activities in the experiments. Using IoT devices as malicious agents is a CICIoT2023 characteristic not found in other efforts. Then, the Cisco switch is connected to the second part through a Gigamon Network Tap. This network device collects all the IoT traffic and sends it to two network monitors, which are responsible for storing the traffic using wireshark [ 47 ]. In fact, a network tap is a hardware device that allows for monitoring and analyzing network traffic by connecting to a network cable and providing a copy of the traffic to other monitoring and security tools. Network taps are connected in a way so as not to affect the normal operation and provide a full-duplex, non-intrusive, and passive way of accessing network traffic, without introducing any latency or affecting the performance of the network. This device has two network and two monitoring ports and is placed between the attacking and legitimate devices, connecting one port to the attackers and the other to the victim networks. Using the monitor ports, we are able to capture the traffic to and from the IoT network.

In the second part, a Netgear Unmanaged Switch is connected to five gateways and base stations to enable communication with IoT devices with protocols such as Zigbee and Z-Wave. Furthermore, another VeraPlus controller is connected to the switch. This controller is also connected to other two Zigbee/Z-Wave hubs and to several devices considered victims in the attacks performed. The list of all IoT devices used in this dataset is presented in Table 1 . Note that Zigbee and Z-wave devices do not have a MAC address and are labeled as “Not Applicable” (N/A) for that particular column.

List of IoT devices used to produce the dataset.

3.3. Data Collection of Benign and Malicious Scenarios

As described in Section 3.2 , a network tap and two traffic monitors are dedicated to monitoring the network traffic. Every packet sent through the network is stored in separate computers. In fact, the network has two different interfaces, which are associated with two other monitoring ports that send incoming packets to these computers. Hence, the network traffic is monitored using Wireshark [ 47 ] and stored in pcap format. Since two data streams are stored, mergecap [ 48 ] is used to unify pcap files for each experiment.

For each attack, a different experiment is performed targeting all applicable devices. In all scenarios, the attacks are performed by malicious IoT devices targeting vulnerable IoT devices. For example, DDoS attacks are executed against all devices, whereas web-based attacks target devices that support web applications. Table 2 depicts the tools used to perform all attacks alongside the number of rows generated. In addition, Figure 3 and Figure 4 illustrate the instances count for each attack and category. The values are also presented in Table 3 .

An external file that holds a picture, illustration, etc.
Object name is sensors-23-05941-g003.jpg

Number of rows for each scenario.

An external file that holds a picture, illustration, etc.
Object name is sensors-23-05941-g004.jpg

Number of rows for each category.

CICIoT2023: tools and frameworks used to execute attacks.

Number of rows for each attack and category.

3.3.1. Benign Data Generation

The benign data represent the legitimate use of the IoT network. In this sense, the main goal of the data-capturing procedure relies on gathering IoT traffic in idle states and with human interactions (e.g., sensor data, echo dot requests, and accessing video feeds from smart cameras).

In terms of hardware for capturing, we relied on a network tap combined with two network monitors. In terms of software used, we adopted Wireshark to capture the entire traffic. Furthermore, all IoT devices are configured with default parameters and without malicious or attacking scripts. In this sense, benign data traffic gathering happens when there are no attacks. This process was conducted over a period of 16 h.

3.3.2. Executing DoS and DDoS Attacks

These attacks refer to flooding threats to compromise the availability of IoT operations. In the case of Denial-of-Service (DoS) attacks, one Raspberry Pi is responsible for flooding IoT devices. Furthermore, multiple Raspberry Pis are used to execute Distributed Denial-of-Service (DDoS) attacks through an SSH-based master-client configuration. The attacks executed are:

ACK Fragmentation: a relatively small number of maximum-sized packets is used to compromise the network operation. In many cases, these fragmented packets are successfully sent and handled by routers, firewalls, and intrusion prevention systems, given that fragmented packets recompilation is not performed [ 62 ];
Slowloris: relies on using partial HTTP requests via open connections to a targeted Web server focusing on the application layer [ 63 ];
ICMP/HTTP/UDP/TCP Flood: based on overwhelming a targeted device with different packet types [ 64 , 65 , 66 ];
RST-FIN Flood: degrades networking capabilities by forwarding continuously RST-FIN packets towards a specific target [ 67 ];
PSH-ACK Flood: degrades server operation by flooding using PUSH and ACK requests [ 68 ];
UDP Fragmentation: refers to a special UDP flood that consumes more bandwidth while reducing the number of packets [ 69 ];
ICMP Fragmentation: relies on the use of identical fragmented IP packets containing a portion of a fragmented ICMP message [ 70 ];
SYN Flood: is a specific type of TCP flood that targets the initial handshake of the TCP connection. The SYN flood sends a large number of SYN (synchronize) packets to the targeted server, but it never completes the handshake by sending the final ACK (acknowledge) packet [ 71 ];
Synonymous IP Flood: an extensive number of manipulated TCP-SYN packets with source and destination addresses as the targeted address, which leads the server to use its resources to process the incoming traffic [ 72 ].

3.3.3. Gathering Information from the IoT Topology

These attacks gather all possible information about the target. In addition, an attacker can use a reconnaissance (i.e., scan) attack as a preparation step for other attacks. There are multiple ways to perform these attacks, and some of the most popular and threatening variations are:

Ping Sweep: A ping sweep attack, also known as a ping scan, is a type of reconnaissance attack used to identify active hosts on a network. It involves sending a series of ICMP (Internet Control Message Protocol) Echo Request (ping) packets to a range of IP addresses on a network, and then analyzing the ICMP Echo Reply (pong) packets that are returned to identify which hosts are active and responding [ 73 ];
OS Scan: An OS (operating system) scan attack, also known as an operating system fingerprinting attack, is a type of reconnaissance attack that is used to identify the type and version of an operating system running on a targeted host. The attacker uses various techniques to gather information about the targeted host, such as analyzing the responses to network packets, or examining the behavior of open ports and services, in order to determine the type and version of the operating system [ 74 ];
Vulnerability Scan: A vulnerability scan attack is a type of network security assessment that involves automated tools to identify potential vulnerabilities in a computer system or network. The goal of a vulnerability scan is to identify security weaknesses that could be exploited by an attacker to gain unauthorized access to a system or steal sensitive information [ 75 ];
Port Scan: A port scan attack is a type of reconnaissance attack that is used to identify open and active ports on a targeted host. The attacker sends a series of packets to various ports on the targeted host, attempting to establish a connection. The responses to these packets are then analyzed to determine which ports are open, closed, or filtered [ 76 ].
Host Discovery: A host discovery attack, also known as a host identification or host enumeration attack, is a type of reconnaissance attack that is used to identify active hosts on a network. It involves using various techniques to identify the IP addresses of devices that are connected to a network, and it is the first step in many cyber-attacks [ 77 ].

3.3.4. Exploiting Web-Based Vulnerabilities

When executing these attacks, web services running on IoT devices were targeted. Web-based attacks are concerned with targeting web services in several ways. These attack types include injection, hijacking, poisoning, spoofing, and DoS [ 78 ]. The web-based attacks executed in this research are:

SQL Injection: an attack that targets web applications by injecting malicious SQL code into the application’s input fields. The goal of an SQL injection attack is to gain unauthorized access to a database, steal sensitive information, or execute arbitrary commands on the database server [ 79 ];
Command Injection: an attack that targets web applications by injecting malicious commands into an input field with the ultimate goal of gaining unauthorized access to a system, stealing sensitive information, or executing arbitrary commands on the targeted system [ 80 ];
Backdoor Malware: involves installing malware on a targeted system that allows the attacker to gain unauthorized access to the system at a later time. The malware, known as a “backdoor,” creates a hidden entry point into the system that can be used to bypass security measures and gain access to sensitive information or perform malicious actions [ 81 ];
Uploading Attack: targets a web application by exploiting vulnerabilities in the application’s file upload functionality. The goal of an uploading attack is to upload malicious files, such as malware, to a targeted system and use them to gain unauthorized access or execute arbitrary code on the targeted system;
Cross-Site Scripting (XSS): allows an attacker to inject malicious code (e.g., a script) into a web page. The injected script can then be executed by the web browser of any user with access to the page, allowing the attacker to steal sensitive information (e.g., cookies, session tokens, and personal data) or to perform other malicious activities (e.g., traffic redirection) [ 82 ];
Browser Hijacking: a type of cyber attack in which an attacker modifies a web browser’s settings, such as the home page, default search engine, or bookmarks in order to redirect the user to a different website or display unwanted ads. The goal of a browser hijacking attack is to generate revenue through advertising or to steal personal information [ 83 ].

3.3.5. Spoofing Communication

Spoofing attacks enable malicious actors to operate under the identity of a victim system and gain illegitimate access to the network traffic. The main focus of such a procedure includes gaining access to systems, stealing data, and spreading malware [ 84 ]. Two of the most popular spoofing attacks are:

ARP spoofing: relies on the transmission of manipulated ARP (Address Resolution Protocol) messages to associate the MAC address of the malicious device with the IP address of some other legitimate device in the network. This enables attackers to intercept, modify, or block network traffic [ 85 ];
DNS spoofing: relies on the alteration of DNS entries in a DNS server’s cache, redirecting users to manipulated or malicious websites. This enables attackers to steal sensitive information, spread malware, and perform other malicious actions [ 86 ].

3.3.6. Brute-Force Threats

Brute-force attacks consist of the submission of data (e.g., passwords or passphrases) to eventually gain access to systems [ 87 ]. Among the several procedures that can be executed, a dictionary brute-force attack is a type of attack that attempts to guess a password or passphrase by repeatedly trying words from a pre-defined list of words obtained from various sources. The goal of the attack is to find the correct password by trying all the words in the dictionary [ 88 ].

3.3.7. Mirai as an IoT Threat

The Mirai attack is a large-scale DDoS that can target IoT devices. In this paper, we are conducting different variations of Mirai attacks by using five different raspberries, as illustrated in Figure 5 , alongside the connections considered in the different IoT network layers. In order to connect to the Internet, a gateway uses a Windows 10 instance to provide and monitor Internet access. This access is possible through a Netgear unmanaged switch that connects attackers and general IoT devices. Several tools are used to perform the attacks and a special Mirai configuration is also adopted. An online IoT supervisor coordinates the operation of the multiple IoT devices in the topology (e.g., sensors, cameras, and smart speakers). Finally, some other works do not consider Mirai in their attack set. In fact, we focus on several attacks that can be executed against IoT devices, and we consider the analysis and execution of new IoT attacks in the future directions of this research (e.g., attacks using future protocols).

An external file that holds a picture, illustration, etc.
Object name is sensors-23-05941-g005.jpg

Basic attack framework for the dataset.

This attack infected devices to form a botnet that can flood targeted victims. This threat can cause disruption in different contexts and some of its most popular variations are:

GREIP: Within the GRE packet, this attack floods the target system with encapsulated packets. The internal data comprise random IPs and ports, whereas the external layer contains actual IPs [ 89 ];
GREETH: This attack presents a similar procedure to GREIP. However, the main focus is on the packet encapsulation approach, which is based on the ethernet header [ 89 ];
UDP Plain: This threat focuses on flooding targeted victim systems with UDP packets considering a repeated packet segment. However, the payload sent is different for each packet [ 89 ].

4. Feature Extraction and Data Description

The CICIoT2023 dataset is available in two different file formats: pcap and csv. Pcap files comprise the original data generated and collected in the CIC IoT network in different scenarios. These files contain all packets sent and can be used to extract and engineer other features. Furthermore, csv files present a simpler way of loading and using the data. Those files are composed of features extracted from the original pcap files summarized by a fixed-size packet window. In other words, the features are extracted from a sequence of packets carrying information between two hosts.

The method adopted to produce the dataset is illustrated in Figure 6 . Firstly, the data are generated (i.e., captured), extracted, and labeled. This refers to the initial step, in which the actual attacks are executed against IoT devices. Then, the data are processed in a way to enable researchers to access the data generated easily. Finally, we conduct a machine learning (ML) evaluation to show how classification capabilities can be leveraged by the proposed dataset.

An external file that holds a picture, illustration, etc.
Object name is sensors-23-05941-g006.jpg

Method adopted to produce the dataset.

Figure 7 illustrates how the data generation, extraction, and labeling are conducted for each attack scenario (and benign scenario). The first phase relies on the use of different tools presented in Table 2 to execute attacks against IoT devices in the network. After that, the network traffic is captured in pcap format using Wireshark. Finally, for each attack executed, the entire traffic captured is labeled as belonging to that particular attack.

An external file that holds a picture, illustration, etc.
Object name is sensors-23-05941-g007.jpg

Regarding the data processing step, illustrated in Figure 8 , the network traffic data composed of captures of all attacks alongside benign traffic are used. As it represents about 548 GB worth of traffic data, we split it into smaller chunks of 10 MB to perform the conversion in parallel. This process is conducted using TCPDUMP [ 90 ]. After that, a parallel procedure is executed to extract several features using the DPKT package [ 91 ] and store them in separate csv files. These features are described in Table 4 . In this process, DPKT is used to enable a flexible feature extraction procedure considering important attributes of the IoT operation highlighted in previous works. Conversely, other tools can also be used to extract features, e.g., CICFlowMeter [ 92 ] and Nfstream [ 93 ]. In this stage, we also perform the data cleaning by removing incomplete packets (i.e., packets that present null features). In our experiments, we only remove the timestamp from the list since it does not illustrate the network behavior—instead, it is used for sorting. In this case, all other features are directly used to evaluate how different ML models perform in such circumstances.

An external file that holds a picture, illustration, etc.
Object name is sensors-23-05941-g008.jpg

Data processing: converting pcap files to csv.

Features extracted from the network traffic.

These features are extracted based on proposals present in the literature regarding IoT security [ 8 , 46 ]. In fact, although these features have been used and validated in other efforts, our main goal is to present a flexible approach to training ML models with multiple features. Thus, several other features can be extracted or engineered based on the scripts used in this research as well as the raw network traffic (i.e., pcap files).

With the extracted features, we group the values captured in window sizes of 10 (i.e., Backdoor Malware, Benign Traffic, Browser Hijacking, Command Injection, Dictionary brute force, DNS spoofing, MITM ARP spoofing, Host Discovery, OS Scan, Ping Sweep, Port Scan, SQL Injection, Uploading Attack, Vulnerability Scan, and XSS) and 100 (DDoS ACK Fragmentation, DDoS HTTP Flood, DDoS ICMP Flood, DDoS ICMP Fragmentation, DDoS PSHACK Flood, DDoS RSTFIN Flood, DDoS SlowLoris, DDoS SYN Flood, DDoS SynonymousIP Flood, DDoS TCP Flood, DDoS UDP Flood, DDoS UDP Fragmentation, DoS HTTP Flood, DoS SYN Flood, DoS TCP Flood, DoS UDP Flood, Mirai GREIP Flood, Mirai Greeth Flood, and Mirai UDPPlain) packets to mitigate data size discrepancy (e.g., DDoS and CommandInjection) and calculate their mean values using Pandas [ 94 ] and Numpy [ 95 ]. Finally, we combine all subfiles into a processed csv dataset using Pandas. Thereupon, the resulting csv datasets represent the combination of features of each data chunk.

Moreover, each attack conducted in this research presents different characteristics. For example, the network traffic generated by a DDoS attack tends to be larger than the network traffic generated by a spoofing attack. Indeed, these differences can also be observed in other features of the dataset. Table 4 lists all features provided in the dataset, which Table 5 presents the characteristics of these features. For each feature in the entire dataset, we present the mean, standard deviation (std), minimum (min), 25th percentile (25%), median (50%), 75th percentile (75%), and maximum (max) values.

Dataset description.

5. Machine Learning (ML) Evaluation

In order to demonstrate how the CICIoT2023 dataset can be used to train machine learning (ML)-based attack detection and classification methods, Figure 9 illustrates the ML evaluation pipeline adopted in this research. Firstly, we combine all datasets produced following the procedure presented in Figure 8 . In this sense, malicious and benign traffics are combined and shuffled into a single dataset (i.e., blended dataset) using PySpark [ 96 ]. Once the data are integrated, we evaluate ML performance from three different perspectives: (i) multiclass classification, focussing on classifying 33 individual attacks; (ii) grouped classification, considering 7 attack groups (e.g., DDoS and DoS); and (iii) binary classification (i.e., malicious and benign traffic classification). In each case, the dataset is divided into the train (80%) and test (20%) sets, which are normalized using the StandardScaler method [ 97 ] before the actual training process. Finally, the results obtained are summarized as integrated results.

An external file that holds a picture, illustration, etc.
Object name is sensors-23-05941-g009.jpg

Machine learning (ML) evaluation pipeline adopted in this research.

5.1. Metrics

The evaluation of different ML models and configurations is conducted based on evaluation metrics. Given that TP represents the True Positives, TN the True Negatives, FP the False Positive, and FN the False Negatives, the metrics used in this research are [ 98 ]:

Accuracy: responsible for evaluating the classification models by depicting the proportion of correct predictions in a given dataset and is based on the following expression: A c c = T P + T N T P + T N + F P + F N (1)
Recall: the ratio of correctly identified labels to the total number of occurrences of that particular label: R e c = T P T P + F N (2)
Precision: the ratio of correctly identified labels to the total number of positive classifications: P r e = T P T P + F P (3)
F1-Score: geometric average of precision and recall: F 1 = 2 × P r e × R e c P r e + R e c (4)

5.2. Evaluation

In the evaluation process, we adopted five ML methods that have been successfully used in different applications, including cybersecurity: Logistic Regression [ 99 ], Perceptron [ 100 ], Adaboost [ 101 , 102 , 103 ], Random Forest [ 104 ], and Deep Neural Network [ 105 ]. Figure 10 illustrates the performance of all methods when framing the classification problem as binary (i.e., malicious and benign), multiclass with 8 classes (i.e., benign and attack categories), and multiclass with 34 classes (i.e., benign and all individual attacks). These results are also depicted in Table 6 .

An external file that holds a picture, illustration, etc.
Object name is sensors-23-05941-g010.jpg

Results obtained in the classification process conducted using different machine learning models.

Results obtained in the classification process conducted using different machine learning models (illustrated in Figure 10 ).

For the binary classification, the results show that all methods present high performance, whereas accuracy is a metric that all methods reach over 98%, and the F1-score highlights the difference among these approaches. For example, Perceptron achieves 81%, showing that it suffers since the minority class (i.e., benign) is misclassified more often. In the classification of attack groups (i.e., eight classes), the overall performance is degraded since the classification task becomes more challenging. The Logistic Regression, Perceptron, and Adaboost methods show a significant decrease in accuracy. This impact is even more perceptible for F1-score. However, both Random Forest and Deep Neural Network are able to maintain high accuracy and F-1 score. These methods also present a decrease in performance but are capable of achieving F1 scores of 70%.

Finally, the most challenging classification task is represented by a multiclass classification of individual attacks (i.e., 34 classes). In this scenario, both Random Forest and Deep Neural Network could maintain high accuracy with very similar results. The same applies to F1-score since a slight reduction was perceived (around 1%) compared to the eight-class challenge. Furthermore, this case study shows that the Logistic Regression, Perceptron, and Adaboost methods are not able to categorize attacks as efficiently, given that the average accuracy is below 80% and F1-score is less than 50% in all cases.

These results show how ML methods can be used to classify attacks against IoT operations. In fact, this is a starting point that can be considered in any ML-based cybersecurity solutions for IoT operations. This effort not only highlights that the use of other ML methods is possible (e.g., optimized methods), but also enables the adoption of similar strategies to solve IoT-specific problems. Finally, although we are focussing on 33 different attacks, future directions could also be tailored to address issues related to individual attacks or categories.

5.3. Discussion

To illustrate how these models are performing for each class, Table 7 and Table 8 show the confusion matrix for Random Forest and Deep Neural Networks in the case of multiclass classification (eight classes).

Confusion matrix for Deep Neural Network in the case of multiclass classification (8 classes).

Confusion matrix for Random Forest in the case of multiclass classification (8 classes).

In both cases, it is possible to observe that some classes are very well classified, mainly those with a large number of occurrences in the dataset. For example, the misclassification rates for DDoS, DoS, and Mirai are very small, followed by Recon and spoofing.

However, these models face challenges in classifying other attacks. For example, web-based attacks are usually classified as benign, Recon, or spoofing. The same occurs in the brute force classification. Although the similarities in the data patterns lead the models to make these mistakes, the classification is successful in most cases, leading to the results depicted in Figure 10 . In fact, the results show that the multiclass classification performance degrades for three classes (Benign, Recon, and spoofing). The underlying traffic for those scenarios can be similar, and we intend to explore this phenomenon in future works further.

Finally, Table 9 and Table 10 compare all datasets reviewed with the proposed CICIoT2023 dataset. These tables focus on presenting an analysis of attacks executed in this research as well as its main contributions, i.e., these datasets may include attacks other than those shown in these tables.

Comparison CICIoT2023 with existing IoT security datasets.

Comparison CICIoT2023 contributions with existing IoT security datasets.

6. Conclusions

Nowadays, IoT is becoming increasingly important for society. In this context, the development of security solutions is pivotal to enabling efficient, secure, and dependable IoT operations. This research introduced a novel and extensive IoT attack dataset to foster the development of security analytics applications in real IoT operations. In this process, 33 attacks are executed in an IoT topology composed of 105 devices. These attacks are classified into seven categories (i.e., DDoS, DoS, Recon, Web-based, brute force, spoofing, and Mirai) and all attacks are executed by malicious IoT devices targeting other IoT devices. Furthermore, this dataset includes multiple attacks not available in other IoT datasets and enables IoT professionals to develop new security analytics solutions using data in different formats. The dataset is available through the CIC Dataset website ( https://www.unb.ca/cic/datasets/index.html , accessed on 19 June 2023).

Compared to the state-of-the-art publications, the CICIoT2023 dataset extends existing IoT security insights by using an extensive topology with a variety of IoT devices, executing several attacks never present in a single IoT security dataset, and analyzing how widely-used machine learning (ML) methods perform in different classification scenarios.

Finally, this work enables the development of several future works, e.g., the optimization of ML models, the analysis of features and how they influence different ML models, the interpretation of classifications, and the analysis of transferability based on the comparison to other datasets.

Acknowledgments

The authors graciously acknowledge the support from the Canadian Institute for Cybersecurity (CIC), the funding support from the Canada Research Chair, and the Mastercard.

Funding Statement

This research received no external funding.

Author Contributions

Conceptualization, E.C.P.N., S.D., R.F., A.Z., R.L. and A.A.G.; methodology, E.C.P.N., S.D., R.F., A.Z., R.L. and A.A.G.; software, E.C.P.N., S.D., R.F. and A.Z.; validation, E.C.P.N., S.D., R.F., A.Z., R.L. and A.A.G.; formal analysis, E.C.P.N., S.D., R.F. and A.Z.; investigation, E.C.P.N., S.D., R.F., A.Z., R.L. and A.A.G.; resources, E.C.P.N., S.D., R.F. and A.Z.; data curation, E.C.P.N., S.D., R.F. and A.Z.; writing—original draft preparation, E.C.P.N., S.D., R.F. and A.Z.; writing—review and editing, E.C.P.N., S.D., R.F., A.Z., R.L. and A.A.G.; visualization, E.C.P.N., S.D., R.F. and A.Z.; supervision, R.L. and A.A.G.; project administration, S.D., R.L. and A.A.G.; funding acquisition, R.L. and A.A.G. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Data availability statement, conflicts of interest.

The authors declare no conflict of interest.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Research on Network Attack Detection Technology based on Reverse Detection and Protocol Analysis

Ieee account.

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests
US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support
About IEEE Xplore
Accessibility
Terms of Use
Nondiscrimination Policy
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Network Threats: A Step-by-Step Attack Demonstration

Follow this real-life network attack simulation, covering 6 steps from Initial Access to Data Exfiltration. See how attackers remain undetected with the simplest tools and why you need multiple choke points in your defense strategy.

Surprisingly, most network attacks are not exceptionally sophisticated, technologically advanced, or reliant on zero-day tools that exploit edge-case vulnerabilities. Instead, they often use commonly available tools and exploit multiple vulnerability points. By simulating a real-world network attack, security teams can test their detection systems, ensure they have multiple choke points in place, and demonstrate the value of networking security to leadership.

In this article, we demonstrate a real-life attack that could easily occur in many systems. The attack simulation was developed based on the MITRE ATT&CK framework, Atomic Red Team, Cato Networks ' experience in the field, and public threat intel. In the end, we explain why a holistic security approach is key for network security .

The Importance of Simulating a Real-life Network Attack

There are three advantages to simulating a real attack on your network:

You can test your detections and make sure they identify and thwart attacks. This is important for dealing with run-of-the-mill attacks, which are the most common types of attacks.
Real attacks help you demonstrate that defense relies on multiple choke points. An attack is almost never the result of a single point of failure, and therefore, a single detection mechanism isn't enough.
Real attacks help you demonstrate the importance of network monitoring to your leadership. They show how real visibility into the network provides insights into breaches, allowing for effective mitigation, remediation, and incident response.

The Attack Flow

The attack flow demonstrated below is based on six steps:

Initial Access
Ingress Tool Transfer
Credential Dumping
Lateral Movement and Persistence
Data Exfiltration

These steps were chosen since they exemplify common techniques that are ubiquitous in attacks.

Now, let's dive into each step.

1. Initial Access

The attack begins with spear-phishing, which establishes initial entry into the network. For example, with an email sent to an employee with a lucrative job offer. The email has an attached file. In the backend, the malicious attachment in the email runs a macro and exploits a remote code execution vulnerability in Microsoft Office with a Hoaxshell, which is an open-source reverse shell.

According to Dolev Attiya, Staff Security Engineer for Threats at Cato Networks, "A defense-in-depth strategy could have been useful as early as this initial access vector. The phishing email and the Hoaxsheel could have been caught through an antivirus engine scanning the email gateway, an antivirus on the endpoint or through visibility into the network and catching command and control of the network artifact generated by the malicious document. Multiple controls increase the chance of catching the attack."

2. Ingress Tool Transfer

Once access is gained, the attacker transfers various tools into the system to assist with further stages of the attack. This includes Powershell, Mimikatz, PSX, WMI, and additional tools that live off the land.

Attiya adds, "Many of these tools are already inside the Microsoft Windows framework. Usually, they are used by admins to control the system, but attackers can use them as well for similar, albeit malicious, purposes."

3. Discovery

Now, the attacker explores the network to identify valuable resources, like services, systems, workstations, domain controllers, ports, additional credentials, active IPs, and more.

According to Attiya, "Think of this step as if the attacker is a tourist visiting a large city for the first time. They are asking people how to get to places, looking up buildings, checking street signs, and learning to orient themselves. This is what the attacker is doing."

4. Credential Dumping

Once valuable resources are identified the previously added tools are used to extract credentials for multiple users to compromised systems. This helps the attacker prepare for lateral movement.

5. Lateral Movement and Persistence

With the credentials, the attacker moves laterally across the network, accessing other systems. The attacker's goal is to expand their foothold by getting to as many users and devices as possible and with as high privileges as possible. This enables them to hunt for sensitive files they can exfiltrate. If the attacker obtains the administrator's credentials, for example, they can obtain access to large parts of the network. In many cases, the attacker might proceed slowly and schedule tasks for a later period of time to avoid being detected. This allows attackers to advance in the network for months without causing suspicion and being identified.

Etay Maor, Sr. Director of Security Strategy, says "I can't emphasize enough how common Mimikatz is. It's extremely effective for extracting passwords, and breaking them is easy and can take mere seconds. Everyone uses Mimikatz, even nation-state actors."

6. Data Exfiltration

Finally, valuable data is identified. It can be extracted from the network to a file-sharing system in the cloud, encrypted for ransomware , and more.

How to Protect Against Network Attacks

Effectively protecting against attackers requires multiple layers of detection. Each layer of security in the kill chain must be strategically managed and holistically orchestrated to prevent attackers from successfully executing their plans. This approach helps anticipate every possible move of an attacker for a stronger security posture.

To watch this entire attack and learn more about a defense-in-depth strategy, watch the entire masterclass here.

How to Update and Automate Outdated Security Processes

Download the eBook for step-by-step guidance on how to update your security processes as your business grows.

Cybersecurity Webinars

Learn techniques to secure software supply chain.

Boost your resilience against evolving cyber threats with proactive threat hunting tips from leading industry experts.

Learn Advanced DDoS Prevention Tactics

Explore the latest in DDoS attack tactics and how to shield your business from advanced DDoS threats at our live webinar.

Help | Advanced Search

Computer Science > Cryptography and Security

Title: byzantine attacks exploiting penalties in ethereum pos.

Abstract: In May 2023, the Ethereum blockchain experienced its first inactivity leak, a mechanism designed to reinstate chain finalization amid persistent network disruptions. This mechanism aims to reduce the voting power of validators who are unreachable within the network, reallocating this power to active validators. This paper investigates the implications of the inactivity leak on safety within the Ethereum blockchain. Our theoretical analysis reveals scenarios where actions by Byzantine validators expedite the finalization of two conflicting branches, and instances where Byzantine validators reach a voting power exceeding the critical safety threshold of one-third. Additionally, we revisit the probabilistic bouncing attack, illustrating how the inactivity leak can result in a probabilistic breach of safety, potentially allowing Byzantine validators to exceed the one-third safety threshold. Our findings uncover how penalizing inactive nodes can compromise blockchain properties, particularly in the presence of Byzantine validators capable of coordinating actions.

Submission history

Access paper:.

HTML (experimental)
Other Formats

References & Citations

Google Scholar
Semantic Scholar

BibTeX formatted citation

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

This paper is in the following e-collection/theme issue:

Published on 25.4.2024 in Vol 26 (2024)

Digital Therapeutic (Mika) Targeting Distress in Patients With Cancer: Results From a Nationwide Waitlist Randomized Controlled Trial

Authors of this article:

Original Paper

Franziska Springer 1 * , MSc ;
Ayline Maier 2 * , PhD ;
Michael Friedrich 1 , PhD ;
Jan Simon Raue 2 , PhD ;
Gandolf Finke 2 , PhD ;
Florian Lordick 3, 4 , Prof Dr ;
Guy Montgomery 5 , Prof Dr ;
Peter Esser 1 , PhD ;
Hannah Brock 1 , MSc ;
Anja Mehnert-Theuerkauf 1 , Prof Dr

1 Department of Medical Psychology and Medical Sociology, Comprehensive Cancer Center Central Germany, University Medical Center Leipzig, Leipzig, Germany

2 Fosanis GmbH, Berlin, Germany

3 Department of Medicine II, University Medical Center Leipzig, Leipzig, Germany

4 University Cancer Center Leipzig, Comprehensive Cancer Center Central Germany, Leipzig, Germany

5 Center for Behavioral Oncology, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, United States

*these authors contributed equally

Corresponding Author:

Anja Mehnert-Theuerkauf, Prof Dr

Department of Medical Psychology and Medical Sociology, Comprehensive Cancer Center Central Germany, University Medical Center Leipzig

Philipp-Rosenthal-Str. 55, Haus W

Leipzig, 04103

Phone: 49 341 97 18800

Email: [email protected]

Background: Distress is highly prevalent among patients with cancer, but supportive care needs often go unmet. Digital therapeutics hold the potential to overcome barriers in cancer care and improve health outcomes.

Objective: This study conducted a randomized controlled trial to investigate the efficacy of Mika, an app-based digital therapeutic designed to reduce distress across the cancer trajectory.

Methods: This nationwide waitlist randomized controlled trial in Germany enrolled patients with cancer across all tumor entities diagnosed within the last 5 years. Participants were randomized into the intervention (Mika plus usual care) and control (usual care alone) groups. The participants completed web-based assessments at baseline and at 2, 6, and 12 weeks. The primary outcome was the change in distress from baseline to week 12, as measured by the National Comprehensive Cancer Network Distress Thermometer. Secondary outcomes included depression, anxiety (Hospital Anxiety and Depression Scale), fatigue (Functional Assessment of Chronic Illness Therapy-Fatigue), and quality of life (Clinical Global Impression-Improvement Scale). Intention-to-treat and per-protocol analyses were performed. Analyses of covariance were used to test for outcome changes over time between the groups, controlling for baseline.

Results: A total of 218 patients (intervention: n=99 and control: n=119) were included in the intention-to-treat analysis. Compared with the control group, the intervention group reported greater reductions in distress ( P =.03; ηp²=0.02), depression ( P< .001; ηp²=0.07), anxiety ( P= .03; ηp²=0.02), and fatigue ( P= .04; ηp²=0.02). Per-protocol analyses revealed more pronounced treatment effects, with the exception of fatigue. No group difference was found for quality of life.

Conclusions: Mika effectively diminished distress in patients with cancer. As a digital therapeutic solution, Mika offers accessible, tailored psychosocial and self-management support to address the unmet needs in cancer care.

Trial Registration: German Clinical Trials Register (DRKS) DRKS00026038; https://drks.de/search/en/trial/DRKS00026038

Introduction

In addition to somatic symptoms such as pain [ 1 ], patients with cancer report elevated levels of distress, anxiety, and depression [ 2 , 3 ]. Epidemiological data show that the prevalence of clinically substantial psychological distress typically ranges from 30% to 60% among patients with cancer [ 2 , 4 ]. Psychological distress can persist long after the end of treatment and is associated with reduced quality of life (QoL), lower cancer treatment adherence, and lower survival rates [ 5 ].

Supportive care interventions to prevent and manage the adverse psychological and physical effects of cancer across the cancer trajectory effectively improve outcomes such as emotional distress, QoL, and fatigue [ 6 ]. Optimal supportive care is holistic and patient centered, that is, based on the needs of each individual patient [ 7 ]. However, access to supportive care is often limited by a lack of specialist staff, organizational deficiencies, and barriers that cause patients to avoid or delay their treatment [ 8 - 10 ]. Thus, emerging or persistent supportive care needs across the cancer trajectory often go unmet, with detrimental psychosocial and emotional impacts on patients with cancer [ 11 ]. Moreover, the number of patients living with cancer has increased rapidly in recent years [ 12 ] due to improved early detection, diagnosis, and oncological treatments, posing a growing challenge to health systems worldwide to ensure adequate and long-term care for all patients with cancer [ 13 ].

The increasing use of digital health has ushered in a new era of patient-centered cancer care due to its potential for cancer care delivery [ 14 ]. Digital health interventions provide multiple benefits: they facilitate easy and low-threshold access to care, can overcome barriers to care (eg, location, time, and health status), may enhance symptom management through real-time symptom assessment, are scalable, and provide cost-effective and efficient information sharing [ 14 ]. Growing literature suggests that digital therapeutics, a subset of digital health interventions providing evidence-based treatments driven by software, play a useful role in addressing the unmet needs of patients with cancer [ 15 ]. For instance, various mobile apps have proven to be effective in catering to specific needs of patients with cancer, such as pain, anxiety, or QoL, by using different types of interventions, such as psychoeducation, physical exercises, or coping skills training (eg, [ 16 - 19 ]). Moreover, large analyses such as systematic reviews and meta-analyses evaluating the efficacy of app-based interventions for patients with cancer show positive effects on patient-relevant outcomes, such as distress, QoL, anxiety, depression, pain, and fatigue [ 20 - 23 ].

Existing app-based supportive care interventions provide various intervention modules, such as symptom monitoring, psychoeducation, mindfulness exercises, physical exercises, and cognitive behavioral therapy (CBT) techniques [ 24 ]. However, most of these apps are limited in their scope, targeting only specific symptoms (eg, fatigue) [ 25 ] and health behaviors (eg, physical activity) [ 26 ], or provide only a single function (eg, mindfulness training or symptom tracking) [ 27 - 29 ]. Furthermore, some of these apps were originally developed for non–oncology patient populations and have only been slightly adapted for patients with cancer [ 30 ]. Only a few apps offer a broader range of intervention modules [ 25 , 31 ], but they target specific subgroups of patients with cancer (eg, patients with 1 tumor entity or with specific symptoms).

Despite the evident need, there is yet no digital therapeutic that comprehensively addresses the problems faced by all patients with cancer and simultaneously offers tailored support for each individual patient. Therefore, we investigated the efficacy of Mika (developed by Fosanis GmbH), an app-based digital therapeutic that addresses all patients with cancer transdiagnostically and provides a holistic supportive care intervention. The app incorporates evidence-based supportive care elements, such as distress and symptom monitoring [ 32 ], CBT-based coping skills training [ 33 ], mindfulness-based stress reduction (MBSR) [ 34 , 35 ], strength and flexibility training [ 36 ], and patient education [ 37 ], thus targeting different aspects of psychological distress. An artificial intelligence algorithm individually tailors the content of the app to patients’ needs, considering cancer type, cancer treatment stage, and use behavior. A previously conducted pilot study of 70 patients with gynecological cancer indicated Mika’s feasibility and potential efficacy [ 38 ]. Considering the significant prevalence and impact of psychological distress among patients with cancer, this condition was selected as the primary end point of our study. This is underscored by the app’s integrated features for distress tracking and management alongside the widespread recommendation for distress screening in routine clinical care. Distress is recognized as a crucial clinical marker for assessing the efficacy of interventions across various tumor types and catering to the immediate and long-term supportive care needs of this patient group.

The primary aim of this waitlist randomized controlled trial (RCT) was to examine the efficacy of the Mika app for general distress in patients with cancer. The secondary aim was to assess the efficacy of the Mika app on anxiety, depression, fatigue, and QoL. We hypothesized that participants receiving access to the Mika app plus usual care (UC) for 12 weeks would report greater reductions in distress, anxiety, depression, and fatigue and greater improvements in QoL compared to participants receiving UC only.

Study Design

This nationwide unblinded 2-arm waitlist RCT evaluated the efficacy of the app-based digital therapeutic Mika in reducing distress in patients with cancer and was conducted fully decentralized in Germany, that is, participant recruitment, delivery of the study intervention, and outcome data collection were conducted without involving in-person contact between the study team and the participants. In this RCT, participants were assigned to either (1) access to the Mika app plus UC (intervention group [IG]), or (2) UC alone (control group [CG]). Participants were assessed at baseline (t0), 2 weeks (t1), 6 weeks (t2), and 12 weeks (t3) using self-report questionnaires. Once the participants in the CG completed the 12-week questionnaire, they also received access to the Mika app.

Ethical Considerations

The trial was approved by the Ethics Committee of the Medical Faculty of Leipzig University (404/21-ek) and was registered at the German Clinical Trials Register (DRKS00026038) in October 2021. All participants provided written informed consent prior to their participation in the study and retained the autonomy to withdraw from the study at any time. All personal data collected and used for this study underwent deidentification to safeguard the anonymity of participants. Monetary compensation was not provided to participants for their involvement in the study.

Participants

Textbox 1 shows the inclusion and exclusion criteria for this study. We only included patients who had been diagnosed with cancer or relapse within the last 5 years as they are likely to feel burdened by the physical and psychological effects of the disease and its treatment and therefore require supportive care. Epidemiological data indicate that supportive care needs typically decline in the years of long-term survivorship (cancer or relapse diagnosis ≥5 years ago) [ 39 ]. Participants were required to confirm their cancer diagnosis during the course of the study by submitting a letter from their treating physician. The study team enrolled patients after they had provided written informed consent, which had to be completed at home and submitted by email or mail.

Inclusion criteria

Age≥18 years
Cancer diagnosis or relapse diagnosis within the last 5 years (10th revision of the International Statistical Classification of Diseases and Related Health Problems: C00-C97)
Access to a smartphone or tablet
Ability to provide informed consent

Exclusion criteria

Insufficient German language skills
Inability to use a smartphone or tablet
Prior use of the investigated digital therapeutic

Random Assignment

Participants were randomly assigned (1:1) to either the IG or CG using permuted block randomization with blocks of 4 based on an a priori created randomization list. The allocation sequence was concealed from the study investigators until assignment. Due to the nature of the intervention, it was not feasible to blind participants or the study team to the group assignment.

Recruitment and Procedure

Between September and November 2021, patients were recruited via social media advertising campaigns (Facebook and Instagram, Meta Inc) and informational emails to cancer support groups that directed patients to the trial website with a contact form for study registration. In addition, patients were recruited from a participant pool consisting of participants from previous independent studies at the University Medical Center Leipzig. Patients from the participant pool were approached directly by the study team via phone.

All interested patients were screened by phone to determine eligibility. To identify patients who were already users of the digital therapeutic, the study team asked participants about their use of digital support, however, without referring to the publicly available digital therapeutic by name to prevent CG patients from accessing the digital therapeutic before their enrollment in the study. Eligible patients received study information in the form of a video and text via email. Patients were informed that they were required to submit a physician’s letter confirming their cancer diagnosis via a secure cloud data-sharing service (TeamDrive, Crunchbase) during the course of their study participation. After providing informed consent, the participants were randomized into the IG or CG and completed the baseline questionnaires. Participants were informed about their group assignment following a completed baseline assessment. IG participants received a study access code to activate the app after downloading it from the app stores for either Android or iOS smartphones, allowing free use. The questionnaire battery was administered electronically using LimeSurvey (LimeSurvey GmbH). All participants received email invitations and reminders at 2, 6, and 12 weeks to complete the questionnaire. This RCT focused on changes in outcomes from baseline (t0) to week 12 (t3). The 2 assessments in between (t1 and t2) were not part of the analysis; an analysis of the trajectory of the symptoms is planned for the future. Once the CG participants completed the 12-week questionnaire, they also received a study access code that could be used to activate the app. All the participants received information about the app’s content and technical application via a standardized telephone introduction to the app. All participants were contacted for an exploratively structured telephone interview after completing the 12-week questionnaire. During this interview, the use of psychotherapeutic support during study participation was assessed. Data collection ended in March 2022.

Data monitoring was performed via standardized phone calls following questionnaire completion of each participant across all measurement time points to ensure data validity. These phone calls served to ask participants to provide missing questionnaire data, to allow participants to clarify difficulties in understanding single questionnaire items, and to provide assistance with limited app functionality. Missing questionnaire data were entered directly into the database by the study team, with a study team member reading the unanswered questions and associated response options to participants verbatim, prompting them to select their response option.

Self-reported adverse reactions and side effects of the investigated digital therapeutic were assessed at each measurement time point as part of the web-based questionnaire battery.

Intervention

Mika is an app-based digital therapeutic that provides a personalized supportive intervention aiming to reduce distress associated with cancer and its medical treatment, thus improving patients’ QoL. Mika comprises 3 modules: Check-Up , Discover, and Journeys . The Check-Up module allows for the monitoring of distress and symptom monitoring with electronic patient-reported outcomes that can be shared and discussed with the attending physician. The Discover module delivers coaching via articles and videos on cancer types and medical treatments, psychological well-being, physical activity, diet, and social and financial issues, which are based on scientific evidence and presented in a clear and understandable manner for patients. The Journeys module provides users with evidence-based, resource-activating training courses combining psychoeducation and exercises to help patients cope with the mental and physical effects of cancer, for example, coping with stress and fatigue, making decisions, or living with immunotherapy (for more details on the app modules, refer to Table 1 and Figure 1 ). An artificial intelligence algorithm within the app customizes the content for each patient. This includes personalized recommendations based on cancer type, cancer treatment stage, and crucially; the nature and severity of reported symptoms; and ensuring personalized support for each individual. This customization process not only accounts for general patient information but also actively incorporates real-time symptom tracking data and user reading behavior using an attentional factorization machine that predicts a patient’s likelihood of engaging with specific content. This approach focuses on important feature interactions related to content consumption [ 40 ], ensuring that recommendations are dynamically adjusted as patients report changes in symptoms and interact with the content. In addition, the algorithm uses a Dirichlet loss function to estimate the uncertainty in predictions [ 41 ], allowing the content to be ranked and presented based on the estimated read probability. The model undergoes monthly updates using historical data, optimizing through hyperparameter tuning evaluated by 7-fold time series cross-validation.

It is hypothesized that the digital therapeutic empowers patients with cancer by improving their health literacy and self-management along the cancer trajectory using evidence-based methods, such as symptom monitoring, patient education, MBSR, strength and flexibility training, acceptance and commitment therapy, and CBT-based coping skills training.

The Mika app was developed by Fosanis GmbH in collaboration with leading research institutions, such as the Charité University Hospital Berlin, University Hospital Leipzig, and the National Center for Tumor Diseases Heidelberg. All content of the app was carefully reviewed by experts (eg, oncologists, psychotherapists, nutritionists, and physiotherapists) before publication. The feasibility and preliminary efficacy of Mika were investigated in a previously conducted randomized pilot study involving 70 patients with gynecological cancer [ 38 ]. Mika is available for download free of charge in German and the United Kingdom app stores for Android and iOS smartphones.

IG participants could freely choose the modules to work on. While regular app use was recommended, participants were instructed to use the app at least 3 times a week.

a PRO: patient-reported outcome.

UC Condition

UC consisted of all health care that patients in Germany usually receive. There were no restrictions on health care use.

Outcome Assessment

Primary outcome.

The primary outcome was the change in psychological distress from baseline to 12 weeks, measured using the validated German version of the National Comprehensive Cancer Network Distress Thermometer [ 42 ]. Distress Thermometer is a well-established single-item self-report measure that assesses the global level of distress on a 0 (no distress) to 10 (extreme distress)-point Likert scale. It shows excellent psychometric properties across various cancer populations worldwide and is recommended as a clinical tool for routine clinical care [ 43 ]. A score ≥5 indicates clinically significant levels of distress.

Secondary Outcomes

The secondary outcomes included changes in anxiety and depression symptoms, fatigue from baseline to 12 weeks, and QoL at 12 weeks. Anxiety and depression symptoms were measured using the Hospital Anxiety and Depression Scale [ 44 ], a 14-item self-report measure of anxiety and depression, with 7 items measuring each subscale. Scores for each subscale range from 0 to 21, with a higher score indicating higher levels of anxiety or depression and a cutoff score of ≥8 for each subscale. Fatigue was assessed using the Functional Assessment of Chronic Illness Therapy-Fatigue [ 45 ], a 13-item measure that assesses self-reported tiredness, weakness, and difficulty in performing usual activities due to fatigue. The Functional Assessment of Chronic Illness Therapy-Fatigue score ranges from 0 to 52, with higher scores representing less fatigue. Self-reported QoL was measured using an adapted version of the Clinical Global Impression-Improvement Scale [ 46 ], a single-item 7-point measure that assesses the overall improvement of a patient’s disease relative to a baseline state at the beginning of the intervention. In this trial, the Clinical Global Impression-Improvement Scale measured improvement in QoL relative to the beginning of the study, with a value of 4 indicating no change, <4 indicating improvement, and >4 indicating deterioration in QoL.

Intervention Safety

The safety of the digital therapeutic was assessed by the number and type of self-reported adverse reactions and side effects during the trial duration.

Intervention Adherence and Engagement

Adherence to the intervention was assessed by tracking app activities. IG participants were considered active once they activated the app using the study access code and consented to the Mika app’s privacy terms. Subsequently, their pseudonymized in-app activities were automatically recorded as log data. These log data facilitated the evaluation of intervention adherence, defined as the number of days with ≥1 app activity during each of the three 4-week periods (0-4, 5-8, and 9-12 weeks) within the 12-week intervention. Such an approach enabled us to capture the frequency and diversity of app engagement, thus embodying a comprehensive definition of adherence. In addition, engagement across the app’s 3 modules—Check-Up, Discover, and Journeys—was analyzed.

Statistical Analysis

Given an estimated dropout rate of 20% (50/250), a priori sample calculations showed that a sample of 2×125 (N=250) at baseline was needed to detect a change of 1 scale point (SD 2; α=.05; 1−β=.8) in the primary outcome.

Primary analyses were performed using the intention-to-treat (ITT) principle, which included all randomized participants with a confirmed cancer diagnosis by a physician’s letter. Analyses were also performed per-protocol (PP), which was restricted to participants who (1) completed the self-report questionnaire at all measurement time points, (2) did not receive psychotherapeutic support during study participation, (3) did not use the investigated digital therapeutic before receiving access during study participation, and (4) used the investigated digital therapeutic at least 1 time per period up to the 5- to 8-week period of the 12-week intervention period (only IG).

Analysis of covariance was used to examine changes in distress, depression, anxiety, and fatigue outcomes between the trial arms from baseline to 12 weeks, controlling for baseline scores. Exploratory regression analyses were conducted to investigate potential variables influencing the primary outcome. These analyses focused exclusively on sociodemographic and clinical factors that showed differences between the IG and CG in the initial group comparison. Partial eta–squared was reported as the effect size for all analyses of covariance, with effect sizes interpreted as small, medium, and large at ≥0.01, ≥0.06, and ≥0.14 [ 47 ], respectively. Differences in QoL between trial arms at follow-up (12 weeks) were analyzed with a 2-tailed 2-sample t test, using Hedges g ' as a measure of effect size (≥0.2=small effect, ≥0.5=medium effect, and ≥0.8=large effect [ 47 ]).

Missing outcome data at random were imputed using the expectation-maximization algorithm. For dropouts, the last observation carried forward was used. For deceased participants, the worst possible values were assumed. Dropouts were participants who failed to complete the baseline or follow-up questionnaires or failed to provide a physician’s letter confirming their cancer diagnosis. A dropout analysis was performed to compare the variables of age, sex, and baseline distress between study noncompleters (dropouts) and study completers using chi-square and t tests. Furthermore, to model the robustness of the primary efficacy analysis under different assumptions for missing data mechanisms, an explorative sensitivity analysis using reference-based multiple imputation (jump-to-reference) [ 48 ] was performed in the extended ITT population (all randomized participants). For this purpose, monotone missing values were replaced using the jump-to-reference approach, whereas sporadic missing values were replaced under the assumption of missing at random. For jump-to-control and jump-to-reference imputation, 50 data sets were generated to minimize the loss of statistical power. The results were then aggregated across the imputed data sets [ 49 ].

All statistical tests were 2-tailed, with a significance level of 5%. Analyses were performed using R (version 4.1.0; R Foundation for Statistical Computing) [ 50 ].

Study Sample

Over the 3-month recruitment period, 517 persons were screened for eligibility and 321 were determined eligible. Of the 321 participants, 248 (77.3%) gave informed consent and were randomly assigned to the IG and the CG ( Figure 2 ). Of the 248 participants, 37 (14.9%) were considered dropouts because they did not complete baseline or follow-up assessments (n=7), failed to confirm their cancer diagnosis by submission of a physician’s letter (n=7), or both (n=23). Age and sex of study dropouts and study completers did not differ ( P age =.89 and P sex =.23), but participants who dropped out showed higher distress levels at baseline compared to study completers ( P =.02). Participants without a verified cancer diagnosis (30/248, 12.1%) were excluded from the ITT analysis, resulting in an ITT population of 218 participants (n=99, 45.4% IG and n=119, 54.6% CG). Of the 218 participants, 173 (79%) were recruited via social media advertisements and cancer support groups and 45 (21%) were recruited using the participant pool of prior studies.

Baseline characteristics were balanced between the groups ( Table 2 ), but participants in the IG were younger compared with those in the CG ( P =.02). No baseline differences in the primary and secondary outcome parameters were observed between the groups, with P values as follows: P =.99 (distress), P =.25 (depression), P =.47 (anxiety), and P =.21 (fatigue). On average, participants were 56 (SD 11) years old, and 60.6% (132/218) of the participants were female and had been diagnosed with cancer 25 (SD 17) months earlier. The most frequently reported cancer types were breast cancer (74/218, 33.9%) and hematological cancer (61/218, 28%), with 8.7% (19/218) of participants reporting a diagnosis of relapsed cancer. The PP population comprised 124 participants, following the exclusion of 94 participants. The primary reasons for exclusion were psychotherapeutic support during study participation and prior use of the investigated digital therapeutic.

a Intervention=12-week access to digital therapeutic app intervention+usual care.

b Control=usual care.

c NCCN Distress Thermometer: National Comprehensive Cancer Network Distress Thermometer (at baseline, clinically significant level of distress≥5).

d HADS-A: Hospital Anxiety and Depression Scale, anxiety subscale (German version, at baseline, cutoff score ≥8).

e HADS-D: Hospital Anxiety and Depression Scale, depression subscale (German version, at baseline, cutoff score ≥8).

f Multiple reasons are possible within 1 patient, and cases do not add up to the total number.

After 12 weeks, participants in the IG reported a reduced level of distress compared to participants in the CG in the ITT population ( F 1,215 =4.7; P =.03; ηp²=0.02; Table 3 ). The observed treatment effect was more pronounced in the PP population ( F 1,121 =6.9; P =.01; ηp²=0.05). The analysis revealed that higher levels of baseline distress predicted a greater change in distress after 12 weeks in the IG. An exploratory regression analysis yielded no predictive effect of age on the change in distress. The explorative sensitivity analysis among all randomized participants (n=248) showed comparable treatment effects (jump-to-control: F 1,19375.1 =5.3; P =.02; ηp²=0.02 and jump-to-intervention: F 1,15314.8 =5.9; P =.02; ηp²=0.02).

a An analysis of covariance was used to test for differences in change in distress levels between groups from baseline to follow-up (12 weeks), controlling for baseline. The partial eta–squared is the reported standardized effect size for the mean difference. The effect sizes can be interpreted as small, medium, or large at ≥0.01, ≥0.06, and ≥0.14, respectively. The results of the intention-to-treat and per-protocol analysis are reported.

b Intervention=12-week access to digital therapeutic app intervention+usual care.

c Control=usual care.

d N/A: not applicable.

In the ITT population, symptoms of anxiety ( F 1,215 =4.8; P =.03; ηp²=0.02), depression ( F 1,215 =15.5; P <.001; ηp²=0.07), and fatigue ( F 1,215 =4.4; P =.04; ηp²=0.02) improved in participants in the IG from baseline to 12 weeks compared to participants in the CG ( Table 4 ). The observed treatment effects on anxiety and depression were more pronounced in the PP population (anxiety: F 1,121 =7.2; P =.01; ηp²=0.06 and depression: F 1,121 =14.9; P <.001; ηp²=0.11). A trend-to-significant treatment effect was observed for fatigue symptoms in the PP population ( F 1,121 =3.8; P =.05; ηp²=0.03). QoL did not differ significantly between the groups at 12 weeks (ITT: t 216 =0.88; P =.38; g=0.12 and PP: t 122 =1.63; P =.11; g=0.30).

c HADS-A: Hospital Anxiety and Depression Scale, anxiety subscale (German version, at baseline, cutoff score ≥8).

d ITT: intention-to-treat.

e N/A: not applicable.

f Italicized values are significant at P <.05.

g PP: per-protocol.

h HADS-D: Hospital Anxiety and Depression Scale, depression subscale (German version, at baseline, cutoff score ≥8).

i FACIT-F Functional Assessment of Chronic Illness Therapy–Fatigue.

j CGI-I: Clinical Global Impression Improvement.

Safety Outcomes

IG participants reported no adverse reactions or side effects of digital therapeutic during the study.

Of the 99 participants in the IG (ITT), 98 (99%), 78 (79%), and 67 (68%) used the digital therapeutic intervention at 0- to 4-, 5- to 8-, and 9- to 12-week periods of the 12-week intervention, respectively, demonstrating good initial adherence to the intervention, which decreased moderately over time. App use (module use and days spent on the app) decreased over time ( Table 5 ). IG participants accessed content from various categories at different frequencies. The most accessed categories were cancer therapy, symptoms and side effects, and nutrition in cancer, with 80% (79/99), 83% (82/99), and 80% (79/99) of users accessing the content in these categories, respectively. Conversely, partnership and family, relaxation, and recipes were accessed less, with 29% (28/99), 34% (33/99), and 32% (31/99) of users, respectively.

Principal Findings

This nationwide waitlist RCT examined the efficacy of Mika, an app-based digital therapeutic that provides a personalized supportive intervention for patients with cancer. Participants who had access to the Mika app for 12 weeks showed significant improvements in perceived distress (ie, the primary outcome) and symptoms of anxiety, depression, and fatigue (ie, the secondary outcomes) compared to participants who received UC. The observed treatment effects were similar in the ITT and PP populations but more pronounced in the PP population, indicating the overall robustness of the findings. We observed no group difference in the QoL after 12 weeks. Intervention adherence was good, and no adverse reactions or side effects of the investigated digital therapeutic were reported.

Comparison With Prior Work

While a growing body of research shows evidence of the efficacy of app-based interventions for oncological populations on distress, fatigue, anxiety, and depression [ 20 , 25 , 27 , 31 , 51 ], this is the first study to examine the efficacy of a single holistic app-based digital therapeutic based on multiple intervention modules on these patient-relevant outcomes. Although the improvement in the primary outcome was modest, it reflects the nuanced nature of psycho-oncological interventions, where even modest changes can have significant clinical relevance. Furthermore, we conducted comprehensive testing of the effects of the investigated digital therapeutic on patients with cancer across all tumor entities, using a larger sample size compared to most previous studies [ 25 , 27 , 31 , 51 ].

In contrast to the findings of this study, however, other studies found an effect of app-based supportive interventions on QoL [ 23 , 27 , 31 ]. This difference in findings could be due to differences in the operationalization and measurement of QoL. In this study, participants’ global QoL was assessed using a single-item questionnaire after a 12-week intervention period. However, global QoL has been shown to be less affected in patients with cancer compared to specific components of QoL, such as social or cognitive functioning and symptom burden from fatigue or insomnia [ 52 ]. Further research using different QoL assessment tools could provide more insights into the efficacy of the investigated digital therapeutic on specific aspects of QoL.

A significant level of intervention adherence and engagement with the digital therapeutic, with varying degrees of interaction across the different app modules, indicates good acceptability and perceived subjective benefit of the investigated digital therapeutic and allows for reliable conclusions about its efficacy in oncological settings. The broad range of engagement, as illustrated by the IQRs, underscores the personalized nature of app use, catering to diverse participant needs and preferences. The variability in engagement levels across different app modules highlights the importance of personalizing digital therapeutics to increase adherence and maximize therapeutic effects.

As we evaluated the app intervention holistically, future studies should examine the impact of the app’s individual components.

While the dropout rate in the IG was slightly higher than that in the CG, the dropout rate in the IG as well as the overall dropout rate was low compared to other app-based supportive interventions [ 25 , 30 ]. Considering that patients with cancer have been found to have a positive attitude toward digital health [ 53 , 54 ], the findings of this study add to the notion that digital health interventions have the potential to overcome barriers associated with access to supportive care in oncological populations [ 55 ].

We found a positive effect of the investigated digital therapeutic on general psychological distress and a broad range of specific distress-associated parameters. Importantly, improvements in psychological symptoms, that is, depression and anxiety, can also have a positive tertiary preventive effect on cancer progression [ 5 ]. The effect sizes in this study ranged from small (ηp²=0.02) to medium (ηp²=0.07) in the ITT population and were more pronounced in the PP population (ηp²=0.05-0.11). The primary outcome improvement, while subtle, aligns with the expected outcomes in psycho-oncological interventions, highlighting the importance of considering the broad spectrum of therapeutic impacts. The medium to large effects observed in secondary end points, together with the primary outcome, illustrate the broad therapeutic impact and highlight the digital therapeutic’s capacity to significantly improve key aspects of psychological well-being in patients with cancer. Small-to-medium effect sizes are common in in-person supportive care interventions [ 6 ]. Our results also compare well with other app-based supportive care interventions, such as small effect sizes reported for a CBT and psychoeducation self-management apps on fatigue [ 25 ] or small to medium effects of a web-based mindfulness-based intervention on anxiety and depression [ 56 ]. This is further supported by the results of several systematic reviews [ 20 , 21 ]. The fact that such effect sizes can be achieved with minimal cost and personnel effort via a digital approach further supports the significant potential for accessibility, reach, and impact of digital therapeutics.

Clinical Implications

The multifaceted intervention modules of the investigated digital therapeutic aim to support patients holistically. The investigated digital therapeutic hereby translates widely used evidence-based intervention methods within supportive care, such as symptom monitoring; patient education; modules of CBT, MBSR, and acceptance and commitment therapy; and strength and flexibility training, into a digital format. The intervention modules of the app are designed to help patients learn about their disease and prepare for discussions with clinicians in an informed decision-making process. This may reduce anxiety and insecurities across the cancer trajectory, while empowering patients and strengthening their self-efficacy.

While it is acknowledged that digital therapeutic interventions might not fully replicate the “in-person” experience, the scope and utility of these tools in the realm of oncology are substantial. For instance, a study evaluating a mobile app designed for tracking patient-reported daily activities found that when supervised by a physician, the data collected were more accurate than when used without guidance [ 57 ]. Conversely, a music app was equally effective in alleviating pain and anxiety in emergency department patients irrespective of supervision [ 58 ]. This suggests that certain interventions, such as symptom tracking, might be more prone to inaccuracies without proper guidance than passive activities, such as listening to music. In addition, CBT, which is traditionally the most effective in face-to-face settings, has generated interest in the digital domain. A study on the digital adaptation of mindfulness-based cognitive therapy for patients with cancer experiencing distress found the therapeutic connection between therapist and patient to be as potent as in in-person sessions [ 59 ]. This underlines the evolving role of digital therapeutics and its potential to reshape therapeutic avenues in oncology, thus paving the way for enhanced patient care.

Furthermore, considering the increasing number of patients with cancer experiencing psychosocial distress and the limited availability of health care professionals, digital therapeutics could present scalable and cost-effective solutions. These solutions can address symptoms and bolster the quality and accessibility of supportive care [ 55 , 60 , 61 ]. Recognizing patients’ diverse needs, tools such as the Mika app leverage artificial intelligence to deliver real-time, tailored support. This has the potential to benefit a broad spectrum of patients with cancer globally while also reducing the pressure on health care infrastructure and professionals. Therefore, digital therapeutics offer a patient-focused approach that is adaptable to specific clinical and lifestyle challenges such as disease management, emotional support, and health-related determinants. They might also further enhance medication adherence, tolerance to chemotherapy, and overall survival rate in the cancer care continuum [ 15 ]. Incorporating these digital tools into routine oncological supportive care can augment patient-centric care and enrich patient experience, safety, and interactions with clinicians [ 15 , 61 ]. However, while there is a consensus among medical professionals and stakeholders regarding the revolutionary potential of digital health in addressing cancer treatment challenges, the path to universal adoption remains intricate. Future studies should delve into the assimilation of digital therapeutics, such as Mika, into standard care across varied clinical environments and evaluate hurdles such as digital literacy and the acceptance of digital tools by both patients and health care professionals [ 62 - 64 ].

Strengths and Limitations

The main strength of this study was the app itself. It addresses the overreaching problem areas faced by all patients with cancer while providing tailored support for population-specific areas of burden (ie, cancer type, treatment status, and use behavior). Its flexible and easily accessible use allows for seamless integration into patients’ daily lives and continuity of supportive treatment. In addition, the low overall dropout rate and data monitoring led to very little missing data. Similar findings in the ITT, PP, and extended ITT populations suggest overall robustness of the results.

This study has several limitations. First, the web-based recruitment procedure may have led to study registration from patients with cancer who were particularly motivated, digitally literate, and highly functioning in seeking support during their cancer journey, which may limit the generalizability of the study. However, the use of additional recruitment pathways (support groups and participant pool) likely resulted in the recruitment of a more heterogeneous sample, possibly compensating for potential selection bias. Future studies might investigate the impact of various recruitment channels on the efficacy of digital therapeutics, and thus, which population may be particularly responsive to digital interventions. Second, the higher number of dropouts in the IG compared to the CG may reflect treatment dissatisfaction or lost interest in the treatment of some participants, potentially confounding the study’s results. Dropouts, who are more likely to show elevated levels of distress, may have been made aware of the increased need for support through the intervention modules. Patients with clinically significant levels of distress or mental disorders might have accessed support services with more guidance from a health care professional, such as psychotherapy or psycho-oncological counseling. However, no side effects or adverse events were reported in the IG, and the overall robust pattern of results in the ITT, PP, and extended ITT populations suggests a low risk of attrition bias. The fact that participants who dropped out of the study showed higher baseline distress levels may have led to an underestimation of the intervention effect as higher baseline distress levels predicted a greater change in outcome after treatment. Third, due to the nature of the intervention, the group allocation could not be blinded. While experimenter bias was reduced due to a predefined, standardized monitoring procedure and statistical analysis plan, IG participants may have anticipated potential effects. Fourth, the intervention, along with its adherence, was assessed as a whole, which requires the evaluation of specific modules and any potential dose-response relationship in the future. In addition, there was no specific measure to evaluate the subjective usefulness or satisfaction with the digital therapeutic under investigation. Incorporating such a measure could have provided targeted insights into the participants’ perceptions and experiences with the app. However, the observed use behavior, characterized by participants repeatedly accessing the app and actively engaging with its content, may serve as an indirect indicator of the app’s value to the participants. Future studies should aim to validate this interpretation. Finally, the study sample included participants with a wide variety of cancer diagnoses, which did not allow for the examination of diagnosis-specific intervention effects. However, the sample composition is consistent with the target population of the investigated digital therapeutic, which includes patients with cancer of all entities, and strengthens the study’s generalizability and clinical utility. Moreover, a large body of data shows that while variables such as cancer type, treatment status, disease progression, and sex may influence the magnitude of treatment response to supportive therapy, the beneficial effects of supportive therapy are present across various cancer subpopulations [ 65 - 67 ]. In addition, there is a consensus that psychosocial support needs to be integrated into routine cancer care for all cancer types [ 68 , 69 ].

Conclusions

In summary, this RCT demonstrated that Mika, an app-based digital therapeutic that provides a personalized supportive care intervention, can effectively reduce psychological distress and further alleviate symptoms of anxiety, depression, and fatigue in patients with cancer. Digital therapeutics, such as Mika, deliver easily accessible, patient-centered, and effective psychosocial and self-management support for patients with cancer across the course of the disease. Digital therapeutics may present scalable solutions to support patients with cancer worldwide and thus help fill the supportive care gap. Further research is needed to explore the integration of Mika into routine cancer care and its efficacy in diverse clinical settings.

Acknowledgments

The clinical trial was funded by Fosanis GmbH, Berlin.

Data Availability

The data set generated during and analyzed during this study, including individual participant data that underlie the results reported in this article after deidentification (text, tables, and figures), clinical study report, informed consent form, and analytic code, are available from AMT beginning 3 months and ending 5 years following article publication. Access to the data will be granted to investigators whose proposed use of the data has been approved by an independent review committee identified for this purpose, for individual patient data meta-analysis. Proposals for accessing the data may be submitted up to 36 months following article publication.

Authors' Contributions

JSR and GF provided financial support. FS, AM, and HB provided administrative support, with FS also contributing to the collection and assembly of data. MF contributed to data curation. FS, AM, and MF contributed to data analysis and interpretation. FS and AM equally contributed to writing the original draft. All authors contributed to reviewing and editing the draft and provided final approval of the manuscript. AMT, FS, and JSR contributed to the conception and design.

Conflicts of Interest

FS, MF, and HB received research funding for this trial from Fosanis GmbH, which was paid to their institution. AM is an employee at the company Fosanis GmbH. JSR and GF work for the company Fosanis GmbH. They are the managing directors and board members of Fosanis GmbH and own shares of Fosanis GmbH. All other authors declare no other conflicts of interest.

CONSORT-eHEALTH checklist (V 1.6.1).

Pachman DR, Barton DL, Swetz KM, Loprinzi CL. Troublesome symptoms in cancer survivors: fatigue, insomnia, neuropathy, and pain. J Clin Oncol. Oct 20, 2012;30(30):3687-3696. [ CrossRef ] [ Medline ]
Mehnert A, Hartung TJ, Friedrich M, Vehling S, Brähler E, Härter M, et al. One in two cancer patients is significantly distressed: prevalence and indicators of distress. Psychooncology. Jan 16, 2018;27(1):75-82. [ CrossRef ] [ Medline ]
Mitchell AJ, Chan M, Bhatti H, Halton M, Grassi L, Johansen C, et al. Prevalence of depression, anxiety, and adjustment disorder in oncological, haematological, and palliative-care settings: a meta-analysis of 94 interview-based studies. Lancet Oncol. Feb 2011;12(2):160-174. [ CrossRef ] [ Medline ]
Meggiolaro E, Berardi MA, Andritsch E, Nanni MG, Sirgo A, Samorì E, et al. Cancer patients' emotional distress, coping styles and perception of doctor-patient interaction in European cancer settings. Pall Supp Care. Jul 09, 2015;14(3):204-211. [ CrossRef ] [ Medline ]
Brown KW, Levy AR, Rosberger Z, Edgar L. Psychological distress and cancer survival: a follow-up 10 years after diagnosis. Psychosom Med. 2003;65(4):636-643. [ CrossRef ] [ Medline ]
Faller H, Schuler M, Richard M, Heckl U, Weis J, Küffner R. Effects of psycho-oncologic interventions on emotional distress and quality of life in adult patients with cancer: systematic review and meta-analysis. J Clin Oncol. Feb 20, 2013;31(6):782-793. [ CrossRef ] [ Medline ]
Epstein RM, Street RL. Patient-centered communication in cancer care: promoting healing and reducing suffering. National Cancer Institute. 2007. URL: https://cancercontrol.cancer.gov/sites/default/files/2020-06/pcc_monograph.pdf [accessed 2024-03-05]
Alcalde Castro M, Chavarri Guerra Y, Ramos-Lopez WA, Covarrubias-Gómez A, Sanchez S, Quiroz P, et al. Patient-reported barriers for accessing supportive care among patients with metastatic cancer treated at a public cancer center in Mexico. J Clin Oncol. Dec 01, 2018;36(34_suppl):124. [ CrossRef ]
Carrieri D, Peccatori FA, Boniolo G. Supporting supportive care in cancer: the ethical importance of promoting a holistic conception of quality of life. Crit Rev Oncol Hematol. Nov 2018;131:90-95. [ CrossRef ] [ Medline ]
Kumar P, Casarett D, Corcoran A, Desai K, Li Q, Chen J, et al. Utilization of supportive and palliative care services among oncology outpatients at one academic cancer center: determinants of use and barriers to access. J Palliat Med. Aug 2012;15(8):923-930. [ FREE Full text ] [ CrossRef ] [ Medline ]
Bellas O, Kemp E, Edney L, Oster C, Roseleur J. The impacts of unmet supportive care needs of cancer survivors in Australia: a qualitative systematic review. Eur J Cancer Care (Engl). Nov 12, 2022;31(6):e13726. [ CrossRef ] [ Medline ]
Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin. Jan 08, 2019;69(1):7-34. [ FREE Full text ] [ CrossRef ] [ Medline ]
Atun R, Cavalli F. The global fight against cancer: challenges and opportunities. Lancet. Feb 2018;391(10119):412-413. [ CrossRef ]
Penedo FJ, Oswald LB, Kronenfeld JP, Garcia SF, Cella D, Yanez B. The increasing value of eHealth in the delivery of patient-centred cancer care. Lancet Oncol. May 2020;21(5):e240-e251. [ FREE Full text ] [ CrossRef ] [ Medline ]
Gussoni G, Ravot E, Zecchina M, Recchia G, Santoro E, Ascione R, et al. Digital therapeutics in oncology: findings, barriers and prospects. A narrative review. Ann Res Oncol. Feb 2022;02(01):55. [ FREE Full text ] [ CrossRef ]
Yang J, Weng L, Chen Z, Cai H, Lin X, Hu Z, et al. Development and testing of a mobile app for pain management among cancer patients discharged from hospital treatment: randomized controlled trial. JMIR Mhealth Uhealth. May 29, 2019;7(5):e12542. [ FREE Full text ] [ CrossRef ] [ Medline ]
Ghanbari E, Yektatalab S, Mehrabi M. Effects of psychoeducational interventions using mobile apps and mobile-based online group discussions on anxiety and self-esteem in women with breast cancer: randomized controlled trial. JMIR Mhealth Uhealth. May 18, 2021;9(5):e19262. [ FREE Full text ] [ CrossRef ] [ Medline ]
Hou IC, Lin HY, Shen SH, Chang KJ, Tai HC, Tsai AJ, et al. Quality of life of women after a first diagnosis of breast cancer using a self-management support mHealth app in Taiwan: randomized controlled trial. JMIR Mhealth Uhealth. Mar 04, 2020;8(3):e17084. [ FREE Full text ] [ CrossRef ] [ Medline ]
Keum J, Chung MJ, Kim Y, Ko H, Sung MJ, Jo JH, et al. Usefulness of smartphone apps for improving nutritional status of pancreatic cancer patients: randomized controlled trial. JMIR Mhealth Uhealth. Aug 31, 2021;9(8):e21088. [ FREE Full text ] [ CrossRef ] [ Medline ]
Hernandez Silva E, Lawler S, Langbecker D. The effectiveness of mHealth for self-management in improving pain, psychological distress, fatigue, and sleep in cancer survivors: a systematic review. J Cancer Surviv. Feb 2019;13(1):97-107. [ CrossRef ] [ Medline ]
Matis J, Svetlak M, Slezackova A, Svoboda M, Šumec R. Mindfulness-based programs for patients with cancer via eHealth and mobile health: systematic review and synthesis of quantitative research. J Med Internet Res. Nov 16, 2020;22(11):e20709. [ FREE Full text ] [ CrossRef ] [ Medline ]
Adriaans DJ, Dierick-van Daele AT, van Bakel MJ, Nieuwenhuijzen GA, Teijink JA, Heesakkers FF, et al. Digital self-management support tools in the care plan of patients with cancer: review of randomized controlled trials. J Med Internet Res. Jun 29, 2021;23(6):e20861. [ FREE Full text ] [ CrossRef ] [ Medline ]
Qin M, Chen B, Sun S, Liu X. Effect of mobile phone app-based interventions on quality of life and psychological symptoms among adult cancer survivors: systematic review and meta-analysis of randomized controlled trials. J Med Internet Res. Dec 19, 2022;24(12):e39799. [ FREE Full text ] [ CrossRef ] [ Medline ]
Springer F, Mehnert-Theuerkauf A. Content features and its implementation in novel app-based psycho-oncological interventions for cancer survivors: a narrative review. Curr Opin Oncol. Jul 01, 2022;34(4):313-319. [ CrossRef ] [ Medline ]
Spahrkäs SS, Looijmans A, Sanderman R, Hagedoorn M. Beating cancer-related fatigue with the Untire mobile app: results from a waiting-list randomized controlled trial. Psychooncology. Nov 2020;29(11):1823-1834. [ FREE Full text ] [ CrossRef ] [ Medline ]
Mayer DK, Landucci G, Awoyinka L, Atwood AK, Carmack CL, Demark-Wahnefried W, et al. SurvivorCHESS to increase physical activity in colon cancer survivors: can we get them moving? J Cancer Surviv. Feb 9, 2018;12(1):82-94. [ FREE Full text ] [ CrossRef ] [ Medline ]
Lengacher CA, Reich RR, Ramesar S, Alinat CB, Moscoso M, Cousin L, et al. Feasibility of the mobile mindfulness-based stress reduction for breast cancer (mMBSR(BC)) program for symptom improvement among breast cancer survivors. Psychooncology. Feb 2018;27(2):524-531. [ FREE Full text ] [ CrossRef ] [ Medline ]
Mikolasek M, Witt CM, Barth J. Effects and implementation of a mindfulness and relaxation app for patients with cancer: mixed methods feasibility study. JMIR Cancer. Jan 13, 2021;7(1):e16785. [ FREE Full text ] [ CrossRef ] [ Medline ]
Gustafson DH, DuBenske LL, Atwood AK, Chih MY, Johnson RA, McTavish F, et al. Reducing symptom distress in patients with advanced cancer using an e-alert system for caregivers: pooled analysis of two randomized clinical trials. J Med Internet Res. Nov 14, 2017;19(11):e354. [ FREE Full text ] [ CrossRef ] [ Medline ]
Chung IY, Jung M, Park YR, Cho D, Chung H, Min YH, et al. Exercise promotion and distress reduction using a mobile app-based community in breast cancer survivors. Front Oncol. 2019;9:1505. [ FREE Full text ] [ CrossRef ] [ Medline ]
Greer JA, Jacobs J, Pensak N, MacDonald JJ, Fuh C, Perez GK, et al. Randomized trial of a tailored cognitive-behavioral therapy mobile application for anxiety in patients with incurable cancer. Oncologist. Aug 25, 2019;24(8):1111-1120. [ FREE Full text ] [ CrossRef ] [ Medline ]
Basch E, Deal AM, Kris MG, Scher HI, Hudis CA, Sabbatini P, et al. Symptom monitoring with patient-reported outcomes during routine cancer treatment: a randomized controlled trial. J Clin Oncol. Feb 20, 2016;34(6):557-565. [ FREE Full text ] [ CrossRef ] [ Medline ]
Greer JA, Park ER, Prigerson HG, Safren SA. Tailoring cognitive-behavioral therapy to treat anxiety comorbid with advanced cancer. J Cogn Psychother. Jan 01, 2010;24(4):294-313. [ FREE Full text ] [ CrossRef ] [ Medline ]
Carlson LE, Speca M, Faris P, Patel KD. One year pre-post intervention follow-up of psychological, immune, endocrine and blood pressure outcomes of mindfulness-based stress reduction (MBSR) in breast and prostate cancer outpatients. Brain Behav Immun. Nov 2007;21(8):1038-1049. [ CrossRef ] [ Medline ]
Speca M, Carlson LE, Goodey E, Angen M. A randomized, wait-list controlled clinical trial: the effect of a mindfulness meditation-based stress reduction program on mood and symptoms of stress in cancer outpatients. Psychosom Med. 2000;62(5):613-622. [ CrossRef ] [ Medline ]
Winters-Stone KM, Dobek J, Nail L, Bennett JA, Leo MC, Naik A, et al. Strength training stops bone loss and builds muscle in postmenopausal breast cancer survivors: a randomized, controlled trial. Breast Cancer Res Treat. Jun 19, 2011;127(2):447-456. [ FREE Full text ] [ CrossRef ] [ Medline ]
Adam R, Bond C, Murchie P. Educational interventions for cancer pain. A systematic review of systematic reviews with nested narrative review of randomized controlled trials. Patient Educ Couns. Mar 2015;98(3):269-282. [ CrossRef ] [ Medline ]
Wolff J, Stupin J, Olschewski J, Pirmorady Sehouli A, Maier A, Fofana M, et al. Digital therapeutic to improve cancer-related well-being: a pilot randomized controlled trial. Int J Gynecol Cancer. Jul 03, 2023;33(7):1118-1124. [ CrossRef ] [ Medline ]
Harrison SE, Watson EK, Ward AM, Khan NF, Turner D, Adams E, et al. Primary health and supportive care needs of long-term cancer survivors: a questionnaire survey. J Clin Oncol. May 20, 2011;29(15):2091-2098. [ CrossRef ] [ Medline ]
Xiao J, Ye H, He X, Zhang H, Wu F, Chua TS. Attentional factorization machines: learning the weight of feature interactions via attention networks. arXiv. Preprint posted online August 15, 2017. 2017. [ FREE Full text ] [ CrossRef ]
Sensoy M, Kaplan L, Kandemir M. Evidential deep learning to quantify classification uncertainty. In: Proceedings of the 32nd Conference on Neural Information Processing Systems. 2018. Presented at: NeurIPS '18; December 3-8, 2018; Montreal, QC. URL: https://www.proceedings.com/48413.html
Mehnert A, Müller D, Lehmann C, Koch U. Die Deutsche version des NCCN distress-thermometers. Z Für Psychiatr Psychol Psychother. Jan 2006;54(3):213-223. [ CrossRef ]
National Comprehensive Cancer Network. Distress management. Clinical practice guidelines. J Natl Compr Canc Netw. Jul 01, 2003;1(3):344-374. [ CrossRef ] [ Medline ]
Petermann F. Hospital anxiety and depression scale, Deutsche version (HADS-D). Z Für Psychiatr Psychol Psychother. Jul 2011;59(3):251-253. [ CrossRef ]
Lai J, Cella D, Chang C, Bode RK, Heinemann AW. Item banking to improve, shorten and computerize self-reported fatigue: an illustration of steps to create a core item bank from the FACIT-Fatigue Scale. Qual Life Res. Aug 2003;12(5):485-501. [ CrossRef ] [ Medline ]
Guy W. Clinical global impressions scale. Psychiatry. 1976. [ CrossRef ]
Cohen J. Statistical Power Analysis for the Behavioral Sciences. Oxfordshire, UK. Routledge; 2013.
Carpenter JR, Roger JH, Kenward MG. Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation. J Biopharm Stat. Oct 18, 2013;23(6):1352-1371. [ CrossRef ] [ Medline ]
Enders CK. Applied Missing Data Analysis. New York, NY. The Guilford Press; 2022.
R Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing. 2021. URL: https://www.R-project.org/ [accessed 2024-03-05]
Ham K, Chin S, Suh YJ, Rhee M, Yu ES, Lee HJ, et al. Preliminary results from a randomized controlled study for an app-based cognitive behavioral therapy program for depression and anxiety in cancer patients. Front Psychol. Jul 25, 2019;10:1592. [ FREE Full text ] [ CrossRef ] [ Medline ]
Hinz A, Mehnert A, Dégi C, Reissmann DR, Schotte D, Schulte T. The relationship between global and specific components of quality of life, assessed with the EORTC QLQ-C30 in a sample of 2019 cancer patients. Eur J Cancer Care (Engl). Mar 16, 2017;26(2):e12416. [ CrossRef ] [ Medline ]
Jansen F, van Uden-Kraan CF, van Zwieten V, Witte BI, Verdonck-de Leeuw IM. Cancer survivors' perceived need for supportive care and their attitude towards self-management and eHealth. Support Care Cancer. Jun 26, 2015;23(6):1679-1688. [ CrossRef ] [ Medline ]
Kessel KA, Vogel MM, Kessel C, Bier H, Biedermann T, Friess H, et al. Mobile health in oncology: a patient survey about app-assisted cancer care. JMIR Mhealth Uhealth. Jun 14, 2017;5(6):e81. [ FREE Full text ] [ CrossRef ] [ Medline ]
Marthick M, McGregor D, Alison J, Cheema B, Dhillon H, Shaw T. Supportive care interventions for people with cancer assisted by digital technology: systematic review. J Med Internet Res. Oct 29, 2021;23(10):e24722. [ FREE Full text ] [ CrossRef ] [ Medline ]
Nissen ER, O'Connor M, Kaldo V, Højris I, Borre M, Zachariae R, et al. Internet-delivered mindfulness-based cognitive therapy for anxiety and depression in cancer survivors: a randomized controlled trial. Psychooncology. Jan 18, 2020;29(1):68-75. [ FREE Full text ] [ CrossRef ] [ Medline ]
Egbring M, Far E, Roos M, Dietrich M, Brauchbar M, Kullak-Ublick GA, et al. A mobile app to stabilize daily functional activity of breast cancer patients in collaboration with the physician: a randomized controlled clinical trial. J Med Internet Res. Sep 06, 2016;18(9):e238. [ FREE Full text ] [ CrossRef ] [ Medline ]
Chai PR, Schwartz E, Hasdianda MA, Azizoddin DR, Kikut A, Jambaulikar GD, et al. A brief music app to address pain in the emergency department: prospective study. J Med Internet Res. May 20, 2020;22(5):e18537. [ FREE Full text ] [ CrossRef ] [ Medline ]
Bisseling E, Cillessen L, Spinhoven P, Schellekens M, Compen F, van der Lee M, et al. Development of the therapeutic alliance and its association with internet-based mindfulness-based cognitive therapy for distressed cancer patients: secondary analysis of a multicenter randomized controlled trial. J Med Internet Res. Oct 18, 2019;21(10):e14065. [ FREE Full text ] [ CrossRef ] [ Medline ]
Parikh RB, Basen-Enquist KM, Bradley C, Estrin D, Levy M, Lichtenfeld JL, et al. Digital health applications in oncology: an opportunity to seize. J Natl Cancer Inst. Oct 06, 2022;114(10):1338-1339. [ FREE Full text ] [ CrossRef ] [ Medline ]
Aapro M, Bossi P, Dasari A, Fallowfield L, Gascón P, Geller M, et al. Digital health for optimal supportive care in oncology: benefits, limits, and future perspectives. Support Care Cancer. Oct 12, 2020;28(10):4589-4612. [ FREE Full text ] [ CrossRef ] [ Medline ]
Rankin NM, Butow PN, Thein T, Robinson T, Shaw JM, Price MA, et al. Everybody wants it done but nobody wants to do it: an exploration of the barrier and enablers of critical components towards creating a clinical pathway for anxiety and depression in cancer. BMC Health Serv Res. Jan 22, 2015;15(1):28. [ FREE Full text ] [ CrossRef ] [ Medline ]
Leader AE, Capparella LM, Waldman LB, Cammy RB, Petok AR, Dean R, et al. Digital literacy at an urban cancer center: implications for technology use and vulnerable patients. JCO Clinical Cancer Informatics. Dec 2021;(5):872-880. [ CrossRef ]
den Bakker CM, Schaafsma FG, Huirne JA, Consten EC, Stockmann HB, Rodenburg CJ, et al. Cancer survivors' needs during various treatment phases after multimodal treatment for colon cancer - is there a role for eHealth? BMC Cancer. Dec 04, 2018;18(1):1207. [ FREE Full text ] [ CrossRef ] [ Medline ]
Cillessen L, Johannsen M, Speckens AE, Zachariae R. Mindfulness-based interventions for psychological and physical health outcomes in cancer patients and survivors: a systematic review and meta-analysis of randomized controlled trials. Psychooncology. Dec 11, 2019;28(12):2257-2269. [ FREE Full text ] [ CrossRef ] [ Medline ]
Tauber NM, O'Toole MS, Dinkel A, Galica J, Humphris G, Lebel S, et al. Effect of psychological intervention on fear of cancer recurrence: a systematic review and meta-analysis. J Clin Oncol. Nov 01, 2019;37(31):2899-2915. [ FREE Full text ] [ CrossRef ] [ Medline ]
Merluzzi TV, Pustejovsky JE, Philip EJ, Sohl SJ, Berendsen M, Salsman JM. Interventions to enhance self-efficacy in cancer patients: a meta-analysis of randomized controlled trials. Psychooncology. Sep 09, 2019;28(9):1781-1790. [ FREE Full text ] [ CrossRef ] [ Medline ]
Jacobsen PB, Wagner LI. A new quality standard: the integration of psychosocial care into routine cancer care. J Clin Oncol. Apr 10, 2012;30(11):1154-1159. [ CrossRef ] [ Medline ]
Fann JR, Ell K, Sharpe M. Integrating psychosocial care into cancer services. J Clin Oncol. Apr 10, 2012;30(11):1178-1186. [ CrossRef ] [ Medline ]

Abbreviations

Edited by YH Lin; submitted 21.08.23; peer-reviewed by F Denis, A Haussmann, N Schaeffeler, P Chow; comments to author 24.01.24; accepted 23.02.24; published 25.04.24.

©Franziska Springer, Ayline Maier, Michael Friedrich, Jan Simon Raue, Gandolf Finke, Florian Lordick, Guy Montgomery, Peter Esser, Hannah Brock, Anja Mehnert-Theuerkauf. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 25.04.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

IMAGES

(PDF) Network Attack Scenario Analysis and Threat Identification
(PDF) Different Type Network Security Threats and Solutions, A Review
Security Zines : Network Attacks
Network Security Research Paper
(PDF) Topological Analysis of Network Attack Vulnerability
(PDF) Naggen: a Network Attack Graph GENeration Tool

VIDEO

Common network attack on network
Data Integration Style Guide
Aberdeen Strategy & Research Video White paper sponsored by SAP
The Strategic Value of Data Management to an SAP S/4HANA Implementation
How SAP Leveraged SAP HANA for a more Intelligent Enterprise
AL ICT Model Paper Question 2 -Network

COMMENTS

Intelligent Techniques for Detecting Network Attacks: Review and
The first section provides an introduction and background to the research area. A brief overview of network attacks is presented in Section 2. Section 3 discusses intelligent network attack mitigation techniques where all the reviewed research papers, the network attacks they address using ML and DL techniques, and their findings are presented ...
Research paper A comprehensive review study of cyber-attacks and cyber
In addition, five scenarios can be considered for cyber warfare: (1) Government-sponsored cyber espionage to gather information to plan future cyber-attacks, (2) a cyber-attack aimed at laying the groundwork for any unrest and popular uprising, (3) Cyber-attack aimed at disabling equipment and facilitating physical aggression, (4) Cyber-attack as a complement to physical aggression, and (5 ...
(PDF) ADVANCES IN NETWORK SECURITY: A COMPREHENSIVE ...
The report proposes new research directions to advance research. T his paper discus ses network ... is a review of papers with keywords network security, network attacks and threats and network ...
Network Attacks and Prevention techniques
There has been an increasing connectivity to the internet and with this increased connectivity, there is also a significant increase in security related incidents. The growing features of the internet also calls for implementing robust security measures to secure private information and the whole network. With industries inspiring Bring Your Own Device strategy, the threat spectrum has ...
A survey of distributed denial-of-service attack, prevention, and
The identification of the attack source as well as the network path of the attack traffic is known as IP traceback. In this section, we will introduce some popular methods of attack source identification. ... Research papers covered Key features Advantages Limitations; Signature-based detection: 88-93: 1. Works based on already defined attack ...
Prevention and Detection of Network Attacks: A Comprehensive Study
Research conducted on network attacks focuses on an attack called a "wormhole," which is challenging to safeguard. In the paper, they explained that even if the attacker has not compromised any hosts and even if every communication is valid and secret, the wormhole attack is still feasible.
Network Attacks and Their Detection Mechanisms: A Review
The increasing occurrence of network attacks is an important problem to network services. In this paper, we present a network based Intrusion Detection and Prevention System DPS), which can ...
Apply machine learning techniques to detect malicious network traffic
Computer networks target several kinds of attacks every hour and day; they evolved to make significant risks. They pass new attacks and trends; these attacks target every open port available on the network. Several tools are designed for this purpose, such as mapping networks and vulnerabilities scanning. Recently, machine learning (ML) is a widespread technique offered to feed the Intrusion ...
Full article: Cybersecurity Deep: Approaches, Attacks Dataset, and
The remaining paper is structured as follows. Section 2 outlines the relevant ... This section defines the contemporaries of real-world network attack records, including infiltration, brute force, DDoS, and SQL injection records. ... The research should highlight building a real-time setup to validate deep learning approaches so they can ...
Detection of Network Attacks using Machine Learning and Deep Learning
Abstract Anomaly-based network intrusion detection systems are highly significant in detecting network attacks. Robust machine learning and deep learning models for identifying network intrusion and attack types are proposed in this paper. Proposed models have experimented with the UNSW-NB15 dataset of 49 features for nine different attack samples.
(PDF) Network Security: Cyber-attacks & Strategies to ...
Mitigate Risks and Threads. Priyanka Dedakia. Department of Computing and information. Bournemouth University. Bournemouth, U.K. [email protected]. Abstract — Network security is a set ...
Security in Wireless Networks: Analysis of Wi-Fi Security and Attack
The research methods of this paper are case study and report. First, a correct understanding of the enormous impact of this threat was formed by studying actual cases of wireless network attacks. Then, through the analysis of the attacker's means of attack and protocol security vulnerabilities, design defects, etc., will guard against wireless ...
A Review of Attacks, Vulnerabilities, and Defenses in Industry 4.0 with
Network attacks are commonly designed to impact a network's performance. ... This paper reviews recent research efforts on attacks, vulnerabilities, and defenses in Industry 4.0 implementations, and highlights security-related topics and challenges that seem to be surging in this area. The contributions of this paper can be divided in the ...
Ransomware: Recent advances, analysis, challenges and future research
2019. 2.1. Malware analysis. Malware analysis is a standard approach to understand the components and behaviour of malware, ransomware included. This analysis is useful to detect malware attacks and prevent similar attacks in the future. Malware analysis is broadly categorized into static and dynamic analysis.
JSAN
In recent times, distributed denial of service (DDoS) has been one of the most prevalent security threats in internet-enabled networks, with many internet of things (IoT) devices having been exploited to carry out attacks. Due to their inherent security flaws, the attacks seek to deplete the resources of the target network by flooding it with numerous spoofed requests from a distributed system.
The Emerging Threat of Ai-driven Cyber Attacks: A Review
Hence, this study investigates the emerging threat of AI-driven attacks and reviews the negative impacts of this sophisticated cyber weaponry in cyberspace. The paper is divided into five parts. The mechanism for offering the review process is presented in the next section. Section 3 contains the results.
Zero-day attack detection: a systematic literature review
This section presents the results analysis of zero-day attack detection research papers included in this study (see Appendix 2) and the answers to the research questions identified in Sect. 5.1. 6.1 Data analysis. Tables 8, 9, 10, and 11 present the summaries extracted from the papers included in this SLR. The tables are separated based on ...
A Survey of Network Attacks on Cyber-Physical Systems
A cyber-physical system (CPS) typically consists of the plant, sensors, actuators, the controller and a communication network. The communication network connects the individual components to achieve the computing and communication in the CPS. It also makes the CPS vulnerable to network attacks. How to deal with the network attacks in CPSs has become a research hotspot. This paper surveys the ...
Analysis of computer network attack based on the virus propagation
The conventional method can make a reasonable analysis of common network attacks, but the reliability of the analysis is low under the virus propagation model. This paper proposes a new research method of computer network attack analysis based on the virus propagation model. Based on the relationship between the framework and daemon, the framework of the model for computer network attack ...
network security Latest Research Papers
This paper presents a framework in which different machine learning classification schemes are employed to detect various types of network attack categories. Five machine learning algorithms: Random Forest, Decision Tree, Logistic Regression, K-Nearest Neighbors and Artificial Neural Networks, are used for attack detection.
Probabilistic models for evaluating network edge's resistance against
In the actual network attack, sophisticated attackers often combine various tools (Masscan, Nmap etc.) and botnet to accelerate the scanning process. In addition, the process where an attacker uses the constructed weapon to implement attacks is often very short. Therefore, we assume that the time spent by the attacker in the stage R and D is ...
CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks
Most existing efforts do not consider an extensive network topology with real IoT devices. The main goal of this research is to propose a novel and extensive IoT attack dataset to foster the development of security analytics applications in real IoT operations. To accomplish this, 33 attacks are executed in an IoT topology composed of 105 devices.
Research on Network Attack Detection Technology based on Reverse
There are also many methods to detect network attacks. This paper introduces the research on network attack detection technology based on reverse detection and protocol analysis. By monitoring the network attack packets in the big data environment of power system, the attack packets are restored and analyzed to quickly diagnose and locate the ...
Network Threats: A Step-by-Step Attack Demonstration
Discovery. Credential Dumping. Lateral Movement and Persistence. Data Exfiltration. These steps were chosen since they exemplify common techniques that are ubiquitous in attacks. Now, let's dive into each step. 1. Initial Access. The attack begins with spear-phishing, which establishes initial entry into the network.
Byzantine Attacks Exploiting Penalties in Ethereum PoS
In May 2023, the Ethereum blockchain experienced its first inactivity leak, a mechanism designed to reinstate chain finalization amid persistent network disruptions. This mechanism aims to reduce the voting power of validators who are unreachable within the network, reallocating this power to active validators. This paper investigates the implications of the inactivity leak on safety within ...
(PDF) An Experimental Study on DoS Attack
The DoS attack is the most popular attack in the network security with the development of network and internet. In this paper, the DoS attack principle is discussed and some DoS attack methods are ...
Symmetry
Convolutional neural networks (CNNs) need to replicate feature detectors when modeling spatial information, which reduces their efficiency. The number of replicated feature detectors or labeled training data required for such methods grows exponentially with the dimensionality of the data being used. On the other hand, space-insensitive methods are difficult to encode and express effectively ...
Journal of Medical Internet Research
Background: Distress is highly prevalent among patients with cancer, but supportive care needs often go unmet. Digital therapeutics hold the potential to overcome barriers in cancer care and improve health outcomes. Objective: This study conducted a randomized controlled trial to investigate the efficacy of Mika, an app-based digital therapeutic designed to reduce distress across the cancer ...

Apply machine learning techniques to detect malicious network traffic in cloud computing

Introduction

Related work

Supervised learning

Unsupervised learning

Cloud-based techniques

Deep learning techniques

Detection framework (our approach)

Dataset preparation stage

Preprocessing the dataset

Label the dataset

Building detection model stage

Extracting features

Trigger the model and passing features

Evaluating stage

Split -validation

Results and analysis

Cross-validation evaluation

Evaluating ANN

Evaluating DTREE

Evaluating K-nearest Neighbor (KNN classifier)

Evaluating support vector machine

Evaluating Random Forest

Evaluating Naive Bayes

Split-validation evaluation

Evaluating KNN

Evaluating SVM

Cross-validation result

Split-validation result

Conclusions and future work

Availability of data and materials

Acknowledgements

Author information

Contributions

Corresponding author

Ethics declarations

Consent for publication

Additional information

Rights and permissions

About this article

Share this article

Zero-day attack detection: a systematic literature review

Cite this article

Access this article

Similar content being viewed by others

AI-Driven Cybersecurity: An Overview, Security Intelligence Modeling and Research Directions

Cybersecurity data science: an overview from machine learning perspective

A comprehensive survey of AI-enabled phishing attacks detection techniques

Author information

Corresponding author

Ethics declarations

Additional information

Appendix 1: list of included and excluded studies

Appendix 2: data extraction form and details

Rights and permissions

About this article

Share this article

Analysis of computer network attack based on the virus propagation model

1 Introduction

2.1 To build the framework of the model for computer network attack analysis

2.2 Analysis of computer network attack

3 Implementation of computer network attack analysis

3.2 Analysis of the process of computer network attack

4 Experimental simulation

4.1 Preparation of the experimental data

4.2 Design of the test process

4.3 Analysis of coverage test results

4.4 Analysis of the test results of uncertainty

4.5 Analysis of reliability analysis

5 Conclusion

Availability of data and materials

Abbreviations

Acknowledgements

About the authors

Author information

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information