Network protocol reverse engineering is the basis for many security applications. A common class of protocol reverse engineering methods is based on the analysis of network message traces. After ...performing message field identification by segmenting messages into multiple fields, a key task is to infer the semantics of the fields. One of the limitations of existing field semantics inference methods is that they usually infer semantics for only a few fields and often require a lot of manual effort. In this paper, we propose an automated field semantics inference method for binary protocol reverse engineering (FSIBP). FSIBP aims to automatically learn semantics inference knowledge from known protocols and use it to infer the semantics of any field of an unknown protocol. To achieve this goal, we design a feature extraction method that can extract features of the field itself and of the field context. We also propose a semantic category aggregation method that abstracts the fine-grained semantics of all fields of known protocols into aggregated semantic categories. Moreover, we make FSIBP infer semantics based on the similarity of fields to semantic categories. The above design enables FSIBP to utilize the semantic knowledge of all fields of known protocols and infer the semantics of any fields of unknown protocols. The whole process of FSIBP does not require any expert knowledge or manual parameter setting. We conduct extensive experiments to demonstrate the effectiveness of FSIBP. Moreover, we find a utility for FSIBP besides field semantics inference, its output can help to detect the mis-segmented fields generated during the message field identification.
We study the construction of optimal conflict-avoiding codes (CAC) from a number theoretical point of view. The determination of the size of optimal CAC of prime length p and weight 3 is formulated ...in terms of the solvability of certain twisted Fermat equations of the form g2Xℓ+gYℓ+1=0 over the finite field Fp for some primitive root g modulo p. We treat the problem of solving the twisted Fermat equations in a more general situation by allowing the base field to be any finite extension field Fq of Fp. We show that for q greater than a lower bound of the order of magnitude O(ℓ2) there exists a generator g of Fq× such that the equation in question is solvable over Fq. Using our results we are able to contribute new results to the construction of optimal CAC of prime lengths and weight 3.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
With the rapid development of the Internet, a large number of private protocols emerge on the network. However, some of them are constructed by attackers to avoid being analyzed, posing a threat to ...computer network security. The blockchain uses the P2P protocol to implement various functions across the network. Furthermore, the P2P protocol format of blockchain may differ from the standard format specification, which leads to sniffing tools such as Wireshark and Fiddler not being able to recognize them. Therefore, the ability to distinguish different types of unknown network protocols is vital for network security. In this paper, we propose an unsupervised clustering algorithm based on maximum frequent sequences for binary protocols, which can distinguish various unknown protocols to provide support for analyzing unknown protocol formats. We mine the maximum frequent sequences of protocol message sets in bytes. And we calculate the fuzzy membership of the protocol message to each maximum frequent sequence, which is based on fuzzy set theory. Then we construct the fuzzy membership vector for each protocol message. Finally, we adopt K-means++ to split different types of protocol messages into several clusters and evaluate the performance by calculating homogeneity, integrity, and Fowlkes and Mallows Index (FMI). Besides, the clustering algorithms based on Needleman–Wunsch and the fixed-length prefix are compared with the algorithm presented in this paper. Compared with these traditional clustering methods, we demonstrate a certain improvement in the clustering performance of our work.
In order to solve the clustering problem of unknown binary protocols, an improved k-means unknown binary protocol clustering method is proposed, which determines the initial clustering center and ...improves the clustering distance. Firstly, the k value is determined and the clustering center is extracted by using DCBP (Determine the initial clustering center of binary Protocol) algorithm and the change rate of error square, and then the data are clustered by improving the k-means algorithm of distance function. The unknown binary protocol bit stream is divided into different subsets of binary protocols. By improving the k-means algorithm, the Pearson distance improves the accuracy of binary protocol clustering from 96% to 98.9%. The DCBP algorithm helps us to determine the k value accurately. The k value determined in this paper is 5, and the clustering accuracy is 98.9%. The clustering accuracy is 80% when k is 4 and 92.2% when k is 6. And the operation speed of the improved k-means algorithm is better than that of the AGNES algorithm. The algorithm is better adapted to the clustering of unknown binary protocols, and improves the accuracy of clustering and the speed of operation.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
Message clustering is one of the main steps of protocol reverse engineering. For the private binary protocol packets, the current message clustering method has the problem of message vectorization ...feature redundancy, and the traditional clustering method has the problem that the cluster center and the number of clusters are difficult to determine. According to the idea of n-gram serialization, the sequence item-location matrix of the message is constructed, frequent items are mined, and the message feature vector is constructed, which effectively removes the sequence noise in the message vectorization. The contour coefficient is used to guide the split hierarchical clus-tering, which avoids the initial clustering number and clustering center selection, so as to realize the clustering of private binary protocol messages under unsupervised conditions. The testing is carried out on a data set of 7 types of messages with 4 protocals: AIS, DNS, ICMP and ARP. The t-SNE visual interface is used to observe the distri
Protocol reverse engineering can be applied to various security applications, including fuzzing, malware analysis, and intrusion detection. It aims to acquire an unknown protocol's format, semantic, ...and behavior specifications, where format extraction is the primary task. One subset of the mainstream research utilizes the network traffic for the reverse analysis. These approaches leverage various algorithms, such as multiple sequence alignment, frequent itemset mining, and information entropy to extract format information from messages. However, they are primarily intended to locate the keyword fields and have limitations in extracting contextual features or dealing with large data sets. This paper presents ProsegDL, a deep learning-based format extraction tool for binary protocol, with a specially designed method of generating training data sets. ProsegDL innovatively leverages image semantic segmentation and siamese network techniques, focusing on extracting the features of fields and identifying field boundaries for fixed format protocols. The tool is evaluated on six popular protocols. The results show that it has at most 13% higher precision, 23% higher recall than the comparison methods when inferring with a small data set, and at most 18% higher precision, 28% higher recall when inferring with a large number of messages.
The Global Positioning System (GPS) and Inertial Navigation System (INS) are the two main types of navigation system currently, they may complement each other very well, which make GPS/INS integrated ...navigation system have an excellent navigation performance. INS is a time integration system, and the error accumulate with time. It need to be synchronized with the GPS to correct the error. In the software synchronization, it can not be real-time due to the operating system delay, and it may cause fluctuations. Compared with the software synchronization, the hardware synchronization is more accurate. In this paper, an Embedded Integrated Navigation Hardware Platform based on ARM and DSP is presented. ARM kernel is mainly used to achieve the control of the entire system and data acquisition, and DSP kernel is responsible for the data fusion navigation algorithm. The platform adopts the method of hardware to synchronized the INS clock. The experimental results show that the received navigation data is not only comprehensive but also accurate. It meets the data requirements and clock synchronization for combined navigation.
Type Length Value (TLV) is one of the main structures commonly used in network protocols. A large number of proprietary protocols, whose specification is unknown to the public, run in the current ...Internet as well as domain-specific Internet of Things (IoT) applications. It is critical to infer the TLV fields within a packet because this information can help network administrators quickly identify abnormal traffic and potential attacks. Inferring TLV fields belongs to the general task of protocol reverse engineering and is particularly challenging for binary protocols, where the boundaries of TLV fields have many possible positions. Existing methods for reverse engineering binary protocols involve many parameters and only work for protocols strictly following the conventional TLV format. We extend the concept of TLV to accommodate a broader category of structural patterns in various binary protocols, such as TCP, IP, ModBus, and MQTT. We then design algorithms to automatically extract the extended - TLV fields from packets. Via a series of experiments over several protocols, we demonstrate that our algorithms can accurately and quickly identify the extended-TLV fields in all the tested protocols. Our approach can thus be deployed as a general method for automatically reverse engineering binary protocol format.
In order to solve the problem that there are many kinds of unknown binary protocols on the network, which are not easy to manage, In order to ensure the safe and orderly operation of the network, it ...is necessary to classify the traffic in the network. In this paper, a binary protocol classification method based on a class of classification and one-dimensional CNN (convolution neural network) is proposed, which is trained by the tags of the protocol data obtained by clustering. The binary protocol message is directly used as the input of one-dimensional convolution neural network, and the classification model is trained to realize the automatic classification function of the protocol. A binary protocol classifier is constructed, which can automatically learn the nonlinear relationship between the original input and the expected output. As far as we know, this is the first time that Information Entropy and CNN networks have been applied to the field of binary protocol classification. The experimental results show that the recognition rate of the protocol is up to 98%, and the classification time is better than that of the clustering method. The results show that the method is effective.
The Global Positioning System (GPS) can provides high-precision positioning information. The GPS information of multi-antenna satellite data is acquired for attitude measurement. An embedded ...multi-antenna satellite data acquisition system is designed in this paper. This system adopts binary protocol of GPS receiver. Compared to the NMEA protocol, the binary protocol can acquire more comprehensive raw data such as pseudo range and signal-to-noise ratio for satellite navigation. The hardware of the system is based on AM335x ARM Cortex A8 microcontroller and consists of four GPS receiver unit (NV08C-CSM). The software of the system is based on the embedded Linux operating system and the multi-channel data acquisition programs are designed. The experiments were conducted many times. The results show that the system provide GPS raw data properly for attitude measurement, which meets the requirements of precision. The size of the hardware platform is small (10×10cm) and easy to apply in Unmanned Aerial Vehicle (UAV), Unmanned surface vehicles (USV), etc.