Top PDF Data Collection and Data Analysis in Honeypots and Honeynets

Data Collection and Data Analysis in Honeypots and Honeynets

Data Collection and Data Analysis in Honeypots and Honeynets

Some authors [14,15] add data analysis to the above-mentioned core elements. Data analysis is an ability of honeynet to analyse the data, which is being collected from it. Data analysis is used for “understanding, analysing, and tracking the captured probes, attacks or some other malicious activities” [1]. Example of this core element is combination of security devices, such as firewall (IPtables), intrusion prevention system (Cisco IPS) and intrusion detection system (Snort, Suricata), where this security devices can analyse the network traffic in detail, and return the result of analysis in a visible way. In this paper we focus on data analysis. Deployment and usage of honeypots and honeynets brings many benefits, e.g. the possibility of discovering new forms of attacks. On the other hand, usage of honeypots and honeynets brings about some problems. The primary motivation for elaborating this paper is the fact that there are several problems in field of analysis of data. There are a lot of implementations of honeypots that collect data. In most cases they use different format for their storage, or collected data differ. Therefore, it is difficult to analyse the attack from various types of honeypots. Another problem represents a secure way of transferring the collected data from honeypots to the analysis itself.
Show more

16 Read more

Comparison of Empirical Data from Two Honeynets and a Distributed Honeypot Network

Comparison of Empirical Data from Two Honeynets and a Distributed Honeypot Network

as long as the partner organization is willing to sign a non-disclosure agreement, is willing to share the findings with all the other partners and can provide 4 IP addresses. Even though 4 IP addresses are required only one physical host is used. A RedHat operating system is then installed in the host and three other virtual hosts are created using honeyd [12] and assigned an IP address each. The three hosts emulate Windows NT, Windows 98 and Linux RedHat 7.3 operating systems. The fourth IP address is assigned to the physical host itself. All of the data collected from each honeypot is flushed to a centralized data collection database. This database is then available for analysis to all of the participating partners. Since the virtual hosts are created using honeyd, the Leurre.com can be described as a low-interaction network, i.e. the virtual hosts can be configured to run various services and appear as though they are running various operating systems, but they are not actually real hosts and their capabilities are still limited. However there are advantages from having a worldwide distributed architecture. For example, analysis can be done on how the attacks are distributed at a given time throughout many different locations in the world.
Show more

11 Read more

NEMESYS: Enhanced Network Security for Seamless Service Provisioning in the Smart Mobile Ecosystem

NEMESYS: Enhanced Network Security for Seamless Service Provisioning in the Smart Mobile Ecosystem

Figure 1 shows the system architecture that will be developed within the NEMESYS project. The core of the NEMESYS architecture consists of a data collection infras- tructure (DCI) that incorporates a high-interaction honeyclient and interfaces with virtualized mobile honeypots (VMHs) in order to gather data regarding mobile at- tacks. The honeyclient and VMHs collect mobile attack traces and provide these to the DCI, which are then enriched by analysis of the data and by accessing re- lated data from the mobile core network and external sources. For example, TCP/IP stack fingerprinting in order to identify the remote machine’s operating system, and clustering of the traces are passive methods of data enrichment. DNS reverse name lookup, route tracing, autonomous system identification, and geo-localization are methods to improve characterization of remote servers which may require access to external sources, possibly in real time. The enriched mobile attack data is made available to anomaly detection module, and the visualization and analysis module (VAM) running on the mobile network operator site.
Show more

9 Read more

Time Series Analysis and Forecast of GDP in Ethiopia:  Evidence from Ethiopian Data

Time Series Analysis and Forecast of GDP in Ethiopia: Evidence from Ethiopian Data

The analysis continued with the unit root test of the differenced series. Since the null hypothesis cannot be rejected, in order to determine the order of integration of the non-stationary time series, the same tests were applied to their differences. The order of integration is the number of unit roots that should be contained in the series so as to be stationary. After second differencing, the tests showed that all variable were stationary at second difference. The Critical values for tests were found to be -2.95 and -3.94 at 5% significance level. The results in table 2 indicate that the null hypothesis is rejected for the second differences of the five time series variables, GDP, Consumption, Investment, government expenditure and Net- export, given that p-values less than 5% level of significance with intercept and trend in ADF test. This implies that the five time series variables are integrated of degree one (I (2)). Therefore, the ADF test shows that all series are non stationary in levels, and stationary in the second difference.
Show more

5 Read more

Development of software for qualitative and comparative performance analysis between some data ordination algorithms

Development of software for qualitative and comparative performance analysis between some data ordination algorithms

Insertion Sort had the worst performance of the four algorithms used for all vector sizes, the least effective being performance, because it is a simple ordering algorithm, the result found was already expected by the group. Despite the slowness in data organization this method becomes useful when the few data to be ordered due to its easy implementation. Merge Sort did not perform well in the tests, due to the use of a division and recursion method, making the comparisons made a lot and consequently the performance of the algorithm was not good for vectors organized in a totally random way, being in third place in the ranking ranking in terms of performance of the algorithms used, earning only from insertion Sort. The Comb Sort, even though it was considered a simple algorithm, obtained a very encouraging result for the vectors used, being the second best in the analysis of the group. This had an approximate organization of 2.7 times slower than the fastest of the algorithms evaluated by the group, if considered the vector of size 40000 (forty thousand), comparing the time related to the vector of 10,000 (ten thousand) positions, the comb Sort further improves its performance to only about 1.49 times slower. And to the surprise of the authors of this work, the algorithm in ordering the vector of size equal to 20,000 (twenty thousand) indexes was faster than quick Sort. The one that had the best performance in all sizes of vectors tested was quick Sort, could not be another, as the name itself says it is very fast, achieved surprising results and was ranked first in the ranking formed, despite the result already be expected by the team, managed to surprise positively. To get a sense of the stark difference between the slower and faster algorithms tested by the group, the insertion Sort was almost 52 (fifty-two) times slower than the faster of the evaluated methods if considering the vector size 40,000 (forty thousand) and approximately 68 (sixty- eight) times slower compared to the vector of size 10,000 (ten thousand). The results were satisfactory and very consistent, a special highlight for comb Sort, which, even though it was a simple sort algorithm, managed to get close enough to the organization time of quick Sort.
Show more

11 Read more

Why only data mining? a pilot study on inadequacy and domination of  data mining technology

Why only data mining? a pilot study on inadequacy and domination of data mining technology

In past, we had a question, what happens? But with data mining’ we discover that what will happen and why? (Nakasima et.al 2018 14 ). Data mining integrates the various technologies like statistics, machine learning and databases(Irshad et.al, 2018 15 ). It has applications in different disciplines like medical, financial, defence, intelligence and so on(Sohail et.al, 2019 8 ). The tools of data mining include clustering, classifications, associations and detections (Muhammad et.al, 2017 16 ). From decades, data mining have developed in many ways regarding techniques, which includes extracting association, neural networks, logic programming, rough sets and decision trees(Zhu et.al, 2018 17 ). Furthermore, data mining has gone beyond limits like the relational databases to the text mining and multimedia data(Ristoski et.al 2018 18 ); also it’s involved in the information security and detections(Santis et.al, 2018 19 ). After so many developments, companies are still facing some challenges like Scalability (Sohail et.al, 2017 8 ), but till far data mining is working on the massive quality of datasets and also engaged in working for the Terabyte sizes(Najjar et.al, 2018 20 ). By the enormous growth of data in different disciplines’ the question arise, Can this technology fulfilthe needs in extraction of Petabyte size data? This comes in the limitation and domination of data mining technology (on which this paper is focusing). As data mining involves some algorithms, it is important to understand the limitation of data mining algorithms and tools. Which requires the time and space for the complexity? In example: can these algorithms be completed in time? If the problem is decided, what is the complexity? For future predictions, we need to find out more about the complexity of markets and business(Alonso et.al, 2018 11 ) and how data mining is shining in the financial platforms.
Show more

8 Read more

Spatial characterization and interpolation of precipitation data

Spatial characterization and interpolation of precipitation data

The area with the most uncertainty in hydrologic models employed in operational river stage forecasting is the quantitative precipitation forecast (Krzysztofowicz, 1998; Seo et al. 2000). Geostatistics offers the best approach to characterize and interpolate precipitation data but most of the studies have been limited to only one or two algorithms with little justification for the choice of algorithms (Bras and Rodriguez-Iturbe, 1985; Seo et al. 2000; Kyriakidis et al. 2001; Kyriakidis et al. 2004). Past research indicates that the assumption of spatial correlation and regionalization in hydrological studies are justified even with moderate deviations, because regional analysis still yields more accurate quantile estimates than at-site analysis (Lettenmaier and Potter, 1985; Lettenmaier et al. 1987; Hosking and Willis 1988; Potter and Lettenmaier, 1990; Kyriakidis et al. 2004; Gonzales and Valdes, 2008). Thus, geostatistics is an acceptable methodology for the characterization and interpolation of precipitation data (Delhomme 1978; Creutin and Obled, 1982; Lebel et al. 1987; Azimi-Zonooz et al. 1989; Barancourt et al. 1992; Bacchi and Kottegoda 1995; Goovaerts, 2000, Germann and Joss, 2001; Bernes et al. 2004; Bernes, et al., 2009). In past research, kriging methods such as ordinary kriging, kriging with a trend, universal kriging and others have been used to incorporate the heterogeneity and spatial correlation of climate variables into the estimation of climate data values (Delhomme, 1978; Tabios and Salas, 1985; Phillips et al. 1992; Bardosy, 1993; Hammond and Yarie, 1996; Holdaway, 1996; Martinez-Cob, 1996; Ashraf et al. 1997; Nalder and Wein, 1998; Goovaerts, 2000; Llyod, 2005; Bargaoui and Chebbi, 2009).
Show more

15 Read more

A review of research process, data collection and analysis

A review of research process, data collection and analysis

The analysis is an important part of research. The analysis of the data depends upon the types of variables and its’ nature [3]. The irst thing for the data analysis is to describe the characteristics of the variables. The analysis can be scrutinized as follows: Summarizing data: Data are the bunch of values of one or more variables. A variable is a characteristic of samples that has different values for different subjects. Value can be numeric, counting, and category. The numeric values of continuous variables are those which have numeric meanings, a unit of measurement, and may be in fraction like – height, weight, blood pressure, monthly income etc. Another type of variables is discrete variables which are based on counting process like – number of student in different classes, number of patients visiting OPD in each day etc [4].
Show more

6 Read more

Analysis of erroneous data entries in paper based and electronic data collection

Analysis of erroneous data entries in paper based and electronic data collection

that EDC is cost effective, at least in large studies [2, 10]. In conclusion, we found the greatest source of data error was attributable to data omissions, specifically among categorical variables. Data omissions appeared to be following a MNAR pattern and this needs to be addressed in a twofold approach. A well-designed EDC system that does not permit blank entries can address omission of recorded data, extensive training of data col- lection staff in hindsight to the socio-cultural context is likely to improve the quality of data collection. Since direct electronic data collection is unlikely to be per- formed in duplicate, a system that performs real-time logic checks would be highly desirable. EDC, however, may only be suitable if data can be synchronized in real time and can be accessed from multiple locations, which requires a fairly complex preparatory phase and this may not be cost effective for small studies.
Show more

6 Read more

Data quality assessment and spatial analysis for conservation and sustainable use of Arabica coffee (Coffea Arabica L ) genetic resources in Ethiopia

Data quality assessment and spatial analysis for conservation and sustainable use of Arabica coffee (Coffea Arabica L ) genetic resources in Ethiopia

Case 3: Both Latitude and Longitude errors: A total of 100 continuous accessions, (8833 to 8932) were found in this error group. These accessions are the ones which fell in the geographical region of the Nigeria in the visualization map (Figure 3). The problem of this group was very complicated, since no accession can fall to the correct (the passport locality information) geographical region of Ethiopia by correcting or dragging either longitude or latitude or both. Therefore, the collection sheets of each accession were scrutinized for the possible error. The result showed that, just six digit numbers were recorded in the latitude and longitude fields without any units and other spatial data such as Datum and UTM zone. Moreover, these numbers were again changed to degree, minute and second while the data entered into the database without knowing the original units, so that the final result ended up with such erroneous points. For this group the final result showed that, the data were collected in Universal Transverse Mercator (UTM) unit, though it was not indicated in the collection sheets. When these UTM units are converted to decimal degree and projected in the map, they exactly corresponded with passport locality information (Figures 4 and 5).
Show more

8 Read more

Data analysis considerations

Data analysis considerations

Rochette (2011) proposed that four or more air samples should be taken during deployment, to adequately assess the quality of the calculated flux (detection of outliers and technical problems during handling and analysis of samples), and to account for the increase in non-linear rates of gas concentration with deployment time. In this chapter, we reinforce this recommendation, but also acknowledge that a less intensive chamber headspace sampling may be acceptable for certain situations. Any consideration around reducing headspace sampling intensity should be based on minimising the overall uncertainty of the flux estimate. For example, when flux spatial variability is exceptionally high, it may be preferable to deploy a greater number of less-intensively sampled chambers (two or three samples) to improve plot-level flux estimates, even if this comes at the cost of increased uncertainty in individual chamber estimates (see Section on Spatial Variability).
Show more

146 Read more

Analysis of cpi to ppi on chinese data

Analysis of cpi to ppi on chinese data

On the basis of price transmission mechanism, some scholars use the correlation coefficient matrix or linear regression method to study the relationship between CPI and PPI, which are more concentrated in the early relationship analysis. CPI and PPI are generally considered non-stationary series, unable to establish a model directly. At the same time, some scholars consider the relationship between CPI and PPI on the basis of long-term and short-term relationship, using co-integration analysis method and establishing of error correction model to study the relationship between them. Mahdavi and Zhou's study show that there is a long-term equilibrium relationship between CPI and PPI. They used a co-integration test method, the results of the test for the existence of co-integration relationship (Mahdavi and Zhou, 1997). On the basis of considering the causal relationship between CPI and PPI, some scholars use the Granger causality test to study whether there is causality between CPI and PPI. Huang Zhilin selected the monthly data of CPI and PPI from January 2005 to December 2013 in Shenzhen to study the relationship between CPI and PPI using the non-linear Granger causality test. Silver and Wallace selected the monthly data of CPI and PPI in America, and used Granger causality test method for empirical analysis, the results of the study was the same as Chen Yu’s (Lew and Wallace, 1980). T. E. Clark selected CPI and PPI data in America to establish a vector auto-regressive model, and found that the price transmission mechanism from the PPI to CPI is not significant, there is no positive conduction relationship from PPI to CPI (Todd E.Clark, 1995). Chen Jian, Mei Mei analyzed the relationship between CPI and PPI, the results showed that CPI and PPI is Granger reasons, and can predict the future trend for each other (Chen Jian and Mei Mei, 2009). Xiao Songhua and Wu Xu's research showed that PPI can cause CPI, which indicated that the trend of CPI can be predicted by PPI (Xiao Songhua and Wu Xu, 2009). Song Jinqi and Shu Xiaohui established VAR model, analyzed the relationship between CPI and PPI, and the results showed that the positive conduction relationship between PPI and CPI is not significant during the short term but significant during the long term (Wang Guirong, 2009).
Show more

5 Read more

Seismic Data Collection with Shakebox and Analysis Using MapReduce

Seismic Data Collection with Shakebox and Analysis Using MapReduce

In the past decade, the MapReduce programming model [10] has emerged as a popular framework for large data set analysis. The key idea of MapReduce is to divide the data into chunks, which are processed in parallel. Several open source MapReduce frameworks have been developed in the last years. In particular, Hadoop [11], the most prevalent implementation of MapReduce, has been extensively used by companies and research com- munities on a very large scale. In this paper, we adopt Hadoop and MapReduce for the data process and show and analyze our experimental results. Specifically, we design Map and Reduce functions that suit for the appli- cation of seismic big data.
Show more

8 Read more

Qualitative Data Collection and Analysis Methods: The INSTINCT Trial

Qualitative Data Collection and Analysis Methods: The INSTINCT Trial

Qualitative research exists on a spectrum of scientific in- vestigations: on one end, qualitative research seeks to ex- plore meanings and opinions and to generate hypotheses regarding decision-making and behavior by using induc- tive reasoning. On the other end, quantitative studies test hypotheses by using objective, measurable processes and deductive logic. The INSTINCT study provides a con- ceptual framework and strategy to construct a qualitative data collection and analysis plan for examining barriers, attitudes, and beliefs toward the adoption of a challeng- ing therapy in EM. It then uses these results to guide a subsequent educational intervention, which is then tested for efficacy using traditional quantitative methods. By pairing such processes, qualitative research may offer an important adjunctive tool for helping translate knowl- edge from clinical trials and other scientific inquiry into broadly accepted clinical practice.
Show more

8 Read more

Big Data Analysis on WSN for Risk Analysis on Different Data

Big Data Analysis on WSN for Risk Analysis on Different Data

The term big data sets so large or complex data sets where the traditional data processing applications are inadequate. According to IBM in the year 2012, 90% of data in the world was generated in last two years. Hence Big Data is used to analyze, gather and process the large data sets. The main source of big data is wireless sensor networks. The wireless sensor networks are nothing but the collection of many hundreds and thousands of sensor nodes. Because of sensor nodes small size, flexibility, low cost and many other characteristics the Wireless sensor networks are used for monitoring the environment, industries, military and some other technologies. Wireless Sensor Networks has some of the limitations and several types of challenges. They are Critical in time, Large geographical area can be covered, Fixed routing path is not possible and Lesser bandwidth. A large amount of data is gathered by using real-time big data gathering algorithm (RTBDG). Thus, RTBDG algorithm can achieve high performance in the consumption of energy and network lifetime in gathering big data in real time. We proposed a new energy efficient big data aggregation protocol in wireless sensor networks. The key idea behind this algorithm is to recursively divide the sensor network into different partitions symmetrical about a centroid node. Furthermore, a set of cluster heads in the middle of each partition are defined in order to aggregate data from cluster members and transmit these data to cluster heads in the next hierarchical level. The new algorithm adopts the concept of hierarchical clustering which prevents cluster heads from sending their data for long distances and thus the energy consumption of the sensor nodes is significantly improved. In this work the better efficiency is achieved rather then the existing work because the existing work is only depends on physical structure but our work is implemented with logical work and the overall output is 70%.
Show more

6 Read more

Data Correlation and Economic Analysis of Wind Regimes of Manga Hill in Nyamira County, Kenya

Data Correlation and Economic Analysis of Wind Regimes of Manga Hill in Nyamira County, Kenya

energy sources in the world. The Kenyan government has a new energy policy that directs its state-owned energy system, KenGen, as well as the country’s independent power producers, to eliminate fossil fuel- powered generation. The country’s energy plan outlines how the majority of the country’s electricity will come from renewable sources at utility, commercial, industrial scale and Off-grid connections. An understanding of the characteristics of the wind is critical to all aspects of wind energy generation, from the identification of suitable sites to predictions of the economic viability of wind farm projects to the design of wind turbines themselves, all is dependent on the characteristic of wind. In this study, data from the Kisii meteorological station was used to make a comparison and correlation with the data obtained from the Manga Hill site, to determine how the data collected from the two sites tend to vary. In this correlation, the Pearson’s Correlation Coefficient was found as 0.85. In addition, an economic evaluation of wind regimes of Manga Hill was done by making an analysis of three different wind turbines chosen to simulate the performance using their specific power curves. Bergey Excel-10 turbine model showed a better performance with high energy yield at minimum cost with increasing hub height. In order to utilize wind energy, installation of a 10 kW rated power horizontal axis wind turbine, with rotor diameter between 1 m and 7 m, at 30 m height would be economically viable for Manga Hill site for domestic power generation: for lighting and small house electrical applications as a supplement power to the grid connected electricity in the region.
Show more

5 Read more

Collection and Analysis of data for Inter-domain Traffic Engineering

Collection and Analysis of data for Inter-domain Traffic Engineering

Network operators are not always aware of the traffic distribution that would occur in a network after failure events. By using the data described in Section 2, operators can study the impact of failures of single hardware elements. If the ISP has safe knowledge of groups of hardware that could fail simultaneously (Section 2.4), a more complete analysis of network failure can be performed [Kiese et al. 2009]. For Outbound traffic, network operators might need to simulate the BGP decision process. For inbound traffic, an AS should also consider the relationships and policies of external ASes to improve their traffic estimations. Since the simulation of the network under different failures is challenging, ISPs should use historical data to assess the quality of their estimations. Operators could obtain this data, for instance, by storing routing and traffic data when the network is under maintenance.
Show more

10 Read more

Fast Data Collection for High Dimensional Data in Data Mining

Fast Data Collection for High Dimensional Data in Data Mining

the class discounted by a term that takes the mutual dependencies into account. hierarchical clustering for feature selection. Hierarchical algorithms generate clusters that are placed in a cluster tree, which i s commonly known as a dendrogram. Clustering‟s are obtained by extracting t hose clusters that are situated at a given height in this tree. It shows that good classifiers can be built by using a small number of attributes located at the centers of the clusters identified in the dendrogram. This type of data compression can be achieved with little or no penalty in terms of the accuracy of the classifier produced and highlights the relative importance of attributes.
Show more

8 Read more

Triangulation in data collection

Triangulation in data collection

the police, with another four stating that they had been kept on hold or that the police had taken a long time to answer their call (n = 70).  The large majority of the sample noted t[r]

37 Read more

Workflow Solutions Data Collection, Data Review and Data Management

Workflow Solutions Data Collection, Data Review and Data Management

MGC Diagnostics believes that the data belongs to the facility and should not be controlled in any way by any manufacturer. An open SQL database architecture allows authorized individuals to develop and run queries. This truly open SQL Database assures data will always be accessible to the user. Data is not MGC Diagnostics encrypted or dependent in any way.

8 Read more

Show all 10000 documents...