unsupervised machine learning

Top PDF unsupervised machine learning:

An Improved Unsupervised Machine Learning Technique for Tweet Summarization

An Improved Unsupervised Machine Learning Technique for Tweet Summarization

Unsupervised Machine learning is making the machine learn like how a human learns by his past experiences. The data set is not labelled. So, the machine checks for some sort of patterns to learn what kind of data is that. There are many unsupervised machine learning algorithms like clustering, anomaly detection, associative clustering. These are some of the unsupervised machine learning algorithms used in general. As, mentioned in the related work, text summarization, page ranking, text ranking and lexranking are the methods used for summarization purpose. In this paper we are using LexRank algorithm to summarize the tweets which are tweeted by a particular person at a particular time period on a particular topic.
Show more

8 Read more

Unsupervised Machine Learning for Networking:Techniques, Applications and Research Challenges

Unsupervised Machine Learning for Networking:Techniques, Applications and Research Challenges

VI. C ONCLUSIONS We have provided a comprehensive survey of machine learning tasks and latest unsupervised learning techniques and trends along with a detailed discussion of the applications of these techniques in networking related tasks. Despite the recent wave of success of unsupervised learning, there is a scarcity of unsupervised learning literature for computer networking applications, which this survey aims to address. The few previously published survey papers differ from our work in their focus, scope, and breadth; we have written this paper in a manner that carefully synthesizes the insights from these survey papers while also providing contemporary coverage of recent advances. Due to the versatility and evolving nature of computer networks, it was impossible to cover each and every application; however, an attempt has been made to cover all the major networking applications of unsupervised learning and the relevant techniques. We have also presented concise future work and open research areas in the field of networking, which are related to unsupervised learning, coupled with a brief discussion of significant pitfalls and challenges in using unsupervised machine learning in networks.
Show more

37 Read more

Unsupervised Machine Learning for Networking:Techniques, Applications and Research Challenges

Unsupervised Machine Learning for Networking:Techniques, Applications and Research Challenges

is designed with the current network’s condition through their monitoring sources. Operators who manage these require- ments by wrestling with complexity manually will definitely welcome any respite that they can get from (semi-)automated unsupervised machine learning. As highlighted in by [303], for ML to become pervasive in networking, the “semantic gap”—which refers to the key challenge of transferring ML results into actionable insights and reports for the network operator—must be overcome. This can facilitate a shift from a reactive interaction style for network management, where the network manager is expected to check maps and graphs when things go wrong, to a proactive one, where automated reports and notifications are created for different services and network regions. Ideally, this would be abstract yet informa- tive, such as Google Maps Directions, e.g. “there is heavier traffic than usual on your route” as well as suggestions about possible actions. This could be coupled with an automated correlation of different reports coming from different parts of the network. This will require a move beyond mere no- tifications and visualizations to more substantial synthesis through which potential sources of problems can be identi- fied. Another example relates to making measurements more user-oriented. Most users would be more interested in QoE instead of QoS, i.e., how the current condition of the network affects their applications and services rather than just raw QoS metrics. The development of measurement objectives should be from a business-eyeball perspective—and not only through presenting statistics gathered through various tools and protocols such as traceroute, ping, BGP, etc. with the burden of putting the various pieces of knowledge together being on the user.
Show more

36 Read more

Credit Card Fraud Detection using Unsupervised Machine Learning

Credit Card Fraud Detection using Unsupervised Machine Learning

The challenges should be solved by unsupervised machine learning. In this aspect, scientists are still working to identify and avoid such fraud using new techniques. However, such methods are often required to detect such fraud correctly and effectively [1]. Our aim here is to identify fraudulent transactions while reducing incorrect classifications of fraud. Credit card Fraud Identification is a typical standard sample of variety.

6 Read more

Multivariate Unsupervised Machine Learning for Anomaly Detection in Enterprise Applications

Multivariate Unsupervised Machine Learning for Anomaly Detection in Enterprise Applications

of Technology robertl@kth.se Abstract Existing application performance management (APM) solutions lack robust anomaly detection capabilities and root cause analysis techniques, that do not require manual efforts and domain knowledge. In this paper, we develop a density-based unsupervised machine learning model to detect anomalies within an enterprise application, based upon data from multiple APM systems. The research was conducted in collaboration with a European automotive company, using two months of live application data. We show that our model detects abnormal system behavior more reliably than a commonly used outlier detection technique and provides information for detecting root causes.
Show more

10 Read more

An automatic taxonomy of galaxy morphology using unsupervised machine learning

An automatic taxonomy of galaxy morphology using unsupervised machine learning

2 Centre for Astrophysics Research, School of Physics, Astronomy & Mathematics, University of Hertfordshire, Hatfield AL10 9AB, UK Accepted 2017 September 8. Received 2017 September 5; in original form 2015 July 3 A B S T R A C T We present an unsupervised machine learning technique that automatically segments and labels galaxies in astronomical imaging surveys using only pixel data. Distinct from previ- ous unsupervised machine learning approaches used in astronomy we use no pre-selection or pre-filtering of target galaxy type to identify galaxies that are similar. We demonstrate the technique on the Hubble Space Telescope (HST) Frontier Fields. By training the al- gorithm using galaxies from one field (Abell 2744) and applying the result to another (MACS 0416.1−2403), we show how the algorithm can cleanly separate early and late type galaxies without any form of pre-directed training for what an ‘early’ or ‘late’ type galaxy is. We then apply the technique to the HST Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) fields, creating a catalogue of approximately 60 000 classifications. We show how the automatic classification groups galaxies of similar morpho- logical (and photometric) type and make the classifications public via a catalogue, a visual catalogue and galaxy similarity search. We compare the CANDELS machine-based classifi- cations to human-classifications from the Galaxy Zoo: CANDELS project. Although there is not a direct mapping between Galaxy Zoo and our hierarchical labelling, we demonstrate a good level of concordance between human and machine classifications. Finally, we show how the technique can be used to identify rarer objects and present lensed galaxy candidates from the CANDELS imaging.
Show more

23 Read more

Unsupervised Machine Learning Approach for Tigrigna Word Sense Disambiguation

Unsupervised Machine Learning Approach for Tigrigna Word Sense Disambiguation

All human languages have words that can mean different things in different contexts. Word sense disambiguation (WSD) is an open problem of natural language processing, which governs the process of identifying which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings (polysemy). We use unsupervised machine learning techniques to address the problem of automatically deciding the correct sense of an ambiguous word Tigrigna texts based on its surrounding context. And we report experiments on four selected Tigrigna ambiguous words due to lack of sufficient training data; these are መደብ read as “medeb” has three different meaning (Program, Traditional bed and Grouping), ሓለፈ read as “halefe”; has four dissimilar meanings (Pass, Promote, Boss and Pass away), ሃደመ read as “hademe”; has two different meaning (Running and Building house) and, ከበረ read as “kebere”; has two different meaning (Respecting and Expensive).Finally we tested five clustering algorithms (simple k means, hierarchical agglomerative: Single, Average and complete link and Expectation Maximization algorithms) in the existing implementation of Weka 3.8.1 package. “Use training set” evaluation mode was selected to learn the selected algorithms in the preprocessed dataset. We have evaluated the algorithms for the four ambiguous words and achieved the best accuracy within the range of 67 to 83.3 for EM which is encouraging result.
Show more

7 Read more

Segmenting accelerometer data from daily life with unsupervised machine learning

Segmenting accelerometer data from daily life with unsupervised machine learning

Further, the data can also be explored using automated methods such as machine learning. Machine learning methods that use labelled data, referred to as supervised machine learning, have previously been used for activity type classification and energy expenditure estimation [10–13]. Although such methods have shown potential for physical activity intensity assess- ment, they have disadvantages similar to the cut-points approach in that the trained classifier may overfit to the specific experimental conditions under which it was trained. Unsupervised machine learning on the other hand has received less attention in relation to physical activity intensity assessment. These methods are data-driven, allow identification of the characteristic states in the data, and can be applied to free-living data directly. Note that they are called states rather than categories, because they are defined by a Markov model rather than by absolute thresholds. As a result, they do not require time consuming and expensive calibration studies including a year of work to plan and conduct the study, they do not require costs related to exercise laboratory usage, and they may avoid arbitrary decisions in the design of the cut-point approach.
Show more

19 Read more

Customer clustering in the health insurance industry by means of unsupervised machine learning

Customer clustering in the health insurance industry by means of unsupervised machine learning

To ensure competitiveness and relevancy in today’s highly digitised world, companies need to ensure that their focus is continuously on the client and on the experience they provide – while not having a negative effect on the organisation’s bottom line. A crucial step to achieving this is to get to know one’s customer base. With the vast amount of data available in a health insurance company, they are able to leverage on unsupervised machine learning techniques to segment their customers. This enables organisations to have a more tailored approach to their customers, identify market growth opportunities and gain competitive advantage.
Show more

50 Read more

Unsupervised machine learning applied to scanning precession electron diffraction data

Unsupervised machine learning applied to scanning precession electron diffraction data

Abstract Scanning precession electron diffraction involves the acquisition of a two-dimensional precession electron diffrac- tion pattern at every probe position in a two-dimensional scan. The data typically comprise many more diffraction patterns than the number of distinct microstructural volume elements (e.g. crystals) in the region sampled. A dimen- sionality reduction, ideally to one representative diffraction pattern per distinct element, may then be sought. Further, some diffraction patterns will contain contributions from multiple crystals sampled along the beam path, which may be unmixed by harnessing this oversampling. Here, we report on the application of unsupervised machine learning methods to achieve both dimensionality reduction and signal unmixing. Potential artefacts are discussed and preces- sion electron diffraction is demonstrated to improve results by reducing the impact of bending and dynamical diffrac- tion so that the data better approximate the case in which each crystal yields a given diffraction pattern.
Show more

14 Read more

Identifying strong lenses with unsupervised machine learning using convolutional autoencoder

Identifying strong lenses with unsupervised machine learning using convolutional autoencoder

In this paper, we develop a new unsupervised machine learning technique comprised of a feature extractor, a convolutional autoencoder, and a clustering algorithm consisting of a Bayesian Gaussian mixture model. We apply this technique to visual band space- based simulated imaging data from the Euclid Space Telescope using data from the strong gravitational lenses finding challenge. Our technique promisingly captures a variety of lensing features such as Einstein rings with different radii, distorted arc structures, etc., without using predefined labels. After the clustering process, we obtain several classification clusters separated by different visual features which are seen in the images. Our method successfully picks up ∼63 per cent of lensing images from all lenses in the training set. With the assumed probability proposed in this study, this technique reaches an accuracy of 77.25 ± 0.48 per cent in binary classification using the training set. Additionally, our unsupervised clustering process can be used as the preliminary classification for future surveys of lenses to efficiently select targets and to speed up the labelling process. As the starting point of the astronomical application using this technique, we not only explore the application to gravitationally lensed systems, but also discuss the limitations and potential future uses of this technique.
Show more

16 Read more

An unsupervised machine learning method for assessing quality of tandem mass spectra

An unsupervised machine learning method for assessing quality of tandem mass spectra

of spectra [10-15]. Based on defined features these methods assessed the quality of tandem mass spectra by supervised machine learning methods, which require labelled training datasets to train a classifier. The trained classifier is then used to classify spectra into high-quality or poor-quality ones. Ideally, the training set should be validated by some peptide identification algorithms or manual checking, i.e., the set should be correctly labelled without or with very few falsely labelled spectra. However, this information is hard to be obtained prior to the peptide identification for new dataset. Even worse, tandem mass spectrometers may produce differ- ent spectra for the same peptide under different experi- mental conditions. Classifier trained by one dataset may not be effective on another. Therefore, unsupervised machine learning methods are appealing for assessing the quality of tandem mass spectra. In [16], we applied the weighted k-means to classify tandem mass spectra into high-quality cluster and poor quality spectra, based on the features defined in [6].
Show more

8 Read more

Analyzing Arizona OSHA Injury Reports Using Unsupervised Machine Learning

Analyzing Arizona OSHA Injury Reports Using Unsupervised Machine Learning

As the construction continue to be a leading industry in the number of injuries and fatalities annually, several organizations and agencies are working avidly to ensure the number of injuries and fatalities is minimized. The Occupational Safety and Health Administration (OSHA) is one such effort to assure safe and healthful working conditions for working men and women by setting and enforcing standards and by providing training, outreach, education and assistance. Given the large databases of OSHA historical events and reports, a manual analysis of the fatality and catastrophe investigations content is a time consuming and expensive process. This paper aims to evaluate the strength of unsupervised machine learning and Natural Language Processing (NLP) in supporting safety inspections and reorganizing accidents database on a state level. After collecting construction accident reports from the OSHA Arizona office, the methodology consists of preprocessing the accident reports and weighting terms in order to apply a data-driven unsupervised K-Means-based clustering approach. The proposed method classifies the collected reports in four clusters, each reporting a type of accident. The results show the construction accidents in the state of Arizona to be caused by falls (42.9%), struck by objects (34.3%), electrocutions (12.5%), and trenches collapse (10.3%). The findings of this research empower state and local agencies with a customized presentation of the accidents fitting their regulations and weather conditions. What is applicable to one climate might not be suitable for another; therefore, such rearrangement of the accidents database on a state based level is a necessary prerequisite to enhance the local safety applications and standards.
Show more

6 Read more

Unsupervised  Machine  Learning  on  Encrypted  Data

Unsupervised Machine Learning on Encrypted Data

Data Angela J¨ aschke 1 and Frederik Armknecht 1 University of Mannheim, Germany Abstract. In the context of Fully Homomorphic Encryption, which al- lows computations on encrypted data, Machine Learning has been one of the most popular applications in the recent past. All of these works, however, have focused on supervised learning, where there is a labeled training set that is used to configure the model. In this work, we take the first step into the realm of unsupervised learning, which is an important area in Machine Learning and has many real-world applications, by ad- dressing the clustering problem. To this end, we show how to implement the K-Means-Algorithm. This algorithm poses several challenges in the FHE context, including a division, which we tackle by using a natural encoding that allows division and may be of independent interest. While this theoretically solves the problem, performance in practice is not op- timal, so we then propose some changes to the clustering algorithm to make it executable under more conventional encodings. We show that our new algorithm achieves a clustering accuracy comparable to the original K-Means-Algorithm, but has less than 5% of its runtime.
Show more

30 Read more

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

Machine learning is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases. A learner can take advantage of examples (data) to capture characteristics of interest of their unknown underlying probability distribution. Data can be seen as examples that illustrate relations between observed variables.

9 Read more

Unsupervised Machine Learning based Documents Clustering in Urdu

Unsupervised Machine Learning based Documents Clustering in Urdu

amongst information objects. In this kind of algorithm, the client can’t assign the quantity of groups as an input, such as a k-means algorithm. However, like a k-medoids, it can locate "exemplars", fellows of the input set that are illustrative of clusters [25]. Various strategies have been acquired to achieve semantic correlations amongst documents [26]. A famous tool such as WordNet has been utilized to improve the semantic association amongst words, such as synonyms etc [27]. Additional ontology made research are also incorporated [28, 29], which focuses on words semantic relationship. Chinese news- based clustering approach is proposed by utilizing a Neural network language model [30]. K-nearest neighbour, k- means and support vector machine are employed for Marathi news clustering [31]. Agglomerative hierarchical clustering is proposed for Urdu ligature recognition and they also utilized Naïve Bayes, decision tree, K-nearest neighbour and linear discernment analysis for classification [32]. A detailed study on Urdu document images has been conducted by utilizing various clustering algorithms such as Self organizing map, K-means and hierarchical clustering [33]. Urdu ligatures organization is accomplished using a deep neural network. They exploit a corpus of 2430 ligatures and achieved an accuracy of 73.13 % [34]. Table 1 shows the most related work about Urdu document clustering.
Show more

13 Read more

Anomaly Detection in Sensor Data Using Unsupervised Machine Learning

Anomaly Detection in Sensor Data Using Unsupervised Machine Learning

ABSTRACT: Pervasive sensing is one of the most prominent technologies being adapted by current process industry. Every process industry is highly equipped with wireless sensors for process monitoring in which locations human intervention need to be limited. Thus, major challenge with these numerous sensors is store and analyzes large volume of sensor data stream. This paper focus on sensor data analysis along with anomaly detection specific to process sector because the placement and nature of the data generated from these sensors follows a specific pattern during process flow. This data is more structured unlike other type of big data in which data is more unstructured. No assurance that any single algorithm can produce optimized results. So, this paper presenting a generic frame works with ensemble of methods such as probability and statistics, Neural Networks and Clustering. Here Neural Net is supervised learning model to predict new data based on trained data. But unseen data is wrongly predictable by neural nets. For that clustering is used as unsupervised learning model to efficiently handle concept drifts in sensor data stream. These solutions are implemented to various data scenarios with practical means to improve prediction and anomaly detection accuracy of equipment as well as process flows. Best of our knowledge no single framework is available to fully analyse sensor data stream related to independent, correlation based, group wise with respect to process flow segmentation and process and sub process hierarchy analysis.
Show more

8 Read more

Network Attack Detection Using an Unsupervised Machine Learning Algorithm

Network Attack Detection Using an Unsupervised Machine Learning Algorithm

Mukkamala et al. [22] apply an artificial intelligence technique that involves the Artificial Neural Network (ANN) and the Support Vector Machine (SVM) algorithms to detect a network traffic attack. Both SVM and ANN achieved accuracies better than 99%. The SVM had slightly higher performance, although not statistically significant. However, the SVM was significantly faster than the ANN. For training the SVM training took 52 seconds to 211 seconds versus the ANN requiring 30 minutes to 38 minutes For testing, the SVM took 1 second to 16 seconds while the ANN again took over 30 minutes. In addition to comparing the performance of the SVMs and ANNs , they ranked the input features by applying feature selection approach.
Show more

10 Read more

Using unsupervised machine learning for fault identification in virtual machines

Using unsupervised machine learning for fault identification in virtual machines

There are varying degrees of autonomy within self-healing systems. This is largely depen- dent upon the type of computing environment, management style, and learning algorithms or primitives used. The latter topic can broadly be summarised as the difference between reactive versus proactive (i.e. supervised and unsupervised) strategies, respectively. Reactive solutions are constrained to resolving faults only after they have been previously observed, a fortiori. In order to realise fully self-healing systems, a shift must occur from supervised to unsupervised learning strategies. Unsupervised strategies allow for this shift by anticipating faults in circumstances that have not been previously observed and, principally, offer the highest potential degree of reduction in human intervention. However, the use of such techniques come with costs – including potentially higher rates of error, and a lack of scrutability for some types of errors.
Show more

164 Read more

Opinion Mining using Supervised and Unsupervised Machine Learning Approaches

Opinion Mining using Supervised and Unsupervised Machine Learning Approaches

Accordingly, in this investigation, we attempt to distinguish a straightforward, yet functional methodology for notion examination on Twitter. Subsequently, this examination plans to research the machine learning system as far as Movie Reviews investigation on Twitter. Different machine learning methods have been used, few of them are supervised and furthermore unsupervised. Huge organizations these days put on investigating these suppositions with the end goal to survey their items or administrations by knowing the general population criticism toward such business. The way toward knowing clients' feelings toward specific item or administrations whether positive or negative is called sentiment analysis. A large portion of these methodologies are utilizing machine learning procedures. Machine learning procedures are different and have distinctive exhibitions.
Show more

6 Read more

Show all 10000 documents...