Top PDF Cluster Analysis of Business Data

Cluster Analysis of Business Data

Cluster Analysis of Business Data

The paper is organized as follows: Section 2 is related with the models of AHCA used in the present work. Subsection 2.1 is devoted to the clustering of variables based on the basic affinity coefficient in a classical context. Subsection 2.2 presents an extension of the basic affinity coefficient for the case of a classical three-way data table, which corresponds to a particular case of the weighted generalized affinity coefficient. Subsection 2.3 is dedicated to the clustering of symbolic objects described by modal variables based on the weighted generalized affinity coefficient in the field of the Symbolic Data Analysis. We present, in Section 3, the best partitions obtained with the application of AHCA to a dataset on business area. There, in order to cluster groups of individuals, we apply two different strategies based on the extensions of the basic affinity coefficient, which are referred in subsections 2.2 and 2.3, respectively. Finally, Section 4 contains some concluding remarks about the work and its results.
Show more

9 Read more

Performance Analysis of Tree Cluster Based Data Gathering for WSNs

Performance Analysis of Tree Cluster Based Data Gathering for WSNs

Abstract— Wireless sensor network (WSNs) contain many sensor devices which send their data to the base station for processing is called direct delivery. This cause to heavy traffic in network so this decreases the lifetime of the network. Wireless sensor networks have been widely applied in various industrial applications, which involve collecting a massive amount of heterogeneous sensory data. However, number of data gathering strategies for WSNs cannot avoid the hotspot problem in local or whole deployment area. network connectivity affected by the hotspot problem and decreases the network lifetime.WSN suffers from many hurdles such as small memory, low computational capability, and limited energy resources so, data gathering technique is introduced to improve the lifetime of network. Therefore to improve performance large number of protocols are introduced. Previous researchers have used such types of the cluster-based, the chain-based and the tree-based to establish their energy-efficient routing protocols. In this paper, we propose an improved version namely a tree-cluster data gathering technique which uses both cluster and tree based protocols. The simulation and comparison with other techniques shows that our TCDGT can significantly balance the load of the whole network, reduce the energy consumption, alleviate the hotspot problem and prolong the network lifetime.
Show more

5 Read more

Design and Analysis of a Quantum Circuit to Cluster a Set of Data Points

Design and Analysis of a Quantum Circuit to Cluster a Set of Data Points

In the paper named “Quantum Clustering Algorithms” by Esma Aïmeur, Gilles Brassard and Sébastien Gambs, they explained a way to implement K-medians algorithm using a black box implementation of the following three quantum algorithms: The quant_find_max algorithm, quant_cluster_median and an algorithm to find c smallest number. Unsupervised learning is the part of machine learning whose purpose is to give to machines the ability to find some structure hidden within data. Given a training data set, the goal of a clustering algorithm is to group similar datapoints in the same cluster while putting dissimilar datapoints in different clusters. Some possible applications of clustering algorithms include: discovering sociological groups existing within a population, grouping automatically molecules according to their structures, clustering stars according to their galaxies, and gathering news or papers according to their topic. In this part, it seeks to speed-up some classical clustering algorithms by drawing on QIP techniques. Here the concept of Grover’s black box model (Oracle) is used for concerning the distance between points of the training dataset. In the classical black-box setting, a query corresponds to asking for the distance between two points x i and x j by providing indexes i and j to the black box.
Show more

6 Read more

Model based cluster analysis of microarray gene expression data

Model based cluster analysis of microarray gene expression data

Pneumococcal otitis media is one of the most common dis- eases in children. Almost every child in the United States experiences at least one episode of acute otitis media by the age of 5 years. To understand the pathogenesis of otitis media, it is important to identify genes involved in response to pneumococcal middle-ear infection and to study their roles in otitis media. A study was recently carried out at the University of Minnesota, applying radioactively labeled cDNA microarrays [14] to the mRNA analysis of 1,176 genes in middle-ear mucosa of rats with and without subacute pneumococcal middle-ear infection. It consisted of six experiments: two cDNA microarrays were run with controls while four were run with pneumococcal middle-ear infec- tion. We first take a natural logarithm transformation for all the observed gene-expression levels so that they are more likely to have a normal distribution, which will reduce the number of clusters found in a model-based clustering. The histograms of gene-expression levels before and after log- transformation for the first experiment are shown in Figure 1. It can be seen that the log-transformation reduces the skewness of the distribution of gene-expression levels. After taking log-transformation, for each experiment we then standardize the transformed gene-expression levels by
Show more

8 Read more

Expanding Self-organizing Map for Data. Visualization and Cluster Analysis

Expanding Self-organizing Map for Data. Visualization and Cluster Analysis

are clearly two sequences of peaks in the ESOM’s U-matrix, which indicate three clusters in the first data set. The layer structure in the data set is clearly illustrated. In contrast, The boundaries in the SOM’s U-matrix is not so clear because some high peaks blur the boundaries. The scatter plots shown in Fig.6 illustrate data clusters on the grid. In the scatter plot, each marker represents a mapping of a data item and the shape of the marker indicates its cluster label. The marker is placed on the winning neuron of the data item. To avoid overlapping, the marker has plotted with a small offset which is determined according to the data item’s Euclidean distance from the winning neurons. The ESOM maps data items in well-organized layers. We can easily find the three clusters in its scatter plot which is quite similar with the original data set as shown in Fig.3. However, the SOM cannot map the data items very well. The outer cluster in Fig.3 is even separated into three subclusters (indicated by ‘+’).
Show more

25 Read more

Fuzzy cluster analysis of high-field functional MRI data $

Fuzzy cluster analysis of high-field functional MRI data $

like FCA can serve as a very powerful data mining and first-pass data analysis tool in fMRI and provides information unavailable to standard evaluation methods that inherently require prior knowledge. The new information obtained with this data-driven approach can help to identify technical or physiological artifacts and may also help to improve the fMRI technique. In this paper, we have demonstrated the potential of FCA for exploratory analysis of both synthetic and in vivo fMRI data. FCA performs at least equal in reproducing the searched activation (i.e. TPs) compared to correlation analysis in cases of properly known stimulus response functions. This may improve the answer to the most commonly posed questions about brain activation: where did it occur (shown by the cluster membership map) and what are the temporal characteristics (depicted by the cluster centroid). As fuzzy cluster analysis is a paradigm independent approach for fMRI data analysis, FCA is able to identify anticipated as well as unexpected haemodynamic responses and artifact-related temporal patterns. The signal processing strategy used in FCA helps to extract the full amount of information without distorting model bias. Data presented here were restricted to oxygenation-contrast fMRI only, however, other potent MR applications may include dynamic perfusion imaging [32] or classification of spectra
Show more

21 Read more

Cluster analysis applied to regional geochemical data: Problems and possibilities

Cluster analysis applied to regional geochemical data: Problems and possibilities

In multi-element analysis of geological materials one usually deals with elements occurring in very different concentrations. In rock geochemistry, the chemical elements are divided into "major", "minor" and "trace" elements. Major elements are measured in % or tens of %, minor elements are measured in about 1 % amounts, and trace elements are measured in ppm, or even ppb. This may become a problem in multivariate techniques that consider all variables simultaneously because the variable with the greatest variance will have the greatest influence on the outcome. Variance is obviously related to absolute magnitude. As one consequence, one should not mix variables quoted in different units in one and the same multivariate analysis (Rock, 1988). Transferring all elements to just one unit (e.g. mg/kg) is not an easy solution to this problem, as the major elements occur in much greater amounts than the trace elements. To enter geochemical raw data, including major, minor and trace elements into cluster analysis does not make sense because it can be predicted that the minor and trace elements would have almost no influence on the result. The same even applies to the major elements: if C (or LOI as a proxy for "organic content") is entered together with the other major elements it will completely govern the clustering just because of its much greater concentrations. The data matrix will thus need to be "prepared" for cluster analysis using appropriate data transformation and standardisation techniques.
Show more

40 Read more

Data Analysis & Monitoring Business Activity

Data Analysis & Monitoring Business Activity

In the context of business-process orientated, wherever organizations concentrate on observance and optimizing their business processes, we have a tendency to believe that the endeavor observance is a very important business demand. Other challenges seem from our experiment that the BAM repository collects the method or subprocess with totally different intervals. Increasing knowledge within the activity warehouse is incredibly quick, thus we want to analysis to boost the performance. Merge knowledge warehouse and activity warehouse capabilities to observe streaming knowledge from operational systems to notice business events, like production-line issues, spikes in client complaints, and decreasing stock on a retailer's shelf.
Show more

7 Read more

Application of cluster analysis and multidimensional scaling on medical schemes data

Application of cluster analysis and multidimensional scaling on medical schemes data

The plots produced by the various MDS methods were compared to the various body system rules. There were some differences between the displays produced by these MDS methods. This is not of serious concern as it was expected that different displays would be produced, but it makes overall interpretation more difficult. The chronic diseases mentioned in the second (CMY, IHD, DYS and HYP), fourth (CSD and IBD) and the fifth (BMD and SCZ) body system rules seem to be strongly related in most MDS plots. The chronic diseases mentioned in the first (COP, AST and BCE) and sixth (MSS, BMD and EPL) body system rules seem to be reasonably related. Points representing COP and AST tend to be located close to each other, but the point representing BCE was usually located further away. It seems therefore that COP and AST have a strong relation with each other, but not with BCE. A similar observation was made with regard to the sixth body system rule. It seemed that BMD and EPL have a reasonable strong relation with each other, but MSS did not show a strong relation with BMD and EPL. The chronic diseases mentioned in the seventh (SLE and RHA) and the third (HYP and CRF) body system rules do seem to be weakly related. Some evidence suggested that there might be some relation between SLE and RHA and some relation between HYP and CRF, but these relations do not seem to be strong. The last body system rule, involving the Diabetes Mellitus diseases DM1 and DM2, was the only rule that was actually applied to the Medical Scheme 55-59 data set used in this study. The Medical Scheme 55-59 data set shows no co-occurrence between these two diseases. It is therefore expected that these two diseases should not be located very close to each other in the MDS plots. This was indeed the case. It was found that the points representing DM1 and CRF tend to be located close to each other. This means that there seems to be a strong similarity between DM1 and CRF.
Show more

188 Read more

Comparisons of Different Methods of Cluster Analysis with Application to Rainfall Data

Comparisons of Different Methods of Cluster Analysis with Application to Rainfall Data

ABSTRACT: Cluster analysis is statistical method of partitioning a sample into homogeneous groups or classes of high similarity, called as cluster. In this study we have classified monthly rainfall data of Marathwada region by using cluster analysis techniques. This is less rainfall affected region by since the lastly ten years. In this study we have used monthly rainfall data of 36 Metrological stations with 40 years of observation (1975 to 2014). There are several cluster analysis techniques available for classification of data .we have using seven cluster analysis techniques such as Ward’s, Complete, single, centroid, Average, median,& McQuitty. The result shows two cluster analysis techniques such as, single linkage & centric linkage method is most useful for classify the rainfall data of Marathwada region with most similarity level & minimum distance. The single linkage was classified 36 station of rainfall into three clusters with greater than 73 % similarity level and minimum distance. The centroid linkage was classified into two clusters with greater than 74% similarity level and minimum distance. This result is useful for identification of spatial rainfall pattern which is an essential task for hydrologists, climatologists as well as regional and local planners.
Show more

12 Read more

Expanding Self-Organizing Map for data visualization and cluster analysis

Expanding Self-Organizing Map for data visualization and cluster analysis

Fig. 4 illustrates the quantization and the topological errors during a typical run on the first data set of both algorithms. It is clearly seen that the quanti- zation error decreases gradually as the learning process continues. The quan- tization error of the trained ESOM is 0.017 which is a bit smaller than that of the trained SOM, 0.018. During learning, the decreasing, the increasing and the converging stages can be observed in the topological error curve. At the very beginning of the training process, the neuron’s weights are fairly different, while some of them even contain remnants of random initial values, thus higher topological errors are obtained. After executing several iterations, the topo- logical error decreases dramatically. Because the learning rate  and the neighborhood function are large, the neurons adjacent on the grid may move much closer to the input data item together. At this stage, the ESOM can learn the ordering topology of data items quickly. As shown in Fig. 4, the topo- logical error of the ESOM is much smaller than that of the SOM. The final topological errors of the ESOM and the SOM are 0.238 and 0.304 respectively, ESOM gains about 20% improvement. Thus, the ESOM can generate better topology preserving maps than the SOM.
Show more

17 Read more

Application of Factor and Cluster Analysis for an evaluation of Business Practices Models of Foreign Banks

Application of Factor and Cluster Analysis for an evaluation of Business Practices Models of Foreign Banks

Graph 4.5.1 shows “Dendrogram”. In Greek language the word ‘ Dendro’ means tree. Here the cases in 3 number clusters are presented in a ‘Tree shape’ or called as a Dendrogram. The branching-type-nature of the Dendrogram allows the researcher to trace backward or forward to any individual case or cluster at any level. It, in addition, gives an idea of how great the distance was between cases or groups that are clustered in a particular step, using a 0 to 25 scale along the top of the chart. While it is difficult to interpret distance in the early clustering phases (the extreme left of the graph), as you move to the right relative distance become more apparent. The bigger the distances before two clusters are joined, the bigger the differences in these clusters. To find a membership of a particular cluster simply trace backwards down the branches to the name.
Show more

36 Read more

Virtual Cluster Management for Analysis of Geographically Distributed and Immovable Data

Virtual Cluster Management for Analysis of Geographically Distributed and Immovable Data

Algorithm 3, shows how resource, application and data specifications are utilized in the re- source allocation process. For data-intensive or I/O intensive workloads, only applications that are sensitive to data locality and network bandwidth are considered. Network latency sensitive applications are not considered in this algorithm. First, GetCandidateList loops through the set of candidate resources (RS) based on the resource requirements and ap- plication constrains (AS). The candidate resource list is then sorted based on the α value. If element values in α are equal, no list sorting is performed. However, if the dominant characteristic is compute intensive, the list will be sorted in descending order based on the computing power of each physical cluster using the SortByCP procedure. Other- wise, if the dominant characteristic is data intensive, the resources listed in DS.source are selected first, followed by the AppendList procedure that adds resources with higher bandwidth to any of the resources in DS.source. If the dominant characteristic is I/O in- tensive, the resource is sorted with higher inbound or outbound network bandwidth. The allocation plan will then be executed using the AllocationP lan procedure, which provi- sions resources from the weighted candidates according to the data distribution which is provided by DS.source.
Show more

155 Read more

Intelligent Pattern Mining and Data Clustering for Pattern Cluster Analysis using Cancer Data

Intelligent Pattern Mining and Data Clustering for Pattern Cluster Analysis using Cancer Data

Data mining techniques are used for the knowledge discovery process under the large data set environment. Clustering techniques are used to group up the relevant data sets. Hierarchical and partitioned clustering techniques are used for the clustering process. The clustering process is the complex task with high process time. The pattern extraction scheme is applied to find frequent item sets. Association rule mining techniques are applied to carry out the pattern extraction process. The pattern extraction scheme and the clustering scheme are integrated in the simultaneous pattern extraction and clustering scheme. The clustering process is improved with pattern comparison and transaction transfer process. The simultaneous clustering scheme is implemented to analyze the cancer patient diagnosis reports. The system is implemented as four major modules data set management, pattern extraction, clustering process and performance analysis. The data sets are preprocessed before the pattern extraction process. The patterns are used in the simultaneous clustering process. The performance analysis is done with the comparison of the data clustering scheme and pattern clustering schemes. The process time and memory factors are used in the performance analysis process. The cluster accuracy is represented using the fitness values. The system is enhanced with the K-means clustering algorithm
Show more

11 Read more

Business Management and Administration Cluster Exam

Business Management and Administration Cluster Exam

Financial. Business analysis is the process of investigating and evaluating a business issue, problem, process, or approach. Analysis helps a business determine if it is achieving its goals or if it needs to take corrective action, so it can achieve its goals. Financial analysis involves evaluating the business's financial health. Because the business needs to make a reasonable profit so it can thrive in marketplace, it is important to continuously analyze the business's financial status. The operations business function involves the day-to-day activities that the business performs—activities that are required for continued business functioning. Information management is the process of coordinating the resources pertaining to business knowledge, facts, or data. Marketing is the process of creating, communicating, and delivering value to customers and managing customer relationships in ways that benefit the organization and its stakeholders.
Show more

20 Read more

BUSINESS MANAGEMENT AND ADMINISTRATION CLUSTER CORE

BUSINESS MANAGEMENT AND ADMINISTRATION CLUSTER CORE

Automated oversight. Risk managers can use automated oversight technology to assign specific risk limits to individual divisions, departments, or employees. If focused on individual employees, the automated oversight technology tracks each employee's activity to determine if and when s/he exceeds the preset risk limit. If the employee exceeds his/her limit, the software notifies management immediately. Management can then take the necessary actions to reduce the risk exposure. Financial analysts use stress testing to determine a particular financial instrument's stability in different extreme events. Data aggregation involves pulling together data from several disparate systems into one central repository or database. Corporate governance is the system by which directors handle their responsibility toward shareholders.
Show more

34 Read more

Innovative Business Model of the Cluster as an Ecosystem

Innovative Business Model of the Cluster as an Ecosystem

In conclusion, it can be noted that the parameters of the cluster as an innovation ecosystem vary from ecosystem to ecosystem, but in any of the metrics are based on measuring and measuring performance functioning in several aspects: 1) participants – financial analysis and organizational characteristics, roles and functions in a cluster, business models economic entities, their strategic and tactical behavior, capabilities and potential development companies, input and output material flows, production capacity and output, sales volumes and sales, the history of the development of successful firms; 2) structure – “types” of companies and their dynamics, information value channels and values business platforms, ways interactions and forms of cooperation, institutional aspects of economic practice; 3) competitiveness in comparison with other ecosystems, including by product, services, innovations, technologies, personnel, brands; 4) business activity in terms of interaction, network formation, transactionality, trade turnover between partners; 5) strategic vision in terms of opportunities, risk, and development. If you summarize the parameters of the cluster as innovation ecosystem and metrics to measure effectiveness then can be traced four basic principles of construction and organization this kind of systems: complexity, self- organization, coevolution, and adaptation. Generally, these principles are properties of any complex socio-economic systems and are the result interdisciplinary synthesis of the following scientific directions: systems theory, synergetic and technology. It may be noted that the theory of clusters also appeals to them. It means, that between ecosystems and clusters more common than differences. The formation and functioning of clusters represent the stage of development of the innovation ecosystem, and the cluster itself is a transitional form ecosystem in terms of evolution economic space. As they accumulate critical mass and capital cluster becoming a full-fledged business ecosystem.
Show more

8 Read more

Cluster Analysis of Temporal Data using Maximum Likelihood Estimation

Cluster Analysis of Temporal Data using Maximum Likelihood Estimation

Due to rapid growth of technologies, a large amount of data gets generated. The need arises to handle this data for retrieving and analyzing useful information. Clustering of temporal data has been explored using evolutionary clustering. However the time dimension associated with the record has not been considered. Traditional clustering algorithms usually focus on grouping data objects based on similarity function. However, if temporal dimension is incorporated, it allows to perform cluster analysis for evolving patterns. Temporal data clustering extends traditional clustering mechanisms and provides underpinning solutions for discovering the condensing information over the period of time. This paper proposes a methodology for clustering records based on time frame. The proposed methodology first clusters the records based on time frame. When a new query record comes, using maximum likelihood estimation we try to identify its true representative cluster. The assignment of query record to a particular cluster is based on the distance measure.
Show more

5 Read more

Data mining in an educational setting : a cluster analysis of browsing behaviour

Data mining in an educational setting : a cluster analysis of browsing behaviour

Starting from 1995 when Amazon started their online business (Thomas, 2015) and the commercialization of the internet began, the need for online marketing arose. Online marketing gave marketers the option to specifically target advertisements to certain people, this was previously not possible through the mass media television and radio. In the year 2017 the total online advertising revenues for the US were $88 Billion which marks a steep increase by 21.4% compared to the year 2016 (IAB 2017). The global spending on digital advertising in 2017 was around $232 Billion according to eMarketer (2018) This shows that online advertising has become important for companies and organizations. The IAB identify several categories of companies that engage in online advertising that range from retail to media related companies. Online advertising is not only used by companies, but also by organisations to voice themselves or by universities to attract students. Online advertisements can take many different forms, from banner displays and videos to audio advertisements used by music streaming portals (IAB 2017). To improve the effectiveness of online advertising, behavioural targeting practices are used which according to Lu et.al (2015) yield a conversion rate which is twice as high as for untargeted advertisements. Conversion rate percentage of people who interact in the desired way with an advertisement, for example registering at a website or buying a certain product. In addition to targeted advertisements being more effective, Goldfarb & Tucker (2011) found that traditional advertisements tend to be ignored. According to Ryan & Jones (2012) it becomes easier for online users to filter out irrelevant information which further stresses the need for meaningful advertisements. Online behavioural targeting was described by Mathews-Hunt (2016) as the collection of online browsing data and assigning the data to interest categories. The data is collected through cookies that are installed on the users browsing device (Jaworska & Sydow, 2008) To analyse the large amounts of data, algorithms are used (Wang et.al, 2017). A frequently used data analysis technique for user segmentation is clustering, where the data entries are grouped in clusters according to their browsing behaviour (Cho et.al, 2005). This enables marketers to identify high potential leads certain groups of people who have a chance of becoming a customer.
Show more

27 Read more

Analyzing and Improving the Efficiency of Hadoop-Cluster for Big Data Analysis

Analyzing and Improving the Efficiency of Hadoop-Cluster for Big Data Analysis

There is enough enthusiasm for the MapReduce(MR) perspective for generous scale data examination [10]. Regardless of the way that the basic control stream of this structure has existed in parallel DBMS for over 20 years, some have thought about MR as a radically new handling model [2][11]. In this paper[11], depiction and connection of the two perfect models have been made. In addition, the both systems with respect to execution and development have been surveyed. To this end, a benchmark containing a social occasion of endeavors that they continued running on an open source type of MR and furthermore on two parallel DBMSs is portrayed. For each task, every system's execution, for various degrees of parallelism on a cluster of 100 nodes, is assessed. Their results revealed some fascinating tradeoffs. Regardless of the way that the methodology to stack data into and tune the execution of parallel DBMSs took any more extended than the MR structure, the observed performance of these DBMSs was too good. Hypothesis has been made about the explanations behind the thrilling execution qualification and considers use thoughts that future systems should take from the two sorts of structures.
Show more

6 Read more

Show all 10000 documents...