Available online: http://edupediapublications.org/journals/index.php/IJR/ P a g e | 836
Mining Competitors from Large Unstructured Datasets
Gaddale Supriya
1,
A.D. Sivarama Kumar
21P.G. Scholar, 2Assistant Professor
1,2 Branch: Computer Science Engineering
1,2SVR Engineering college
Email : :[email protected], [email protected]
Abstract
In any focused business, achievement depends on the capacity to make a thing more engaging clients than the opposition. Various inquiries emerge with regards to this assignment: how would I formalize and evaluate the intensity between two things? Who are the principle contenders of a given thing? What are the highlights of a thing that most influence its intensity? Despite the effect and significance of this issue to numerous spaces, just a restricted measure of work has been given toward a viable arrangement. In this undertaking, I present a formal definition of the intensity between two things, in view of the market portions that they can both cover. Our assessment of aggressiveness uses client surveys, a plentiful wellspring of data that is accessible in an extensive variety of spaces. I present
productive techniques for assessing
aggressiveness in expansive survey datasets and address the normal issue of finding the best k contenders of a given thing. At last, I assess the nature of our outcomes and the adaptability of our methodology utilizing various datasets from various spaces.
Index Terms—Data mining, Web mining,
Information Search and Retrieval,
Electronic commerce
INTRODUCTION
Clients frequently experience issues in communicating their web look needs; they may not know the catchphrases that can
recover the data they require. Watchword
proposal (otherwise called question
recommendation), which has turned out to be a standout amongst the most basic highlights of business web indexes, helps toward this path. Subsequent to presenting a catchphrase question, the client may not be happy with the outcomes, so the watchword proposal module of the internet searcher suggests a lot of m catchphrase inquiries that are destined to refine the client's pursuit
the correct way. Viable watchword
recommendation techniques depend on snap data from inquiry logs and question session information, or question point models . New catchphrase proposals can be resolved by their semantic importance to the first
watchword inquiry. The semantic
importance between two catchphrase
inquiries can be resolved (I) in light of the cover of their clicked URLs in a question log, (ii) by their vicinity in a bipartite chart that interfaces watchword inquiries and their clicked URLs in the inquiry log, (iii) as per their co events in question sessions, and (iv) in view of their similitude in the subject dispersion space.
Available online: http://edupediapublications.org/journals/index.php/IJR/ P a g e | 837
through that takes a client area and client provided watchword question as contentions and returns protests that are spatially close
and textually\ significant to these
contentions. Google handled an every day normal of 4.7 billion inquiries in 2011, a
considerable part of which have
neighborhood purpose and target spatial web objects (i.e., focal points with a web nearness having areas and additionally message portrayals) or geo-records (i.e., archives related with geo-areas). Moreover, 53% of Bing's portable hunts in 2011 were found to have a nearby purpose. To fill this hole, we propose a Location-mindful
Keyword question Suggestion (LKS)
structure. I outline the advantage of LKS utilizing a toy model. An area mindful recommendation is \lobster", which can recover close-by records d4 and d5 that are likewise important to the client's unique inquiry aim. Past watchword inquiry proposal models (e.g.) disregard the client area and would", which again neglects to recover close-by significant archives.
Note that LKS has an alternate objective and in this way contrasts from other area mindful suggestion techniques (e.g.,
auto-consummation/moment look, label
proposal).
The primary test of our LKS system is the way to successfully quantify watchword inquiry closeness while catching the spatial separation factor. In agreement to past question recommendation approaches, LKS builds and uses a watchword record bipartite diagram (KD-chart for short), which associates the catchphrase inquiries with their important reports as appeared in Figure 1(c). Distinctive to all past methodologies which disregard areas, LKS changes the loads on edges in the KD-chart to catch the semantic pertinence between watchword inquiries, as well as the spatial separation between the report areas and the
inquiry guarantor's area _q. I apply an irregular stroll with restart (RWR) process on the KD-chart, beginning from the client provided question kq, to locate the arrangement of m catchphrase inquiries with the most noteworthy semantic importance to kq and spatial nearness to the client area. RWR on a KD-chart has been viewed as better than elective methodologies and has been a standard system utilized in different (area autonomous) watchword proposal thinks about.
The second test is to register the proposals proficiently on a substantial unique chart. Performing keyword\ proposal in a split second is critical for the relevance of LKS practically speaking. Be that as it may, RWR look has a high computational expense on vast diagrams. Past work on scaling up RWR look require
pre-calculation and additionally diagram
division part of the required RWR scores are emerged under the presumption that the change probabilities between hubs (i.e., the edge loads) are known already. What's more, RWR seek calculations
that don't depend on pre-calculation quicken the calculation by pruning hubs dependent on their lower or upper bound scores and
furthermore require the full change
Available online: http://edupediapublications.org/journals/index.php/IJR/ P a g e | 838
conventional procedures for RWR seek, symmetrical to LKS, consequently they can be connected to accelerate RWR look in other huge diagrams. In synopsis, the commitments of this venture are we plan a
Location-mindful Keyword inquiry
Suggestion (LKS) system, which gives proposals that are significant to the client's data needs and can recover pertinent reports near the question guarantor's area.
EXISTING SYSTEM
To the best of our insight, our work is the first to address the assessment of aggressiveness through the examination of vast unstructured datasets, without the
requirement for direct near proof.
Regardless, our work has connections to past work from different areas.
Administrative Competitor Identification:
The administration writing is rich with works that emphasis on how chiefs can physically recognize contenders.
Finding Competitive Products:
The initial phase in these methodologies is the meaning of a strength work that speaks to the estimation of an item. Horizon calculation: Our work use ideas and strategies from the broad writing on horizon calculation.
DISADVANTAGES OF EXISTING
SYSTEM:
Dependency on transactional data
PROPOSED SYSTEM
In our proposed system to identify the competitiveness between the items, in each items having the number of features .I use Product Reviews as well as Hotel Reviews for Implementation. Of Which Hotel Domain Sentiment classification can be extended to give Service recommendation to users based on their requirements.
The user should be adding the reviews of the item based on their intension. Then I collect the data in unstructured datasets over the multiple domains and apply the
Nlp(Natural Language Processing) to
identify the similar kinds of reviews on the products. Then apply the collaborative filtering technique for identify the best items in the various domains based on the user reviews and sort the items.
A user-based CF algorithm is adopted to generate appropriate recommendations. It aims at calculating a personalized rating of each candidate service for a user, and then
presenting a personalized service
recommendation list and recommending the most appropriate services to him/her. Then I
need to find the percentage of
competitiveness between the products that can be calculated based on no of users reviews products/total no of users.
ADVANTAGES OF PROPOSED
SYSTEM:
Efficient and applicable to domains with very large populations of items.
SYSTEM REQUIREMENTS AND
MODULES
FUNCTIONAL REQUIREMENTS: NON-FUNCTIONAL
REQUIREMENTS:
Increased admin security: The PC should be highly secured and accessible only by the administrator to avoid the misuse of the application.
Portability: The GUIs of this application is user-friendly so it is very easy for the user to understand and respond to the same.
Reliability: This system has high
probability to deliver us the required queries and the functionalities available in the application.
Available online: http://edupediapublications.org/journals/index.php/IJR/ P a g e | 839
Scalability: The system can be extended to integrate the modifications done in the present application to improve the quality of the product. This is meant for the future works that is to be done on the application.
Robustness: The application is fault tolerant with respect to illegal user/receiver inputs. Error checking has been built in the system to prevent system failure.
File Uploading Protocol:
This protocol aims at allowing clients to upload files via the auditor. Specifically, the file uploading protocol includes three phases:
• Phase 1 (cloud client → cloud server):
client performs the duplicate check with the cloud server to confirm if such a file is stored in cloud storage or not before uploading a file. If there is a duplicate, another protocol called Proof of Ownership will be run between the client and the cloud storage server. Otherwise, the following protocols (including phase 2 and phase 3) are run between these two entities.
• Phase 2 (cloud client → auditor): client uploads files to the auditor, and receives a receipt from auditor.
• Phase 3 (auditor → cloud server):
auditor helps generate a set of tags for the uploading file, and send them along with this file to cloud server.
Modules:
Admin Module:
In this module, admin can upload details about items i.e. Hotels, Malls, Camera, Restaurants, and Recipes. After that, admin can check all uploaded items details, customer queries and interests. Finally, top-k competitors are identified from given item based on CMiner.
User Module:
In the Second module, we develop the User based features. In this module, the User can give queries for anyone item, i.e. Camera, Hotels, Restaurants and recipes. At first creating the data set for cameras,
Hotels, restaurant, recipes. Collect the user requirement from user page.
C Miner Algorithm Module:
Next, we present C Miner, an exact algorithm for finding the top-k competitors of a given item. Our algorithm makes use of the skyline pyramid in order to reduce the number of items that need to be considered. Given that we only care about the top-k competitors, we can incrementally compute the score of each candidate and stop when it is guaranteed that the top-k has emerged.
2.3.4 Skyline Operator Module:
In this module, skyline operator is performed. The skyline is a well-studied concept that represents the subset of points in a population that are not dominated by any other point. We refer to the skyline of a set of items i as Sky(I). The concept of the skyline leads to the following lemma:
2.3.5 Lemma1:
Given the skyline Sky(I) of a set of items I and an item i ∈ I, let Y contain the k items from Sky(I) that are most competitive with i. Then, an item j ∈ I can only be in the top-k competitors of i, if j ∈ Y or if j is dominated by one of the items in Y.
Fig 4.2 Process Model
SYSTEM ARCHITECTURE Architecture Flow:
Below architecture diagram
Available online: http://edupediapublications.org/journals/index.php/IJR/ P a g e | 840
tiers separately using three layers called presentation layer, business layer, data link layer. This project was developed using 3-tier architecture.
3-Tier Architecture:
The three-tier software architecture (a three layer architecture) emerged in the 1990s to overcome the limitations of the two-tier architecture. The third tier (middle tier server) is between the user interface (client) and the data management (server) components. This middle tier provides process management where business logic
and rules are executed and can
accommodate hundreds of users (as
compared to only 100 users with the two tier architecture) by providing functions such as queuing, application execution, and database staging.
The three tier architecture is used when an effective distributed client/server design is needed that provides (when compared to the two tier) increased performance, flexibility, maintainability, reusability, and scalability, while hiding the complexity of distributed processing from the user. These characteristics have made three layer architectures a popular choice for Internet applications and net-centric information systems
Advantages of Three-Tier:
Separates functionality from
presentation.
Clear separation - better
understanding.
Changes limited to well define
components.
Effective network performance.
Fig: 4.3 System Architecture
ARCHITECTURE DIAGRAM
Fig 4.4 System Design
SCREEN SHOTS
Experimental result analysis is a procedure of analyzing the output of experiments carried on the system.
Available online: http://edupediapublications.org/journals/index.php/IJR/ P a g e | 841
WELCOME ADMIN LOGIN
WELCOME ADMIN MAIN
ALL USERS LIST
ADD HOTELS
Available online: http://edupediapublications.org/journals/index.php/IJR/ P a g e | 842
HOTEL NAMES
ALL MALLS LIST BASED ON
COMPETITORS
MALL NAME
ALL HOTELS BOOKING DETAILS
HOTEL NAME
Available online: http://edupediapublications.org/journals/index.php/IJR/ P a g e | 843
HOTELS AND MALLS
COMPETITING RESULTS
FINDING TOP RESULT
USER REGISTRATION FORM
USER LOGIN
USER MAIN
PROFILE DETAILS
Available online: http://edupediapublications.org/journals/index.php/IJR/ P a g e | 844
ENTER ACCOUNT DETAILS
VIEW ACCOUNT DETAILS
SEARCH MALLS/HOTELS
SEARCH RESULTS
HOTEL DETAILS
Available online: http://edupediapublications.org/journals/index.php/IJR/ P a g e | 845
HOTEL BOOKING
ADDRESS
SEARCH MALLS/HOTELS
SEARCH RESULT
MALL DETAILS
VIEW TOP KEYWORDS SEARCHED
THER TESTING METHODOLOGIES
1.1.1.User Acceptance Testing
Available online: http://edupediapublications.org/journals/index.php/IJR/ P a g e | 846
easily be understood even by a person who is new to the system.
1.1.2.8.3.2Output Testing
After performing the validation testing, the next step is output testing of the proposed system, since no system could be useful if it does not produce the required output in the specified format. Asking the users about the format required by them tests the outputs generated or displayed by the system under consideration. Hence the output format is considered in 2 ways – one is on screen and another in printed format.
1.1.3.8.3.3Validation Checking
Validation checks are performed on the following fields.
1.1.4.8.3.4Text Field
The text field can contain only the number of characters lesser than or equal to its size. The text fields are alphanumeric in some tables and alphabetic in other tables. Incorrect entry always flashes and error message.
1.1.5.8.3.5 Numeric Field
The numeric field can contain only numbers from 0 to 9. An entry of any character flashes an error messages. The
individual modules are checked for
accuracy and what it has to perform. Each module is subjected to test run along with sample data. The individually tested modules are integrated into a single system. Testing involves executing the real data information is used in the program the existence of any program defect is inferred from the output. The testing should be planned so that all the requirements are individually tested.
A successful test is one that gives out the defects for the inappropriate data and produces and output revealing the errors in the system.
1.1.6.8.3.6 Preparation of Test Data
Taking various kinds of test data does the above testing. Preparation of test data plays a vital role in the system testing. After preparing the test data the system under study is tested using that test data. While testing the system by using test data errors are again uncovered and corrected by using above testing steps and corrections are also noted for future use.
1.1.7.8.3.7 Using Live Test Data
Live test data are those that are actually extracted from organization files. After a system is partially constructed, programmers or analysts often ask users to key in a set of data from their normal activities. Then, the systems person uses this data as a way to partially test the system. In other instances, programmers or analysts extract a set of live data from the files and have them entered themselves.
It is difficult to obtain live data in sufficient amounts to conduct extensive testing. And, although it is realistic data that will show how the system will perform for the typical processing requirement, assuming that the live data entered are in fact typical, such data generally will not test all combinations or formats that can enter the system. This bias toward typical values then does not provide a true systems test and in fact ignores the cases most likely to cause system failure.
1.1.8.8.3.8Using Artificial Test Data
Artificial test data are created solely for test purposes, since they can be generated to test all combinations of formats and values. In other words, the artificial data, which can quickly be prepared by a data generating utility program in the information systems department, make possible the testing of all login and control paths through the program.
Available online: http://edupediapublications.org/journals/index.php/IJR/ P a g e | 847
Often, an independent team of testers formulates a testing plan, using the systems specifications.
The package “Virtual Private Network” has satisfied all the requirements specified as per software requirement specification and was accepted.
1.1.9.8.3.9User Training
Whenever a new system is
developed, user training is required to educate them about the working of the system so that it can be put to efficient use by those for whom the system has been primarily designed. For this purpose, the normal working of the project was demonstrated to the prospective users. Its working is easily understandable and since the expected users are people who have good knowledge of computers, the use of this system is very easy.
1.1.10.8.4Maintenance
This covers a wide range of activities including correcting code and design errors. To reduce the need for maintenance in the long run, we have more accurately defined the user’s requirements during the process of system development. Depending on the requirements, this system has been developed to satisfy the needs to
the largest possible extent. With
development in technology, it may be possible to add many more features based on the requirements in future. The coding and designing is simple and easy to understand which will make maintenance easier.
CONCLUSION
I introduced a formal dentition of intensity between two things, which I approved both
quantitatively and subjectively. Our
formalization is pertinent crosswise over spaces, defeating the deficiencies of past
methodologies. I consider various
components that have been to a great extent ignored before, for example, the situation of
the things in the multi-dimensional element space and the inclinations and assessments of the clients. Our work acquaints san end-with end technique for mining such data from vast datasets of client audits. In light of our aggressiveness dentition, I tended to the computationally difficult issue of finding the best k contenders of a given thing. The proposed structure is proficient and material to spaces with substantial populaces of things. The proficiency of our philosophy was verified by means of an exploratory assessment on genuine datasets from
various spaces. Our tests likewise
uncovered that just few surveys is adequate to confidently appraise the distinctive sorts of clients in a given market, also the quantity of clients that have a place with each kind.
BIBILOGRAPGHY
[1] M. E. Porter, Competitive Strategy: Techniques for Analyzing Industries and Competitors. Free Press, 1980. [2] R. Deshpand and H. Gatingon,
“Competitive analysis,” Marketing Letters, 1994.
[3] B. H. Clark and D. B. Montgomery,
“Managerial Identification of
Competitors,” Journal of Marketing, 1999.
[4] W. T. Few, “Managerial competitor
identification: Integrating the
categorization, economic and
organizational identity perspectives, “Doctoral Dissertation, 2007.
[5] M. Bergen and M. A. Peteraf,
“Competitor identification and
competitor analysis: a broad-based managerial approach,” Managerial and Decision Economics, 2002.
[6] J. F. Porac and H. Thomas, “Taxonomic
mental models in
competitordefinition,” The Academy of Management Review, 2008.
Available online: http://edupediapublications.org/journals/index.php/IJR/ P a g e | 848
integration,” Academy of
Management Review, 1996.
[8] R. Li, S. Bao, J. Wang, Y. Yu, and Y.
Cao, “Cominer: An effective
algorithm for mining competitors from the web,” in ICDM, 2006.
[9] Z. Ma, G. Pant, and O. R. L. Sheng, “Mining competitor relationships from online news: A network-based approach,” Electronic Commerce Research and Applications, 2011. [10] R. Li, S. Bao, J. Wang, Y. Liu, and Y.
Yu, “Web scale competitor discovery using mutual information,” in ADMA, 2006.
[11] S. Bao, R. Li, Y. Yu, and Y. Cao, “Competitor mining with the web,” IEEE Trans. Knowl. Data Eng., 2008. [12] G. Pant and O. R. L. Sheng, “Avoiding
the blind spots: Competitor
identification using web text and linkage structure,” in ICIS, 2009. [13] D. Zelenko and O. Semin, “Automatic
competitor identification from public information sources,” International Journal of ComputationalIntelligence and Applications, 2002.
[14] R. Decker and M. Trusov, “Estimating aggregate consumer preferences from online product reviews,” International Journal of Research in Marketing, vol. 27, no. 4, pp. 293–307, 2010. [15] C. W.-K. Leung, S. C.-F. Chan, F.-L.
Chung, and G. Ngai, “A probabilistic rating inference framework for mining
user preferences from reviews,”
World Wide Web, vol. 14, no. 2, pp. 187–215, 2011.
[16] K. Lerman, S. Blair-Goldensohn, and
R. McDonald, “Sentiment
summarization: evaluating and
learning user preferences,” in ACL, 2009, pp. 514–522.
[17] E. Marrese-Taylor, J. D. Vel´asquez, F.
Bravo-Marquez, and Y.
Matsuo,“Identifying customer
preferences about tourism products
using an aspect-based opinion mining
approach,” Procedia Computer
Science, vol. 22, pp. 182–191, 2013. [18] C.-T. Ho, R. Agrawal, N. Megiddo,
and R. Srikant, “Range queries in olap data cubes,” in SIGMOD, 1997, pp. 73–88.
[19] Y.-L.Wu, D. Agrawal, and A. El
Abbadi, “Using wavelet
decomposition to support progressive and approximate range-sum queries over data cubes,” in CIKM, ser. CIKM ’00, 2000, pp. 414–421.
[20] D. Gunopulos, G. Kollios, V. J.
Tsotras, and C. Domeniconi,
“Approximating multi-dimensional
aggregate range queries over real attributes,” in SIGMOD, 2000, pp. 463–474.
[21] M. Muralikrishna and D. J. DeWitt, “Equi-depth histograms for estimating
selectivity factors for
multi-dimensional queries,” in SIGMOD, 1988, pp. 28–36.
[22] N. Thaper, S. Guha, P. Indyk, and N. Koudas, “Dynamic multidimensional histograms,” in SIGMOD, 2002, pp. 428–439.
[23] K.-H. Lee, Y.-J. Lee, H. Choi, Y. D. Chung, and B. Moon, “Parallel data
processing with mapreduce: a
survey,” AcM sIGMoD Record,vol. 40, no. 4, pp. 11–20, 2012.
[24] S. B¨orzs¨onyi, D. Kossmann, and K. Stocker, “The skyline operator,”in
ICDE, 2001.
[25] D. Papadias, Y. Tao, G. Fu, and B. Seeger, “An optimal and progressive algorithm for skyline queries,” ser. SIGMOD ’03.
[26] G. Valkanas, A. N. Papadopoulos, and D. Gunopulos, “Skyline ranking `a la IR,” in ExploreDB, 2014, pp. 182– 187.
Available online: http://edupediapublications.org/journals/index.php/IJR/ P a g e | 849
set of vectors and applications,” J. ACM, 1978.
[28] X. Ding, B. Liu, and P. S. Yu, “A
holistic lexicon-based approach