This booklet provides participants, educators and event partners with a preparation question guide for TEXATA, the Official 2014 Big Data Analytics World Championships.
TEXATA is a fun, independent and challenging business education competition for Big Data Analytics. The mission is to improve well-rounded technical skills, awareness and understanding of the Big Data Analytics disciplines in business. We seek to celebrate the world’s best organizations, business leaders and community partners. We hope to give students and young professionals the courage to pursue exciting career paths within Big Data, Data Science and Business Analytics and collaborate together with event partners.
The competition involves two Online Qualification Rounds, with a Live World Finals event in Austin, Texas USA. This preparation booklet outlines core concepts to be tested during the TEXATA 2014 World Championships. We testing will examine a diverse range of practical, technical and business themes at the heart of Big Data Analytics – including Sentiment Analysis, Machine Learning, Statistical Methods and Predictive Modeling and Analytics Insights. Round 1 and 2 Qualification questions will combine multiple-choice, short-answer and real-world business implementation case studies. The World Finals is an advanced business case study challenge, with in-depth interviews and face-to-face presentations with global leading judging authorities and industry leaders.
We hope you enjoy Round 1. Good luck!
Competition Structure
Qualification Round 1 (Online, 4 hours)
Round 1 involves ~100 multiple choice and short-answer case studies.
Theoretical and practical questions (“perform this analysis on this dataset — which is the correct answer?”). Datasets will be open data. We will additionally use Thomson Reuters’1 supplied big data set for participants.
Samples of big data sets will be available to competitors in the days leading up to the competition Round 1.
Final scores for Round 1 will be determined by the proportion of correct answers weighted, with additional marks available for prompt time submissions (earlier is better) and creative insight.
Participants will receive a full breakdown of marketing criteria 48 hours prior to Round 1.
Depending on performances, the Top 20%-50% of participants in TEXATA Round 1 will progress to Round 2.
Qualification Round 2 (Online, 4 hours)
Round 2 is a similar format to Round 1, with greater focus on case studies and real-world practical questions.
Theoretical questions involve detailed treatment of technical concepts covered previously in Round 1.
Machine Learning principles and algorithms will also be explored more deeply in Round 2.
Practical questions involve competitors implementing predictive models on very large structured and unstructured datasets. Big data sets will be based provided by Thomson Reuters. Competitors are welcome to include any other public domain data they feel may improve their model.
Round 2 scores will be determined by a combination of multiple choice answer correctness, free text answer assessment against a rubric, predictive modeling score, and submission time (earlier is better). The mark weightings and section allocations will be made available to all registered participants prior to Round 2.
World Finals (Austin, Texas, 6 hours)
The TEXATA 2014 World Finals will be held in Austin, Texas on the weekend of 22nd/23rd November 2014.
As part of technical presentations, Finalists will perform a complete data analysis workflow (i.e. beginning at user interviews and ending with a results presentation).
As part of business presentations, Finalists will be interviewed by a variety of industry leaders and judging panelists on their proposed creative Big Data Analytics solution and real-world business challenges.
Finalists will have access to real business data to solve the issues identified.
Finalists are responsible for their problem definition, scope, execution, and communication of business insights.
TEXATA 2014 winners will be decided by a panel of judges and the question design team (including the problem owner). An evaluation criterion will be based on a rubric covering problem identification and decomposition, approach to solution, implementation effectiveness and clarity of results communication.
1
Technical Requirements
Programming Capabilities
Competitors will be required to perform coding to compete in TEXATA 2014.
Competitors are free to use any languages and frameworks with which they are familiar and comfortable.
Competitors will need to be comfortable in performing numerical computations over data (e.g. “What is the mean of value X in this dataset?”), data processing such as aggregating and normalizing data, and performing text analytics such as word counts and document classification.
More advanced machine learning and predictive modeling skills will be applicable in Round 2 and World Finals.
Whilst we are not focused on code quality or style in either Rounds 1 or 2, judges may request a code review as part of their overall assessment and judging panel interviews and presentations at the Live World Finals in Texas.
Business Results
TEXATA 2014 skills explore commercial impacts and real-world business insights of Big Data Analytics.
TEXATA 2014 is focused on applying on business industries (e.g. financial services, e-commerce and mobility).
Round 1 and Round 2 performances will assess objective, fact-driven results and business insights.
Amazon Web Services
Big Data sets used in the TEXATA 2014 Online Rounds will be hosted by Amazon Web Services.2
Competitors should be comfortable accessing and/or processing data stored in Amazon S3.
Competitors are welcome to download the data from S3 to your preferred storage solution.
Access details for the datasets will be provided in the days prior to the competition.
TEXATA will not provide technical support for accessing the datasets beyond basic connection details.
Competition Interface
TEXATA Rounds 1 and 2 will be conducted through a web browser.
Participants will have 4 hours (240 minutes) to complete both Rounds 1 and Round 2.
Participants are expected to have access to a computer with internet access and their preferred big data analytic environment over this time.
The competition is independent and product agnostic – every participant can use any technological tool, methodology and process to submit their competition solution.
Competitors will enter their multiple choice answers and written case study answers via the HackerRank technology competition platform3.2 AWS: http://www.aws.amazon.com 3 HackerRank: https://www.hackerrank.com
Skill & Expertise
Competitors preparing to enter TEXATA should review the following topics and skills areas. This list is neither exhaustive nor definitive. TEXATA has a strong industry focus, so don’t be too concerned if you’re not too experienced on matrix algebra — as long as you have the technical skills to implement big data analytics, and the business understanding to apply them effectively, you will be a strong competitor.
Statistics
Probability theory
Probability distributions
Precision, recall, accuracy measures
A/B(/n) testing experiment design & interpretation
Computer Science
Algorithm description & identification
Linear algebra
Database fundamentals
Map/Reduce program design
Big data system design
Linux command line tools
Machine Learning
Text analytics
Social network analysis
Mobile data analysis
Business Skills
Big data industry awareness
Stakeholder engagement
Communication of results