2. Chapter 2: SQL Learnability
2.5 SQL Content Review
2.5.3 Augmented Use of One Query Language and Database Structure
Data-model/query language is one of the factors that the performance of a database user is influenced by [4, 92, 93]. Some examples, as shown in Table 2.5, that Augmented between one query language and database base structure (ER, network, relational, or hierarchical) model. One of these examples is explored in detail, Chan’s study [92]. The reason is that Chan’s study looked at the cognitive activities that this research focused on, as discussed in section 2.3.
Table 2.5: Summary of the Research Conducted to Augment the Use of Query Language and Database Structure.
Chan’s study
The aim of this study was to conduct an experiment that measures query performance at both the query translation stage and the query writing stage which was discussed by Ogden [59]
Subjects:
20 first year undergraduate students participated in this study.Researcher Experimental nature Results Lochovsky &
Tsichritzis [94]
Query witting with ER or
relational models No difference in the number of sematic errors. ER user were faster to complete the task
Chan [93] Query witting with ER model and KOL or relational model with SQL
Users in ER\KOL perform better(time and query correctness), more confident than in ER\SQL user
Leitheiser &
March [95] Query evaluation & witting with ER or relational models Query learning and using is easier with relational than ER Chan [92] query performance with the
relational model and SQL was measured at two query stages: the query translation and query writing stages
SQL query difficulties (which are all based on the query writing stage). Exploratory analysis of query difficulties show surprises. For example, operations generally perceived to be difficult (such as joins, group count and repeated relations) are not difficult at the query translation stage, i.e. the difficulties are not because of the relational model, but because of SQL.
76
Teaching / Training:
Students were trained by an administrator before theytook the query test. A training manual was used during the study to provide a brief overview of both relational data model and the query language. To improve learning, feedback on query accuracy was given before proceeding to the next example. The period of training was about one hour.
Evaluation:
The task in this study is set at two stages. Query translation is thestage that tests subject’s understanding of the data value representation of the relational model. The second stage requires the users to write down the query syntax. At this stage, the researchers test whether participants can specify the query operations with SQL syntax. Both stages cover the same query questions. Each subject performed seven queries for both stages. The queries covered a comprehensive range from the very simple to the very difficult. The seven chosen queries covered the following semantic specifications: single entity, two entities (of different types) connected by a relationship, attribute condition, two instances of the same type, counting of relationships, quantifiers for WHERE, EXIST and not EXIST.
Results
: The study result showed that:- It is possible to understand the relational model for many operations, but it is difficult to express these operations in SQL.
- Confirm the findings in the literature about applying SQL operation difficulties.
- Before using relational database systems, it is recommended that one need more training on the particular difficulties of the query language, and also the operations that are even difficult at the model level (e.g. sub query with not exist).
- Knowing more about the difficulties in expressing operations in SQL allows educators focus on these aspects of SQL that cause problems for users and thus allow a more focused training for SQL users.
77
Chan’s study helped this research to investigate more about the learner’s ability to perform tasks in the three cognitive activities that were identified by Ogden [59] as was discussed in section 2.3.2. In addition, it motivates this research to investigate the SQL misconception or the most difficult concepts in SQL. Many studies reported that students experience many problems when learning SQL as was discussed earlier. Some of these problems arise from misconceptions in the student's understanding of the elements of SQL and the relational data model in general. For example, students find that join conditions, grouping and restricting grouping are the hardest concepts to understand [96]. The difference between aggregate and scalar functions is another common source of confusion [3]. In addition, Lu et al. [96] carried out a survey on the kinds of SQL statements used by 149 SQL writers from 41 companies. They found an even spread from very simple to very complex queries, with just over a quarter of queries involving 5 or more conditions, and 20% of the queries being classified as complex (involving 5 or more relations, or having more than 6 attributes or more than 5 conditions. Moreover, Lu et al. cite research [57, 97] which shows that the rate of incorrect SQL queries ranges around 75% mark, which is as astonishing as it is unacceptable.
Clearly a failure to develop SQL skills is not merely an academic issue: it has wide-ranging effects and there is a need to find a better way of helping students to really grasp the nuances of SQL. Chapter 5 investigates the factors that might affect SQL learnability from both learner and educators perspectives. It also examines the learners’ skills in solving query problems. The next section gives a summary of the reviewed studies.