Performance Evaluation - Supporting Uncertainty in Standard Database Management Systems

Strictly speaking, we cannot compare the performances between our UDBMS and the conventional DBMSs as the relational data models they are based on are different. But as our UDBMS provides supports for the conventional databases managements, experiments can be conducted to study the overhead (if any) posed by our UDBMS to the conventional queries. More particularly, test databases TDB_A, TDB_B, TDB_C and the test query 1 are chosen for experiments.

Uncertainty associated for the three test databases are removed (and the new test databases are TDB_A‟, TDB_B‟, TDB_C‟) and the test query 1 is modified to a conventional query, called the test query 0, as follows:

SELECT L_EXTENDEDPRICE, L_DISCOUNT, L_ORDERKEY,

L_SHIPDATE

FROM CUSTOMER, ORDERS, LINEITEM, NATION, REGION

WHERE O_ORDERKEY = L_ORDERKEY AND R_NAME =”Asia” AND

R_REGIONKEY = N_REGIONKEY AND C_NATIONKEY = N_NATIONKEY AND C_CUSTKEY = O_CUSTKEY AND

O_ORDERDATE >= DATE „1995/08/01” AND O_ORDERDATE <= DATE „1995/12/31‟;

Table 3 and Figure 26 below show the experiment results of evaluating the test query 0 on the 3 test databases in both the UDBMS and the MySQL DBMS. Although our

UDBMS does the additional query validation, parsing and analysis, little overhead

(about 5%) is posed compared to the conventional DBMSs for the conventional queries. TDB_A‟ TDB_ B‟ TDB_C‟ Number of result tuples 77184 151911 240400 Evaluation Time

(in sec) in UDBMS

12.97 36.52 91.34 Evaluation Time

(in sec) in MySQL

11.73 33.47 82.88

Table 3. Evaluating conventional query in UDBMS/ MySQL DBMS

Figure 26. Evaluating conventional query in UDBMS/ MySQL DBMS

0 10 20 30 40 50 60 70 80 90 100 0.5 1.5 2.5 3.5 4.5 Ev al u ation Ti m e ( in se c)

Database Size (in GB)

UDBMS MySQL DBMS

in the naïve approach (discussed in Section 5.3.1), exclusive queries, nested queries and queries with the “WITH RE”, “(MAX|MIN) RE”, “ORDER BY RE”, “RE (<|<=|=|>=|>) <value>” keywords are therefore cannot be evaluated in the naïve

UDBMS (UDBMS that is constructed by following the naïve approach).

As for our test queries set, only the 1st test query can be evaluated and the experiment results of evaluating the test query 1 in the naïve UDBMS are shown in Table 4.

TDB_A TDB_B TDB_C TDB_D TDB_E TDB_F Evaluation Time (in

sec) on the original test databases

245.28 632.12 2145.28 273.07 692.53 2535.85 Table 4. Evaluating test query 1 in the naïve UDBMS

In order to compare the performance differences between the naïve UDBMS and the improved UDBMS (UDBMS that is constructed by following the improved approach), we present the experiment results of running the test query 1 in the improved UDBMS in Table 5. We also present the experiment results of running the test query 1 on the 6 test databases when the information sources associated with the test databases are fully reliable sources (with reliability 1.0) in Table 5.

TDB_A TDB_B TDB_C TDB_D TDB_E TDB_F Evaluation Time (in sec)

on the original test databases

19.63 41.56 102.36 29.51 56.75 115.45 Evaluation Time (in sec)

on test databases with fully reliable sources

18.39 43.87 102.85 31.12 55.64 116.59 Table 5. Evaluating test query 1 in the improved UDBMS

Figure 27. Performance differences between the naïve and the improved UDBMSs Studies of Figure 27, which graphically shows the performance differences between the naïve and the improved UDBMSs, tell us that the performance of the UDBMSs do not rely on the reliability values of the information sources and the performance of the improved UDBMS is far better than the naïve UDBMS in two ways:

1. The evaluation time required by the improved UDBMS is less than the evaluation time required by the naïve UDBMS for the same test query and test database. For instance, the evaluation time (245.28 seconds) of the test query 1 on the test database TDB_A in the naïve UDBMS is almost 10 times over the evaluation time (25.56 seconds) of the test query 1 on the test database TDB_A in the improved UDBMS;

2. The evaluation time required by the improved UDBMS for the same test query increases more slowly than the naïve UDBMS when the size of the database and the number of the associated information sources increase. For instance, while the evaluation time for a test query on a 3.6GB test database is 9 times over the evaluation time for the same test query on a 1GB test database in the

0 500 1000 1500 2000 2500 0.5 1.5 2.5 3.5 Ev al u ation Ti m e ( in se c)

Database Size (in GB)

Naïve approach(S2) Improved approach(S2) Naïve approach(S10) improved approach(S10) improved approach(S2- 1.0) improved approach(S10-1.0)

naïve UDBMS, the time only increases 6 times over the 1GB test database‟s case in the improved UDBMS;

Reasons for these huge performance differences between the naïve and the improved

UDBMS, as we have discussed in Section 5.3, lie in the fact the time-consuming

“ISTtuple_group” operation is not longer required by the improved UDBMS for the HashSet structure that is used to internally represent the data sets guarantees duplicate-free collection of the elements it keeps.

In what follows, we study the performance of the improved UDBMS in a more detailed level.

If we regard the experiment results of evaluating the test query 0 as the basis for comparison, we can then study the performance of the source vectors‟ calculations by evaluating the test query 1 in the system, we can study the performance of the reliability calculations by evaluating the test query 2 in the system, we can study the performance of the “ISTtuple_sort” operation by evaluating the test query 3 in the system, and we can study the largest overhead that may be posed by our improved

UDBMS to the USQL queries by evaluating the test query 4 in the system.

Table 6 below shows the experiment results of running the test queries 1 to 4 in the improved UDBMS (we list the experiment results of running test query 0 in Table 5 again for comparison purpose).

Query No. TDB_A TDB_B TDB_C TDB_D TDB_E TDB_F 0 11.73 33.47 82.88 11.73 33.47 82.88 1 15.14 41.56 102.36 29.51 56.75 115.45 2 19.63 47.79 108.21 42.41 76.66 120.47 3 21.34 50.25 116.54 49.45 85.05 122.82 4 23.54 53.74 120.36 54.84 88.92 131.61 Table 6. Experiment results of running test queries 1 to 4 in the improved UDBMS Graphically comparing the experiment results of running test queries 0 to 4 in the improved UDBMS on the test databases with 2 information sources and 10 information sources, we have:

Figure 28. Experiment results of running test queries 0 to 4 in the improved UDBMS on test databases with 2 information sources

0 20 40 60 80 100 120 140 0.5 1 1.5 2 2.5 3 3.5 4 Ev al u ati on Ti m e (i n sec )

Database Size (in GB)

TQ0 TQ1 TQ2 TQ3 TQ4

Figure 29. Experiment results of running test queries 0 to 4 in the improved UDBMS on test databases with 10 information sources

Analysis of the Figures 28 and 29 tell us that:

1. As we have to calculate the source vectors associated with the result tuples to calculate their reliability values and we have to sort the result tuples by their reliability values to find the data tuples with the largest reliability value, more time is required to evaluate the test query 4 than the test queries 3, 2, 1, and 0. But the calculations of the source vectors associated with the result tuples, the calculations of the reliability values for the result tuples, and the “ISTtuple_sort” operation all reach high performance so that the time required to evaluate the test query 4 only increases by 50% over the time required to evaluate the test query 0 in the test database with 3.6 GB size and 10 information sources;

2. More time is required to evaluate the queries on the test databases with more information sources, but the increment of the time is not so obvious, for

0 20 40 60 80 100 120 140 0.5 1 1.5 2 2.5 3 3.5 4 Ev al u ati on Ti m e (i n sec )

Database Size (in GB)

TQ0 TQ1 TQ2 TQ3 TQ4

instance, the time required to evaluate the test query 4 in the improved

UDBMS on the test databases TDB_F (test database with 10 information

sources) is only 10% more over the time required on the test databases TDB_C;

Experiment results of running test queries 5, 6 and 7 in the improved UDBMS are shown in Table 7 as below.

Query No. TDB_A TDB_B TDB_C TDB_D TDB_E TDB_F 5 10.76 23.49 35.53 17.93 29.58 43.71 6 6.89 74.05 143.19 9.66 102.54 173.83 7 45.28 93.06 143.89 68.39 109.89 159.01 Table 7. Experiment results of running test queries 5 to 7 in the improved UDBMS Although the time required by the UDBMS to evaluate these queries is longer than the time required by the UDBMS to evaluate the simple queries on average, supports for the exclusive and the nested queries are quite good in the improved UDBMS. From what we can see in Figure 30, the time required to evaluate the nested and the exclusive queries grows linearly with respect to the number of the tuples and information sources.

Figure 30. Experiment results of running test queries 5 to 7 in the improved UDBMS Finally, efforts are paid to study the scalability of our UDBMS. As large amount of disk I/Os are required to read data from the databases when the they cannot be loaded completely into the main memory and the “add” operation for the HashSet becomes extremely expensive when the operating system could not cache the hash codes for all the elements of the HashSet in the main memory, we believe that our proposed

UDBMS can only be used to answer queries on the databases (relations) when they

can be loaded completely to the main memory.

Experiments are conducted accordingly to study the performance of the improved

UDBMS when the test databases cannot be loaded completely into the main memory.

Test database TDB_G is chosen, and the test queries are evaluated in the improved

UDBMS on TDB_G. Unfortunately we fail to measure the time required for the

improved UDBMS to evaluate the test queries as our system terminates when the operating system runs out of memory.

0 20 40 60 80 100 120 140 160 180 200 0.5 1 1.5 2 2.5 3 3.5 4 Ev al u ati on Ti m e (i n sec )

Database Size (in GB)

TQ5(S2) TQ5(S10) TQ6(S2) TQ6(S10) TQ7(S2) TQ7(S10)

To sum up, we list the experiment results of running the 7 test queries in the improved

UDBMS on the test databases in Table 7 and Figure 31 and we present our

observations as follows.

TDB_A TDB_B TDB_C TDB_D TDB_E TDB_F TDB_G SUM

(in sec) 142.58 383.94 770.08 272.19 549.39 866.90 UN-KNOWN Table 8. Evaluating all test queries in the improved UDBMS

Figure 31. Evaluating all test queries in the improved UDBMS

Our observation is less than conclusive, but these experiment results convinced that although more time is required to evaluate the queries when there are more data tuples and information sources to consider, the time required to evaluate the queries increases linearly as long as the databases (relations) can be loaded completely to the main memory.

Chapter 7 Conclusion and Future Research

Our motivation in this work laid in the lack of efficient uncertain database management systems in the database and the artificial intelligence fields. Our goal was to primarily develop an efficient database management system for manipulation with uncertain data.

To reach our goal, we classified the two existing approaches to build DBMS for uncertain data, we explained their differences and we justified why we choose the “light weight” way to construct the new system. We reviewed the IST formulism proposed by Sadri, and we extended SQL to USQL to define the transaction rules in the new UDBMS. New query parsing, analyzing and evaluation techniques were introduced, and for the purpose of further enhancing the evaluation efficiency, different data storing mechanisms and algorithms were proposed.

Two UDBMS prototypes were built and a bunch of experiments were conducted to study the performance of our proposed techniques. Our experiment results show that the improved UDBMS yields faster evaluation than the naïve UDBMS. And although more time is required by the improved UDBMS to evaluate the queries on bigger databases and databases with more information sources, the time required increases linearly.

Despite the facts that our UDBMS is based on the IST formulism and it uses MySQL DBMS as its back-end DBMS, the techniques and approaches we have studied in this thesis can be easily modified and adapted to build new uncertain database management systems that are based on other uncertain theories or use other conventional DBMSs as the back-end DBMS.

Note that since the proposed UDBMS is an in-memory, single-user system, supports for large size databases are inadequate. As databases can get really huge and usually they cannot be loaded completely in the main memory, future researches can be dedicated to study the problem of efficient evaluating queries on large size databases. Another research direction is to provide full supports for the SQL features in the

USQL. Until now, our UDBMS only provides portions of the supports for the

uncertain data, thus the future works can be done to provide full supports for the “LIKE” operator, the “index” mechanisms in the USQL language.

References

[Adar 07] Eytan Adar and Christopher Re. “Managing Uncertainty in Social Networks”. In: IEEE Computer Society Technical Committee on Data

Engineering (TCDE), 2007.

[Agra 06] P. Agrawal, O. Benjelloun, A. Das Sarma, C. Hayworth, S. Nabar, T. Sugihara, and J. Widom. “TRIO: A System for Data, Uncertainty and Lineage”. In: Proc. 32nd Intl. Conf. on Very Large Data Bases (VLDB), pages 1151-1154, 2006.

[Agar 09] Charu C. Agarwal, and Philip S. Yu. “A survey of Uncertain Data Algorithms and Applications”. In: IEEE Transactions on Knowledge and

Data Engineering (TKDE), Vol. 21, No. 5, 2009.

[Antlr] “ANTLR Homepage”. http://www.antlr.org.

[Anto 07a] L. Antova, C. Koch, and D. Olteanu. “From Complete to Incomplete Information and Back”. In: Proc. of ACM SIGMOD Int’l. conf. on

Management of data, 2007.

[Anto 07b] L. Antova, C. Koch, and D. Olteanu. “10^(10^6) Words and Beyond: Efficient Representation and Processing of Incomplete Information”. In:

Proc. 23rd IEEE Int’l Conf. Data Eng. (ICDE), 2007.

Relational Processing of Uncertain Data”. In: Proc. 24th IEEE Int’l Conf.

Data Eng. (ICDE), 2008.

[Bell 99] S. Bell. “A Beginner's Guide to Uncertainty of Measurement”. In:

Measurement Good Practice Guide, No. 11, Issue 2, 2001.

[Benj 06] O. Benjelloun, A. Das Sarma, A. Halevy, and J. Widom. “ULDBs: Databases with Uncertainty and Lineage”. In: Proc. 32nd Int’l Conf.

Very Large Data Bases (VLDB), 2006.

[Benj 08] O. Benjelloun, A. Das Sarma, A. Halevy, M. Theobald, and J. Widom. “Databases with Uncertainty and Lineage”, In: Int’l Conf. on Very Large

Data Bases (VLDB), Vol. 17, Issue 2, 2008.

[Bos 02] Kees van den Bos and E. Allan Lind. “Uncertainty Management by Means of Fairness Judgments”. In: Advances in Experimental Social

Psychology, Vol.34, pp. 1-60, 2002.

[Chen 03] R. Cheng, Dmitri V. Kalashnikov, and S. Prabhakar. “Evaluating Probabilistic Queries over Imprecise Data”. In: Proc. of the 2003 ACM

SIGMOD Int’l Conf. on Management of Data, 2003.

[Das 92] Amar K. Das, Samson W. Tu, Gretchen P. Purcell, and Mark A. Musen. “An Extended SQL for Temporal Data Management in Clinical Decision-Support Systems”. In: Proc. of the Annual Symposium on

[Dey 97] D. Dey, and S. Sarkar. “Extended SQL Support for Uncertain Data”, In:

Proc. of the 16th Int’l Conf. on Conceptual Modeling (ER’97), 1997.

[Egeh 94] Max. J. Egehofer. “Spatial SQL: A Query and Presentation Language”. In: IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 6, 1994.

[Gare 79] M. R. Garey, and D. S. Johnson. “Computers and Intractability, A Guide to the Theory of NP-Completeness”. Computers and Intractability; A

Guide to the Theory of NP-Completeness, W. H. Freeman & Co.,

ISBN:0716710455, 1979.

[Ge 09] Tingjian Ge. “Query Processing on Uncertain Data”, Ph.D. Thesis, Brown University, 2009.

[Gotz 09] M. Gotz, and C. Koch. “A Compositional Framework for Complex Queries over Uncertain Data”. In: '09 Proceedings of the 12th Int’l Conf.

on Database Theory (ICDT), pp. 149-161, 2009.

[Heuv 07] Gerard Heuvelink. “Error-aware GIS at work: real-world applications of the Data Uncertainty Engine”. In: Int’l Workshop on Uncertainty in

Environmental Modeling, 2007.

[Kian 05] A. Kiani, and N. Shiri. “A Framework for Information Integration with Uncertainty”. In: Lecture Notes in Computer Science, Vol. 3563/2005, 194-206, DOI: 10.1007/11533962_17, 2005.

[KieB 02] W. KieBling, and G. Kostler. “Preference SQL – Design, Implementation, Experiences”. In: Proc. of the 28th Int’l Conf. Very

Large Data Bases (VLDB), 2002.

[Laks 01] Laks V.S. Lakshmanan and Nematollaah Shiri, "Logic Programming and Deductive Databases with Uncertainty: A Survey". In: Encyclopedia of

Computer Science and Technology, Vol. 45, pp 155-176, Marcel Dekker,

Inc., 2001.

[Lore 97] Nikos A. Lorentzos, and Yannis G. Mitsopoulos. “SQL Extension for Interval Data”. In: IEEE Transactions on Knowledge and Data

Engineering (TKDE), Vol. 9, No. 3, 1997.

[Mano 01] I. Manolescu, D. Florescu, and D. Kossmann. “Answering XML Queries over Heterogeneous Data Sources”. In: Proc. of the 27th Int’l Conf. Very

Large Data Bases (VLDB), 2001.

[Motr 94] A. Motro. “Management of uncertainty in database systems”. Modern

database systems: The Object Model, Interoperability, and Beyond,

1994.

[Motw 03] R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma. “Query Processing, Resource Management, and Approximation in a Data Stream Management System”. In: Proc. of the Conf. on Innovative Data Systems

Research (CIDR), 2003.

[Orion] “The Orion 2.0 Documentation”. http://orion.cs.purdue.edu/doc.html. [Pars 96] S. Parsons. “Current Approaches to Handling Imperfect Information in

Data and Knowledge Bases”. In: IEEE Transactions on Knowledge and

Data Engineering (TKDE), 8(3):353--372, 1996.

[Pira 92] H. Pirahesh, Joseph M. Hellerstein, and W. Hasan. “Extensible/Rule Based Query Rewrite Optimization in Starburst”. In: Proc. of ACM

SIGMOD Int’l. conf. on Management of data, 1992.

[Riez 03] S. Riezler, and Yi Liu. “Query Rewriting using Monolingual Statistical Machine Translation”. In: Journal of Computational Linguistics, Vol. 36, No. 3, 2003.

[Sadr 91a] F. Sadri. “Reliablity of Answers to Queries in Relational Database”. In:

IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 3,

No.2, pp. 245-251, 1991.

[Sadr 91b] F. Sadri. “Modeling Uncertainty in Databases”. In: 7th Int’l Conf. on

Data Engineering (ICDE), pp. 122-131, 1991.

[Sadr 94] F. Sadri. “Aggregate Opertaions in the Information Source Tacking Method”. In: Theoretical Computer Science. Vol. 133, Issue 2, pp. 421-442, 1994.

[Sadr 95] F. Sadri. “Integrity Constraints in the Information Source Tracking Method”. In: IEEE Transactions on Knowledge and Data Engineering

(TKDE), Vol. 7, No. 1, 1995.

[Sard 90] Nandlal L. Sarda. “Extensions to SQL for Historical Databases”. In:

IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 2,

No. 2, 1990.

[Sarm 06] A. Das Sarma, O. Benjelloun, A. Halevy, and J. Widom. “Working Model for Uncertain Data”. In: Proc of 22nd IEEE Int’l Conf. on Data

Engineering (ICDE), 2006.

[Sarm 09] Anish Das Sarma. “Managing Uncertainty Data”. Ph.D. Thesis, Stanford University, 2009.

[Shir 04] N. Shiri and Z.H. Zheng. “Challenges in Fixpoint Computation with Multisets". In Proc. of 3rd Int’l Symposium on Foundations of

Information and Knowledge Systems (FoIKS), pp. 273-290, LNCS 2942,

2004.

[Sing 08] Sarvjeet Singh, Chris Mayfield, Sagar Mittal, Sunil Prabhakar, Susanne Hambrusch, Rahul Shah. “Orion 2.0: Native Support for Uncertain Data”. In: Proc. of ACM SIGMOD Int’l. conf. on Management of data, 2008.

a Temporal Extension of SQL”. In: 19th Int’l Conf. on Data Engineering

(ICDE), ISBN: 0-7803-7665-X, 2003.

[Yan 01] Qi Yang, Weining Zhang, Jing Wu, H. Nakajima, and Naphtali D. Rishe. “Efficient Processing of Nested Fuzzy SQL queries in a Fuzzy Database”. In: IEEE Transactions on Knowledge and Data Engineering

(TKDE), Vol. 13, No. 6, 2001.

[Yao 03] Yong Yao, and J. Gehrke. “Query Processing for Sensor Networks”, In

Proc. of the Conf. on Innovative Data Systems Research (CIDR), 2003.

[Yu 04] Cong Yu, and L. Popa. “Constraint-Based XML Query Rewriting for Data Integration”. In: Proc. of the ACM SIGMOD Int’l Conf. on

Management of Data, 2004.

[Zade 65] L. Zadeh. “Fuzzy sets”. In: Information and Control, Vol. 8, pp. 338–353, 1965.

[Zhan 08] Wenjie Zhang, Xuemin Lin, Jian Pei, and Ying Zhang. “Managing Uncertain Data: Probabilistic Approaches”. In Proc. of the 9th Int’l Conf. on Web-Age Information Management, 2008.

[Zhen 04] Zhi hong Zheng, “On Efficient Fixpoint Computation of Deductive Databases with Uncertainty”. Master’s Thesis, Concordia University, 2004.

Appendix

The language grammars used for validating the input USQL commands are shown as follow:

createtable

: 'CREATE' 'TABLE' IDENTIFIER

'(' IDENTIFIER type (notnullconstraint | defaultconstraint)? (auto_increasement)? (',' IDENTIFIER type (notnullconstraint | defaultconstraint)?

(id11=auto_increasement)? )* ',' constraint (',' constraint)* ')' ';'

;

insert

: 'INSERT' 'INTO' IDENTIFIER

('(' IDENTIFIER (',' IDENTIFIER)* ')')? 'VALUES' '(' expression (',' expression)* ')' ('WITH' 'SV' '(' expression (',' expression)* ')')? ';' ;

update

: 'UPDATE' IDENTIFIER

'SET' IDENTIFIER '=' expression (',' IDENTIFIER '=' expression)*

('WHERE' expression)? ';' | 'UPDATE' IDENTIFIER

'SET' 'SV' '=' '\"' expression (',' expression )* '\"' ('WHERE' expression)? ';'

;

alter

: 'ALTER' 'TABLE' IDENTIFIER ('RENAME' IDENTIFIER

| 'MODIFY' 'COLUMN' IDENTIFIER type | 'CHANGE' IDENTIFIER IDENTIFIER type

| 'ADD' 'COLUMN' IDENTIFIER type ('AFTER' IDENTIFIER | 'FIRST') | 'ADD' 'COLUMN' '(' (','IDENTIFIER type)* ')'

| 'ADD' constraint

; constraint : primarykeyconstraint | uniqueconstraint | foreignkeyconstraint | checkconstraint

| 'CONSTRAINT' IDENTIFIER primarykeycombo | 'CONSTRAINT' IDENTIFIER uniquecombo | 'CONSTRAINT' IDENTIFIER checkcombo ;

primarykeyconstraint

: 'PRIMARY' 'KEY' '(' IDENTIFIER ')' ;

uniqueconstraint

: 'UNIQUE' '(' IDENTIFIER (',' IDENTIFIER)* ')' ;

foreignkeyconstraint

: 'FOREIGN' 'KEY' '(' IDENTIFIER ')'

'REFERENCE' IDENTIFIER '(' IDENTIFIER ')' ;

checkconstraint

;

primarykeycombo

: 'PRIMARY' 'KEY' '(' IDENTIFIER (',' IDENTIFIER)* ')' ;

uniquecombo

: 'UNIQUE' '(' IDENTIFIER (',' IDENTIFIER)* ')' ;

checkcombo

: 'CHECK' '(' expression (',' expression)* ')' ;

selection

: query_clause

(('UNION' ('ALL')? | 'INTERSECT' | 'MINUS') query_clause)* ';'

;

query_clause

: 'SELECT' expression (with_clause)? 'FROM' expression

('WHERE' expression)? ('ORDER BY RE')?

| 'DATE' '\'' I_NUMBER '/' I_NUMBER '/' I_NUMBER '\'' ;

unary

: ('-'| '+' )* term ;

mult

: unary (('*'| '/'| 'MOD') unary)* ;

: mult (('+'| '-') mult)* ; relation : add ( '=' add | '<>' add | '>' add | '>=' add | '<' add | '<=' add)* ; logic : relation ( 'AND' relation | 'OR' relation | 'IN' relation

| 'NOT IN' relation)* ;

expression

: logic ('.' (IDENTIFIER| STAR))? ('AS' IDENTIFIER)? (',' logic ('.' (IDENTIFIER | STAR))? ('AS' IDENTIFIER)?)* ;

STAR: '*';

IDENTIFIER: ('_'|'a'..'z'|'A'..'Z')+ (I_NUMBER)?; WS: (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;}; I_NUMBER: '0'..'9'+;

In document Supporting Uncertainty in Standard Database Management Systems (Page 86-111)