P
Formula Methods in Excel
Optimising calculations in Excel workbooks
This Excel formula manual is suitable for Excel users of all levels. Rather than just focus on individual functions and formula methods, this course takes a deeper look at how Excel evaluates formulae, and focuses on the most efficient methods available.Jon von der Heyden 3/23/2011
Formula Methods in Excel © Jon von der Heyden 2011 Page 1
System Requirements
At the time of writing the latest version of Microsoft Excel for Windows is office version 14, Excel 2010. This document is written specifically for office versions for Windows PC.
Formula Methods in Excel © Jon von der Heyden 2011 Page 2
About Excel Design Solutions
There has been much debate amongst some of the professionals that frequent the Excel forums on what makes a true Excel modeller/developer. Some suggest that having a thorough knowledge of Excels rich features and functionality is unnecessary, favouring business skills and experience. Some even suggest having little or no VBA programming experience is ok too. Some, on the other hand, have suggested that all one requires is the technical skills and experience, and that it is down to the client to communicate the requirements.
We at Excel Design Solutions believe that a true Excel professional modeller/developer must have exceptional technical Excel knowledge and have exceptional business acumen. That is why you will find that each of our consultants participate at the various Excel web forums on an endless quest to improve our knowledge by addressing other users and developers challenges. Each of our consultants have worked in business for many years and established themselves as business experts in their chosen fields. In fact, forum participation and a back-bone in business is a requirement to any individual seeking opportunities within Excel Design Solutions.
We don’t have a large employee base. Whilst we do work directly on projects we do also seek and
approach known Excel and business experts to collaborate in our assignments on a per project basis. Being so directly involved in the forums and the Excel community we have established relationships with the best in the field and we collaborate with these individuals on an as-need basis.
For more information on what Excel Design Solutions can do for you, or to get in touch with someone at Excel Design Solutions, visit the website: www.exceldesignsolutions.com
Formula Methods in Excel © Jon von der Heyden 2011 Page 3
About Jon von der Heyden (The Author)
Jon is one of the co-founders of Excel Design Solutions, founded in 2007. He has over ten years’ experience in finance analysis and commercial management positions. His speciality is management accounting and he relishes complex financial modelling assignments. Jon initially pursued a career in IT, having studied web-design and E-commerce, but was later ‘nudged’ toward finance when working for a large UK telecoms company back in 2000. Although not a qualified management accountant, this subject interests Jon most and he has spent much time tutoring many CIMA graduates by teaching the practical applications of the many management accounting methodologies using Excel.
Jon has spent much of his years working on reorganisation projects as a senior analyst. He specialises in cost analysis, activity-based costing and cost improvement. Achieving cost improvement has often lead Jon into the various business operations giving Jon valuable insight into the business functions. Process
improvement and automation has been the key to Jon’s successes. Jon has also been involved in plenty of other projects including outsource, supply chain management and revenue generating projects.
Jon’s most recent experience as a company employee was working in shared services for an international multi-conglomerate where he acquired 5 years international experience controlling cost opportunity projects and playing an integral role in the implementation of the shared services global product catalogue and efficiencies in service delivery and financial planning.
Formula Methods in Excel © Jon von der Heyden 2011 Page 4
Table of Contents
System Requirements ... 1
About Excel Design Solutions ... 2
About Jon von der Heyden (The Author)... 3
Index of Tables ... 8
Introduction ... 10
1. Back to Basics ... 11
Basic Anatomy of an Expression ... 11
Translating an Expression into an Excel Formula ... 11
Statistical Notations and Worksheet Functions ... 12
(Capital) Sigma, ... 12
SUM and SUMPRODUCT ... 12
X bar, ... 13
AVERAGE ... 13
Introduction to Excel Formula ... 14
Basic Anatomy of an Excel Formula ... 14
Operators ... 14
Calculation Order and Operator Precedence ... 16
Cell Referencing ... 16 3-D References ... 17 Union References ... 17 Intersecting Ranges ... 17 Reference Notation ... 18 Defined Names ... 18
Array (CSE) formulae ... 19
Array Constants ... 21
2. How the Excel Recalculation Engine Works ... 22
Dependency Trees ... 22
Volatile Functions ... 23
Events that Trigger Recalculation ... 24
Calculation Methods... 24
3. Data Types, Interpretation and Precision ... 25
Data Types ... 25
Numbers ... 25
Formula Methods in Excel © Jon von der Heyden 2011 Page 5
Errors ... 25
Text ... 26
Floating Point-Precision ... 26
Loss of Precision When Using Very Large Numbers ... 26
Loss of Precision When Using Very Small Numbers ... 27
Boolean Logic ... 27
Coercion ... 27
AND Logic ... 28
OR Logic ... 28
Date and Time Values ... 29
4. Introducing Worksheet Functions ... 30
Data Type Conformity... 30
Nested Worksheet Functions ... 33
Optional Arguments ... 33
Logical and Information Functions ... 34
AND() ... 34 OR() ... 34 NOT() ... 34 ISBLANK() ... 35 ISNA() ... 35 IF() ... 35 Lookup Functions... 36 LOOKUP() ... 36 MATCH() ... 36 VLOOKUP() ... 36 HLOOKUP()... 36 INDEX() ... 37 CHOOSE() ... 38
Further Lookup Tips ... 38
Binary Search versus Linear Search ... 39
Math and Statistical Functions ... 40
ROUND() ... 40
MROUND() ... 40
ROUNDUP() ... 40
Formula Methods in Excel © Jon von der Heyden 2011 Page 6 ROUNDDOWN() ... 41 FLOOR() ... 41 INT() ... 41 MOD() ... 41 MAX() ... 42 MIN() ... 42 LARGE() ... 42 SMALL() ... 43 SUMPRODUCT() ... 43 COUNTIF() ... 44 SUMIF() ... 45 COUNTIFS() ... 48 SUMIFS() ... 48 Text Functions ... 48 TRIM() ... 48 LEN() ... 49 REPLACE() ... 49 SUBSTITUTE() ... 49 MID() ... 49 LEFT() ... 49 RIGHT()... 49 FIND() ... 50 SEARCH() ... 50 EXACT() ... 50 Date Functions ... 50 DATE() ... 50 EDATE() ... 51 EOMONTH() ... 51 DATEDIF() ... 52 WEEKNUM() ... 52 NETWORKDAYS() ... 52 WORKDAY()... 52 Database Functions ... 52 DSUM() ... 53 DAVERAGE() ... 53
Formula Methods in Excel © Jon von der Heyden 2011 Page 7
DCOUNT() ... 53
DGET() ... 53
DMAX() ... 53
DMIN() ... 53
Database Function Examples ... 54
5. Dynamic Named Ranges ... 56
When to Use Dynamic Named Ranges ... 56
One-Dimensional Dynamic Range ... 56
Dynamic Ranges – Numbers Only ... 57
Dynamic Ranges – Text Only ... 57
Multi-Dimensional Dynamic Ranges ... 57
6. Using Tables ... 58
7. Auditing Formula ... 59
8. Funky formulae ... 62
Get the Month Number of a Financial Year ... 62
Get the Week Number of a Financial Year ... 62
Repeat Each Item in a Table n Times ... 62
Repeat a Table n Times ... 63
Get the nth Element from a String based on a given Delimiter ... 63
3-Dimensional SUMIF ... 63
Multi-Criteria Lookups ... 63
Vlookup returning Multiple Results... 64
Variable Discounting using Differential Rates ... 64
Extract Numbers from an Alpha-numeric String ... 64
Extract a Date from a Text String ... 64
Calculate the Last Used Row in a Column (useful for Dynamic Ranges) ... 65
Locate a Break-Even Point ... 65
9. Shortcuts ... 66
Control Keys ... 66
Function Keys... 68
Formula Methods in Excel © Jon von der Heyden 2011 Page 8
Index of Tables
Table 1-1 Summing the X and Y values separately ... 12
Table 1-2 Summing the XY products ... 12
Table 1-3 Summing the X and Y values separately using SUM ... 13
Table 1-4 Summing the XY products using SUMPRODUCT ... 13
Table 1-5 Arithmetic Operators ... 14
Table 1-6 Comparison Operators ... 15
Table 1-7 Text Operators ... 15
Table 1-8 Reference Operators ... 15
Table 1-9 Wildcard Operators ... 15
Table 1-10 Operator Precedence ... 16
Table 1-11 Using parenthesis to change calculation order ... 16
Table 1-12 Aggregating unioned references ... 17
Table 1-13 Aggregating intersecting references ... 17
Table 1-14 R1C1 Notation ... 18
Table 1-15 Demonstrating name scope recognition ... 19
Table 1-16 Aggregating an Inline Array Constant ... 19
Table 1-17 Aggregating an Array ... 20
Table 1-18 An Array Entered Formula ... 20
Table 2-1 List of Strictly Volatile Functions ... 23
Table 2-2 Recalculation Event Triggers ... 24
Table 3-1 List of error types ... 26
Table 3-2 Example loss of precision when using very large numbers ... 26
Table 3-3 Example loss of precision when using very small numbers... 27
Table 3-4 Coercing boolean values to digital values ... 27
Table 3-5 Coercing an array of boolean values to an array of digital values ... 28
Table 3-6 AND Logic Truth Table ... 28
Table 3-7 OR Logic Truth Table ... 28
Table 4-1 Basic anatomy of a worksheet function ... 30
Table 4-2 Demonstrating the distinct advantage of using SUM over a classic addition expression ... 31
Table 4-3 VLOOKUP, exact match and approximate match syntax ... 32
Table 4-4 Demonstrating nested worksheet functions within a formula ... 33
Table 4-5 Boolean logic, multiplying logical tests to avoid function calls and evaluation steps. ... 35
Table 4-6 Performing a right-to-left lookup with INEX and MATCH ... 37
Table 4-7 Yielding an intersecting range using INDEX ... 37
Table 4-8 Yielding a range using INDEX to return a range operand ... 38
Table 4-9 Handling lookup error values ... 39
Table 4-10 Rounding to the nearest desired multiple using ROUND ... 40
Table 4-11 Rounding up to the nearest desired multiple using CEILING ... 40
Table 4-12 Extracting a date from a date and time stamp ... 41
Table 4-13 Extracting the time from a date and time stamp ... 41
Table 4-14 Summing the nth item in an array using MOD; a stepped approach ... 42
Table 4-15 Using MIN and MAX to avoid IF function calls ... 42
Table 4-16 Summing the top n values in an array using SUM and LARGE ... 43
Table 4-17 Sum or Count a range using multiple criteria with SUMPRODUCT ... 44
Formula Methods in Excel © Jon von der Heyden 2011 Page 9
Table 4-19 Sum values in a range based on multiple criteria in the same criteria range ... 45
Table 4-20 Summing values that correspond to empty cells using SUMIF ... 46
Table 4-21 Summing cells that correspond to non-empty cells using SUMIF ... 47
Table 4-22 Sum values between two dates using SUMIF ... 47
Table 4-23 Offsetting the sum range in SUMIF ... 47
Table 4-24 Dropping leading characters with MID and REPLACE ... 49
Table 4-25 Return a serial date exactly n months before or after a specified date ... 51
Table 4-26 Return the 1st and last day of the month of a given date... 51
Table 4-27 DATEDIF interval values ... 52
Table 4-28 Aggregating results with D Functions with a single criterion ... 54
Table 4-29 Aggregating results with D Functions using multiple criteria (OR logic) ... 54
Table 4-30 Aggregating results with D Functions using multiple criteria (AND logic) ... 55
Table 5-1 Dynamic Table of Holiday Dates ... 56
Formula Methods in Excel © Jon von der Heyden 2011 Page 10
Introduction
This material really is intended for anybody. Even the more advanced users are unlikely to know 60% of this material.
The only two mandatory criteria in candidates are: He or she must want to learn Excel. He or she must really want to learn Excel!
This material focuses on formulae methods exclusively. Why? Because this is where 90% (or more) of models go wrong! formulae are probably the single most powerful feature Excel offers and on which outputs are most heavily dependent on.
And let’s face it…Excel is huge! You could spend 2 hours a day studying Excel for a year and you still won’t scratch the surface.
All studying Excel has ever done for me is reveal how much more there is to explore, and give me a hunger to learn more.
This material starts with a gentle stroll as we explore some of the basics of formulae and understand how Excel interprets formulae and computes the results. By the end we will be exploring complex expressions, nesting functions, using array formulae, names, dynamic ranges, tables and all sorts of other exciting stuff! For now, let us just assume EXCEL CAN DO ANYTHING (except make toast!).
Formula Methods in Excel © Jon von der Heyden 2011 Page 11
1. Back to Basics
Let us start by asking, what is a formula? A formula, in Excel, is an expression entered into a range or name that is recognised by Excel such that it can be processed by its’ calculation engine to produce a result.
Basic Anatomy of an Expression
Example: 3X2 - 4X + 5XY + 3X
Term: There are four terms in the given expression. They are, respectively, 3X2; -4X; 5XY; 3X Sign: The sign of a term is whether it is positive or negative. Only the second of these four terms is
negative. When we write a positive term on its’ own we don’t bother to write the ‘+’ sign before it. Term Type: This refers only to the part of the term that is written in letters. Thus, the first term of
this expression is an ‘X-squared’ term, the second term is an ‘X’ term, the third an ‘XY’ term and the fourth and last is an ‘X’ term.
Coefficients: The coefficient of a term is the number at the front of it. The coefficient tells us how
many of each term type there are.
Like Term: When term types are the same they are known to be ‘like terms’. In this example ‘-4X’ and
‘3X’ are ‘like terms’. The phrase ‘collecting like terms’ refers to the process of putting like terms together into a single term. For example, collecting ‘-4X’ and ‘3X’ can be represented in a single term ‘-X’ (note the exclusion of the coefficient 1, which is always assumed to be 1 when omitted).
Translating an Expression into an Excel Formula
Example: = 3*A1^2 – 4*A1 + 5*A1*B1 + 3*A1
In this example we have substituted the letter ‘X’ for reference A1, and the letter ‘Y’ for cell reference B1. The only way to tell Excel that an entry in a cell is an expression, and that it is to be passed to its’
calculation engine for processing, is to prefix the expression with an equals symbol or unary symbol. The former is more commonly used and recommended.
Excel demands that we be much more explicit when describing an expression. For instance, we know from the previous example that the term ‘3X2’ means that there are 3 ‘X-squared’ terms. In Excel, we need to explicitly multiple the term three times, hence ‘3*X2’.
Formula Methods in Excel © Jon von der Heyden 2011 Page 12
Statistical Notations and Worksheet Functions
The use of the term ‘notation’ in the following context needs clarification. In statistics, notations might refer to symbols used to represent an instruction on how to process a term. Let us explore two common notations used in statistics.
(Capital) Sigma,
The first most common symbol in expressions is the Greek letter capital sigma, written as ‘’. This is not to be confused with the lower case Greek letter sigma ‘’, which is used to measure spread, called the
‘standard deviation’. The sigma we refer to, , is an instruction to add a set of numbers together. So, X means to ‘add together all of the X values’. Similarly, XY means ‘add together all of the XY products’. For example: X Y 0 -4 1 1 2 1 3 3 4 2 X = 0 + 1 + 2 +3 + 4 Y = -4 + 1 + 1 + 3 + 2 = 10 = 3
Table 1-1 Summing the X and Y values separately
To find XY, it is necessary to calculate all of the five separate products of X times Y and then add them together, thus; X Y XY 0 -4 0 1 1 1 2 1 2 3 3 9 4 2 8 XY = 0 + 1 + 2 + 9 + 8 = 20
Table 1-2 Summing the XY products
SUM and SUMPRODUCT
The notations used in expressions are not available to us in Excel formula, that is, Excel does cannot interpret these symbols and the anatomy of these expressions. Instead, we pass instruction to Excel using Worksheet Functions.
Formula Methods in Excel © Jon von der Heyden 2011 Page 13 The instruction X, meaning ‘add together all of the X values’, is passed to Excel using the ‘SUM’ worksheet function. For example:
A B 1 X Y 2 0 -4 3 1 1 4 2 1 5 3 3 6 4 2
7 X = SUM(A2:A6) Y = SUM(B2:B6)
8 = 10 = 3
Table 1-3 Summing the X and Y values separately using SUM
The instruction XY, meaning ‘add together all of the XY products’, is passed to Excel using the ‘SUMPRODUCT’ worksheet function, thus;
A B C 1 X Y XY 2 0 -4 0 3 1 1 1 4 2 1 2 5 3 3 9 6 4 2 8 7 XY = SUMPRODUCT(A2:A6,B2:B6) 8 = 20
Table 1-4 Summing the XY products using SUMPRODUCT
Note that we do not need to make any reference to column C.
X bar, ̅
Perhaps the second most common symbol in expressions is the ‘X bar’, represented by the symbol ‘ ̅’. This refers to the mean of the X values. A ‘mean’ is the most common form of average, where one adds up the X values and divide it by the count of the X values; thus can also represented by the following expression:
̅ =
AVERAGEAgain, Excel is not able to interpret the X bar symbol in an expression. Instead we need to pass the instruction to Excel using the ‘AVERAGE’ worksheet function. Using the preceding examples, the instruction to calculate the mean of the X values can be passed using the following expression: =AVERAGE(A2:A6)
Formula Methods in Excel © Jon von der Heyden 2011 Page 14
Introduction to Excel Formula
In the previous chapter we looked at expressions and how one would translate these expressions into syntax that Excel can interpret. We also introduced a few worksheet functions. Let us now explore the anatomy of a typical Excel formula, with an embedded worksheet function, using the appropriate Excel terminology.
Basic Anatomy of an Excel Formula
A formula can contain any or all of the following: [worksheet] functions, references, operators and constants.
Example: = ROUND(A1+A2,2) = ROUND(TotalSales,2)
Function: ROUND is a function used to round a number to n decimal points, in this example 2.
References: References include cell addresses and names. In the given formula A1, A2 and
TotalSales are all examples of references, with the latter being a name.
Operators: There are five categories of operators; arithmetic, comparison, text concatenation and
reference. In the given formula the + (plus) is an example of an arithmetic operator, and the 2nd = (equals) is an example of a comparison operator.
Constants: A constant is a value that is not calculated. Any value resulting from an expression is not
a constant. In the given formula the #2 is an example of a constant.
Operators
Operators specify the type of calculation that you want to perform on the elements of a formula. There is a default order in which calculations occur, generally following mathematical rules, but that can be changed using parenthesis.
ARITHMETIC OPERATOR MEANING EXAMPLE
+ (plus) Addition = 3+3
- (minus) Subtraction = 5-4
* (asterisk) Multiplication = 10*10
/ (forward slash) Division = 10/2
% (percent) Percent = 50%
^ (caret) Exponentiation = 2^2
Table 1-5 Arithmetic Operators
Formula Methods in Excel © Jon von der Heyden 2011 Page 15
COMPARISON OPERATOR MEANING EXAMPLE
= Equal to = A1=B1
> Greater than = A1>B1
< Less than = A1<B1
>= Greater than or equal to = A1>=B1
<= Less than or equal to = A1<=B1
<> Not equal to = A1<>B1
Table 1-6 Comparison Operators
Comparison operators always yields a logical data type result (i.e. TRUE or FALSE).
TEXT OPERATOR MEANING EXAMPLE
& (ampersand) Concatenates two operands = A1 & B1
Table 1-7 Text Operators
The text operator always yields a string data type result, even if the operands are numerical values.
REFERENCE OPERATOR MEANING EXAMPLE
: (colon) Range operator, producing a single reference of all cells contained within each given reference.
= A1:B10 , (comma) Union operator, combining multiple references into a
single reference.
= A1:A10,C1:C10 (space) Intersection operator, producing a reference of cells
common to two given references.
= B7:D7 C6:C8
Table 1-8 Reference Operators
Reference operators always yield a range data type result, specifically a range object.
WILDCARD OPERATOR MEANING EXAMPLE
* (asterisk) Matches any number of characters. =COUNTIF(A1,”*XYZ*”) ? (question mark) Matches any single character. =COUNTIF(A1,”?” & “XYZ”) ~ (tilde) Matches the literal trailing character. =COUNTIF(A1,”~*”)
Table 1-9 Wildcard Operators
Formula Methods in Excel © Jon von der Heyden 2011 Page 16
Calculation Order and Operator Precedence
It probably comes as no surprise to learn that Excel calculates formulae in a very specific order. A formula in Excel always begins with an equal sign (=). Following the equal sign are the elements (operands) to be calculated, such as constants or references. These are separated by calculation operators. Excel calculates the formula from left to right, according to a specific order for each operator in the formula.
RANK OPERATOR DESCRIPTION
1 : (colon) Reference operators
(space) , (comma)
2 - Negation (e.g. -1)
3 % Percent
4 ^ Exponentiation
5 * and / Multiplication and division
6 + and - Addition and Subtraction
7 & Concatenation 8 = Comparison <> <= >= <>
Table 1-10 Operator Precedence
To change the order of calculation, enclose the part of the formula to be calculated first in parenthesis. EXAMPLE EXPRESSION
= 5+5*2 = (5+5)*2
= 5+(5*2) = 10*2
= 5+10 =20
= 15
Table 1-11 Using parenthesis to change calculation order
Cell Referencing
Relative References: A relative reference in a formula, such as A1, is based on the relative position of the cell that contains the formula and cell that the reference refers to. If the cell position of the formula changes then the cell referenced by the formula will change relatively too.
Absolute References: An absolute reference in a formula, such as $A$1, always refers to a cell in a specific location. If the cell position of the formula changes then the cell referenced by the formula will not change. In A1 notation column and row references are flagged as absolute by prefixing the column and row with the $ (dollar) symbol, also referred to as an anchor.
Mixed References: A mixed reference has either an absolute column and a relative row, or an absolute row and a relative column. What this means, essentially, is that either only a column or a row is anchored. $A1 tells us that the column reference, A, will not change when this formula cell changes in position. The row reference however will change relative to the position. Conversely, A$1, tells us that the row reference, 1, will not change when this formula cell changes in position. The column, however, has not been anchored and will change relatively.
Formula Methods in Excel © Jon von der Heyden 2011 Page 17
3-D References
Harnessing multiple sheets in your calculations can be used in such a manner that they introduce to us a 3rd dimension. Use a 3-D reference if you wish to analyse the same cell, or range of cells, on multiple
worksheets in a workbook.
Example: =SUM(Sheet1:Sheet5!A1:A10)
In this example values housed in A1:A10, within all sheets positioned between and including Sheet1 and Sheet 5, are summed up to yield a result.
Union References
Union references, i.e. cell references separated with the comma (,) separator, allow us to create references to non-contiguous ranges. A B C 1 1 2 3 2 2 3 4 3 3 4 5 4 =SUM(A1,B2,C3)
Table 1-12 Aggregating unioned references
Intersecting Ranges
You can aggregate values from an intersection of two range references. In other words, only the components that falls within both range references is taken into account.
A B C D
1 NWE SWE NEE
2 Sales 2800 1400 1800
3 COGS 1100 750 950
4 Gross Margin 1700 650 850
5
6 =NWE COGS
Formula Methods in Excel © Jon von der Heyden 2011 Page 18
Reference Notation
In Excel, references conform to one of two notations, namely A1 reference style or R1C1 reference style. The former is the default but either is acceptable.
A1 Notation: In A1 reference style columns are represented by letters A:IV (Excel 2003 and earlier versions) or A:XFD (Excel 2007 and subsequent versions). Rows are numbered . In this reference style columns and rows are anchored by suffixing the column or row reference with a $ (dollar symbol).
R1C1 Notation: In R1C1 reference style both columns and rows are numbered. Cell references are displayed in terms of their relationship to the cell that contains the formula rather than their actual position on the grid. Cells are referred to by relative notation. Relative references have numbers in square brackets.
REFERENCE MEANING
R[-2]C A mixed reference to the cell two rows up and in the same column. RC[-2] A mixed reference to the cell in the same row and two columns to the left. R[2]C[2] A relative reference to the cell two rows down and two columns to the right. R2C2 An absolute reference to a cell in the 2nd row and 2nd column (i.e. B2). R[-1] A relative reference to the entire row above the active cell.
C[-1] A relative reference to the entire column to the left of the active cell. R An absolute reference to the current row.
C An absolute reference to the current column. RC An absolute reference to the active cell.
Table 1-14 R1C1 Notation
Defined Names
You can create names to represent cells, ranges of cells, formulae, constants, array constants or Excel tables. A name is a meaningful shorthand that makes it easier to understand the purpose of a reference in a formula.
When to use names:
To represent cells, or ranges of cells, that will be frequently referenced in formulae, pivot tables and charts.
To house constants that will be frequently referenced in formulae.
To facilitate dynamic range references to be used in formulae, pivot tables and charts. Dynamic ranges are generated using formulae.
All names have a scope, either to a specific worksheet (referred to as local scope) or to the entire workbook (referred to as global scope). The scope of a name is the location within which the name is recognised without qualification. For example, if you have a name such as Budget_FY11, and its scope is Sheet1, that name, if not qualified, is recognised only in Sheet, but not in other sheets without qualification.
Formula Methods in Excel © Jon von der Heyden 2011 Page 19
NAME REFERS TO SCOPE
Test =“Sheet” Sheet1
Test =“Workbook” Workbook
FORMULA LOCATION RESULT
= Test Sheet1 Sheet
= Sheet1!Test Sheet1 Sheet
= Test Sheet2 Workbook
= Sheet1!Test Sheet2 Sheet
Table 1-15 Demonstrating name scope recognition
Array (CSE) formulae
An array formula can perform multiple calculations and then return either a single result or multiple results. Array formulae act on two or more sets of data known as array arguments. One creates array formulae in the same manner in which one produces normal formula, but the instruction to process the formula as an array formula is given by confirming the formula entry with Control+Shift+Enter. If done properly Excel encapsulates the formula in curly brackets {}. Do not attempt to manually type in the curly brackets. This form of formula is also commonly referred to as ‘CSE’ formula because of the need to commit it with Control+Shift+Enter.
The first type of array formula, i.e. the ones used to yield a single result, offers us endless possibilities, but unfortunately they are also known to add significant overhead to the calculation process. This is not always true, and in fact array formulae have received bad publicity, as in some manners of use actually can reduce the overhead in the calculation process. Best practise suggest that we use array formula in moderation and consider adopting a stepped approach as an alternative (i.e. using helper cells, columns and rows). But for the budding formula guru, I suggest experimenting with both array formula and classic methods using stepped approach and then note the changes in calculation times and draw your own conclusions on when it is acceptable, or not, to use array formulae. Sometimes practicality must prevail over efficiency, provided that the methods used are not grossly inefficient.
When we create a single result array formula we pass it an array of variable values or an array of constant values. The array on its own serves little purpose. Instead we have to pass an instruction to Excel on how to aggregate the array, typically using SUM, AVERAGE or COUNT.
FORMULA RESULT COMMENT
{={1;2;3;4;5;6;7;8;9;10}} 1 If you were to enter this formula in cell A1, and commit with CSE, Excel will yield a result of 1 (the first array item). To aggregate a result one must pass an instruction to Excel telling it what form of aggregation to apply to the items in the array.
{=SUM({1;2;3;4;5;6;7;8;9;10)} 55 Here the result is 55 because Excel has received an instruction to SUM each item in the array.
Formula Methods in Excel © Jon von der Heyden 2011 Page 20
FORMULA RESULT COMMENT
{=ROW(1:10)} 1 In this example Excel is told to yield an array of values associated with the given row numbers. Again this is rather pointless, unless the array is used for some form of
aggregation.
{=SUM(ROW(1:10))} 55 Excel yields a result of 55, the SUM of each item in the array.
Table 1-17 Aggregating an Array
The exhibit in table 1.16 demonstrates the syntax of an inline array constant array formula. When passing inline array constants Excel automatically recognises that it should treat the formula as an array formula. Therefore it is not necessary to explicitly pass instruction to Excel using CSE. Thus;
=SUM({1;2;3;4;5;6;7;8;9;10}) will yield the same result as; {=SUM({1;2;3;4;5;6;7;8;9;10)}
The exhibit in table 1.17 demonstrates the syntax of an array formula calling a variable array. This form of an array formula does require that we explicitly pass Excel an instruction to treat the formula as an array formula. However, the SUMPRODUCT function aggregates its results using array formula method and thus we are not explicitly required to instruct Excel to treat SUMPRODUCT like an array formula. When passing a single array of values to SUMPRODUCT, SUMPRODUCT can only yield a summation of those values. Thus; {=SUM(ROW(1:10))}
will yield the same result as; SUMPRODUCT(ROW(1:10))
The use of SUMPRODUCT in this context is recommended because it avoids someone inadvertently recommitting the formula without CSE. The LOOKUP and FREQUENCY function are also capable of processing arrays without CSE. An exception to this is when the TRANSPOSE function is used within an array formula argument.
The latter form of an array formula mentioned is the type that yields multiple results. This form is commonly referred to as an ‘array entered formula’. A typical example would be to explore the TRANSPOSE worksheet function.
TRANSPOSE is used to copy an array of values and yield a result of opposite orientation or dimension.
A B C 1 X Y Z 2 3 X {=TRANSPOSE(A1:C1)} 4 Y 5 Z
Table 1-18 An Array Entered Formula
In this example one would first select range A3:A5, then type the formula, and then commit with CSE. It is not necessary to anchor any of the references as none will move relatively. Excel knows to handle the range as an array of values. There are two effects of an array entered formula that one need be aware of:
Formula Methods in Excel © Jon von der Heyden 2011 Page 21 1. One cannot change a single element of the array (in this example A3:A5). The array needs to be
handled as a single entity, thus if changes are required one needs to select the entire range, enter the revised formula, and commit with CSE.
2. As a result of (1) above, one cannot delete a row or column that intersects an array entered formula range. In the above example one could delete column A because the entire array range is contained within that column. One cannot however delete row 3, 4 or 5 because each intersects with the array entered formula range. Deleting all rows 3:5 (in one hit) is permissible for the same reason that one can delete column A.
Array Constants
Array constants, that have had brief mention in the section above, are merely arrays that remain constant. Array constants can contain text, numbers, logical values or error values. Numbers, logical values and errors can be typed in as is. Text values must be enclosed in speech marks.
When you enter array constants make sure you: 1. Enclose them in curly brackets {}.
2. Denote column partitions with a comma (,). 3. Denote row partitions with a semi-colon (;). Example: {1,2;3,4}
This example demonstrates an array comprising of two rows and two columns.
Array constants can be entered in names or directly within formula. When entered directly into a formula they are referred to as inline array constants. Inline arrays and names arrays need to be treated as two separate animals:
NAMED ARRAY (not CSE entered) NAMED ARRAY (CSE entered) INLINE ARRAY CONSTANT
=SUM(myarray) =8 {=SUM(myarray)} =8 =SUM({3;5}) =8
=SUM(myarray)+1 =9 {=SUM(myarray)+1} =9 =SUM({3;5})+1 =9
=SUM(myarray+1) =4 {=SUM(myarray+1)} =10 =SUM({3;5}+1) =10
One is not required to CSE commit an array formula with an inline array constant, it is a given. But one must be cautious when referring to named arrays because the behaviour does not appear to be consistent. On first review it appears as though it is not necessary to CSE commit formula with named array references. However, look at the 3rd exhibit under ‘NAMED ARRAY (not CSE entered)’. This rendition does need to be CSE committed. Of course in this example the entire issue can be overcome by using SUMPRODUCT, but that’s not the point. The same issue would apply using other aggregate functions, such as AVERAGE. The recommendation here is, when in doubt use CSE to commit the formula.
Formula Methods in Excel © Jon von der Heyden 2011 Page 22
2. How the Excel Recalculation Engine Works
Excel uses a complex algorithm for choosing the fastest route and the minimum number of cells required to calculate a formula result. Excel’s recalculation engine normally optimises calculation time by tracking changes and only recalculating:
Cells, formula, values or names that have changed since the last calculation. Cells dependent on other cells, formulae, names or values that need recalculation. The exceptions to the statements above are:
Volatile functions are always calculated.
Full calculation (Control+Alt+F9) will force calculation of all formulae. Having more than 65536 dependencies causes full calculation to be invoked. Names that are not called anywhere in a worksheet are never calculated.
Names are calculated each time they are referenced by a formula that is recalculated.
Dependency Trees
Excel tracks changes since the last recalculation and builds dependency trees in an attempt to reduce calculation time. These prompt Excel to recalculate only:
Formulae that have changed. Names that have changed. Volatile functions.
Formulae dependent on changed or volatile formulae, names or cells.
Dependency trees are immediately updated whenever a formula is entered or changed. In Excel 2002 and later you can force Excel to rebuild the dependency trees by hitting Control+Alt+Shift+F9.
In complex formula-based models, Excel may spend considerable time and memory building and evaluating the dependency trees. In versions prior to Excel 2007 dependency trees will only store up to 65536
dependencies to unique references. Where complex formula-based models near that limit it is not unusual to find full calculation faster than recalculation.
Formula Methods in Excel © Jon von der Heyden 2011 Page 23 How do you know when you are exceeding the dependency tree limit?
The word ‘calculate’ persists in the status bar despite invoking recalculation. Note, ‘calculate’ will also display in the status bar when:
o Calculation option has been set to manual and the workbook contains uncalculated formulae.
o The iteration option is turned on and the workbook contains circular references. o You are using Excel 2007 or later and have set Workbook ForceFullCalculation to True. Changing a cell and tabbing to another cell takes a long time.
Dependency trees are categorised as follows: Within Sheet Dependency Trees Inter Sheet Dependency Trees Inter Workbook Dependency Trees
Formulae with references to other sheets are known to take longer to calculate. formulae with references to other workbooks are also known to take longer to calculate, sometimes quite significantly. One should always consider strongly whether or not to link to other workbooks, and perhaps favour storing the external data directly within the same workbook (e.g. by using a query table).
Volatile Functions
A volatile function is a worksheet function that Excel has determined must be recalculated at each recalculation, regardless of whether or not any of its precedents have changed.
A function is not always strictly volatile or non-volatile. Some functions behave in a volatile manner depending on the manner in which it is used. There are however a number of functions that are strictly volatile, namely:
FUNCTION COMMENT
RAND Generates a new random number each time recalculation is invoked.
NOW Returns the current date and time (from the system date and time) each time recalculation is invoked.
TODAY Returns the current date (from the system date) each time recalculation is invoked. OFFSET Returns a reference offset from a given reference.
CELL Returns information about the formatting, location, or contents of a cell. INDIRECT Returns a reference indicated by a text value.
INFO Returns information about the current operating environment.
Table 2-1 List of Strictly Volatile Functions
The SUMIF function can also behave in a volatile manner depending on the manner in which it is used.
VOLATILE NON-VOLATILE
Formula Methods in Excel © Jon von der Heyden 2011 Page 24 The differences between the two formulae referred to might not be so obvious. The volatile method does not explicitly reference the column B range, whilst the non-volatile method does.
Direct dependents of volatile functions are always recalculated. Indirect dependents of volatile functions are not always recalculated.
So when is it ok to call volatile functions? The basic rule is to avoid using volatile functions wherever possible. Use volatile functions:
In moderation… Using a couple of formulae that call volatile functions is not going to slow your calculation time considerably.
When there is no alternative; or the alternative will add significant overhead to the calculation.
Events that Trigger Recalculation
On the main part calculation is invoked when you change the value in a cell that has a dependent (assuming you are working in automatic calculation mode), or when you hit F9. There are however a number of other triggers that you need be aware of. The following table lists some of these triggers.
TRIGGER COMMENT
Autofilter Selecting any filter criteria will flag all of the formula in the autofilter range as uncalculated.
Clicking row or column divider Clicking a row or column divider will trigger recalculation. Manually changing the span of a row or column however will not trigger a recalculation.
Inserting or deleting rows, columns or cells
Any formulae that refer to other worksheets and any formula containing names that refers to other worksheets or to the current worksheet will become flagged as uncalculated. Any formulae that are referred to by formula in other worksheets will also become flagged as uncalculated. Renaming, deleting and
moving worksheets
Renaming worksheets, deleting worksheets and changing the position of a worksheet in a workbook will trigger recalculation.
Table 2-2 Recalculation Event Triggers
Calculation Methods
Normally Excel invokes recalculation when you change a cell value that has dependents. The calculation method this uses in recalculation.
Shortcuts for invoking calculation: Full Calculation: Control+Alt+F9 Recalculation: F9
Selected Sheet(s) Only: Shift+F9
Calculating an individual formula, array formula, or part thereof: Select the formula in the formula bar, or only the portion you want to evaluate, and hit F9. The formula or part of the formula is replaced by the result. For an array formula you will see an array of the results, which is a great way of debugging an array formula.
Formula Methods in Excel © Jon von der Heyden 2011 Page 25
3. Data Types, Interpretation and Precision
Whenever you type something into a cell, Excel needs to interpret that value so that (1) it knows how to process the value when it is called in a formula, and (2) so that it knows how much memory to allocate for the storage of that value. Data types not only apply to values typed into cells; any value yielded by a formula will be of a certain data type, even the values in names will be of a certain data type.
Data Types
There are a variety of different data types but we are going to group all of the various types into four categories; numbers, text, booleans (also referred to as logicals) and errors.
Data types define how the bytes of memory are used to hold the data, and what kind of data can be stored. Generally Excel determines the data type of a value, but we are given a relative amount of control over this. For instance, if you type 12345 into a cell, clearly Excel knows to treat this as a numeric value and thus Excel assigns this a number data type. However, if the cell is formatted as text, or you prefix the entry with an apostrophe, Excel will treat this as a text data type.
Numbers
When numbers are held in Excel that number is stored in eight bytes. It is the data type that also tells us that the number range at our disposal is finite. In addition to numbers that are obviously number data types, date and time values, although often represented textually, are also numbers. Unless specifically formatted otherwise, all number values will appear right aligned in a cell. It is suggested that you do not change the alignment of numbers in cells because it is a very good visual guide informing you whether or not a number is recognised as a number, or as a text value.
Booleans
A logical or boolean expression is one that evaluates to TRUE or FALSE. You can also manually type in boolean values directly into a cell, name or formula argument. Unless specifically formatted otherwise, all boolean values will appear centre aligned in a cell, and appear in uppercase.
Errors
Error values inform us when something has gone wrong! Although typically the result of a formula we can actually manually type in error values into a cell. Unless specifically formatted otherwise, all error values will appear centre aligned, appear in uppercase and be prefixed with the hash symbol (#).
Formula Methods in Excel © Jon von der Heyden 2011 Page 26
ERROR MEANS
#N/A Excel cannot find a lookup value within a specified lookup table. It is likely that: The lookup value does not exist within the lookup table.
The data type of the lookup value is not consistent with the entry in the lookup table. Your lookup value does not match the value in the lookup table. Check for leading
and trailing spaces.
#VALUE! Occurs when the wrong type of argument or operand is used. The error is most commonly yielded when attempting an arithmetical calculation using a text value.
#NAME? A function or name is not recognised. Usually the result of a typo. #DIV/0 Result of an attempt to divide a number by zero.
#NULL! Occurs when you specify an intersecting range which in fact does not intersect.
#REF! Result of an invalid reference in your formula. Occurs usually when you delete the physical reference, meaning that the reference in the formula has nothing to point to.
Table 3-1 List of error types
Text
Generally a catchall for all other values not identified as belonging to one of the already mentioned data types. Unless specifically formatted otherwise, all text values will appear left aligned. Text values are actually ordered values, in that a text value can be equal to, less than or greater than another text value. For instance, using a comparative expression =”A”>”Z” will yield FALSE. =”Z”>”A” will yield TRUE.
Floating Point-Precision
Excel was designed in accordance to the IEEE Standard for Binary Floating-Point Precision. This standard defines how floating-point numbers are stored and calculated. The advantage of using floating-point representation over fixed-point representation is that it can support a wider range of values. For example, a fixed-point representation that has seven decimal digits with two decimal places can represent the numbers 12345.67, 123.45, 1.23 and so on. Floating-point representation with seven decimal digits, however, can in addition represent 1.234567, 123456.7, 0.00001234567, 1234567000000000 and so on. The number of digits of precision limits the accuracy of numbers. For example, the number
1234567890123456 cannot be exactly represented if 15 digits of precision are used. Excel uses 15 digits of precision.
Loss of Precision When Using Very Large Numbers
A 1 1.2E+200
2 1E+100 3 = SUM(A1:A2) 4 = 1E+100
Table 3-2 Example loss of precision when using very large numbers
The resulting value in A3 is 1E+100, the same number in A2. At least 100 digits of precision would be required to accurately compute the result.
Formula Methods in Excel © Jon von der Heyden 2011 Page 27
Loss of Precision When Using Very Small Numbers
A 1 0.000123456789012345
2 1
3 = SUM(A1:A2) 4 = 1.00012345678901
Table 3-3 Example loss of precision when using very small numbers
The resulting value in A3 is 1.00012345678901 instead of 1.000123456789012345. At least 19 digits of precision would be required to accurately compute the result.
Boolean Logic
Many users are already aware that boolean values can be represented with digital values. In Excel we can pass numerical values to logical function arguments and we can pass boolean values in expressions to be computed as digital values. The process Excel undergoes to convert boolean values to digital values, and vice versus, is referred to as coercion.
Coercion
In Excel, we can numerical values in formula to represent boolean values. Excel will recognise zero as FALSE and any non-zero number as TRUE. There is no explicit instruction needed to tell Excel to coerce zero to FALSE and a non-zero number to TRUE, it is a given when any such number is passed to a logical argument. Coercing a boolean to a digital value will represent FALSE as zero (unchanged) and TRUE as one. We do however need to be explicit when coercing a boolean to a digital value. A boolean is coerced to a digital value when it is used as an operand in an arithmetical expression. To yield the representative digital value we use an expression that will not change the numeric value of the digital value equivalent.
A B C D E F G
1 DIGITAL VALUE EXPRESSION RESULT BOOLEAN VALUE EXPRESSION RESULT
2 1 =--A2 1 TRUE =--E2 1
3 0 =A3+0 0 FALSE =E3+0 0
4 1 =A4-0 1 TRUE =E4-0 1
5 0 =A5*1 0 FALSE =E5*1 0
6 1 =A6/1 1 TRUE =E6/1 1
7 0 =A7^1 0 FALSE =E7^1 0
Table 3-4 Coercing boolean values to digital values
It is widely believed that using double negation (--) is the most optimised coercion method, because double negation appears first in the order of evaluation.
This method can also be used to coerce an entire array of values. For instance, assume you have a comparative expression over an array of values, the next table illustrates.
Formula Methods in Excel © Jon von der Heyden 2011 Page 28
A B C
1 VALUES EXPRESSION RESULT
2 A =SUMPRODUCT(--(A2:A7="a")) 2 3 B Step 0.1 =SUMPRODUCT(--({"a";"b";"c";"a";"b";"c"}="a")) 4 C Step 1.0 =SUMPRODUCT(--({TRUE;FALSE;FALSE;TRUE;FALSE;FALSE})) 5 A Step 1.1 =SUMPRODUCT(-({-1;0;0;-1;0;0})) 6 B Step 2.0 =SUMPRODUCT({1;0;0;1;0;0}) 7 C Step 3.0 (RESULT) =2
Table 3-5 Coercing an array of boolean values to an array of digital values
AND Logic
AND logic yields TRUE when all comparative statements evaluate to TRUE. If any comparison evaluates to FALSE then AND logic dictates that the result must be FALSE. Multiplying comparative results with each other also serves as AND logic.
BOOLEAN VALUES DIGITAL VALUES
CONDITION A CONDITION B A AND B CONDITION A CONDITION B A x B
FALSE FALSE FALSE 0 0 0
FALSE TRUE FALSE 0 1 0
TRUE FALSE FALSE 1 0 0
TRUE TRUE TRUE 1 1 1
Table 3-6 AND Logic Truth Table
OR Logic
OR logic yields TRUE when any one comparative statement of many yields TRUE. Adding comparative results with each other also serves as OR logic.
BOOLEAN VALUES DIGITAL VALUES
CONDITION A CONDITION B A OR B CONDITION A CONDITION B A + B
FALSE FALSE FALSE 0 0 0
FALSE TRUE TRUE 0 1 1
TRUE FALSE TRUE 1 0 1
TRUE TRUE TRUE 1 1 2
Formula Methods in Excel © Jon von der Heyden 2011 Page 29
Date and Time Values
Excel stores dates as a number representing the number of days since 0 January 1900, and times as a fraction of a 24 hour day. These are referred to a serial dates and times. It is cell formatting that provides textual representation, but essentially dates are whole numbers and times are decimal values. Knowing that dates are numeric values allows us to handle date and time values constructively in formulae. For instance, the 4th of April 2010 has a numeric value of 40272. This is said because 40272 days have elapsed since 0 January 1900. This result is actually overstated because Excel interprets the year 1900 as a leap year (29 days in February); which it was not. For this reason, Excel allows us to switch to a different base, the 1904 data system. Here dates commence 0 January 1904. Whilst this system is theoretically more accurate, it is best to avoid using it. The 1900 date system allows greater compatibility with other systems.
The time value 18H42 has a numeric value of 0.779166666666667. This can be validated using the following equation:
Formula Methods in Excel © Jon von der Heyden 2011 Page 30
4. Introducing Worksheet Functions
Worksheet functions allow us to pass instruction to Excel on how to evaluate terms, and as such, a strict convention applies.
1. Firstly Excel needs to determine whether or not an entry into a range, or name, is an expression. This is assumed to be true:
1.1. When the entry / expression is prefixed with an equals ‘=’ symbol or unary symbol such as plus ‘+’ or minus ‘-‘.
AND;
1.2. If entered into a range and the range is not text formatted.
2. Excel splits the expression into the individual terms. It then analyses each term for a worksheet
function by cross-referencing each whole word in the term against its function library. Note it does not assess words encapsulated in speech marks.
3. Most worksheet functions take arguments, parameters or inputs if you like. These arguments are contained within parenthesis. Therefore Excel always expects a worksheet function name to be suffixed with parenthesis. If the worksheet function takes arguments then these inputs must be contained within the parenthesis. The parenthesis must still be present even if the worksheet function does not take any arguments. If parenthesis is missing Excel will assume the component to be a name. 4. Where a worksheet function takes more than one argument (within the parenthesis), the arguments
must be separated by a comma delimiter (note the actual delimiter depends on regional settings – it is common to find arguments semi-colon delimited on the European continent).
Excel knows to send this expression to the calculation engine because it is prefixed with an equals symbol Excel recognises this worksheet function because it appears in the function library. Opening parenthesis.
The arguments are contained within parenthesis. Closing parenthesis. The first argument ; namely the X values. A comma separates the arguments. The second argument; namely the Y values. = SUMPRODUCT ( A2:A6 , B2:B6 )
Table 4-1 Basic anatomy of a worksheet function
Data Type Conformity
All worksheet functions are configured to yield a result conforming to a certain data type. Those that don’t are said to yield a variant data type. Similarly the values passed to the function arguments are also
Formula Methods in Excel © Jon von der Heyden 2011 Page 31 Taking this further, it comes as no surprise that the data type yielded by the SUM function is a number data type. It will also come as no surprise that the data types that SUM expects within its’ arguments should also be a number data type. But now bear in mind that certain worksheet functions are capable of processing arrays. An array, simply put, is a series of values. For example:
A 1 X 2 0 3 1 4 2 5 3 6 4 7 = SUM(A2:A6) 8 = 10
Here the instruction to Excel is to sum each value within the range A2:A6. In reality all that is happening in the background is that Excel is using this range to load values into an array. We can actually pass an array directly to the SUM function argument. For example:
=SUM({1;2;3;4;5;6})
Here an array is qualified because the values are entered within curly parenthesis, specifically an inline
array constant. In the example of SUM, we have already mentioned that Excel worksheet function expects
the function arguments to conform to a predefined data type, and that the SUM function expects us to pass numerical values. So, when passing an array we should try to ensure that each array item (i.e. each value) conforms to the expected data type. This same rule applies to values contained within a range, where that range is passed to the function argument.
In actual fact, the SUM function is very forgiving. If we include a text value within the array that it evaluates, SUM merely treats the text value as zero. This gives SUM a distinct advantage over using a classic addition expression.
A B 1 X 2 0 3 1 4 Y 5 3 6 4 7 = SUM(A2:A6) = A2 + A3 + A4 + A5 + A6 8 = 8 = #VALUE!
Table 4-2 Demonstrating the distinct advantage of using SUM over a classic addition expression
The formula entered in B7 in figure 4.2 yields an error result. The #VALUE! error in this instance indicates the presence of a non-numerical value . Excel cannot add the text value in A4 to the addition of A2 and A3, hence each evaluation step beyond this point yields an error value.
So far we have only briefly touched and explored the SUM and SUMPRODUCT functions. Currently, in Excel 2010, there are 331 common worksheet functions. This does not take into account additional worksheet functions at your disposal through addins and other external sources. To explore argument data type conformity we need to choose a different function. Let us explore a common favourite, VLOOKUP:
Formula Methods in Excel © Jon von der Heyden 2011 Page 32 A B 1 X Y 2 A 10 3 B 100 4 C 1000 5 6 A D 7
8 = VLOOKUP(A6,A2:B4,2,FALSE) = VLOOKUP(B6,A2:B4,2,TRUE)
9 = 10 = 1000
Table 4-3 VLOOKUP, exact match and approximate match syntax
We won’t explore the VLOOKUP function in much depth now; that comes later. What is demonstrated here is data type conformity in the function arguments.
The first argument expects a lookup value, i.e. the value sought in the table. In this example we are looking for the value “a” in the lookup table. In the case of VLOOKUP, the lookup value can be numeric, text or a logical value (essentially a variant data type). It would be rather futile to pass an array or inline array constant to this first argument because VLOOKUP expects a single value, and only the first array item will be taken into account.
The second argument represents the table that the lookup value is sought within, and that the return value is contained within. VLOOKUP searches for the lookup value (i.e. the 1st argument) within the left-most column of the table. In our example our table is contained within a range, but it need not be. We could represent the table using an inline array constant, for instance:
=VLOOKUP(“a”;{“a”,10;”b”,100;”c”,1000},2,FALSE)
Notice that the inline array constant contains both comma and semi-colon separators. The comma represents a column partition and the semi-colon represents a row partition. So we can conclude that this inline array contains two columns and three rows, just as range A2:B4 is made up of two columns and three rows. Again this argument can take a variant data type, however VLOOKUP will always yield an error unless this argument is either a range or an array.
The third argument indicates which column index to yield a value from, assuming the lookup value is found in the left-most column of the table. In this example the #2 refers to column B of the table. This argument can only accept an integer value. Excel wouldn’t know how to interpret a text string.
VLOOKUPs’ fourth and final argument is used to instruct Excel whether or not it should seek an exact match, or an approximate match. This can only ever be TRUE or FALSE, in other words a boolean value. So what happens if we pass anything other than a boolean? If you enter a text value you can expect to receive a #VALUE! error. Excel doesn’t have a mechanism for coercing a text value to a boolean value. You can however pass a numeric value. It is not uncommon to see this argument expressed as 1 or 0 (zero). Excel will resolve the number to a boolean, meaning that strict data type is still applied. The number zero can be used to represent FALSE, and any non-zero number can be used to represent TRUE.
Formula Methods in Excel © Jon von der Heyden 2011 Page 33
Nested Worksheet Functions
Next we address the topic of nested functions. Although worksheet function arguments need to conform to specific data types, this does not mean that we are restricted only to constant inputs or reference inputs. It is perfectly acceptable to nest a worksheet function, or any formula, within a function argument,
provided the result of that nested function conforms to the expected data type. Let us explore this in a little more depth:
A B C D
1 Period 1 Period 2 Period 3
2 Susan 0 0 350 3 Bob 0 252 125 4 Mary 600 600 600 5 James 125 0 250 6 7 Employee: Bob 8 Period: Period 2 9
10 Expense Claimed: = INDEX(B2:D5, MATCH(B7,A2:A5,0), MATCH(B8,B1:D1,0))
11 = 252
Table 4-4 Demonstrating nested worksheet functions within a formula
The INDEX worksheet function takes 3 arguments.
We pass a table or array to the first argument. In this example the table is in a range, specifically B2:D5, the expense values only.
The second argument tells Excel which Y coordinate, or row index, we want to return a value from. The third argument tells Excel which X coordinate, or column index, we want to return a value from. The intersection of the Y and X coordinate is the result of the INDEX formula.
In the table above, we use the MATCH worksheet function to yield the Y coordinate, or the position of the ‘Bob’ in A2:A5. We use the MATCH worksheet function to yield the X coordinate, or the position of ‘Period 2’ in B1:D1. The data type of the MATCH result can only be an integer or an error type (i.e. if no match is found then MATCH will yield #N/A).
Optional Arguments
As previously stated, not all worksheet functions take arguments. The TODAY worksheet function is a classic example. TODAY will always yield todays date by collecting the result from the system date. Of the functions that do rely on arguments occasionally some of these arguments are optional. An example of this can be observed with the VLOOKUP we explored earlier. The last argument, indicating whether or not an exact or approximate match is required, is optional. Where this is the case, and the argument is omitted in the formula, Excel will assume a default value. For example:
=VLOOKUP(“d”,{“a”,10;”b”,100;”c”,1000},2)
In this context Excel will assume that the omitted argument is TRUE, Excel is instructed to perform an approximate match. However;
Formula Methods in Excel © Jon von der Heyden 2011 Page 34 =VLOOKUP(“d”,{“a”,10;”b”,100;”c”,1000},2,)
In this context Excel will assume that the omitted argument is FALSE, Excel is instructed not to perform an approximate match. It might not be obvious, the only difference between the former and the latter is that the latter contains a comma after the column index argument, meaning that the fourth argument has not actually been omitted but that Excel has not been explicitly told what the argument value is.
It is considered best practise to be as explicit as possible when constructing your formula. Being explicit does not add any overhead to Excels’ calculations since omitted arguments will always revert to a default value. In fact, it has been suggested by some that being explicit reduces the overhead since Excel does not have to reference its’ library to establish the default value. Whether or not this is true the effects are so slight that they are difficult to substantiate.
Logical and Information Functions
Logical functions introduce decision making in Excel. They either yield TRUE of FALSE, or instruct Excel on how to arrive at a result if a condition is either TRUE or FALSE. Information functions answer specific questions and are usually prefixed with ‘IS’. In the context of this lesson we will only explore information functions that yield a TRUE or FALSE result.
AND()
Returns TRUE if all of its’ arguments are TRUE, otherwise yields FALSE. AND supports up to 30 logical arguments in Excel version 2003 and earlier, but up to 255 in later versions.
Syntax: AND(logical1, logical2, …)
OR()
Returns TRUE if any of its’ arguments are TRUE, returns FALSE if all of arguments are FALSE. OR supports up to 30 logical arguments in Excel version 2003 and earlier, but up to 255 in later versions.
Syntax: OR(logical1, logical2, …)
Use arrays when analysing a single cell: When using OR to test only one cell value, an inline array constant can offer a touch of micro-optimisation. For instance:
A B
1 Bob 2
3 =OR(A1=”Mary”,A1=”Bob”) This rendition involves 3 evaluation steps. 4 =OR(A1={“Mary”;”Bob”}) This rendition only involves two evaluation steps.
NOT()
Reverses the logic of its’ argument. Use NOT when you want to make sure a value is not equal to one particular value.
Formula Methods in Excel © Jon von der Heyden 2011 Page 35
ISBLANK()
Returns TRUE if the value is blank, otherwise returns FALSE. This function can mislead users. ISBLANK will yield FALSE when a value contains a null string, such as a formula configured to yield “”. You can also use the LEN function to determine if a value is empty, or contains a null string.
Syntax: ISBLANK(value)
ISNA()
Returns TRUE if a value is a #N/A error, otherwise returns FALSE. Use ISERROR() to test if a value is of any error type.
Syntax: ISNA(value)
IF()
Specifies a logical test to perform. Instructs Excel to yield a specific value if the 1st argument is TRUE, or another value if the 1st argument is FALSE.
Syntax: IF(logical_test, value_if_true,value_if_false)
When using IF, do not test if a comparative statement is TRUE or FALSE: How often do you see:
IF((value1 > value2)=TRUE, do_this, do_that)? The statement ‘something > something_else’ is a comparison statement and can only yield a TRUE or FALSE. Thus asking Excel to confirm that it is TRUE is an extra and entirely unnecessary evaluation step.
When using IF, do not explicitly ask Excel whether a value is zero or not: Because Excel recognises zero as FALSE, and any non-zero numeric value as TRUE, it is entirely unnecessary to pass this sort of comparison statement in IF. For instance, IF(value<> 0, do_this, do_that) can simply be
expressed as IF(value, do_this, do_that), saving an evaluation step.
Avoid IF in logical numerical tests: The tables below attempts to illustrate using boolean logic to avoid function calls and reduce the evaluation steps to yield a result.
A B C D
1
Pay 20% bonus on Revenue over 20K only where GP >= 30% 2
3 Profit Centre Revenue GP% Bonus
4 001589 26000 44% 1200
5 001523 19100 28% 0
6 001596 22000 28% 0
7 001508 11200 86% 0
Table 4-5 Boolean logic, multiplying logical tests to avoid function calls and evaluation steps.
The bonus in D4 can be calculated using a combination of IF() and AND():
IF(AND(B4>20000,C4>=0.3),(B4-20000)*0.2,0). This formula involves two function calls and 6 evaluation steps. The same result can be achieved using (B4>20000)*(B4-20000)*(C4>=0.3)*0.2, however this method involves no function calls with the same number of evaluation steps.
Formula Methods in Excel © Jon von der Heyden 2011 Page 36
Lookup Functions
Lookups are of the most frequently used functions in the Excel function library. Unfortunately though, they are often the most likely cause of slow calculations. Fortunately there are a number of ways to improve lookup calculation times.
LOOKUP()
The LOOKUP function takes two forms, Vector or Array. The array version searches for a specific item in an array, and returns a value from the same position in the last column or row of the array. If multiple matches exist, LOOKUP returns the last match. The array must be sorted in ascending order. Error values in the array are ignored.
Syntax: LOOKUP(lookup_val, array)
Lookup the LAST item in a table: Typical lookup functions match the first occurrence of an item in a table. Lookup will return the last match. Say you a table of values in A1:A10, and you wish to yield an adjacent value from B1:B10, but should more than one occurrence exist, grab the last match:
LOOKUP(1,1/(A1:A10=”lookup_value”),B1:B10)
MATCH()
The MATCH function searches for a specific item in a 1-dimensional array of values (e.g. range), and then returns the relative position of that item in the array.
Syntax: MATCH(lookup_value, lookup_array, match_type)
Match_type = 1 returns the largest match less than or equal to the lookup value if the lookup array is sorted in ascending order.
Match_type = 0 requests an exact match.
Match_type = -1 returns the smallest match greater than or equal to the lookup value if the lookup array is sorted in descending order.
VLOOKUP()
The VLOOKUP function searches for a specific item in the left-most column of an array of values, and then returns a value from the same row from the desired column in the array.
Syntax: VLOOKUP(lookup_value, lookup_array, col_index, match_type)
Match_type = TRUE (or any non-zero number) returns the largest match less than or equal to the lookup value. The array must be sorted in ascending order.
Match_type = FALSE (or zero) requests an exact match.
HLOOKUP()
The HLOOKUP function searches for a specific item in the top-most row of an array of values, and then returns a value from the same column from the desired row in the array.