• No results found

True False

The isinstance (obj, class) method

The isinstance() method is used to check the relationship between the objects and classes. It returns true if the first parameter, i.e., obj is the instance of the second parameter, i.e., class. Consider the following example.

Example 1. class Calculation1: 2. def Summation(self,a,b): 3. return a+b; 4. class Calculation2: 5. def Multiplication(self,a,b): 6. return a*b; 7. class Derived(Calculation1,Calculation2): 8. def Divide(self,a,b): 9. return a/b; 10. d = Derived() 11. print(isinstance(d,Derived)) Output: True Regular Expression 1.Introduction

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. Regular expressions are widely used in UNIX world.

The module re provides full support for Perl-like regular expressions in Python. The re module raises the exception re.error if an error occurs while compiling or using a regular expression. We would cover two important functions, which would be used to handle regular expressions. But a small thing first: There are various characters, which would have special meaning when they are used in regular expression. To avoid any confusion while dealing with regular expressions, we would use Raw Strings as r'expression'

2.Match Object

2.1class re.MatchObject

Match objects always have a boolean value of True.

Since match() and search() return None when there is no match, you can test whether there was a match with a simple if statement:

match = re.search(pattern, string) if match:

process(match)

Match objects support the following methods and attributes: 2.2.expand(template)

Return the string obtained by doing backslash substitution on the template string template, as done by the sub() method. Escapes such as \nare converted to the appropriate characters, and numeric backreferences (\1, \2) and named backreferences (\g<1>, \g<name>) are replaced by the contents of the corresponding group.

2.3.group([group1, ...])

Returns one or more subgroups of the match. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Without arguments, group1 defaults to zero (the whole match is returned). If a groupN argument is zero, the corresponding return value is the entire matching string; if it is in the inclusive range [1..99], it is the string matching the corresponding parenthesized group. If a group number is negative or larger than the number of groups defined in the pattern, an IndexError exception is raised. If a group is contained in a part of the pattern that did not match, the corresponding result is None. If a group is contained in a part of the pattern that matched multiple times, the last match is returned.

>>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist") >>> m.group(0) # The entire match

'Isaac Newton'

>>> m.group(1) # The first parenthesized subgroup. 'Isaac'

>>> m.group(2) # The second parenthesized subgroup. 'Newton'

>>> m.group(1, 2) # Multiple arguments give us a tuple. ('Isaac', 'Newton')

If the regular expression uses the (?P<name>...) syntax, the groupN arguments may also be strings identifying groups by their group name. If a string argument is not used as a group name in the pattern, an IndexError exception is raised.

A moderately complicated example:

>>> m.group('first_name') 'Malcolm'

>>> m.group('last_name') 'Reynolds'

Named groups can also be referred to by their index: >>> m.group(1)

'Malcolm' >>> m.group(2) 'Reynolds'

If a group matches multiple times, only the last match is accessible: >>> m = re.match(r"(..)+", "a1b2c3") # Matches 3 times.

>>> m.group(1) # Returns only the last match. 'c3'

2.4.groups([default])

Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern. The default argument is used for groups that did not participate in the match; it defaults to None. (Incompatibility note: in the original Python 1.5 release, if the tuple was one element long, a string would be returned instead. In later versions (from 1.5.1 on), a singleton tuple is returned in such cases.)

For example:

>>> m = re.match(r"(\d+)\.(\d+)", "24.1632") >>> m.groups()

('24', '1632')

If we make the decimal place and everything after it optional, not all groups might participate in the match. These groups will default to Noneunless the default argument is given:

>>> m = re.match(r"(\d+)\.?(\d+)?", "24")

>>> m.groups() # Second group defaults to None. ('24', None)

>>> m.groups('0') # Now, the second group defaults to '0'. ('24', '0')

2.5.groupdict([default])

Return a dictionary containing all the named subgroups of the match, keyed by the subgroup name. The default argument is used for groups that did not participate in the match; it defaults to None. For example:

>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds") >>> m.groupdict()

{'first_name': 'Malcolm', 'last_name': 'Reynolds'} start([group])

end([group])

Return the indices of the start and end of the substring matched by group; group defaults to zero (meaning the whole matched substring). Return -1 if group exists but did not contribute to the match. For a match object m, and a group g that did contribute to the match, the substring matched by group g (equivalent to m.group(g)) is

m.string[m.start(g):m.end(g)]

Note that m.start(group) will equal m.end(group) if group matched a null string. For example, after m = re.search('b(c?)', 'cba'), m.start(0) is 1, m.end(0) is 2, m.start(1) and m.end(1) are both 2, and m.start(2) raises an IndexError exception. An example that will remove remove_this from email addresses:

>>> email = "tony@tiremove_thisger.net" >>> m = re.search("remove_this", email) >>> email[:m.start()] + email[m.end():] '[email protected]'

2.6.span([group])

For MatchObject m, return the 2-tuple (m.start(group), m.end(group)). Note that if group did not contribute to the match, this is (-1, -1). groupdefaults to zero, the entire match.

2.7.pos

The value of pos which was passed to the search() or match() method of the RegexObject. This is the index into the string at which the RE engine started looking for a match.

2.8.endpos

The value of endpos which was passed to the search() or match() method of the RegexObject. This is the index into the string beyond which the RE engine will not go.

2.9.lastindex

The integer index of the last matched capturing group, or None if no group was matched at all. For example, the expressions (a)b, ((a)(b)), and((ab)) will have lastindex == 1 if applied to the string 'ab', while the expression (a)(b) will have lastindex == 2, if applied to the same string.

2.10.lastgroup

The name of the last matched capturing group, or None if the group didn’t have a name, or if no group was matched at all.

3. Quantifiers

A quantifier has the form {m,n} where m and n are the minimum and maximum times the expression to which the quantifier applies must match. For

example,both e{1,1}e{1,1} and e{2,2} match feel, but neither matches felt.

Writing a quantifier after every expression would soon become tedious, and is certainly difficult to read. Fortunately, the regex language supports several convenient shorthands. If only one number is given in the quantifier, it's taken to be both the minimum and the maximum, so e{2} is the same as e{2,2}. As noted in the preceding section, if no quantifier is explicitly given, it's assumed to be 1 (that is, {1,1} or {1}); therefore, ee is the same as e{1,1}e{1,1} and e{1}e{1}, so both e{2} and ee match feelbut not felt.

Having a different minimum and maximum is often convenient. For example, to match travelled and traveled (both legitimate spellings),we could use

either travel{1,2}ed or travell{0,1}ed. The {0,1} quantification is used so often that it has its own shorthand form, ?, so another way of writing the regex (and the one most likely to be used in practice) is travell?ed.

Two other quantification shorthands are provided: A plus sign (+) stands for {1,n} ("at least one") and an asterisk (*) stands for {0,n} ("any number of"). In both cases, n is the maximum possible number allowed for a quantifier, usually at least 32767. Table 2 shows all the

quantifiers.

The + quantifier is very useful. For example, to match integers, we could use \d+ to match one or more digits. This regex could match in two places in the string 4588.91, for

example: 4588.91 and 4588.91. Sometimes typos are the result of pressing a key too long. We could use the regex bevel+ed to match the legitimate beveled and bevelled, and the

incorrect bevellled. If we wanted to standardize on the single-l spelling, and match only occurrences that had two or more l's, we could use bevell+ed to find them.

The * quantifier is less useful, simply because it can lead so often to unexpected results. For example, supposing that we want to find lines that contain comments in Python files, we might try searching for #*. But this regex will match any line whatsoever, including blank lines, because the meaning is "match any number of pound signs"—and that includes none. As a rule for those new to regexes, avoid using *at all, and if you do use it (or if you use ?), make sure that at least one other expression in the regex has a nonzero quantifier. Use at least one quantifier other than * or ?, that is, since both of these can match their expression zero times.

Often it's possible to convert * uses to + uses and vice versa. For example, we could match "tasselled" with at least one l using tassell*ed or tassel+ed, and match those with two or more l's using tasselll*ed or tassell+ed.

If we use the regex \d+ it will match 136. But why does it match all the digits, rather than just the first one? By default, all quantifiers are greedy—they match as many characters as they can. We can make any quantifier nongreedy (also called minimal) by following it with a question mark (?) symbol. (The question mark has two different meanings—on its own it's a shorthand for the {0,1} quantifier, and when it follows a quantifier it tells the quantifier to be nongreedy.) For example, \d+? can match the string 136 in three different places: 136, 136, and 136. Here's another example: \d?? matches zero or one digits, but prefers to match none since it's nongreedy; on its own it suffers the same problem as * in that it will match nothing—that is, any text at all. Table 2 Regular Expression Quantifiers

Syntax Meaning

e? or e{0,1} Greedily match zero occurrences or one occurrence of expression e.

e?? or e{0,1}? Nongreedily match zero occurrences or one occurrence of expression e.

e+ or e{1,} Greedily match one or more occurrences of expression e.

e+? or e{1,}? Nongreedily match one or more occurrences of expression e.

e* or e{0,} Greedily match zero or more occurrences of expression e.

e*? or e{0,}? Nongreedily match zero or more occurrences of expression e.

e{m} Match exactly m occurrences of expression e.

e{m,}? Nongreedily match at least m occurrences of expression e.

e{,n} Greedily match at most n occurrences of expression e.

e{,n}? Nongreedily match at most n occurrences of expression e.

e{m,n} Greedily match at least m and at most n occurrences of expression e.

e{m,n}? Nongreedily match at least m and at most n occurrences of expression e.

4.Splitting Strings

re.split(regex, subject) returns an array of strings. The array contains the parts of subject between all the regex matches in the subject. Adjacent regex matches will cause empty strings to appear in the array. The regex matches themselves are not included in the array. If the regex contains capturing groups, then the text matched by the capturing groups is included in the array. The capturing groups are inserted between the substrings that appeared to the left and right of the regex match. If you don't want the capturing groups in the array, convert them into non-capturing groups. The re.split() function does not offer an option to suppress capturing groups.

You can specify an optional third parameter to limit the number of times the subject string is split. Note that this limit controls the number of splits, not the number of strings that will end up in the array. The unsplit remainder of the subject is added as the final string to the array. If there are no capturing groups, the array will contain limit+1 items.

The behavior of re.split() has changed between Python versions when the regular expression can find zero-length matches. In Python 3.4 and prior, re.split() ignores zero-length matches. In Python 3.5 and 3.6 re.split() throws a FutureWarning when it encounters a zero-length match. This warning signals the change in Python 3.7. Now re.split() also splits on zero-length matches.

Related documents