• No results found

Lecture 2, Introduction to Python. Python Programming Language

N/A
N/A
Protected

Academic year: 2021

Share "Lecture 2, Introduction to Python. Python Programming Language"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

Lecture 2, Introduction to Python

Young-Rae Cho

Associate Professor

Department of Computer Science

Baylor University

BINF 3360, Introduction to Computational Biology

Python Programming Language

Script Language

 General-purpose script language

 Broad applications

(web, bioinformatics, network programming, graphics, software engineering)

Features

 Object-oriented

 Extension with modules

 Database integration

 Embeddable

(2)

Getting Started

Download & Installation

http://www.python.org/download/ (the most recent version: Python 3.3)

Edit & Run

Create a file named test.py

 Edit the code

 Run the code

# This is a test. dna = ‘ATCGATGA’ print dna, ‘\n’

> python test.py

Primitives

Primitive Data Types

 Numbers or Strings

Substring

Reversing

num = 1234 st = ‘1234’

num_1 = num + int(st) st_1 = str(num) + st dna1 = ‘ACGTGAACT’ dna2 = dna1[::-1] dna1 = ‘ACGTGAACT’ dna2 = dna1[0:4] length = len(dna2)

(3)

Lists

List Variables

 A list of comma-separated values

Insert, Delete, Append, Reverse, and Sort lst1 = [‘A’, ‘C’, ‘G’] lst2 = [‘T’] lst1 = lst1 + lst2 Variable-length list lst = [‘A’, ‘T’, ‘G’] lst.insert(1, ‘C’) del lst[2] lst.append(‘T’) lst.extend([‘A’, ‘C’]) lst.reverse() lst.sort() lst = [‘A’, ‘T’, ‘G’] lst [1:2] = ‘C’ lst [1:1] = ‘T’ lst [2:3] = ‘’ lst [len(lst) : len(lst)] = ‘T’ lst [len(lst) : len(lst)] = [‘A’, ‘C’] lst [::-1]

Sets

Set Variables

Add and Remove

DNAbases = {‘A’, ‘C’, ‘G’, ‘T’} RNAbases = {‘A’, ‘C’, ‘G’, ‘U’} DNAbases | RNAbases DNAbases & RNAbases DNAbases - RNAbases

bases = {‘A’, ‘D’, ‘G’} bases.add(‘T’) bases.remove(‘D’)

(4)

Dictionaries

InitializationMappingDelete d = dict() d[‘key1’] = ‘value1’ k2, v2 = ‘key2’, ‘value2’ d[k2] = v2 d = { ‘key1’: ‘value1’ , ‘key2’: ‘value2’ , ‘key3’: ‘value3’ } d[‘key1’] d.get(‘key1’) d.keys() d.values() del d[‘key1’]

Input / Output

Standard InputReading FilesWriting Files name = ‘myfilename.txt’ with open(name) as file:

data = file.read()

name = sys.stdin.readline() with open(name) as file:

data = file.read()

name = sys.argv[1] with open(name) as file:

data = file.read() import sys

data = sys.stdin.readline().replace(‘\n’, ‘ ’)

name = ‘output.txt’ with open(name, ‘w’) as file:

(5)

Functions

Types

 Built-in system functions

 User-defined functions

Defining Function

Function Call

def function_name (parameter_list): statement statement return value

Iteration

Iterative Process def find_max(lst): max_so_far = lst[0] for item in lst[1:]: if item > max_so_far: max_so_far = item return max_so_far lst1 = [3,5,10,4,6] maximum = find_max(lst1)

(6)

Recursion

Recursive Call

def print_tree(tree, level): print ‘ ’ * 4 * level, tree[0] for subtree in tree[1:]:

print_tree(subtree, level+1) t1 = [‘A’, [‘T’, [‘A’], [‘T’]], [‘G’, [‘G’], [‘C’]]] print_tree(t1, 0)

Modules

Module  A collection of functions

 Module python (.py) files in a library directory

Module Call

import random seq = 'ATCGATAGCTA'

random_base = seq[random.randint(0,len(seq)-1)]

from random import * seq = 'ATCGATAGCTA'

(7)

Regular Expressions

Special Languages  Metacharacters  Quantifiers  Alternatives  Character Set Usage

Same to the regular expressions in Perl

import re

if re.match(‘TATA .* AA’, seq): print ‘It matched!’

import re

matches = re.findall(‘TATA .* AA’, seq) print matches

Biological Applications

Parsing Sequences

Base Frequency Counting

Motif (Substring) Search

Sequence Transformation

 DNA Replication

 Transcription from DNA to RNA

 Translating RNA into Protein

(8)

Parsing Sequences (1)

Single Sequence in FASTA Format

Parsing

 Make a function to return the sequence from the FASTA format >gi|5524211|gb|AAD44166.1| cytochrome b LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIP YIGTNLVEWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDK IPFHPYYTIKDFLGLLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRS VPNKLGGVLALFLSIVILGLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYP YTIIGQMASILYFSIILAFLPIAGXIENY def read_FASTA_seq(filename): with open(filename) as f: return f.read().partition(‘\n’)[2].replace(‘\n’, ‘’)

Parsing Sequences (2)

Multiple Sequences in FASTA Format

Parsing ? >SEQUENCE_1 MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHKIP QFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTLMGQFY VMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL >SEQUENCE_2 SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQIATIGE NLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH

(9)

Frequency Counting

DNA Sequence Validation

Counting Base Frquency

 Make a function to calculate the percent of ‘C’ and ‘G’ in a DNA sequence

def validate_dna (base_sequence): seq = base_sequence.upper() return len(seq) == (seq.count(‘T’) +

seq.count(‘C’) + seq.count(‘A’) + seq.count(‘G’) )

def validate_dna (base_sequence): seq = base_sequence.upper() for base in seq:

if base not in ‘ACGT’: return False return True

def percent_of_GC (base_sequence): seq = base_sequence.upper()

return (seq.count(‘G’) + seq.count(‘C’)) / len(seq)

Motif Search

Searching Substring

 Make a function to take a sequence and a motif and return the position(s) of matching in the sequence

def motif_search (seq, motif): return seq.find(motif)

def all_motif_search (seq, motif): pos = []

idx = seq.find(motif) pos.append(idx)

seq = seq.partition(motif)[2] while seq.find(motif) >= 0:

idx += seq.find(motif) + len(motif) pos.append(idx)

(10)

Transcription

Simulating Transcription

 Make a function to transcribe a DNA into an RNA

def transcription (dna): return dna.replace(‘T’, ‘U’)

Translation (1)

Making Genetic Code

 Make a function to translate a codon to an amino acid

def codon2aa(codon):

genetic_code = { ‘UUU’: ‘F’, ‘UUC’: ‘F’, ‘UUA’: ‘L’, …… } if codon in genetic_code.keys():

return genetic_code[codon] else:

(11)

Translation (2)

Simulating Translation

 Make a function to translate an RNA into a protein sequence

def translation(rna): protein = ‘’

for n in range(0, len(rna), 3): protein += codon2aa(rna[n:n+3]) return protein

Translation (3)

Simulating Translation – cont’

 Make a generator

- an object that returns values from a series it computes

def aa_generator(rna):

return (codon2aa(rna[n:n+3]) for n in range(0, len(rna), 3) )

def translation(rna): gen = aa_generator(rna) protein = ‘’

aa = next(gen) while aa:

(12)

Mutation

Simulating Mutation

 Make a function to simulate single point mutations in a DNA sequence

import random def mutation(dna): position = random.randint(0,len(dna)-1) bases = ‘ACGT’ new_base = bases[random.randint(0,3)] dna[position:position+1] = new_base return dna bases.replace(dna[position], ‘’) new_base = bases[random.randint(0,2)]

Questions?

 Lecture Slides are found on the Course Website, web.ecs.baylor.edu/faculty/cho/3360

References

Related documents

This video will help you setup Python Environment in your computer and write your first Python program.. Spotle.ai

THEN we read the whole file in a for loop (this loop generates a new value for the variable 'line' for every line in filehandle f it reads Then we do fancy functions (split) on

Representing and Communicating AI Model Results in Standard DICOM Format Using the Python Programming Language..

The account statement that can be printed includes the account number, holder and balance... Base

the Python language from the official site and download any editor to write codes or use the official python language editor.... python

IT335 Python Introduction for Commerce CIT 134A Programming in Python or CIT 134B Advanced Python Programming. n/a

Antofagasta, la ciudad más global de Chile, capital de la segunda mayor economía regional del país, de la segunda región con más inversión extranjera y de la primera en

 If you are also running elections that do not technically require the Weighted Voting function, simply set all electors to have a vote weight value of 1 to have results