ML I,II: Tutorial

Today's Agenda

  1. Review of some Python basics
  2. Python standard library
  3. Third-party libraries
    • numpy
    • matplotlib
  4. Basic data analysis and regression

Python Basics

  • Collection Data Structures
    • list
    • tuple
    • dict
    • set
  • Functions
  • Python Memory Model

Collections: list

In [1]:
l = [1, 2, 3, 4, 5, "six"]
print(l)
print(type(l))
[1, 2, 3, 4, 5, 'six']
<class 'list'>
In [2]:
len(l)
Out[2]:
6
In [3]:
print(4 in l, 7 in l)
True False

Indexing lists:

In [4]:
print(l)
[1, 2, 3, 4, 5, 'six']
In [5]:
l
Out[5]:
[1, 2, 3, 4, 5, 'six']
In [6]:
l[0]
Out[6]:
1
In [7]:
l[-1]
Out[7]:
'six'
In [8]:
print(l)
[1, 2, 3, 4, 5, 'six']
In [9]:
l[2:4] ## slicing
Out[9]:
[3, 4]
In [10]:
l[3:]
Out[10]:
[4, 5, 'six']

Modifying lists:

In [11]:
l[-1] = 'seven'
l[2:4] = [7, 8]
l
Out[11]:
[1, 2, 7, 8, 5, 'seven']
In [12]:
l.pop()
Out[12]:
'seven'
In [13]:
l
Out[13]:
[1, 2, 7, 8, 5]
In [14]:
l.append("eight")
l
Out[14]:
[1, 2, 7, 8, 5, 'eight']
In [15]:
l.reverse()
l
Out[15]:
['eight', 5, 8, 7, 2, 1]

Concatenation and repetition:

In [16]:
l = [7, 1, 2]
r = [9, 6, 8]
l + r*2
Out[16]:
[7, 1, 2, 9, 6, 8, 9, 6, 8]

List comprehension:

In [20]:
l = [1, 2, 3, 4]

[[x**2, 1] for x in l]
Out[20]:
[[1, 1], [4, 1], [9, 1], [16, 1]]
In [21]:
[x**2 for x in l if x % 2 == 0]
Out[21]:
[4, 16]

Collections: dict

In [22]:
d = {'a': 2, 'b': 2, 3: 2, 'b': 5}

d['a']  ### lookup
Out[22]:
2
In [23]:
d.keys()
Out[23]:
dict_keys(['a', 'b', 3])
In [24]:
d.values()
Out[24]:
dict_values([2, 5, 2])

Collections: set

In [25]:
s = {1, 2, 3, 1, 3, 1, 4}
s
Out[25]:
{1, 2, 3, 4}
In [26]:
s.intersection([3, 4, 5])
Out[26]:
{3, 4}
In [27]:
s.union([7])
Out[27]:
{1, 2, 3, 4, 7}

Lists don't work as keys

In [28]:
pairdict = {[1, 2]: 3}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-28-4ab7d016ea9e> in <module>
----> 1 pairdict = {[1, 2]: 3}

TypeError: unhashable type: 'list'
In [29]:
pairset = {[1,2], [3,4]}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-6b3ef5633f01> in <module>
----> 1 pairset = {[1,2], [3,4]}

TypeError: unhashable type: 'list'

Hashes

In [30]:
v1 = "abcd" * 10
v2 = "abcd" * 10
v3 = v2 + 'x'
print(v1, v2, v3)

hash(v1), hash(v2), hash(v3)
abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdx
Out[30]:
(4809842418838812611, 4809842418838812611, -4505499860472041403)

Collections: tuple

In [31]:
t = (1, 2, 'three')
In [32]:
len(t)
Out[32]:
3
In [33]:
t[1]
Out[33]:
2
In [34]:
t[1] = 4  ## cannot be modified
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-12c5f1992471> in <module>
----> 1 t[1] = 4  ## cannot be modified

TypeError: 'tuple' object does not support item assignment
In [35]:
pairdict = {(1,2): 3}
pairset = {(1,2), (3,4)}

Summary of Collections

  • list: [v1, v2, v3]
  • tuple: (v1, v2, v3)
    • similar to list, but immutable, can use as key
  • set: {v1, v2, v3}
    • similar to list, but no duplicated elements, no stable ordering
  • dictionary: {k1:v1, k2: v2, k3:v3}

Functions

In [36]:
def square_sum(x, y):
    v = x*x + y*y
    return v

print(square_sum(2, 2))
8

Default values and keyword arguments

In [37]:
def square_sum2(x, y=1):
    return x**2 + y**2

square_sum2(2)
Out[37]:
5
In [38]:
square_sum2(x=1, y=3)
Out[38]:
10
In [39]:
square_sum2(1, 3)
Out[39]:
10
In [40]:
def square_sum3(x, y=1, z=2):
    return x**2 + y**2 + z

square_sum3(2)
Out[40]:
7
In [41]:
square_sum3(2, z=3)
Out[41]:
8

Lambda expressions and higher-order functions

In [42]:
def apply(function, argument):
    return function(argument)

apply(lambda x: x + 1, 3)
# lambda arg1,arg2,... : expression, value1, value2,...
Out[42]:
4
In [43]:
#                     m > n ? m : n
my_max = lambda m, n: m if m > n else n

print(my_max(10, 3))
10
In [44]:
l = [1, 2, 3]
#apply(square_sum2, 2)
[square_sum2(x) for x in l]
Out[44]:
[2, 5, 10]

Useful built-in functions

In [45]:
things = ["cat", "apple", "boat"]
sorted(things) # alphabetically, upper case first
Out[45]:
['apple', 'boat', 'cat']
In [46]:
sorted(things, key=lambda x: len(x))
Out[46]:
['cat', 'boat', 'apple']
In [47]:
enumerate(things)
Out[47]:
<enumerate at 0x7f5884527b40>
In [48]:
list(enumerate(things, 2))  ## the second argument is the first index
Out[48]:
[(2, 'cat'), (3, 'apple'), (4, 'boat')]
In [49]:
### sum over collections
numbers = [14, 13, 15]
sum(numbers) ### works with non-numbers, too
Out[49]:
42
In [50]:
sum([['foo', 'bar'], ['baz'], ['abced', 'efgh']], [])
Out[50]:
['foo', 'bar', 'baz', 'abced', 'efgh']
In [51]:
things
Out[51]:
['cat', 'apple', 'boat']
In [52]:
list(zip(things, reversed(things), numbers))
Out[52]:
[('cat', 'boat', 14), ('apple', 'apple', 13), ('boat', 'cat', 15)]
In [53]:
[str(x) + ' ' + y + 's' for x, y in zip(numbers, things)]
Out[53]:
['14 cats', '13 apples', '15 boats']
In [54]:
## By the way: there are better ways to interpolate strings:
"{count:.2f} {thing}s".format(thing="zebra", count=3.14159)
Out[54]:
'3.14 zebras'
In [55]:
## or even this:
x = 1
y = 2
f'{x} + {y} equals {x+y}'
Out[55]:
'1 + 2 equals 3'

Range Objects

In [56]:
range(12)
Out[56]:
range(0, 12)
In [57]:
list(range(12))  ## endpoint is not included
Out[57]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
In [58]:
sum(range(12))
Out[58]:
66
In [59]:
list(range(32, 7, -3))
Out[59]:
[32, 29, 26, 23, 20, 17, 14, 11, 8]

Python memory model

What is printed here?

some_guy = 'Fred'

names = []
names.append(some_guy)

names2 = names
names2.append('George')
some_guy = 'Bill'

print(some_guy, names, names2)
In [60]:
some_guy = 'Fred'

names = []
names.append(some_guy)

names2 = names
names2.append('George')
some_guy = 'Bill'

print(some_guy, names, names2)
Bill ['Fred', 'George'] ['Fred', 'George']
In [61]:
import copy
some_guy = 'Fred'

names = []
names.append(some_guy)

names2 = copy.deepcopy(names)
names2.append('George')
some_guy = 'Bill'

print(some_guy, names, names2)
Bill ['Fred'] ['Fred', 'George']

Understand how Python works "under the hood": http://www.pythontutor.com/visualize.html

The Python Standard Library

Now: A few examples

Standard Library: Math

In [62]:
import math

math.log2(1024)
Out[62]:
10.0
In [63]:
math.log(math.e)
Out[63]:
1.0
In [64]:
math.cos(math.pi)
Out[64]:
-1.0

Standard Library: Itertools

In [65]:
import itertools

perms = itertools.permutations([1, 2, 3], r=2)
# r-length tuples, all possible orderings, no repeated elements
# default r: length of the iterable

for p in perms:
    print(p)
(1, 2)
(1, 3)
(2, 1)
(2, 3)
(3, 1)
(3, 2)
In [66]:
combs = itertools.combinations([1, 2, 3], r=2)
# r-length tuples, in sorted order, no repeated elements

print(list(combs))
[(1, 2), (1, 3), (2, 3)]

Standard Library: Random

In [67]:
import random as rnd  ## you can re-name imported modules

rnd.randint(1, 6)  ## Here, the end points are both included
Out[67]:
2
In [68]:
print(things)
rnd.choice(things)
['cat', 'apple', 'boat']
Out[68]:
'boat'
In [69]:
rnd.sample(range(1000), 5)
Out[69]:
[472, 14, 128, 624, 913]

Standard Library: Urllib

In [70]:
## Modules can have sub-modules
import urllib.request as rq

response = rq.urlopen("http://en.wikipedia.org/wiki/Python")

print(response.read(151).decode('utf8'))
<!DOCTYPE html>
<html class="client-nojs" lang="en" dir="ltr">
<head>
<meta charset="UTF-8"/>
<title>Python - Wikipedia</title>
<script>document.docume

The Numpy library

  • Fast numerical computations
  • Linear algebra operations
  • "Vectorized" versions of standard operators and functions
  • Not part of the standard library

Numpy installation

Linux:

  • sudo apt-get install python3-numpy (or equivalent for your distro)
  • pip install numpy

Windows and Mac:

In [71]:
import numpy as np

np.version.full_version
Out[71]:
'1.17.2'

Numpy's arrays

In [72]:
a = np.array([1, 2, 3])
type(a)
Out[72]:
numpy.ndarray
In [73]:
a.dtype
print(128 ** 128)
print(np.int64(128 ** 128))   
528294531135665246352339784916516606518847326036121522127960709026673902556724859474417255887657187894674394993257128678882347559502685537250538978462939576908386683999005084168731517676426441053024232908211188404148028292751561738838396898767036476489538580897737998336
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-73-d91fae5e563b> in <module>
      1 a.dtype
      2 print(128 ** 128)
----> 3 print(np.int64(128 ** 128))

OverflowError: Python int too large to convert to C long
In [74]:
a + 1
Out[74]:
array([2, 3, 4])
In [75]:
a * 1.25   
Out[75]:
array([1.25, 2.5 , 3.75])
In [76]:
a ** 3
Out[76]:
array([ 1,  8, 27])
In [77]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
a + b
Out[77]:
array([5, 7, 9])
In [78]:
a * b
Out[78]:
array([ 4, 10, 18])
In [79]:
a.dot(b)
a[0] * b[0] + a[1] * b[1] + a[2] * b[2]
Out[79]:
32

$a \cdot b = a_1 b_1 + a_2 b_2 + a_3 b_3$

Size and Shape of Arrays

In [80]:
a
Out[80]:
array([1, 2, 3])
In [81]:
len(a)
Out[81]:
3
In [82]:
a.size
Out[82]:
3
In [83]:
a.shape
Out[83]:
(3,)

Arrays can be multidimensional

In [84]:
v = np.array([[1, 2, 3]])
v
Out[84]:
array([[1, 2, 3]])
In [85]:
v.shape
Out[85]:
(1, 3)
In [89]:
m = np.array([[1, 2, 3], [4, 5, 6]])
In [90]:
m
Out[90]:
array([[1, 2, 3],
       [4, 5, 6]])
In [91]:
m.shape
Out[91]:
(2, 3)
In [92]:
m2 = np.array([[3, 2], [4, 5], [6, 7]])
m2
Out[92]:
array([[3, 2],
       [4, 5],
       [6, 7]])
In [93]:
print('v:', v.shape, 'm:', m.shape, ' m2:', m2.shape)
v: (1, 3) m: (2, 3)  m2: (3, 2)

Matrix multiplication

m = array([[1, 2, 3],  ;  m2 = array([[3, 2],
           [4, 5, 6]])                [4, 5],
                                      [6, 7]])
In [94]:
m.dot(m2)
Out[94]:
array([[29, 33],
       [68, 75]])
In [95]:
m2.dot(m)
Out[95]:
array([[11, 16, 21],
       [24, 33, 42],
       [34, 47, 60]])

Array initializers

In [97]:
np.arange(0, 10, 2)   ## start, stop, stepsize
# np.array(range(0, 10, 2))  ## equivalent
Out[97]:
array([0, 2, 4, 6, 8])
In [98]:
np.linspace(0, 1, 3)  ## start, stop, count
Out[98]:
array([0. , 0.5, 1. ])
In [99]:
a = np.linspace(0, np.pi, 4)
a
Out[99]:
array([0.        , 1.04719755, 2.0943951 , 3.14159265])
In [100]:
np.cos(a)
Out[100]:
array([ 1. ,  0.5, -0.5, -1. ])
In [101]:
np.zeros(9)
Out[101]:
array([0., 0., 0., 0., 0., 0., 0., 0., 0.])
In [102]:
np.zeros([3, 4])
Out[102]:
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])
In [103]:
np.ones([2, 2])
Out[103]:
array([[1., 1.],
       [1., 1.]])
In [104]:
m
Out[104]:
array([[1, 2, 3],
       [4, 5, 6]])
In [105]:
np.ones_like(m)
Out[105]:
array([[1, 1, 1],
       [1, 1, 1]])
In [106]:
np.eye(5) ## identity matrix
Out[106]:
array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])
In [107]:
np.eye(5, 7, -1)  ## rows, columns, diagonal offset.
Out[107]:
array([[0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0.]])
In [108]:
x = np.random.rand(4, 3)  ### I.i.d. uniform in [0, 1]; 
                      ### randn() for gaussian
2*x + 1    
Out[108]:
array([[2.00957291, 2.99538573, 1.48395068],
       [2.00777067, 1.22702353, 1.05240387],
       [2.94676883, 1.78339226, 2.88969613],
       [1.89841747, 2.48200282, 2.14871356]])

Numpy: basic operations

In [109]:
m
Out[109]:
array([[1, 2, 3],
       [4, 5, 6]])
In [110]:
m.T  ## Transpose
Out[110]:
array([[1, 4],
       [2, 5],
       [3, 6]])
In [111]:
m.flatten()
Out[111]:
array([1, 2, 3, 4, 5, 6])
In [112]:
m
Out[112]:
array([[1, 2, 3],
       [4, 5, 6]])
In [113]:
m.sum()
Out[113]:
21
In [114]:
m.shape
Out[114]:
(2, 3)
  axis=1
 -------->

 [[1, 2, 3],    |    axis=0
  [4, 5, 6]]    \/
In [115]:
m.sum(axis=0)
Out[115]:
array([5, 7, 9])
In [116]:
m.sum(axis=1)
Out[116]:
array([ 6, 15])
In [117]:
m3 = np.array([[0, 1],
               [2, 3]])

m3_inv = np.linalg.inv(m3)  ## compute the inverse

m3.dot(m3_inv)
Out[117]:
array([[1., 0.],
       [0., 1.]])

Drawing things: the matplotlib library

For best results, put this somewhere early in your notebooks:
In [118]:
%matplotlib inline
In [119]:
import matplotlib.pyplot as plt
In [120]:
plt.plot([0, 1, 2], [1, 0, -0.5], 'o--b', linewidth=5);
# o: symbol for points
# --: use dashes as line
# g: color is green

Controlling aspects of the plot

In [121]:
xvals = np.linspace(0, 2*np.pi, 100)
yvals = np.sin(xvals)

plt.plot(xvals, yvals)
plt.xticks(np.linspace(0, 2*np.pi, 7))
plt.title("Sine curve, one period")
plt.xlabel("x"); plt.ylabel("y")
plt.grid();

Different kinds of plots

Scatter plot

In [122]:
x = np.random.randn(1000)
y = np.random.randn(1000)

plt.scatter(x, y);

Bar chart

In [123]:
y = np.random.rand(20)
x = np.arange(20)

plt.bar(x, y, facecolor='green');

Histogram

In [124]:
y = np.random.randn(100000) * 2 + 5   ### mean 5, std.dev. 2

plt.hist(y, bins=500, facecolor='lightgray');