# ML I,II: Tutorial¶

## Today's Agenda¶

1. Review of some Python basics
2. Python standard library
3. Third-party libraries
• numpy
• matplotlib
4. Basic data analysis and regression

# Python Basics¶

• Collection Data Structures
• list
• tuple
• dict
• set
• Functions
• Python Memory Model

## Collections: list¶

In [1]:
l = [1, 2, 3, 4, 5, "six"]
print(l)
print(type(l))

[1, 2, 3, 4, 5, 'six']
<class 'list'>

In [2]:
len(l)

Out[2]:
6
In [3]:
print(4 in l, 7 in l)

True False


Indexing lists:

In [4]:
print(l)

[1, 2, 3, 4, 5, 'six']

In [5]:
l

Out[5]:
[1, 2, 3, 4, 5, 'six']
In [6]:
l[0]

Out[6]:
1
In [7]:
l[-1]

Out[7]:
'six'
In [8]:
print(l)

[1, 2, 3, 4, 5, 'six']

In [9]:
l[2:4] ## slicing

Out[9]:
[3, 4]
In [10]:
l[3:]

Out[10]:
[4, 5, 'six']

Modifying lists:

In [11]:
l[-1] = 'seven'
l[2:4] = [7, 8]
l

Out[11]:
[1, 2, 7, 8, 5, 'seven']
In [12]:
l.pop()

Out[12]:
'seven'
In [13]:
l

Out[13]:
[1, 2, 7, 8, 5]
In [14]:
l.append("eight")
l

Out[14]:
[1, 2, 7, 8, 5, 'eight']
In [15]:
l.reverse()
l

Out[15]:
['eight', 5, 8, 7, 2, 1]

Concatenation and repetition:

In [16]:
l = [7, 1, 2]
r = [9, 6, 8]
l + r*2

Out[16]:
[7, 1, 2, 9, 6, 8, 9, 6, 8]

List comprehension:

In [20]:
l = [1, 2, 3, 4]

[[x**2, 1] for x in l]

Out[20]:
[[1, 1], [4, 1], [9, 1], [16, 1]]
In [21]:
[x**2 for x in l if x % 2 == 0]

Out[21]:
[4, 16]

## Collections: dict¶

In [22]:
d = {'a': 2, 'b': 2, 3: 2, 'b': 5}

d['a']  ### lookup

Out[22]:
2
In [23]:
d.keys()

Out[23]:
dict_keys(['a', 'b', 3])
In [24]:
d.values()

Out[24]:
dict_values([2, 5, 2])

## Collections: set¶

In [25]:
s = {1, 2, 3, 1, 3, 1, 4}
s

Out[25]:
{1, 2, 3, 4}
In [26]:
s.intersection([3, 4, 5])

Out[26]:
{3, 4}
In [27]:
s.union([7])

Out[27]:
{1, 2, 3, 4, 7}

## Lists don't work as keys¶

In [28]:
pairdict = {[1, 2]: 3}

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-28-4ab7d016ea9e> in <module>
----> 1 pairdict = {[1, 2]: 3}

TypeError: unhashable type: 'list'
In [29]:
pairset = {[1,2], [3,4]}

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-6b3ef5633f01> in <module>
----> 1 pairset = {[1,2], [3,4]}

TypeError: unhashable type: 'list'

## Hashes¶

In [30]:
v1 = "abcd" * 10
v2 = "abcd" * 10
v3 = v2 + 'x'
print(v1, v2, v3)

hash(v1), hash(v2), hash(v3)

abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdx

Out[30]:
(4809842418838812611, 4809842418838812611, -4505499860472041403)

## Collections: tuple¶

In [31]:
t = (1, 2, 'three')

In [32]:
len(t)

Out[32]:
3
In [33]:
t[1]

Out[33]:
2
In [34]:
t[1] = 4  ## cannot be modified

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-12c5f1992471> in <module>
----> 1 t[1] = 4  ## cannot be modified

TypeError: 'tuple' object does not support item assignment
In [35]:
pairdict = {(1,2): 3}
pairset = {(1,2), (3,4)}


## Summary of Collections¶

• list: [v1, v2, v3]
• tuple: (v1, v2, v3)
• similar to list, but immutable, can use as key
• set: {v1, v2, v3}
• similar to list, but no duplicated elements, no stable ordering
• dictionary: {k1:v1, k2: v2, k3:v3}

## Functions¶

In [36]:
def square_sum(x, y):
v = x*x + y*y
return v

print(square_sum(2, 2))

8


## Default values and keyword arguments¶

In [37]:
def square_sum2(x, y=1):
return x**2 + y**2

square_sum2(2)

Out[37]:
5
In [38]:
square_sum2(x=1, y=3)

Out[38]:
10
In [39]:
square_sum2(1, 3)

Out[39]:
10
In [40]:
def square_sum3(x, y=1, z=2):
return x**2 + y**2 + z

square_sum3(2)

Out[40]:
7
In [41]:
square_sum3(2, z=3)

Out[41]:
8

## Lambda expressions and higher-order functions¶

In [42]:
def apply(function, argument):
return function(argument)

apply(lambda x: x + 1, 3)
# lambda arg1,arg2,... : expression, value1, value2,...

Out[42]:
4
In [43]:
#                     m > n ? m : n
my_max = lambda m, n: m if m > n else n

print(my_max(10, 3))

10

In [44]:
l = [1, 2, 3]
#apply(square_sum2, 2)
[square_sum2(x) for x in l]

Out[44]:
[2, 5, 10]

## Useful built-in functions¶

In [45]:
things = ["cat", "apple", "boat"]
sorted(things) # alphabetically, upper case first

Out[45]:
['apple', 'boat', 'cat']
In [46]:
sorted(things, key=lambda x: len(x))

Out[46]:
['cat', 'boat', 'apple']
In [47]:
enumerate(things)

Out[47]:
<enumerate at 0x7f5884527b40>
In [48]:
list(enumerate(things, 2))  ## the second argument is the first index

Out[48]:
[(2, 'cat'), (3, 'apple'), (4, 'boat')]
In [49]:
### sum over collections
numbers = [14, 13, 15]
sum(numbers) ### works with non-numbers, too

Out[49]:
42
In [50]:
sum([['foo', 'bar'], ['baz'], ['abced', 'efgh']], [])

Out[50]:
['foo', 'bar', 'baz', 'abced', 'efgh']
In [51]:
things

Out[51]:
['cat', 'apple', 'boat']
In [52]:
list(zip(things, reversed(things), numbers))

Out[52]:
[('cat', 'boat', 14), ('apple', 'apple', 13), ('boat', 'cat', 15)]
In [53]:
[str(x) + ' ' + y + 's' for x, y in zip(numbers, things)]

Out[53]:
['14 cats', '13 apples', '15 boats']
In [54]:
## By the way: there are better ways to interpolate strings:
"{count:.2f} {thing}s".format(thing="zebra", count=3.14159)

Out[54]:
'3.14 zebras'
In [55]:
## or even this:
x = 1
y = 2
f'{x} + {y} equals {x+y}'

Out[55]:
'1 + 2 equals 3'

### Range Objects¶

In [56]:
range(12)

Out[56]:
range(0, 12)
In [57]:
list(range(12))  ## endpoint is not included

Out[57]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
In [58]:
sum(range(12))

Out[58]:
66
In [59]:
list(range(32, 7, -3))

Out[59]:
[32, 29, 26, 23, 20, 17, 14, 11, 8]

## Python memory model¶

### What is printed here?¶

some_guy = 'Fred'

names = []
names.append(some_guy)

names2 = names
names2.append('George')
some_guy = 'Bill'

print(some_guy, names, names2)

In [60]:
some_guy = 'Fred'

names = []
names.append(some_guy)

names2 = names
names2.append('George')
some_guy = 'Bill'

print(some_guy, names, names2)

Bill ['Fred', 'George'] ['Fred', 'George']

In [61]:
import copy
some_guy = 'Fred'

names = []
names.append(some_guy)

names2 = copy.deepcopy(names)
names2.append('George')
some_guy = 'Bill'

print(some_guy, names, names2)

Bill ['Fred'] ['Fred', 'George']


Understand how Python works "under the hood": http://www.pythontutor.com/visualize.html

# The Python Standard Library¶

## Standard Library: Math¶

In [62]:
import math

math.log2(1024)

Out[62]:
10.0
In [63]:
math.log(math.e)

Out[63]:
1.0
In [64]:
math.cos(math.pi)

Out[64]:
-1.0

## Standard Library: Itertools¶

In [65]:
import itertools

perms = itertools.permutations([1, 2, 3], r=2)
# r-length tuples, all possible orderings, no repeated elements
# default r: length of the iterable

for p in perms:
print(p)

(1, 2)
(1, 3)
(2, 1)
(2, 3)
(3, 1)
(3, 2)

In [66]:
combs = itertools.combinations([1, 2, 3], r=2)
# r-length tuples, in sorted order, no repeated elements

print(list(combs))

[(1, 2), (1, 3), (2, 3)]


## Standard Library: Random¶

In [67]:
import random as rnd  ## you can re-name imported modules

rnd.randint(1, 6)  ## Here, the end points are both included

Out[67]:
2
In [68]:
print(things)
rnd.choice(things)

['cat', 'apple', 'boat']

Out[68]:
'boat'
In [69]:
rnd.sample(range(1000), 5)

Out[69]:
[472, 14, 128, 624, 913]

## Standard Library: Urllib¶

In [70]:
## Modules can have sub-modules
import urllib.request as rq

response = rq.urlopen("http://en.wikipedia.org/wiki/Python")


<!DOCTYPE html>
<html class="client-nojs" lang="en" dir="ltr">
<meta charset="UTF-8"/>
<title>Python - Wikipedia</title>
<script>document.docume


# The Numpy library¶

• Fast numerical computations
• Linear algebra operations
• "Vectorized" versions of standard operators and functions
• Not part of the standard library

### Numpy installation¶

Linux:

• sudo apt-get install python3-numpy (or equivalent for your distro)
• pip install numpy

Windows and Mac:

In [71]:
import numpy as np

np.version.full_version

Out[71]:
'1.17.2'

## Numpy's arrays¶

In [72]:
a = np.array([1, 2, 3])
type(a)

Out[72]:
numpy.ndarray
In [73]:
a.dtype
print(128 ** 128)
print(np.int64(128 ** 128))

528294531135665246352339784916516606518847326036121522127960709026673902556724859474417255887657187894674394993257128678882347559502685537250538978462939576908386683999005084168731517676426441053024232908211188404148028292751561738838396898767036476489538580897737998336

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-73-d91fae5e563b> in <module>
1 a.dtype
2 print(128 ** 128)
----> 3 print(np.int64(128 ** 128))

OverflowError: Python int too large to convert to C long
In [74]:
a + 1

Out[74]:
array([2, 3, 4])
In [75]:
a * 1.25

Out[75]:
array([1.25, 2.5 , 3.75])
In [76]:
a ** 3

Out[76]:
array([ 1,  8, 27])
In [77]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
a + b

Out[77]:
array([5, 7, 9])
In [78]:
a * b

Out[78]:
array([ 4, 10, 18])
In [79]:
a.dot(b)
a[0] * b[0] + a[1] * b[1] + a[2] * b[2]

Out[79]:
32

$a \cdot b = a_1 b_1 + a_2 b_2 + a_3 b_3$

### Size and Shape of Arrays¶

In [80]:
a

Out[80]:
array([1, 2, 3])
In [81]:
len(a)

Out[81]:
3
In [82]:
a.size

Out[82]:
3
In [83]:
a.shape

Out[83]:
(3,)

## Arrays can be multidimensional¶

In [84]:
v = np.array([[1, 2, 3]])
v

Out[84]:
array([[1, 2, 3]])
In [85]:
v.shape

Out[85]:
(1, 3)
In [89]:
m = np.array([[1, 2, 3], [4, 5, 6]])

In [90]:
m

Out[90]:
array([[1, 2, 3],
[4, 5, 6]])
In [91]:
m.shape

Out[91]:
(2, 3)
In [92]:
m2 = np.array([[3, 2], [4, 5], [6, 7]])
m2

Out[92]:
array([[3, 2],
[4, 5],
[6, 7]])
In [93]:
print('v:', v.shape, 'm:', m.shape, ' m2:', m2.shape)

v: (1, 3) m: (2, 3)  m2: (3, 2)


### Matrix multiplication¶

m = array([[1, 2, 3],  ;  m2 = array([[3, 2],
[4, 5, 6]])                [4, 5],
[6, 7]])
In [94]:
m.dot(m2)

Out[94]:
array([[29, 33],
[68, 75]])
In [95]:
m2.dot(m)

Out[95]:
array([[11, 16, 21],
[24, 33, 42],
[34, 47, 60]])

## Array initializers¶

In [97]:
np.arange(0, 10, 2)   ## start, stop, stepsize
# np.array(range(0, 10, 2))  ## equivalent

Out[97]:
array([0, 2, 4, 6, 8])
In [98]:
np.linspace(0, 1, 3)  ## start, stop, count

Out[98]:
array([0. , 0.5, 1. ])
In [99]:
a = np.linspace(0, np.pi, 4)
a

Out[99]:
array([0.        , 1.04719755, 2.0943951 , 3.14159265])
In [100]:
np.cos(a)

Out[100]:
array([ 1. ,  0.5, -0.5, -1. ])
In [101]:
np.zeros(9)

Out[101]:
array([0., 0., 0., 0., 0., 0., 0., 0., 0.])
In [102]:
np.zeros([3, 4])

Out[102]:
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
In [103]:
np.ones([2, 2])

Out[103]:
array([[1., 1.],
[1., 1.]])
In [104]:
m

Out[104]:
array([[1, 2, 3],
[4, 5, 6]])
In [105]:
np.ones_like(m)

Out[105]:
array([[1, 1, 1],
[1, 1, 1]])
In [106]:
np.eye(5) ## identity matrix

Out[106]:
array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])
In [107]:
np.eye(5, 7, -1)  ## rows, columns, diagonal offset.

Out[107]:
array([[0., 0., 0., 0., 0., 0., 0.],
[1., 0., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0.]])
In [108]:
x = np.random.rand(4, 3)  ### I.i.d. uniform in [0, 1];
### randn() for gaussian
2*x + 1

Out[108]:
array([[2.00957291, 2.99538573, 1.48395068],
[2.00777067, 1.22702353, 1.05240387],
[2.94676883, 1.78339226, 2.88969613],
[1.89841747, 2.48200282, 2.14871356]])

## Numpy: basic operations¶

In [109]:
m

Out[109]:
array([[1, 2, 3],
[4, 5, 6]])
In [110]:
m.T  ## Transpose

Out[110]:
array([[1, 4],
[2, 5],
[3, 6]])
In [111]:
m.flatten()

Out[111]:
array([1, 2, 3, 4, 5, 6])
In [112]:
m

Out[112]:
array([[1, 2, 3],
[4, 5, 6]])
In [113]:
m.sum()

Out[113]:
21
In [114]:
m.shape

Out[114]:
(2, 3)
  axis=1
-------->

[[1, 2, 3],    |    axis=0
[4, 5, 6]]    \/
In [115]:
m.sum(axis=0)

Out[115]:
array([5, 7, 9])
In [116]:
m.sum(axis=1)

Out[116]:
array([ 6, 15])
In [117]:
m3 = np.array([[0, 1],
[2, 3]])

m3_inv = np.linalg.inv(m3)  ## compute the inverse

m3.dot(m3_inv)

Out[117]:
array([[1., 0.],
[0., 1.]])

# Drawing things: the matplotlib library¶

##### For best results, put this somewhere early in your notebooks:¶
In [118]:
%matplotlib inline

In [119]:
import matplotlib.pyplot as plt

In [120]:
plt.plot([0, 1, 2], [1, 0, -0.5], 'o--b', linewidth=5);
# o: symbol for points
# --: use dashes as line
# g: color is green


### Controlling aspects of the plot¶

In [121]:
xvals = np.linspace(0, 2*np.pi, 100)
yvals = np.sin(xvals)

plt.plot(xvals, yvals)
plt.xticks(np.linspace(0, 2*np.pi, 7))
plt.title("Sine curve, one period")
plt.xlabel("x"); plt.ylabel("y")
plt.grid();


## Different kinds of plots¶

### Scatter plot¶

In [122]:
x = np.random.randn(1000)
y = np.random.randn(1000)

plt.scatter(x, y);


### Bar chart¶

In [123]:
y = np.random.rand(20)
x = np.arange(20)

plt.bar(x, y, facecolor='green');


### Histogram¶

In [124]:
y = np.random.randn(100000) * 2 + 5   ### mean 5, std.dev. 2

plt.hist(y, bins=500, facecolor='lightgray');