October 25, 2022

1 Introduction of Deep Learning

Key Term:

1 Machine Learning ≈ Looking for function

image-20221206053236150

2 Different Types of Functions

image-20221206053348699

3 How to find a function

image-20221206053424557 image-20221208221245405 image-20221208223257254 image-20221208223309541

image-20221208223318275

image-20221208223349650

image-20221208223401234 image-20221208223418427

4 ML Framework

image-20230323115143560

Step1. Model

How to choose: Depend on domain knowledge

Linear Models

Have model Bias (model limitation),

y = b + \sum_{j=1}^{n}w_jx_j

Sigmoid Function

Hard Sigmoid

image-20230323105514175

Soft Sigmoid

y=b+\sum_{i}c_isigmoid(b_i+w_ix_i)

image-20230323105558324

Change parameter

image-20221206054237612

New Model: More Features

image-20221206054403705

Hyperparameter:

image-20221206054613829

image-20221206054619581

Can be seen as Matrix Operation (Why call GPU when Training)

ReLu

image-20221206054710081

Deeper Model

image-20221206054747992

Fully Connect Feedforward Network

This is a function. Input vector, output vector.

Given network structure, define a function set.

image-20230323112155743

Step2. Loss

image-20230323133054604

Loss: L=\frac{1}{N}\sum_ne_n

Regularization

If some noises corrupt input x_i when testing, a smoother function has less influence.

image-20230323132806193

It has the better performance when we don’t consider bias on the \lambda\sum(w_i)^2

image-20230323133558954

Step 3. Optimization

Optimization of New Model

image-20221206054825545

Backpropogation

image-20230323120718394

image-20230323120743453

Backpropagation - Backward Pass

image-20230323121417013

\frac{\partial{C}}{\partial{z}}=\sigma'(z)[w_3\frac{\partial{C}}{\partial{z'}}+w_4\frac{\partial{C}}{\partial{z''}}]

image-20230323121836443

Compute \frac{\partial{C}}{\partial{z}} recursively, until we reach the output layer.

Chain Rule

image-20230323120042742

Batch & Epoch

image-20221206054855172

Backpropagation

An efficient way to compute \frac{\partial{L}}{\partial{w}} in neural network.

5 Deep Learning

Why don’t we go deeper?

Overfitting: Better on training data, worse on unseen data.

image-20230323104612097

figure: Loss for mutiple hidden layers

5.1 History of Deep Learning

image-20230323110254713

# ML