BVSC 이야기: 부에나 비스타 소셜 클럽은 쿠바 뮤지션이다
Machine Learning -> ability to acquire their own knowledge
ability -> extracting patterns from raw data
logistic regression과 softmax function과의 관계
https://ratsgo.github.io/machine%20learning/2017/04/02/logistic/
class에 해당하는 확률을 맞추는 형태이므로 softmax형태가 되는 것이라 생각함
Data의 Representation이 중요한 것에서 나온 분야 -> representation learning
대표적으로 auto encoder가 있음 encoder (representation을 바꿔줌), decoder (to original representation)
original data의 feature를 놓치지 않으면서 새로운 feature를 많이 만들어 낼 수 있게
factors of variation -> factor = seperate sources of influence
Linear Regression의 한계(ex. XOR)을 해결하는 perceptron
Historical Trend
Most neural networks today are based on a model neuron called the rectified linear unit.(ReLU)
1. Kernal Machines 2. Bayesian statistics -> 찾아봐야 할 것
It is worth noting that the effort to understand how the brain works on
an algorithmic level is alive and well. This endeavor is primarily known as
“computational neuroscience” and is a separate field of study from deep learning.
It is common for researchers to move back and forth between both fields. The
field of deep learning is primarily concerned with how to build computer systems
that are able to successfully solve tasks requiring intelligence, while the field of
computational neuroscience is primarily concerned with building more accurate
models of how the brain actually works.
SVM 참고 https://bskyvision.com/163
Ch.2 Linear Algebra
The span of a set of vectors is the set of all points obtainable by linear combination
of the original vectors.
Sometimes we need to measure the size of a vector. In machine learning, we usually
measure the size of vectors using a function called a norm.
The squared L2 norm is more convenient to work with mathematically and
computationally than the L2 norm itself. For example, the derivatives of the
squared L2 norm with respect to each element of x each depend only on the
corresponding element of x, while all of the derivatives of the L2 norm depend
on the entire vector. In many contexts, the squared L2 norm may be undesirable
because it increases very slowly near the origin. In several machine learning
applications, it is important to discriminate between elements that are exactly
zero and elements that are small but nonzero.
In these cases, we turn to a function
that grows at the same rate in all locations, but retains mathematical simplicity:
the L1 norm.
The L1 norm is commonly used in machine learning when the difference between
zero and nonzero elements is very important.
We write diag(v) to denote a square
diagonal matrix whose diagonal entries are given by the entries of the vector v.
Diagonal matrices are of interest in part because multiplying by a diagonal matrix
is very computationally efficient.
Symmetric matrices often arise when the entries are generated by some function of
two arguments that does not depend on the order of the arguments. For example,
if A is a matrix of distance measurements, with Ai,j giving the distance from point
i to point j, then Ai,j = Aj,i because distance functions are symmetric.
A vector x and a vector y are orthogonal to each other if xTy = 0.
An orthogonal matrix is a square matrix whose rows are mutually orthonormal
and whose columns are mutually orthonormal:
AT A = A AT = I. A−1 = AT,
Eigendecomposition, SVD, PCA -> MIT Opencourseware Linear Algebra
Av = λv.
A = V diag(λ)V−1.
Specifically, every real symmetric
matrix can be decomposed into an expression using only real-valued eigenvectors
and eigenvalues:
A = QΛQT,
where Q is an orthogonal matrix composed of eigenvectors of A, and Λ is a
diagonal matrix.
The Moore-Penrose Pseudoinverse
When A has more columns than rows,
A+ = V D+ UT를 이용하여 the solution x = A+ y with minimal Euclidean norm ||x||2 among all possible
solutions.
When A has more rows than columns, it is possible for there to be no solution.
In this case, using the pseudoinverse gives us the x for which Ax is as close as
possible to y in terms of Euclidean norm ||Ax − y||2.
The Determinant
The determinant is equal to the product of all the
eigenvalues of the matrix. The absolute value of the determinant can be thought
of as a measure of how much multiplication by the matrix expands or contracts
space. If the determinant is 0, then space is contracted completely along at least
one dimension, causing it to lose all of its volume. If the determinant is 1, then
the transformation preserves volume.
PCA부분은 Linear Algebra듣고 다시 볼 것
Chapter 3
Probability and Information
Theory
we use probability
theory in two major ways. First, the laws of probability tell us how AI systems
should reason, so we design our algorithms to compute or approximate various
expressions derived using probability theory. Second, we can use probability and
statistics to theoretically analyze the behavior of proposed AI systems.
While probability theory allows us to make uncertain statements and reason in
the presence of uncertainty, information theory allows us to quantify the amount
of uncertainty in a probability distribution.
댓글 영역
획득법
① NFT 발행
작성한 게시물을 NFT로 발행하면 일주일 동안 사용할 수 있습니다. (최초 1회)
② NFT 구매
다른 이용자의 NFT를 구매하면 한 달 동안 사용할 수 있습니다. (구매 시마다 갱신)
사용법
디시콘에서지갑연결시 바로 사용 가능합니다.