Posted 2021-03-05Updated 2021-03-05book32 minutes read (About 4780 words)

제 1고지 미분 자동계산

1단계 상자로서의 변수

1.1 변수란

데이터를 저장하는 상자와 같다

1.2 Variable 클래스 구현

DeZero에서 사용하는 변수라는 개념을 Variable이라는 이름의 클래스로 구현
클래스 이름의 첫글자는 보통 대문자
파이썬이 권장하는 코딩 규칙은 PEP8
- PEP8 : 파이썬 개선 제안서, 파이썬 코드를 어떻게 구상할 지 알려주는 스타일 가이드

# steps/step01.py
class Variable:
    def __init__(self, data):
        self.data = data

__init__에 주어진 인수를 인스턴스 변수 data에 대입
Variable의 data에 보관

# steps/step01.py
import numpy as np

data = np.array(1.0)
x = Variable(data)
print(x.data)

1.0

이 예에서 상자에 넣는 데이터로 ‘넘파이 다차원 배열’을 사용
x는 Variable 인스턴스 이며, 실제 데이터는 x안에 담겨 있음
x는 데이터 자체가 아니라 데이터의 담는 상자 즉, 변수
머신러닝 시스템은 기본 데이터 구조로 ‘다차원 배열’을 사용
DeZero의 Variable 클래스는 넘파이의 다차원 배열만 취급
넘파이 배열은 np.array 함수로 생성 가능
numpy.ndarray 인스턴스를 ndarray 인스턴스로 부름

1
2
3

# steps/step01.py
x.data = np.array(2.0) # x에 새로운 데이터 대입
print(x.data)

2.0

1.3 넘파이 다차원 배열

다차원 배열은 숫자 등의 원소가 일정하게 모여 있는 데이터 구조
다차원 배열에서 원소의 순서에는 방향이 있고, 이 방향을 차원(dimension) 혹은 축(axis)이라고 함
0차원, 1차원, 2차원 배일이 있는데, 차례대로 스칼라(scalar), 벡터(vector), 행렬(matrix)이라고 함
- 스칼라는 단순히 하나의 수
- 벡터는 하나의 축을 따라 숫자가 늘어서 있음
- 행렬은 축이 두 개
다차원 배열을 0차원 텐서(tensor), 1차원 텐서, 2차원 텐서라고도 함

# ndim 은 'number of dimensions'의 약자로, 다차원 배열의 '차원 수'를 뜻함
import numpy as np
x = np.array(1)
x.ndim

1 2	x = np.array([1, 2, 3]) x.ndim

1
2
3

x = np.array([[1, 2, 3],
              [4, 5, 6]])
x.ndim

2단계 변수를 낳는 함수

2.1 함수란

어떤 변수로부터 다른 변수로의 대응 관계를 정한 것

2.2 Function 클래스 구현

Function 클래스는 Variable 인스턴스를 입력받아 Variable 인스턴스를 출력
Variable 인스턴스의 실제 데이터는 인스턴스 변수인 data에 있음

class Function:
    def __call__(self, input):
        x = input.data # 데이터를 꺼냄
        y = x ** 2 # 실제 계산
        output = Variable(y) # Variable 형태로 되돌림
        return output

__call__ 메서드는 파이썬의 특수 메서드
f = Function() 형태로 함수의 인스턴스를 변수 f에 대입해 둠
f(…)형태로 __call__ 메서드를 호출할 수 있음

2.3 Function 클래스 이용

x = Variable(np.array(10))
f = Function()
y = f(x)
print(type(y))
print(y.data)

<class '__main__.Variable'>
100

Function 클래스는 기반 클래스로서, 모든 함수에 공통되는 기능을 구현
구체적인 함수는 Function 클래스를 상속한 클래스에서 구현

# steps/step02.py
class Function:
    def __call__(self, input):
        x = input.data
        y = self.forward(x) # 구체적인 계산은 forward 메서드에서 한다.
        output = Variable(y)
        return output
    def forward(self, x):
        raise NotImplementedError()

NotImplementedError()는 ‘이 메서드는 상속하여 구현해야 한다’는 사실을 알려주는 예외처리

# steps/step02.py
# 입력값을 제곱하는 클래스 구현
class Square(Function):
    def forward(self, x):
        return x ** 2

x = Variable(np.array(10))
f = Square()
y = f(x)
print(type(y))
print(y.data)

<class '__main__.Variable'>
100

3단계 함수 연결

3.1 Exp 함수 구현

오일러의 수, 네이피어 상수 구현

# steps/step03.py
class Exp(Function):
    def forward(self, x):
        return np.exp(x)

3.2 함수 연결

# steps/step03.py
A = Square()
B = Exp()
C = Square()

x = Variable(np.array(0.5))
a = A(x)
b = B(a)
y = C(b)
print(y.data)

1.648721270700128

여러 함수로 구성된 함수를 ‘합성 함수’라고 함

4단계 수치 미분

4.1 미분이란

미분은 변화율
극한으로 짧은 시간(순간)에서의 변화량
도함수 : 함수 f(x)가 주어졌을 때 함수의 정의역에 속하는 각각의 x의 값에 미분계수가 하나씩 대응되는 함수

4.2 수치 미분 구현

컴퓨터는 극한을 취급할 수 없음
h = 0.0001(=1e-4)과 같은 매우 작은 값으로 대체
미세한 차이를 이용하여 함수의 변화량을 구하는 방법을 ‘수치 미분’이라 함
수치 미분은 작은 값을 사용하여 ‘진정한 미분’을 근사
어쩔수 없이 오차가 포함
근사 오차를 줄이는 방법으로 ‘중앙차분’을 씀
- 중앙차분은 f(x)와 f(x+h)의 차이를 구한는 대신에 f(x-h)와 f(x+h)의 차이를 구함
전진차분보다 중앙차분이 진정한 미분값에 가깝다는 사실은 테일러 급수를 이용해 증명가능

중앙차분을 이용하여 수치 비분을 계산하는 함수 numerical_diff(f, x, eps=1e-4)을 구현
f는 Function의 인스턴스, x는 미분을 계산하는 변수로 Variable 인스턴스, eps은 작은 값

# steps/step04.py
def numerical_diff(f, x, eps=1e-4):
    x0 = Variable(x.data - eps)
    x1 = Variable(x.data + eps)
    y0 = f(x0)
    y1 = f(x1)
    return (y1.data - y0.data) / (2 * eps)

# steps/step04.py
# Square 클래스를 대상으로 미분
f = Square()
x = Variable(np.array(2.0))
dy = numerical_diff(f, x)
print(dy)

4.000000000004

4.3 합성 함수의 미분

합성 함수를 미분해보자

# steps/step04.py
def f(x):
    A = Square()
    B = Exp()
    C = Square()
    return C(B(A(x)))


x = Variable(np.array(0.5))
dy = numerical_diff(f, x)
print(dy)

3.2974426293330694

4.4 수치 미분의 문제점

수치 미분의 결과에는 오차가 포함
대부분의 경우 오차는 매우 작지만 어떤 계산이냐에 따라 커질 수 있음
수치 미분의 결과에 오차가 포함되기 쉬운 이유는 주로 ‘자릿수 누락’ 때문
계산량이 많다는 점도 심각한 문제
그래서 등장한 것이 ‘역전파’
역전파는 복잡한 알고리즘이라서 구현하면서 버그가 섞여 들어가기 쉬움
역전파를 정확하게 구현했는지 확인하기 위해 수치 미분의 결과를 이용하는 방식을 ‘기울기 확인’이라함
기울기 확인 : 단순히 수치 미분 결과와 역전파의 결과를 비교

5단계 역전파 이론

역전파을 이용하면 미분을 효율적으로 계산할 수 있고 결과값의 오차도 적음

5.1 연쇄 법칙

역전파를 이해하는 열쇠는 ‘연쇄 법칙(chain rule)’
연쇄 법칙에 따르면 합성 함수의 미분은 구성 함수 각각을 미분한 후 곱한 것과 같음

5.2 역전파 원리 도출

머신러닝은 주로 대량의 매개변수를 입력받아서 마지막에 ‘손실 함수(loss function)’를 거쳐 출력을 내는 형태
손실 함수의 출력은 단일한 스칼라값이며, 이 값이 ‘중요 요소’
머신러닝은 주로 대량의 매개변수를 입력받아서 마지막에 ‘손실 함수(loss function)’를 거쳐 출력을 내는 형태
미분값을 출력에서 입력 방향으로 전파하면 한 번의 전파만으로 모든 매개변수에 대한 미분을 계산할 수 있음

5.3 계산 그래프로 살펴보기

변수는 ‘통상값’과 ‘미분값’이 존재
함수는 ‘통상 계산(순전파)’과 ‘미분값을 구하기 위한 계산(역전파)’이 존재
역전파 시에는 순전파시에 이용한 데이터가 필요, 따라서 역전파를 구현하려면 먼저 순전파를 하고, 이때 각 함수가 입력 변수의 값을 기억해둬야함

그림 5-5

6단계 수동 역전파

역전파의 구동 원리를 설명
Variable과 Function 클래스를 확장하여 역전파를 이용한 미분 구현

6.1 Variable 클래스 추가 구현

통산값(data)과 더불어 그에 대응하는 미분값(grad)도 저장하도록 확장

# steps/step06.py
class Variable:
    def __init__(self, data):
        self.data = data
        self.grad = None

6.2 Function 클래스 추가 구현

미분을 계산하는 역전파(backward 메서드)
forward 메서드 호출 시 건네받은 Variable 인스턴스 유지

# steps/step06.py
class Function:
    def __call__(self, input):
        x = input.data
        y = self.forward(x)
        output = Variable(y)
        self.input = input # 입력 변수를 기억(보관)함
        return output

    def forward(self, x):
        raise NotImplementedError()

    def backward(self, gy):
        raise NotImplementedError()

__call__ 메서드에서 입력된 input을 인스턴스 변수인 self.input에 저장
backward 메서드에서 함수에 입력한 변수가 필요할 때 self.input에서 가져와 사용

6.3 Square 와 Exp 클래스 추가 구현

# steps/step06.py
class Square(Function):
    def forward(self, x):
        y = x ** 2
        return y

    def backward(self, gy):
        x = self.input.data
        gx = 2 * x * gy
        return gx

# steps/step06.py
class Exp(Function):
    def forward(self, x):
        y = np.exp(x)
        return y

    def backward(self, gy):
        x = self.input.data
        gx = np.exp(x) * gy
        return gx

순전파 코드

# steps/step06.py
A = Square()
B = Exp()
C = Square()

x = Variable(np.array(0.5))
a = A(x)
b = B(a)
y = C(b)

역전파 코드

# steps/step06.py
y.grad = np.array(1.0)
b.grad = C.backward(y.grad)
a.grad = B.backward(b.grad)
x.grad = A.backward(a.grad)
print(x.grad)

3.297442541400256

7단계 역전파 자동화

순전파를 한 번만 해주면 어떤 계산이라도 상관없이 역전파가 자동으로 이루어지는 구조 만들기
Define-by-Run이란 딥러닝에서 수행하는 계산들을 계산 시점에 ‘연결’하는 방식으로, ‘동적 계산 그래프’라고 함

7.1 역전파 자동화의 시작

역전파 자동화로 가는 길은 변수와 함수의 ‘관계’를 이해하는 데서 출발
함수 관점에서 변수는 ‘입력’과 ‘출력’에 쓰임
변수 과점에서 함수는 ‘창조자’ 혹은 ‘부모’
일반적인 순전파가 이루어지는 시점에 ‘관계’를 맺어줌

# steps/step07.py
class Variable:
    def __init__(self, data):
        self.data = data
        self.grad = None
        self.creator = None # 인스턴스 변수 추가

    def set_creator(self, func): # creator 설정
        self.creator = func

creator라는 인스턴스 변수 추가, creator 설정을 위한 set_creator 메서드 추가

# steps/step07.py
class Function:
    def __call__(self, input):
        x = input.data
        y = self.forward(x)
        output = Variable(y)
        output.set_creator(self)  # Set parent(function)
        self.input = input
        self.output = output  # Set output
        return output

순전파를 계산하면 그 결과로 output이라는 Variable 인스턴스가 생성
oupput이 creator를 기억

A = Square()
B = Exp()
C = Square()

x = Variable(np.array(0.5))
a = A(x)
b = B(a)
y = C(b)

assert y.creator == C
assert y.creator.input == b
assert y.creator.input.creator == B
assert y.creator.input.creator.input == a
assert y.creator.input.creator.input.creator == A
assert y.creator.input.creator.input.creator.input == x

assert문은 조건을 충족하는지 여부를 확인하는 데 사용

7.2 역전파 도전!

1. 함수를 가져온다.
1. 함수의 입력을 가져온다.
1. 함수의 backward 메서드를 호출한다.

y.grad = np.array(1.0)

C = y.creator # 1. 함수를 가져온다.
b = C.input # 2. 함수의 입력을 가져온다.
b.grad = C.backward(y.grad) # 3. 함수의 backward 메서드를 호출한다.

1
2
3

B = b.creator # 1. 함수를 가져온다.
a = B.input # 2. 함수의 입력을 가져온다.
a.grad = B.backward(b.grad) # 3. 함수의 backward 메서드를 호출한다.

A = a.creator # 1. 함수를 가져온다.
x = A.input # 2. 함수의 입력을 가져온다.
x.grad = A.backward(a.grad) # 3. 함수의 backward 메서드를 호출한다.
print(x.grad)

3.297442541400256

7.3 backward 메서드 추가

위 반복작업을 자동화할 수 있도록 Variable 클래스에 backward 메서드 추가

# steps/step07.py
class Variable:
    def __init__(self, data):
        self.data = data
        self.grad = None
        self.creator = None

    def set_creator(self, func):
        self.creator = func

    def backward(self):
        f = self.creator  # 1. Get a function
        if f is not None:
            x = f.input  # 2. Get the function's input
            x.grad = f.backward(self.grad)  # 3. Call the function's backward
            x.backward()

backward 메서드가 재귀적으로 호출면서 자동화

A = Square()
B = Exp()
C = Square()

x = Variable(np.array(0.5))
a = A(x)
b = B(a)
y = C(b)

# backward
y.grad = np.array(1.0)
y.backward()
print(x.grad)

3.297442541400256

8단계 재귀에서 반복문으로

처리 효율을 개선하고 확장을 대비해 backward 메서드의 구현 방식을 변경

8.2 반복문을 이용한 구현

# steps/step08.py
class Variable:
    def __init__(self, data):
        self.data = data
        self.grad = None
        self.creator = None

    def set_creator(self, func):
        self.creator = func

    def backward(self):
        funcs = [self.creator]
        while funcs:
            f = funcs.pop()  # 1. Get a function
            x, y = f.input, f.output  # 2. Get the function's input/output
            x.grad = f.backward(y.grad)  # 3. Call the function's backward

            if x.creator is not None:
                funcs.append(x.creator)

처리해야 할 함수들을 funcs라는 리스트에 차례로 집어넣음
while 블록 안에서 funcs.pop()을 호출하여 처리할 함수 f를 꺼냄
f의 backward 메서드를 호출
f.input과 f.output에서 함수 f의 입력과 출력 변수를 얻음
f.backward()의 인수와 반환값을 올바르게 설정

# steps/step08.py
A = Square()
B = Exp()
C = Square()

x = Variable(np.array(0.5))
a = A(x)
b = B(a)
y = C(b)

# backward
y.grad = np.array(1.0)
y.backward()
print(x.grad)

3.297442541400256

재귀는 함수를 재귀적으로 호출할 때마다 중간 결과를 메모리에 유지하면서 처리
일반적으로 반복문 방식의 효율이 더 좋음

9단계 함수를 더 편리하게

9.1 파이썬 함수로 이용하기

1 2	def square(x): return Square()(x)

1 2	def exp(x): return Exp()(x)

x = Variable(np.array(0.5))
a = square(x)
b = exp(a)
y = square(b)

y.grad = np.array(1.0)
y.backward()
print(x.grad)

3.297442541400256

# 함수를 연속하여 적용
x = Variable(np.array(0.5))
y = square(exp(square(x)))
y.grad = np.array(1.0)
y.backward()
print(x.grad)

3.297442541400256

9.2 backward 메서드 간소화

# y.grad = np.array(1.0) 생략 위해 Variable에 backward 수정
class Variable:
    def __init__(self, data):
        self.data = data
        self.grad = None
        self.creator = None

    def set_creator(self, func):
        self.creator = func

    def backward(self):
        if self.grad is None: # grad가 None이면 미분값 생성
            self.grad = np.ones_like(self.data)

        funcs = [self.creator]
        while funcs:
            f = funcs.pop()
            x, y = f.input, f.output
            x.grad = f.backward(y.grad)

            if x.creator is not None:
                funcs.append(x.creator)

x = Variable(np.array(0.5))
y = square(exp(square(x)))
y.backward()
print(x.grad)

3.297442541400256

9.3 ndarray 만 취급하기

Variable에 ndarray인스턴스 외의 데이터를 넣을 경우 즉시 오류를 일으킴

# steps/step09.py
class Variable:
    def __init__(self, data):
        if data is not None:
            if not isinstance(data, np.ndarray):
                raise TypeError('{} is not supported'.format(type(data)))

        self.data = data
        self.grad = None
        self.creator = None

    def set_creator(self, func):
        self.creator = func

    def backward(self):
        if self.grad is None: # grad가 None이면 미분값 생성
            self.grad = np.ones_like(self.data)

        funcs = [self.creator]
        while funcs:
            f = funcs.pop()
            x, y = f.input, f.output
            x.grad = f.backward(y.grad)

            if x.creator is not None:
                funcs.append(x.creator)

data가 None이 아니고 ndarray 인스턴스도 아니면 TypeError 예외 발생

# steps/step09.py
x = Variable(np.array(1.0))  # OK
x = Variable(None)  # OK
x = Variable(1.0)  # NG

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-53-33bce631aee3> in <module>
      2 x = Variable(np.array(1.0))  # OK
      3 x = Variable(None)  # OK
----> 4 x = Variable(1.0)  # NG


<ipython-input-50-830a6874675c> in __init__(self, data)
      3         if data is not None:
      4             if not isinstance(data, np.ndarray):
----> 5                 raise TypeError('{} is not supported'.format(type(data)))
      6 
      7         self.data = data


TypeError: <class 'float'> is not supported

ndarray나 None이면 아무 문제 없지만, 다른 데이터 타입을 입력하면 예외 발생
잘못된 데이터 타입을 사용했음을 즉시 알 수 있음

x = np.array([1.0])
y = x ** 2
print(type(x), x.ndim)
print(type(y))

<class 'numpy.ndarray'> 1
<class 'numpy.ndarray'>

x는 1차원 ndarray
y의 데이터 타입도 ndarray

x = np.array(1.0) # 0차원 ndarray
y = x ** 2
print(type(x), x.ndim)
print(type(y))

<class 'numpy.ndarray'> 0
<class 'numpy.float64'>

x는 0차원의 ndarray인데, 제곱(x**2)을 하면 np.float64가 되어버림
Variable은 데이터가 항상 ndarray 인스턴스라고 가정하기 때문에 대처를 해줘야 함

# steps/step09.py
def as_array(x):
    if np.isscalar(x):
        return np.array(x)
    return x

np.isscalar는 입력 데이터가 numpy.float64 같은 스칼라 타입인지 확인하는 함수

1 2	import numpy as np np.isscalar(np.float64(1.0))

True

1	np.isscalar(2.0)

True

1	np.isscalar(np.array(1.0))

False

1	np.isscalar(np.array([1, 2, 3]))

False

이처럼 x가 스칼라 타입인지 쉽게 확인 가능
as_array함수는 입력이 스칼라인 경우 ndarray 인스턴스로 변환
as_array라는 편의 함수가 준비되었으니 Function 클래스에 코드 추가

# steps/step09.py
class Function:
    def __call__(self, input):
        x = input.data
        y = self.forward(x)
        output = Variable(as_array(y)) # 편의 함수 추가
        output.set_creator(self)
        self.input = input
        self.output = output
        return output

    def forward(self, x):
        raise NotImplementedError()

    def backward(self, gy):
        raise NotImplementedError()

순전파의 결과인 y를 Variable로 감쌀 때 as_array()를 이용
출력 결과인 output은 항상 ndarray 인스턴스가 되도록 보장
이제 0차원 ndarray 인스턴스를 사용한 계산도 모든 데이터가 ndarray 인스턴스

10단계 테스트

간단한 테스트를 해보자

10.1 파이썬 단위 테스트

파이썬으로 테스트할 때 표준 라이브러리에 포함된 nuittest를 사용하면 편함

# 이전 단계에서 구현한 square 함수 테스트
# steps/step10.py
import unittest

class SquareTest(unittest.TestCase):
    def test_forward(self):
        x = Variable(np.array(2.0))
        y = square(x)
        expected = np.array(4.0)
        self.assertEqual(y.data, expected)

unittest를 임포트하고 nuittest.TestCase를 상속한 SquareTest 클래스를 구현
square 함수의 출력이 기댓값과 같은지 확인
self.assertEqual 메서드는 주어진 두 객체가 동일한지 여부를 판정

1	$ python -m unittest steps/step10.py

python 명령을 실행할 때 -m unittest 인수를 제공하면 파이썬 파일을 테스트 모드로 실행
파일 끝에 다음 코드를 추가하면 ‘python steps/step10.py’만 입력해도 테스트 수행
1
unittest.main()
10.2 square 함수의 역전파 테스트

# steps/step10.py
class SquareTest(unittest.TestCase):
    def test_forward(self):
        x = Variable(np.array(2.0))
        y = square(x)
        expected = np.array(4.0)
        self.assertEqual(y.data, expected)

    def test_backward(self):
        x = Variable(np.array(3.0))
        y = square(x)
        y.backward()
        expected = np.array(6.0)
        self.assertEqual(x.grad, expected)

test_backward 메서드 추가
y.backward()로 미분값을 구하고, 그 값이 기댓값과 일치하는지 확인

10.3 기울기 확인을 이용한 자동 테스트

기울기 확인이란 수치 미분으로 구한 결과와 역전파로 구한 결과를 비교, 그 차이가 크면 역전파 구현에 문제가 있다고 판단하는 검증 기법

# steps/step10.py
def numerical_diff(f, x, eps=1e-4):
    x0 = Variable(x.data - eps)
    x1 = Variable(x.data + eps)
    y0 = f(x0)
    y1 = f(x1)
    return (y1.data - y0.data) / (2 * eps)


class SquareTest(unittest.TestCase):
    def test_forward(self):
        x = Variable(np.array(2.0))
        y = square(x)
        expected = np.array(4.0)
        self.assertEqual(y.data, expected)

    def test_backward(self):
        x = Variable(np.array(3.0))
        y = square(x)
        y.backward()
        expected = np.array(6.0)
        self.assertEqual(x.grad, expected)

    def test_gradient_check(self):
        x = Variable(np.random.rand(1))
        y = square(x)
        y.backward()
        num_grad = numerical_diff(square, x)
        flg = np.allclose(x.grad, num_grad)
        self.assertTrue(flg)

기울기 확인을 할 test_gradient_check 메서드 안에서 무작위 입력값을 하나 생성
역전파로 미분값을 구하고, numerical_diff 함수를 사용해 수치 미분으로도 계산
두 메서드로 각각 구한 값들이 거의 일치하는지 확인
np.allclose(a, b)는 ndarray 인스턴스인 a와 b의 값이 가까운지 판정
얼마나 가까워야 가깐운 것인지는 np.allclose(a, b, rtol=1e-05, atol=1e-08)과 같이 인수 rtol과 atol로 지정 가능
a와 b의 모든 요소가 다음 조건을 만족하면 True 반환
1
|a - b| <= (atol + rtol * |b|)

Posted 2021-03-05Updated 2021-03-05a minute read (About 123 words)

Hello World

Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in troubleshooting or you can ask me on GitHub.

Quick Start

Create a new post

1	$ hexo new "My New Post"

More info: Writing

Run server

1	$ hexo server

More info: Server

Generate static files

1	$ hexo generate

More info: Generating

Deploy to remote sites

1	$ hexo deploy

More info: Deployment

Posted 2021-03-05Updated 2021-03-054 minutes read (About 665 words)

Pandas Pivot

데이터 프레임의 컬럼 데이터에서 index, column, value를 선택해서 데이터 프레임을 만드는 방법
df.pivot(index, columns, values)
- groupby 하고 pivot을 실행
df.pivot_table(values, index, columns, aggfunc)

pandas io

데이터 프레임을 저장, 로드

1
2
3

# lod
titanic = pd.read_csv("datas/train.csv")
titanic.tail(2)

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
889	890	1	1	Behr, Mr. Karl Howell	male	26.0	0	0	111369	30.00	C148	C
890	891	0	3	Dooley, Mr. Patrick	male	32.0	0	0	370376	7.75	NaN	Q

1 2	# save titanic.to_csv("datas/titanic.tsv", sep="\t", index=False)

1
2
3

# load : encoding
df = pd.read_csv("datas/2014_p.csv", encoding="euc_kr")
df.tail(2)

	ID	RCTRCK	RACE_DE	RACE_NO	PARTCPT_NO	RANK	RCHOSE_NM	HRSMN	RCORD	ARVL_DFFRNC	EACH_SCTN_PASAGE_RANK	A_WIN_SYTM_EXPECT_ALOT	WIN_STA_EXPECT_ALOT
27216	27217	제주	2014-11-29	9	7	6.0	미주여행	김경휴	0:01:31.1	13	2 - - - 2 - 3 - 6	6.2	9.4
27217	27218	제주	2014-11-29	9	6	1.0	철옹성	장우성	0:01:26.6	NaN	1 - - - 1 - 1 - 1	3.9	2.9

kaggle

데이터 분석, 모델을 경쟁할 수 있도록 만든 서비스
https://www.kaggle.com/

1. 성별, 좌석등급에 따른 데이터의 수

1 2	df1 = titanic.groupby(["Sex", "Pclass"]).size().reset_index(name="counts") df1

	Sex	Pclass	counts
0	female	1	94
1	female	2	76
2	female	3	144
3	male	1	122
4	male	2	108
5	male	3	347

1
2
3

# pivot
result = df1.pivot("Sex", "Pclass", "counts")
result

Pclass	1	2	3
Sex
female	94	76	144
male	122	108	347

1
2
3

# pivot table 이용
titanic["counts"] = 1
titanic.tail(2)

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked	counts
889	890	1	1	Behr, Mr. Karl Howell	male	26.0	0	0	111369	30.00	C148	C	1
890	891	0	3	Dooley, Mr. Patrick	male	32.0	0	0	370376	7.75	NaN	Q	1

1 2	result = titanic.pivot_table("counts", ["Pclass"], ["Survived"], aggfunc=np.sum) result

Survived	0	1
Pclass
1	80	136
2	97	87
3	372	119

1 2	result["total"] = result[0]+result[1] result

Survived	0	1	total
Pclass
1	80	136	216
2	97	87	184
3	372	119	491

1 2	result.loc["total"] = result.loc[1] + result.loc[2] + result.loc[3] result

Survived	0	1	total
Pclass
1	80	136	216
2	97	87	184
3	372	119	491
total	549	342	891

1 2	df1 = pd.read_csv("datas/2014_p.csv", encoding="euc-kr") df1.tail(2)

	ID	RCTRCK	RACE_DE	RACE_NO	PARTCPT_NO	RANK	RCHOSE_NM	HRSMN	RCORD	ARVL_DFFRNC	EACH_SCTN_PASAGE_RANK	A_WIN_SYTM_EXPECT_ALOT	WIN_STA_EXPECT_ALOT
27216	27217	제주	2014-11-29	9	7	6.0	미주여행	김경휴	0:01:31.1	13	2 - - - 2 - 3 - 6	6.2	9.4
27217	27218	제주	2014-11-29	9	6	1.0	철옹성	장우성	0:01:26.6	NaN	1 - - - 1 - 1 - 1	3.9	2.9

1 2	df2 = pd.read_csv("datas/2014_s.csv", encoding="euc-kr") df2.tail(2)

	ID	RCTRCK	RACE_DE	PRDCTN_NATION_NM	SEX	AGE	BND_WT	TRNER	RCHOSE_OWNR_NM	RCHOSE_BDWGH
27216	27217	제주	2014-11-29	한	거	NaN	53.0	강대은	김기준	281
27217	27218	제주	2014-11-29	한	거	NaN	57.5	박병진	강상우	314

Posted 2021-03-05Updated 2021-03-057 minutes read (About 1105 words)

summary

pandas
- 데이터 분석 : 데이터 전처리 파트
- 테이블 형태의 데이터를 처리할때 사용하는 python 라이브러리
- Series, DataFrame
- Series
  - 생성, 선택, 수정 방법
- DataFrame
  - 생성 방법 1 : 딕셔너리의 리스트 : 리스트 -> 컬럼 데이터
  - 생성 방법 2 : 리스트의 딕셔너리 : 딕셔너리 -> 로우 데이터
  - row 선택 : df.loc[idx]
  - column 선택 : df[column name]
  - row, column 선택 : df.loc[idx, column]
  - 함수
    - apply, append, concat
    - groupby, merge

1	import makedata

1	makedata.get_age(), makedata.get_name()

(21, 'Billy')

1	makedata.make_data()

[{'Age': 32, 'Name': 'Alvin'},
 {'Age': 26, 'Name': 'Alan'},
 {'Age': 25, 'Name': 'Anthony'},
 {'Age': 40, 'Name': 'Anthony'},
 {'Age': 35, 'Name': 'Billy'},
 {'Age': 39, 'Name': 'Anthony'},
 {'Age': 30, 'Name': 'Andrew'},
 {'Age': 24, 'Name': 'Andrew'},
 {'Age': 31, 'Name': 'Anthony'},
 {'Age': 40, 'Name': 'Andrew'}]

quiz

makedata 모듈을 이용해서 데이터 프레임 만들기
user_df
- 8명의 데이터가 있는 데이터 프레임을 만드세요.
- UserID : 1 ~ 8
- Name : makedata.get_name()
- Age : makedata.get_age()
- 중복되는 Name 값이 없도록

# 딕셔너리의 리스트 : UserID, Name, Age
datas = {}
datas["UserID"] = list(range(1, 9))
datas["Age"] = [makedata.get_age() for _ in range(8)]
names = []
while True:
    name = makedata.get_name()
    if name not in names:
        names.append(name)
    if len(names) >= 8:
        break
datas["Name"] = names

user_df = pd.DataFrame(datas)
user_df

	UserID	Age	Name
0	1	22	Anchal
1	2	35	Andrew
2	3	29	Anthony
3	4	21	Billy
4	5	35	Arnold
5	6	32	Alan
6	7	34	Alvin
7	8	22	Adam

# 딕셔너리 데이터를 데이터 프레임에 하나씩 추가하기 : UserID, Name, Age
user_df = pd.DataFrame(columns=["UserID", "Name", "Age"])

for idx in range(1, 9):

    name = makedata.get_name()
    while name in list(user_df["Name"]):
        name = makedata.get_name()

    data = {"Name": name, "UserID": idx, "Age": makedata.get_age()}

    user_df.loc[len(user_df)] = data

user_df

	UserID	Name	Age
0	1	Billy	23
1	2	Adam	26
2	3	Anchal	23
3	4	Alan	39
4	5	Alvin	20
5	6	Andrew	32
6	7	Arnold	22
7	8	Alex	30

quiz

money_df 만들기
- 15개의 데이터
- ID : 1 ~ 8 랜덤한 숫자 데이터
- Money : 1000원 단위로 1000원 ~ 20000원까지의 숫자가 저장

# 딕셔너리 데이터를 데이터 프레임에 하나씩 추가하기
money_df = pd.DataFrame(columns=["ID", "Money"])
# np.random.randint(1, 9)
for _ in range(15):
    money_df.loc[len(money_df)] = {
        "ID": np.random.randint(1, 9),
        "Money": np.random.randint(1, 21) * 1000,
    }
    
# 컬럼데이터에서 Unique 값 확인
ids = money_df["ID"].unique()
ids.sort()
len(ids), ids

(6, array([1, 2, 3, 5, 6, 7], dtype=object))

money_df

	ID	Money
0	5	20000
1	6	5000
2	2	9000
3	7	4000
4	3	13000
5	2	14000
6	1	3000
7	1	16000
8	2	6000
9	6	13000
10	7	9000
11	1	16000
12	1	10000
13	2	15000
14	7	14000

user_df

	UserID	Name	Age
0	1	Billy	23
1	2	Adam	26
2	3	Anchal	23
3	4	Alan	39
4	5	Alvin	20
5	6	Andrew	32
6	7	Arnold	22
7	8	Alex	30

1. merge

1	user_df.merge(money_df, left_on="UserID", right_on="ID")

	UserID	Name	Age	ID	Money
0	1	Billy	23	1	3000
1	1	Billy	23	1	16000
2	1	Billy	23	1	16000
3	1	Billy	23	1	10000
4	2	Adam	26	2	9000
5	2	Adam	26	2	14000
6	2	Adam	26	2	6000
7	2	Adam	26	2	15000
8	3	Anchal	23	3	13000
9	5	Alvin	20	5	20000
10	6	Andrew	32	6	5000
11	6	Andrew	32	6	13000
12	7	Arnold	22	7	4000
13	7	Arnold	22	7	9000
14	7	Arnold	22	7	14000

1
2
3

# 컬럼명 변경
user_df.rename(columns={"UserID":"ID"}, inplace=True)
user_df.tail(1)

	ID	Name	Age
7	8	Alex	30

1	user_df.merge(money_df).tail(2)

	ID	Name	Age	Money
13	7	Arnold	22	9000
14	7	Arnold	22	14000

1 2	result_df = pd.merge(money_df, user_df) result_df.tail()

	ID	Money	Name	Age
10	3	13000	Anchal	23
11	1	3000	Billy	23
12	1	16000	Billy	23
13	1	16000	Billy	23
14	1	10000	Billy	23

1
2
3

# groupby : sum, size, min .. 함수 : Series
money_list = result_df.groupby("Name").sum()["Money"].reset_index()
money_list

	Name	Money
0	Adam	44000
1	Alvin	20000
2	Anchal	13000
3	Andrew	18000
4	Arnold	27000
5	Billy	45000

1
2
3

# groupby : agg("sum"), agg("mean") .. : DataFrame
money_list = result_df.groupby("Name").agg("sum").reset_index()[["Name", "Money"]]
money_list

	Name	Money
0	Adam	44000
1	Alvin	20000
2	Anchal	13000
3	Andrew	18000
4	Arnold	27000
5	Billy	45000

1	# merge : money_list, user_df : outer

1 2	result = pd.merge(user_df, money_list, how="outer") result

	ID	Name	Age	Money
0	1	Billy	23	45000.0
1	2	Adam	26	44000.0
2	3	Anchal	23	13000.0
3	4	Alan	39	NaN
4	5	Alvin	20	20000.0
5	6	Andrew	32	18000.0
6	7	Arnold	22	27000.0
7	8	Alex	30	NaN

1	# fillna : NaN 을 특정 데이터로 채워줌

1 2	result.fillna(value=0, inplace=True) result

	ID	Name	Age	Money
0	1	Billy	23	45000.0
1	2	Adam	26	44000.0
2	3	Anchal	23	13000.0
3	4	Alan	39	0.0
4	5	Alvin	20	20000.0
5	6	Andrew	32	18000.0
6	7	Arnold	22	27000.0
7	8	Alex	30	0.0

1 2	# money 컬럼을 정수로 데이터 타입을 변경 result.dtypes

ID         int64
Name      object
Age        int64
Money    float64
dtype: object

1 2	result["Money"] = result["Money"].astype("int") result

	ID	Name	Age	Money
0	1	Billy	23	45000
1	2	Adam	26	44000
2	3	Anchal	23	13000
3	4	Alan	39	0
4	5	Alvin	20	20000
5	6	Andrew	32	18000
6	7	Arnold	22	27000
7	8	Alex	30	0

1	result.sort_values("Money", ascending=False)

	ID	Name	Age	Money
0	1	Billy	23	45000
1	2	Adam	26	44000
6	7	Arnold	22	27000
4	5	Alvin	20	20000
5	6	Andrew	32	18000
2	3	Anchal	23	13000
3	4	Alan	39	0
7	8	Alex	30	0

1	np.average(result.sort_values("Money", ascending=False)[:3]["Age"])

23.666666666666668

Posted 2021-03-05Updated 2021-03-0512 minutes read (About 1860 words)

Pandas

데이터 분석을 위한 사용이 쉽고 성능이 좋은 오픈소스 python 라이브러리
R과 Pandas의 특징
- R보다 Pandas가 학습이 쉽습니다.
- R보다 Pandas가 성능이 좋습니다.
- R보다 Python은 활용할 수 있는 분야가 많습니다.
크게 두가지 데이터 타입을 사용합니다.
- Serise : index와 value로 이루어진 데이터 타입
- DataFrame : index, column, value로 이루어진 데이터 타입

1 2	import numpy as np import pandas as pd

1. Series

동일한 데이터 타입의 값을 갖습니다.

1
2
3

# Series : value 만 설정하면 index는 0부처 자동으로 설정됩니다.
data = pd.Series(np.random.randint(10, size=5))
data

0    3
1    4
2    5
3    0
4    3
dtype: int32

1
2
3

# index 설정
data = pd.Series(np.random.randint(10, size=5), index=list('ABCDE'))
data

A    0
B    7
C    1
D    8
E    2
dtype: int32

1	data.index, data.values

(Index(['A', 'B', 'C', 'D', 'E'], dtype='object'), array([0, 7, 1, 8, 2]))

1	data["B"], data.B

(7, 7)

1 2	data["C"] = 10 data

A     0
B     7
C    10
D     8
E     2
dtype: int32

1 2	# 브로드 캐스팅 data * 10

A      0
B     70
C    100
D     80
E     20
dtype: int32

1	data[["B","C"]]

B     7
C    10
dtype: int32

1 2	# offset index data[2::2]

C    10
E     2
dtype: int32

1	data[::-1]

E     2
D     8
C    10
B     7
A     0
dtype: int32

1	# Series 연산

data

A     0
B     7
C    10
D     8
E     2
dtype: int32

1 2	data2 = pd.Series({"D":3, "E":5, "F":7}) data2

D    3
E    5
F    7
dtype: int64

1 2	result = data + data2 result # None

A     NaN
B     NaN
C     NaN
D    11.0
E     7.0
F     NaN
dtype: float64

1	result.isnull()

A     True
B     True
C     True
D    False
E    False
F     True
dtype: bool

1 2	result[result.isnull()] = data result

A     0.0
B     7.0
C    10.0
D    11.0
E     7.0
F     NaN
dtype: float64

1 2	result[result.isnull()] = data2 result

A     0.0
B     7.0
C    10.0
D    11.0
E     7.0
F     7.0
dtype: float64

2. DataFrame

데이터 프레임은 여러개의 Series로 구성
같은 컬럼에 있는 value값은 같은 데이터 타입을 갖습니다.

1	# 데이터 프레임 생성 1 : 딕셔너리의 리스트

datas = {
    "name":["dss", "fcamp"],
    "Email":["dss@gmail.com","fcamp@daum.net"]
}
datas

{'name': ['dss', 'fcamp'], 'Email': ['dss@gmail.com', 'fcamp@daum.net']}

1 2	df = pd.DataFrame(datas) df

	name	Email
0	dss	dss@gmail.com
1	fcamp	fcamp@daum.net

1	# 데이터 프레임 생성 2 : 리스트의 딕셔너리

datas = [
    {"name":"dss", "email":"dss@gmail.com"},
    {"name":"fcamp", "email":"fcamp@daum.net"},
]
datas

[{'name': 'dss', 'email': 'dss@gmail.com'},
 {'name': 'fcamp', 'email': 'fcamp@daum.net'}]

1 2	df = pd.DataFrame(datas) df

	name	email
0	dss	dss@gmail.com
1	fcamp	fcamp@daum.net

1
2
3

# 인덱스를 추가하는 방법
df = pd.DataFrame(datas, index = ["one", "two"])
df

	name	email
one	dss	dss@gmail.com
two	fcamp	fcamp@daum.net

df.index

Index(['one', 'two'], dtype='object')

1	df.columns

Index(['name', 'email'], dtype='object')

df.values

array([['dss', 'dss@gmail.com'],
       ['fcamp', 'fcamp@daum.net']], dtype=object)

1	# 데이터 프레임에서 데이터의 선택 : row, column, (row, column)

1
2
3

# row 선택 : loc
df = pd.DataFrame(datas)
df

	name	email
0	dss	dss@gmail.com
1	fcamp	fcamp@daum.net

1	df.loc[1]["email"]

'fcamp@daum.net'

1
2
3

# index가 있으면 수정, 없으면 추가
df.loc[2] = {"name":"andy", "email":"andy@naver.com"}
df

	name	email
0	dss	dss@gmail.com
1	fcamp	fcamp@daum.net
2	andy	andy@naver.com

1	# column 선택

1	df["name"]

0      dss
1    fcamp
2     andy
Name: name, dtype: object

1 2	df["id"] = "" df

	name	email
0	dss	dss@gmail.com
1	fcamp	fcamp@daum.net
2	andy	andy@naver.com

1 2	df["id"] = range(1, 4) # np.arange(1, 4) df

	name	email	id
0	dss	dss@gmail.com	1
1	fcamp	fcamp@daum.net	2
2	andy	andy@naver.com	3

df.dtypes

name     object
email    object
id        int32
dtype: object

1	# row, column 선택

1	df.loc[[0,2], ["email", "id"]]

	email	id
0	dss@gmail.com	1
2	andy@naver.com	3

1	# 컬럼 데이터 순서 설정

1	df[["id", "name", "email"]]

	id	name	email
0	1	dss	dss@gmail.com
1	2	fcamp	fcamp@daum.net
2	3	andy	andy@naver.com

1	# head, tail

1	df.head(2)

	name	email	id
0	dss	dss@gmail.com	1
1	fcamp	fcamp@daum.net	2

1	df.tail(2)

	name	email	id
1	fcamp	fcamp@daum.net	2
2	andy	andy@naver.com	3

3. apply 함수

map 함수와 비슷

1 2	# email 컬럽에서 메일의 도메인만 가져와서 새로운 domain컬럼을 생성 df

	name	email	id
0	dss	dss@gmail.com	1
1	fcamp	fcamp@daum.net	2
2	andy	andy@naver.com	3

def domain(email):
    return email.split("@")[1].split(".")[0]

domain(df.loc[0]["email"])

'gmail'

1	df["domain"] = df["email"].apply(domain)

df

	name	email	id	domain
0	dss	dss@gmail.com	1	gmail
1	fcamp	fcamp@daum.net	2	daum
2	andy	andy@naver.com	3	naver

1 2	df["domain"] = df["email"].apply(lambda email:email.split("@")[1].split(".")[0]) df

	name	email	id	domain
0	dss	dss@gmail.com	1	gmail
1	fcamp	fcamp@daum.net	2	daum
2	andy	andy@naver.com	3	naver

1	from makedata import *

1	get_name()

'Alvin'

get_age()

1
2
3

df1= pd.DataFrame(make_data(5))
df2= pd.DataFrame(make_data(5))
df2

	Age	Name
0	40	Billy
1	32	Anchal
2	35	Alvin
3	22	Andrew
4	27	Andrew

4. append

1
2
3

# append 데이터 프레임 합치기
df3 = df1.append(df2)
df3[2:7]

	Age	Name
2	34	Alvin
3	29	Billy
4	25	Anchal
0	40	Billy
1	32	Anchal

1
2
3

# reset_index 인덱스 재정렬
df3.reset_index(drop=True, inplace=True)
df3

	Age	Name
0	21	Alan
1	27	Alan
2	34	Alvin
3	29	Billy
4	25	Anchal
5	40	Billy
6	32	Anchal
7	35	Alvin
8	22	Andrew
9	27	Andrew

1 2	df3 = df1.append(df2, ignore_index=True) df3

	Age	Name
0	21	Alan
1	27	Alan
2	34	Alvin
3	29	Billy
4	25	Anchal
5	40	Billy
6	32	Anchal
7	35	Alvin
8	22	Andrew
9	27	Andrew

5. concat

row나 column으로 데이터 프레임을 합칠때 사용

1 2	df3 = pd.concat([df1, df2]).reset_index(drop=True) df3

	Age	Name
0	21	Alan
1	27	Alan
2	34	Alvin
3	29	Billy
4	25	Anchal
5	40	Billy
6	32	Anchal
7	35	Alvin
8	22	Andrew
9	27	Andrew

1	pd.concat([df3, df1], axis=1, join="inner")

	Age	Name	Age	Name
0	21	Alan	21	Alan
1	27	Alan	27	Alan
2	34	Alvin	34	Alvin
3	29	Billy	29	Billy
4	25	Anchal	25	Anchal

group by

특정 컬럽의 중복되는 데이터를 합쳐서 새로운 데이터 프레임을 만드는 방법

1 2	df = pd.DataFrame(make_data()) df

	Age	Name
0	35	Alvin
1	26	Arnold
2	26	Jin
3	23	Anchal
4	30	Adam
5	21	Arnold
6	33	Adam
7	21	Adam
8	24	Alvin
9	32	Andrew

1
2
3

# size
result_df = df.groupby("Name").size().reset_index(name="count")
result_df

	Name	count
0	Adam	3
1	Alvin	2
2	Anchal	1
3	Andrew	1
4	Arnold	2
5	Jin	1

1	# sort_values : 설정한 컬럼으로 데이터 프레임을 정렬

1
2
3

result_df.sort_values(["count"], ascending = False, inplace = True)
result_df.reset_index(drop=True, inplace=True)
result_df

	Name	count
0	Adam	3
1	Alvin	2
2	Arnold	2
3	Anchal	1
4	Andrew	1
5	Jin	1

1 2	# agg # size(), min(), max(), mean()

1	df.groupby("Name").agg("min").reset_index()

	Name	Age
0	Adam	21
1	Alvin	24
2	Anchal	23
3	Andrew	32
4	Arnold	21
5	Jin	26

1 2	# 데이터를 요약해서 보여주는 함수 df.describe()

	Age
count	10.000000
mean	27.100000
std	5.087021
min	21.000000
25%	23.250000
50%	26.000000
75%	31.500000
max	35.000000

7. Merge = sql(join)

두개 이상의 데이터 프레임을 합쳐서 결과를 출력하는 방법

Posted 2021-03-05Updated 2021-03-052 minutes read (About 336 words)

summary

numpy : 선형대수를 빠르게 현산해주는 패키지
행렬의 생성 1 : ndarray, np.array(iterable)
행렬의 생성 2 : ones, zeros
행렬 데이터 선택 : array[x, y, z]
행렬 데이터 수정
- 행렬 데이터를 선택
- =, > (값(scala, vactor, matrix))
- 브로드 캐스팅 개녕
arange : list에서 사용하는 range : 결과가 ndarray

### quiz
- 1000 ~ 130 까지 랜덤한 숫자를 가지는 8*8행렬을 만들고,
- 3의 배수는 fiz, 5의 배수는 buz, 3과 5의 배수는 fbz 문자로 변환
- 랜덤한 행렬 데이터

datas = np.random.randint(100, 130, size=(8, 8))

1	- 데이터 타입이 정수 -> 문자열 : ndarray.astype()

1
2
3

import numpy as np
datas = np.random.randint(100, 130, size=(8, 8))
datas

array([[102, 102, 108, 128, 102, 114, 121, 111],
       [125, 118, 112, 105, 110, 119, 111, 114],
       [127, 109, 127, 101, 117, 113, 123, 109],
       [110, 123, 128, 102, 124, 127, 103, 109],
       [104, 106, 123, 115, 118, 117, 106, 110],
       [104, 120, 109, 108, 120, 126, 109, 101],
       [111, 107, 100, 118, 118, 118, 108, 104],
       [118, 111, 102, 126, 126, 120, 108, 115]])

1
2
3

data1 = np.array([1,2,3])
data2 = [True, False, True]
data1[data2]

array([1, 3])

# 3의 배수 , 5의 배수, 15의 배수 위치값에 대한 T, F matrix 생성
idx_3 = datas % 3 == 0
idx_5 = datas % 5 == 0
idx_15 = datas % 15 == 0

1 2	# 데이터 타입을 str으로 변환 datas.dtype

dtype('int32')

1 2	result = datas.astype("str") result

array([['113', '123', '107', '126', '102', '116', '109', '102'],
       ['103', '106', '103', '129', '109', '109', '115', '104'],
       ['125', '120', '103', '114', '102', '103', '129', '102'],
       ['114', '107', '120', '107', '118', '103', '110', '121'],
       ['101', '113', '114', '124', '101', '126', '115', '109'],
       ['125', '121', '101', '116', '124', '121', '108', '108'],
       ['129', '119', '119', '129', '108', '122', '114', '108'],
       ['101', '103', '126', '120', '127', '109', '127', '105']],
      dtype='<U11')

1	# T, F matrix를 이용하여 특정 조건의 데이터를 선택 후 브로트캐스팅하게 값을 대입

1
2
3

result[idx_3] = "fiz"
result[idx_5] = "buz"
result[idx_15] = "fbz"

result

array([['113', 'fiz', '107', 'fiz', 'fiz', '116', '109', 'fiz'],
       ['103', '106', '103', 'fiz', '109', '109', 'buz', '104'],
       ['buz', 'fbz', '103', 'fiz', 'fiz', '103', 'fiz', 'fiz'],
       ['fiz', '107', 'fbz', '107', '118', '103', 'buz', '121'],
       ['101', '113', 'fiz', '124', '101', 'fiz', 'buz', '109'],
       ['buz', '121', '101', '116', '124', '121', 'fiz', 'fiz'],
       ['fiz', '119', '119', 'fiz', 'fiz', '122', 'fiz', 'fiz'],
       ['101', '103', 'fiz', 'fbz', '127', '109', '127', 'fbz']],
      dtype='<U11')

Quiz

1~20까지 랜덤한 숫자를 가지는 5*5 행렬 생성
최대값에는 MAX, 최소값에는 MIN 문자열이 들어가도록 치환하는 코드
1
np.min(ndarray), np.max(ndarray)

1 2	datas = np.random.randint(1, 20, (5, 5)) datas

array([[ 9,  6, 10, 19,  4],
       [14,  8,  6,  6,  6],
       [15, 14,  6, 17, 12],
       [ 5,  9,  6, 13,  8],
       [16,  3,  9, 10, 13]])

1 2	min_num, max_num = np.min(datas), np.max(datas) min_num, max_num

(3, 19)

1 2	idx_min = datas == min_num idx_max = datas == max_num

idx_min

array([[False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False,  True, False, False, False]])

idx_max

array([[False, False, False,  True, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False]])

1 2	result = datas.astype("str") result

array([['9', '6', '10', '19', '4'],
       ['14', '8', '6', '6', '6'],
       ['15', '14', '6', '17', '12'],
       ['5', '9', '6', '13', '8'],
       ['16', '3', '9', '10', '13']], dtype='<U11')

1
2
3

result[idx_min] = "MIN"
result[idx_max] = "MAX"
result

array([['9', '6', '10', 'MAX', '4'],
       ['14', '8', '6', '6', '6'],
       ['15', '14', '6', '17', '12'],
       ['5', '9', '6', '13', '8'],
       ['16', 'MIN', '9', '10', '13']], dtype='<U11')

1. linspace, logspace 함수

linspace : 설정한 범위에서 선형적으로 분할한 위치의 값을 출력
logspace : 설정한 범위에서 로그로 분할한 위치의 값을 축력

1 2	# linspace np.linspace(0, 100, 5)

array([  0.,  25.,  50.,  75., 100.])

1
2
3

# logspace
# Log10(X1)=2, log10(X2)=3, log10(X3)=4
np.logspace(2, 4, 3)

array([  100.,  1000., 10000.])

1 2	# 30세 연봉이 $100000 이고 60세의 연봉이 $1000000 일때 # 연봉이 선형으로 증가, 지수함수로 증가하는 두 경우에서의 40, 50세 연봉을 출력

1 2	age_30 = 100000 age_60 = 1000000

1	np.linspace(age_30, age_60, 4)

array([ 100000.,  400000.,  700000., 1000000.])

1	np.logspace(np.log10(age_30), np.log10(age_60), 4)

array([ 100000.        ,  215443.46900319,  464158.88336128,
       1000000.        ])

2. numpy random

seed : 램덤값을 설정값
rand : 균등분포로 랜덤한 값 생성
randn : 정규분포로 난수를 발생
randint : 균등분포로 정수값을 발생
suffle : 행렬 데이터를 섞어 줍니다.
choice : 특정 확률로 데이터를 선택

# seed
np.random.seed(1)
result1 = np.random.randint(10, 100, 10)

np.random.seed(1)
result2 = np.random.randint(10, 100, 10)

np.random.seed(2)
result3 = np.random.randint(10, 100, 10)

result1, result2, result3

(array([47, 22, 82, 19, 85, 15, 89, 74, 26, 11]),
 array([47, 22, 82, 19, 85, 15, 89, 74, 26, 11]),
 array([50, 25, 82, 32, 53, 92, 85, 17, 44, 59]))

1	np.random.rand(10)

array([0.20464863, 0.61927097, 0.29965467, 0.26682728, 0.62113383,
       0.52914209, 0.13457995, 0.51357812, 0.18443987, 0.78533515])

1
2
3

# shuffle
r = np.random.randint(1, 10, (3, 4))
r

array([[2, 3, 5, 8],
       [4, 5, 1, 2],
       [1, 9, 6, 9]])

1 2	np.random.shuffle(r) r

array([[1, 9, 6, 9],
       [4, 5, 1, 2],
       [2, 3, 5, 8]])

1 2	#choice np.random.choice(5, 10, p=[0.1, 0.4, 0.2, 0.3])

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-10-e648bb30cd70> in <module>
      1 #choice
----> 2 np.random.choice(5, 10, p=[0.1, 0.4, 0.2, 0.3])


mtrand.pyx in numpy.random.mtrand.RandomState.choice()


ValueError: 'a' and 'p' must have same size

# unique
number, counts = np.unique(r, return_counts=True)
print(number)
print(counts)

[1 2 3 4 5 6 8 9]
[2 2 1 1 2 1 1 2]

3. 행렬 데이터의 결합

concatenate

1
2
3

na1 = np.random.randint(10, size=(2,3))
na2 = np.random.randint(10, size=(3,2))
na3 = np.random.randint(10, size=(3,3))

1 2	# 셀로 결합 na1

array([[9, 3, 0],
       [3, 2, 6]])

na2

array([[8, 1],
       [8, 8],
       [2, 9]])

na3

array([[2, 0, 0],
       [2, 7, 7],
       [2, 3, 7]])

1	np.concatenate((na1, na3))

array([[9, 3, 0],
       [3, 2, 6],
       [2, 0, 0],
       [2, 7, 7],
       [2, 3, 7]])

1	np.concatenate((na2, na3,), axis=1)

array([[8, 1, 2, 0, 0],
       [8, 8, 2, 7, 7],
       [2, 9, 2, 3, 7]])

1 2	# c_, r_ np.c_[np.array([1,2,3]), np.array([4,5,6])]

array([[1, 4],
       [2, 5],
       [3, 6]])

1	np.r_[np.array([1,2,3]), np.array([4,5,6])]

array([1, 2, 3, 4, 5, 6])

1	# split, bar, std, mean ...

Posted 2021-03-05Updated 2021-03-052 minutes read (About 265 words)

Numpy

데이터는 행렬 표현
행렬 데이터 빠르게 계산을 해야 합니다.
행렬 데이터 생성, 수정, 계산 등을 빠르게 처리해 주는 패키지
특징
- C, C++, 포트란으로 작성
- 선형대수학을 빠르게 연산
  - 스칼라, 벡터, 매트릭스

1	import numpy as np

1 2	# 행렬 데이터 생성 # ndarray : 한가지 데이터 타입만 값으로 사용이 가능

1 2	arrary = np.array([1, 2, 3]) type(arrary), arrary

(numpy.ndarray, array([1, 2, 3]))

arrary2 = np.array(
    [[1, 2, 3],
    [4, 5, 6]],
)
arrary2, arrary2.ndim, arrary2.shape

(array([[1, 2, 3],
        [4, 5, 6]]),
 2,
 (2, 3))

1 2	# 행렬의 모양(shape) 변경하기 arrary2.reshape(3,2)

array([[1, 2],
       [3, 4],
       [5, 6]])

1	# 행렬 데이터의 선택 : offset index : masking

1	arrary2[1][::-1]

array([6, 5, 4])

1	arrary2[1,2] # arrary2[1][2]

# 데이터 수정

1
2
3

ls = [1,2,3]
ls[1] = 5
ls

[1, 5, 3]

1 2	arrary2[1][2] = 10 arrary2

array([[ 1,  2,  3],
       [ 4,  5, 10]])

1
2
3

# 브로드 캐스팅
arrary2[0] = 0
arrary2

array([[ 0,  0,  0],
       [ 4,  5, 10]])

1 2	arrary2[0] = [7, 8, 9] arrary2

array([[ 7,  8,  9],
       [ 4,  5, 10]])

1
2
3

# 조건으로 선택
idx = arrary2 >7
idx

array([[False,  True,  True],
       [False, False,  True]])

1	arrary2[idx]

array([ 8,  9, 10])

1 2	arrary2[idx] = 100 arrary2

array([[  7, 100, 100],
       [  4,   5, 100]])

1
2
3

# 행렬 데이터의 생성 2
data = np.zeros((2,3))
data

array([[0., 0., 0.],
       [0., 0., 0.]])

1	data.dtype

dtype('float64')

1 2	data2 = data.astype("int") data2.dtype

dtype('int32')

1 2	data = np.ones((2, 3, 2)) data

array([[[1., 1.],
        [1., 1.],
        [1., 1.]],

       [[1., 1.],
        [1., 1.],
        [1., 1.]]])

1 2	# arange np.arange(5)

array([0, 1, 2, 3, 4])

1	np.arange(5,10)

array([5, 6, 7, 8, 9])

1	np.arange(5, 10, 2)

array([5, 7, 9])

Posted 2021-02-22Updated 2021-03-05programming2 minutes read (About 321 words)

Jupyter notebook

mode
- 명령모드(esc) : 셀을 수정할때 사용
- 편집모드(enter) : 셀안의 내용을 수정할때 사용
style
- markdown(명령모드 + m) : 셀안에 설명을 작성할때 사용
- code(명령모드 + y) : 파이썬 코드를 작성할때 사용
단축키
- 셀 실행 : shift + enter
- 셀 삭제 : (명령모드) x
- 되돌리기 : (명령모드) z
- 셀 생성 : (명령모드) a(위에), b(아래)

1+2

Magic Command

셀 내부에서 특별하게 동작하는 커멘드
% : 한 줄의 magic command를 동작
%% : 셀단의의 magic command를 동작
주요 magic command
- pwd : 현재 주피터 노트북 파일의 경로
- ls : 현재 디렉토리의 파일 리스트
- whos : 현재 선언된 변수를 출력
- reset : 현재 선언된 변수를 삭제

%pwd

'C:\\Code\\01_python'

%ls

 Volume in drive C is Windows10
 Volume Serial Number is E625-BBFB

 Directory of C:\Code\01_python

2021-01-21  �삤�썑 10:03    <DIR>          .
2021-01-21  �삤�썑 10:03    <DIR>          ..
2021-01-21  �삤�썑 09:53    <DIR>          .ipynb_checkpoints
2021-01-21  �삤�썑 10:03             1,905 01_jupyter_notebook.ipynb
               1 File(s)          1,905 bytes
               3 Dir(s)  67,438,399,488 bytes free

%whos

Interactive namespace is empty.

%reset

Once deleted, variables cannot be recovered. Proceed (y/[n])? y

%whos

Interactive namespace is empty.

Shell Command

주피터 노트북을 실행 쉘 환경의 명령을 사용
명령어 앞에 !를 붙여서 실행
주요 명령어
- ls, cat, echo …

1	!echo python

python

!ls

01_jupyter_notebook.ipynb

Posted 2021-02-22Updated 2021-03-05Database6 minutes read (About 893 words)

AWS server setting

Server Setting

OTP 설정
EC2 생성
FTP 서비스 : cyberduck 설치
pyenv 설정
jupyter notebook 설치
mysql 설치 및 설정

1. OTP 설정

AWS Console에서 내 보안 자격 증명 메뉴로 이동
멀티팩터인증(MFA) 선택
MFA 활성화 버튼 선택
가상 MFA 디바이스 선택
Authy 다운로드 및 회원가입
- https://authy.com/download/
- 이메일과 핸드폰 인증이 필요
- 모바일, 데스크탑, 크롬브라우져앱 설치 가능
- Authy 앱 실행
Authy앱에서 Tokens에서 + 버튼을 클릭
AWS 페이지에서 비밀키 표시 버튼을 클릭하고 나온 문자열을 Authy 앱의 Enter Code given by the website에 입력
Account Name을 설정 후 Save
연속해서 나오는 Key값을 AWS 페이지의 MFA Key1, MFA Key2에 입력

2. EC2 생성

AWS Console에서 EC2 입력해서 서비스 페이지에 들어감
인스턴스 메뉴의 인스턴스 시작 클릭
Ubuntu Server 18.04 LTS 선택
t2.micro 선택
검토 및 시작 클릭
키페어 생성 및 다운로드
시작하기 버튼 클릭

접속

dss.pem 키파일 ~/.ssh 디렉토리로 이동
dss.pem 파일 권한 변경
- $ chmod 400 ~/.ssh/dss.pem
서버 접속
- ssh -i ~/.ssh/dss.pem ubuntu@<public ip 주소>

3. FTP 서비스

cyberduck
- https://cyberduck.io/download/
filezilla
- https://filezilla-project.org/download.php
서버 접속 설정
- SFTP 선택
- 서버 : public ip 설정
- 사용자 이름 : ubuntu
- SSH Private Key : dss.pem 파일 선택

4. pyenv 설정

pyenv.sh 파일을 구글 드라이브에서 다운
cyberduck을 이용하여 서버로 파일 이동
pyenv.sh 파일 실행
1
$ source pyenv.sh
5. jupyter notebook 설치 및 설정
ipython jupyter 패키지 설치
1
$ pip install ipython jupyter
설정 파일 생성
1
$ jupyter notebook --generate-conﬁg

패스워드 생성

$ ipython
In [1]: from notebook.auth import passwd
In [2]: passwd()
Enter password: dss
Verify password: dss
sha1:6600c5733ef3:b683d6afba16b3403fdf9a75ac38b7d8e7f733bb

설정파일 접속

1	$ sudo vi /home/ubuntu/.jupyter/jupyter_notebook_conﬁg.py

설정 파일 수정

1
2
3

c.NotebookApp.ip = '172.31.26.225' # 내부 IP 주소
c.NotebookApp.open_browser = False
c.NotebookApp.password = 'sha1:6600c5733ef3:b683d6afba16b3403fdf9a75ac38b7d8e7f733bb'

서버의 8888 포트 활성화
서버에서 jupyter notebook 실행
브라우져로 접속
- http://<public ip>:8888

6. Mysql 설치 및 설정

mysql-server, mysql-client 설치
- $ sudo apt install mysql-server

mysql 보안 설정 ( n-y-n-y-y 순으로 입력해줍니다. )

$ sudo mysql_secure_installation


- Would you like to setup VALIDATE PASSWORD plugin? Press y|Y for Yes, any other key for No: n
- 패스워드 설정 : dss
- Remove anonymous users? (Press y|Y for Yes, any other key for No) : y
- Disallow root login remotely? (Press y|Y for Yes, any other key for No) : n
- Remove test database and access to it? (Press y|Y for Yes, any other key for No) : y
- Reload privilege tables now? (Press y|Y for Yes, any other key for No) : y

최초 패스워드 설정

$ sudo mysql
mysql> SELECT user,authentication_string,plugin,host FROM mysql.user;
mysql> ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'dss';
mysql> FLUSH PRIVILEGES;
mysql> SELECT user,authentication_string,plugin,host FROM mysql.user;
mysql> exit

접속

1 2	$ mysql -u root -p Enter password: dss

외부 접속 허용
- mysql 설정파일 bind-address = 0.0.0.0 으로 수정
  - $ sudo vi /etc/mysql/mysql.conf.d/mysqld.cnf
    1
    bind-address = 0.0.0.0
  - 외부접속이 허용되도록 mysql 설정
    1
    mysql> grant all privileges on *.* to 'root'@'%' identified by 'dss';
  - 재시작으로 설정 적용
    1
    $ sudo systemctl restart mysql.service
  - 서버의 3306 포트 활성화

database management application

windows
- heidiSQL
- https://www.sequelpro.com/
mac
- Sequel Pro
- https://www.heidisql.com

Save Sample Data

https://dev.mysql.com/doc/index-other.html
world database zip 파일 다운로드
압축 해제 후 world.sql 파일을 서버로 이동
database management app을 이용하여 world 데이터 베이스 생성
데이터 저장 방법 1
1
$ mysql -u root -p world < world.sql

데이터 저장 방법 2

1
2
3

sql> create database world;
sql> use world;
sql> source world.sql

Posted 2021-02-22Updated 2021-03-05programming10 minutes read (About 1557 words)

Jupyter notebook

파이썬의 기본 문법

변수 선언, 식별자, 자료형, 형변환, 연산자 학습

1. 주석(comment)과 축력(print)

# 주석 : 앞에 #을 붙이면 코드로 실행이 안됩니다.
# 코드에 대한 설명이나 중간에 코드를 샐행시키고 시퓨지 않을때 사용
# 단축키 : ctrl(cmd) + /
# 블럭설정 : shift + 방향키

# 1,2,3을 출력하는 코드
print(1)
#print(2)
print(3)

1
3

1 2	# 출력 : print 함수 # 코드 중간에 변수안에 들어있는 값을 확인하고 싶을때 사용

a = 1
b = 2
print(b)
c = 3
b = 4
print(b)

2
4

# print 함수의 옵션
# docstring : 함수에 대한 설명 : 단축키(shift + tab)
# 자동완성 : tab
print(1,2, sep="-", end="\t")
print(3)

1-2    3

1	python_data_science = 1

1	python_data_science

2. 변수 선언

RAM 저장공간에 값을 할당하는 행위

1
2
3

a = 1
b = 2
c = a + b

1 2	d, e = 3, 4 f = g = 5

3. 식별자

변수, 함수, 클래스, 모듈등의 이름을 식별자 라고 합니다.
식별자 규칙
- 소문자, 대문자, 숫자, 언더스코어(_)를 사용합니다.
- 가장 압에 숫자 사용 불가
- 예약어의 사용 불가 : def, class, try, except …
- 컨벤션
  - snake case : fast_campus : 변수, 함수
    _ camel case : FastCampus, fastCampus : 클래스

4. 데이터 다입

RAM 저장공간을 효율적으로 사용하기 위해 저장공간의 타입을 설정
동적타이핑
- 변수 선언시 저장되는 값에 따라서 자동으로 데이터 타입이 설정
기본 데이터 타입 : int, float, bool, str
컬렉션 데이터 타입 : list, tuple, dict

a = 1
# int a = 1 
b = "python"
type(a), type(b)

(int, str)

# 기본 데이터 타입 : int, float, bool, str
a = 1
b = 1.2
c = True
d = "data"
type(a),type(b),type(c),type(d)

(int, float, bool, str)

a + b

2.2

a + d

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-22-4fbab87c839c> in <module>
----> 1 a + d


TypeError: unsupported operand type(s) for +: 'int' and 'str'

1
2
3

# 데이터 타입에 함수 : 문자열
# upper : 대문자로 변환
e = d.upper()

d, e

('data', 'DATA')

1	f = "Fast Campus"

1 2	# lower : 소문자로 변환 f.lower()

'fast campus'

1 2	# strip : 공백제거 f.strip()

'Fast Campus'

1 2	# replace : 특정 문자열 치환 f.replace("Fast", "Slow")

'Slow Campus'

1	f.replace(" ", "")

'FastCampus'

dir(f)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

1
2
3

# 오프셋 인덱스 : 마스크, 마스킹 : []
# 문자열은 순서가 있는 문자들의 집합
g = "abcdefg"

1	g[2], g[-2], g[2:5], g[:2], g[3:], g[-2:], g[::2], g[::-1]

('c', 'f', 'cde', 'ab', 'defg', 'fg', 'aceg', 'gfedcba')

1	numbers = "123456789" # 97531 출력

1 2	result = numbers[::2] result[::-1]

'97531'

1	numbers[::2][::-1]

'97531'

1	numbers[::-2]

'97531'

컬렉션 데이터 타입 : list, tuple, dict

list [] : 순서가 있는 수정이 가능한 데이터 타입
tuple () : 순서가 있는 수정이 불가능한 데이터 타입
dict {}: 순서가 없고 키:값 으로 구성된 데이터 타입

1
2
3

# list
ls = [1,2,3,"four", [5,6], True, 1.2]
type(ls), ls

(list, [1, 2, 3, 'four', [5, 6], True, 1.2])

1 2	# offset index 사용 가능 ls[3], ls[1:3], ls[::-1]

('four', [2, 3], [1.2, True, [5, 6], 'four', 3, 2, 1])

1 2	# list 함수 ls = [1, 5, 2, 4]

1
2
3

# append : 가장 뒤에 값을 추가
ls.append(3)
ls

[1, 5, 2, 4, 3]

1
2
3

# sort : 오름차순으로 정렬
ls.sort()
ls[::-1]

[5, 4, 3, 2, 1]

# pop : 가장 마지막 데이터를 출력하고 출력한 데이터를 삭제
# ctrl + enter : 현재 셀을 계속 실행
num = ls.pop()
num, ls

(3, [1, 2])

# 리스트의 복사

1
2
3

ls1 = [1, 2, 3]
ls2 = ls1 # 얕은 복사 : 주소값 복사
ls1, ls2

([1, 2, 3], [1, 2, 3])

1 2	ls1[2] = 5 ls1, ls2

([1, 2, 5], [1, 2, 5])

1 2	ls3 = ls1.copy() ls1, ls3

([1, 2, 5], [1, 2, 5])

1	ls1[2] = 10

ls1, ls3

([1, 2, 10], [1, 2, 5])

Tuple

리스트와 같지만 수정이 불가능한 데이터 타입
튜플은 리스트보다 같은 데이터를 가졌을때 공간을 적게 사용

1
2
3

tp1 = 1, 2, 3
tp2 = (4, 5, 6)
type(tp1), type(tp2), tp1, tp2

(tuple, tuple, (1, 2, 3), (4, 5, 6))

1 2	a, b = 1, 2 a, b

(1, 2)

1 2	# offset index 사용 가능 tp1[1], tp1[::-1]

(2, (3, 2, 1))

# 리스트와 튜플의 저장공간 차이 비교
import sys

ls = [1, 2, 3]
tp = (1, 2, 3)

print(sys.getsizeof(ls), sys.getsizeof(tp))

80 64

dict {}

순서가 없고 {키:값}으로 구성된 데이터 타입

# 선언 : 키는 정수, 문자열 데이터 타입만 사용 가능
# 인덱스 대신 키를 사용
dic = {
    1: "one",
    "two": 2,
    "three" : [1, 2, 3],
}
type(dic), dic

(dict, {1: 'one', 'two': 2, 'three': [1, 2, 3]})

1	dic[1], dic["three"]

('one', [1, 2, 3])

1 2	dic["two"] = 123 dic

{1: 'one', 'two': 123, 'three': [1, 2, 3]}

1
2
3

# 아래의 데이터를 list와 dict로 선언
# 도시 : seoul, busan, daegu
# 인구 : 9,700,000, 3,400,00, 2,400,000

1
2
3

# 리스트
city = ["seoul", "busan", "daegu"]
population = [9700000, 3400000, 2400000]

# 딕셔너리
data = {
    "seoul" : 9700000,
    "busan" : 3400000,
    "daegu" : 2400000,
}

1	sum(population)

15500000

1	sum(data.values())

15500000

5. 형변환

데이터 타입을 변환하는 방법
int, float, bool, str, list, tuple, dict

1
2
3

a = 1
b = "2"
a + int(b)

1	str(a) + b

'12'

1	list(data.values())

[9700000, 3400000, 2400000]

1	city, population

(['seoul', 'busan', 'daegu'], [9700000, 3400000, 2400000])

1 2	# zip : 같은 인덱스 데이터끼리 묶어주는 함수 list(zip(city, population))

[('seoul', 9700000), ('busan', 3400000), ('daegu', 2400000)]

1	result = dict(zip(city, population))

1
2
3

data1 = list(result.keys())
data2 = list(result.values())
data1, data2

(['seoul', 'busan', 'daegu'], [9700000, 3400000, 2400000])

1 2	string = "python" int(string)

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-113-3eb1982ee741> in <module>
      1 string = "python"
----> 2 int(string)


ValueError: invalid literal for int() with base 10: 'python'

6. 연산자

산술연산자 : +, -, *, /, //, %, **
할당연산자 : 변수에 누적시켜서 연산 : +=, //=, **=
비교연산자 : <, >, ==, !=, <=, >= : 결과로 True, False
멤버연산자 : 특정 데이터가 있는지 확인할때 사용 : not in, in
논리연산자 : True, False를 연산 : or, and, not

1 2	# 산술연산 1 + 4 / 2 ** 2

2.0

# 할당연산
a = 10
a+=10
a+=10
a

# 비교연산
b=2
print(a, b)
a < b, a == b, a !=b

30 2





(False, False, True)

1 2	# 논리연산 True and False, True or False, not True or False

(False, True, False)

1 2	# 멤버연산 ls = ['jin', 'andy', 'john']

1	'andy' in ls, 'anchel' in ls

(True, False)

### 랜덤함수
import random

random.randint(1, 10)

1
2
3

# 입력함수
data = input("insert string: ")
data

insert string: 안녕하세요





'안녕하세요'

1	# 해결의 책 : 질문을 하면 질문에 대한 답변을 해주는 책

# 솔루션을 리스트로 작성
# 질문 입력 받음
# 솔루션의 갯수에 맞게 랜덤한 index 정수 값을 생성
# index 해당하는 솔루션 리스트의 데이터를 출력

# 솔루션을 리스트로 작성
solutions = [
    "무엇을 하든 잘 안될것이다.",
    "생각지도 않게 좋은 일이 생길것이다.",
    "무엇을 상상하든 그 이상이다."
]

# 질문 입력 받음
input("질문을 입력하세요.: ")

# 솔루션의 갯수에 맞게 랜덤한 index 정수 값을 생성
idx = random.randint(0, len(solutions) - 1)

# index 해당하는 솔루션 리스트의 데이터를 출력
solutions[idx]

질문을 입력하세요.: d





'무엇을 하든 잘 안될것이다.'

제 1고지 미분 자동계산

1단계 상자로서의 변수

1.1 변수란

1.2 Variable 클래스 구현

1.3 넘파이 다차원 배열

2단계 변수를 낳는 함수

2.1 함수란

2.2 Function 클래스 구현

2.3 Function 클래스 이용

3단계 함수 연결

3.1 Exp 함수 구현

3.2 함수 연결

4단계 수치 미분

4.1 미분이란

4.2 수치 미분 구현

4.3 합성 함수의 미분

4.4 수치 미분의 문제점

5단계 역전파 이론

5.1 연쇄 법칙

5.2 역전파 원리 도출

5.3 계산 그래프로 살펴보기

6단계 수동 역전파

6.1 Variable 클래스 추가 구현

6.2 Function 클래스 추가 구현

6.3 Square 와 Exp 클래스 추가 구현

7단계 역전파 자동화

7.1 역전파 자동화의 시작

7.2 역전파 도전!

7.3 backward 메서드 추가

8단계 재귀에서 반복문으로

8.2 반복문을 이용한 구현

9단계 함수를 더 편리하게

9.1 파이썬 함수로 이용하기

9.2 backward 메서드 간소화

9.3 ndarray 만 취급하기

10단계 테스트

10.1 파이썬 단위 테스트

10.2 square 함수의 역전파 테스트

10.3 기울기 확인을 이용한 자동 테스트

Hello World

Quick Start

Create a new post

Run server

Generate static files

Deploy to remote sites

Pandas Pivot

pandas io

kaggle

1. 성별, 좌석등급에 따른 데이터의 수

summary

quiz

quiz

1. merge

Pandas

1. Series

2. DataFrame

3. apply 함수

4. append

5. concat

group by

7. Merge = sql(join)

summary

Quiz

1. linspace, logspace 함수

2. numpy random

3. 행렬 데이터의 결합

Numpy

Jupyter notebook

Jupyter notebook

Magic Command

Shell Command

AWS server setting

Server Setting

1. OTP 설정

2. EC2 생성

접속

3. FTP 서비스

4. pyenv 설정

5. jupyter notebook 설치 및 설정

6. Mysql 설치 및 설정