9. pandas의 핵심기능 'Group by'

jinny95 2023. 4. 26. 20:28

2023. 4. 26. 20:28

group by 란?

-안에 있는 데이터들을 그룹지어서 확인하는 것

-언피벗된 데이터(기준이 자기 마음대로)를 의미있는 데이터로 만들어준다.

-groupby를 할 때 우리는 by(그룹을 나누는 기준열), data (그룹화해서 살펴볼 데이터), 집계함수(그룹화된 데이터를 하나의 값으로 리턴할 함수) 세가지를 지정해야한다.

Credit Amount	Type of apartment
0	1049	1
1	2799	1
2	841	1
3	2122	1
4	2171	2

△예시 데이터 프레임

1. Type of apartment를 groupby로 묶은 후 평균 통계를 냈을 때

german_grouped = german['Credit Amount'].groupby(german['Type of apartment'])  
german_grouped.mean()

출력>
Type of apartment
1    3122.553073
2    3067.257703
3    4881.205607
Name: Credit Amount, dtype: float64

2. 그룹 안의 그룹

german = german_sample[['Credit Amount', 'Type of apartment','Purpose']]
german_group2 = german['Credit Amount'].groupby(
                                      [german['Purpose'],                        
                                      german['Type of apartment']]
)
german_group2.mean()

출력>
Purpose  Type of apartment
0        1                    2597.225000
         2                    2811.024242
         3                    5138.689655
1        1                    5037.086957
         2                    4915.222222
         3                    6609.923077
2        1                    2727.354167
         2                    3107.450820
         3                    4100.181818
3        1                    2199.763158
         2                    2540.533040
         3                    2417.333333
4        1                    1255.500000
         2                    1546.500000
5        1                    1522.000000
         2                    2866.000000
         3                    2750.666667
6        1                    3156.444444
         2                    2492.423077
         3                    4387.266667
8        1                     902.000000
         2                    1243.875000
9        1                    5614.125000
         2                    3800.592105
         3                    4931.800000
10       2                    8576.111111
         3                    7109.000000
Name: Credit Amount, dtype: float64

이제 purpose가 가장 상위그룹되고 그 안에 아파트타입이 또 한번 그룹이 되었다.

3. for문에서의 그룹바이 - 가장 상위그룹이 겹치는 부분도 다 작성해줌

<예시용 표>
german = german_sample[['Type of apartment',
                       'Sex & Marital Status',
                       'Credit Amount']]

german.head() # 잘 가져와졌는지 확인


Type of apartment	Sex & Marital Status	Credit Amount
0	1	2	1049
1	1	3	2799
2	1	2	841
3	1	3	2122
4	2	3	2171

△예시용 표

<for 써서 그룹바이>
for type, g in german.groupby('Type of apartment'):
  print(g.head(n=3))
  print('--------------------------------------------------------------')

출력>
   Type of apartment  Sex & Marital Status  Credit Amount
0                  1                     2           1049
1                  1                     3           2799
2                  1                     2            841
--------------------------------------------------------------
   Type of apartment  Sex & Marital Status  Credit Amount
4                  2                     3           2171
6                  2                     3           3398
7                  2                     3           1361
--------------------------------------------------------------
    Type of apartment  Sex & Marital Status  Credit Amount
29                  3                     3           4796
44                  3                     3           1239
69                  3                     3           2032
--------------------------------------------------------------

'python > [modules] Pandas' 카테고리의 다른 글

11. 연도 차트 연대별로 정리하고 차트그리기 예시, table pivot, plot (0)	2023.04.26
10. 데이터 정제 예시 (0)	2023.04.26
8. 웹주소로 csv파일 불러오기, pandas의 기초분석(기술 통계량)과 활용예시 (0)	2023.04.26
7. 데이터의 정렬방법 (0)	2023.04.26
6. 데이터의 기본연산 (0)	2023.04.26

JINHEE's lab

9. pandas의 핵심기능 'Group by'

'python > [modules] Pandas' 카테고리의 다른 글

+ Recent posts

티스토리툴바