본문 바로가기

Data Analysis/Exploratory Data Analysis

Pandas (3) Column 네이밍

In [3]:

import pandas as pd
import numpy as np
pd.options.display.max_columns = 40
 
movie = pd.read_csv(r'C:\Users\user\jupyterpractice\EDA\Pandas-Cookbook-master\data\movie.csv')
 
 

열 이름 일목요연하게 정렬하기

  • 가이드 라인
      1. 각 열을 연속 / 불연속에 따라 분류
      1. 연속 / 불연속 내에서 공통적인 열은 그룹으로 만들기
      1. 그룹 내 가장 중요한 열이 가장 먼저 나오게 하고, 범주형 열을 연속형보다 먼저 나오게 하기
  • 추가로 볼 논문 : Tidy Data (http://bit.ly/2v1hvH5)
In [25]:
movie = pd.read_csv(r'C:\Users\user\jupyterpractice\EDA\Pandas-Cookbook-master\data\movie.csv')
In [26]:
movie.head(2)
Out[26]:
  color director_name num_critic_for_reviews duration director_facebook_likes actor_3_facebook_likes actor_2_name actor_1_facebook_likes gross genres actor_1_name movie_title num_voted_users cast_total_facebook_likes actor_3_name facenumber_in_poster plot_keywords movie_imdb_link num_user_for_reviews language country content_rating budget title_year actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
0 Color James Cameron 723.0 178.0 0.0 855.0 Joel David Moore 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi CCH Pounder Avatar 886204 4834 Wes Studi 0.0 avatar|future|marine|native|paraplegic http://www.imdb.com/title/tt0499549/?ref_=fn_t... 3054.0 English USA PG-13 237000000.0 2009.0 936.0 7.9 1.78 33000
1 Color Gore Verbinski 302.0 169.0 563.0 1000.0 Orlando Bloom 40000.0 309404152.0 Action|Adventure|Fantasy Johnny Depp Pirates of the Caribbean: At World's End 471220 48350 Jack Davenport 0.0 goddess|marriage ceremony|marriage proposal|pi... http://www.imdb.com/title/tt0449088/?ref_=fn_t... 1238.0 English USA PG-13 300000000.0 2007.0 5000.0 7.1 2.35 0
In [27]:
movie.columns
Out[27]:
Index(['color', 'director_name', 'num_critic_for_reviews', 'duration',
       'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
       'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
       'movie_title', 'num_voted_users', 'cast_total_facebook_likes',
       'actor_3_name', 'facenumber_in_poster', 'plot_keywords',
       'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',
       'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',
       'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],
      dtype='object')
In [28]:
# 불연속형 그룹
disc_core = ['movie_title','title_year', 'content_rating','genres']
disc_people = ['director_name','actor_1_name', 'actor_2_name','actor_3_name']
disc_other = ['color','country','language','plot_keywords','movie_imdb_link']

# 연속형 그룹
cont_fb = ['director_facebook_likes','actor_1_facebook_likes','actor_2_facebook_likes',
           'actor_3_facebook_likes', 'cast_total_facebook_likes', 'movie_facebook_likes']
cont_finance = ['budget','gross']
cont_num_reviews = ['num_voted_users','num_user_for_reviews', 'num_critic_for_reviews']
cont_other = ['imdb_score','duration', 'aspect_ratio', 'facenumber_in_poster']
In [29]:
new_col_order = disc_core + disc_people + disc_other + \
                    cont_fb + cont_finance + cont_num_reviews + cont_other

# python 집합은 순서가 없으므로 같은지 확인하는 연산은 한 집합의 원소가 다른 집합의 원소와 동일한지 확인하는 것.
# 누락된 column이 없는지 확인한다
set(movie.columns) == set(new_col_order)
Out[29]:
True
In [30]:
movie2 = movie[new_col_order]
movie2.head()
Out[30]:
  movie_title title_year content_rating genres director_name actor_1_name actor_2_name actor_3_name color country language plot_keywords movie_imdb_link director_facebook_likes actor_1_facebook_likes actor_2_facebook_likes actor_3_facebook_likes cast_total_facebook_likes movie_facebook_likes budget gross num_voted_users num_user_for_reviews num_critic_for_reviews imdb_score duration aspect_ratio facenumber_in_poster
0 Avatar 2009.0 PG-13 Action|Adventure|Fantasy|Sci-Fi James Cameron CCH Pounder Joel David Moore Wes Studi Color USA English avatar|future|marine|native|paraplegic http://www.imdb.com/title/tt0499549/?ref_=fn_t... 0.0 1000.0 936.0 855.0 4834 33000 237000000.0 760505847.0 886204 3054.0 723.0 7.9 178.0 1.78 0.0
1 Pirates of the Caribbean: At World's End 2007.0 PG-13 Action|Adventure|Fantasy Gore Verbinski Johnny Depp Orlando Bloom Jack Davenport Color USA English goddess|marriage ceremony|marriage proposal|pi... http://www.imdb.com/title/tt0449088/?ref_=fn_t... 563.0 40000.0 5000.0 1000.0 48350 0 300000000.0 309404152.0 471220 1238.0 302.0 7.1 169.0 2.35 0.0
2 Spectre 2015.0 PG-13 Action|Adventure|Thriller Sam Mendes Christoph Waltz Rory Kinnear Stephanie Sigman Color UK English bomb|espionage|sequel|spy|terrorist http://www.imdb.com/title/tt2379713/?ref_=fn_t... 0.0 11000.0 393.0 161.0 11700 85000 245000000.0 200074175.0 275868 994.0 602.0 6.8 148.0 2.35 1.0
3 The Dark Knight Rises 2012.0 PG-13 Action|Thriller Christopher Nolan Tom Hardy Christian Bale Joseph Gordon-Levitt Color USA English deception|imprisonment|lawlessness|police offi... http://www.imdb.com/title/tt1345836/?ref_=fn_t... 22000.0 27000.0 23000.0 23000.0 106759 164000 250000000.0 448130642.0 1144337 2701.0 813.0 8.5 164.0 2.35 0.0
4 Star Wars: Episode VII - The Force Awakens NaN NaN Documentary Doug Walker Doug Walker Rob Walker NaN NaN NaN NaN NaN http://www.imdb.com/title/tt5289954/?ref_=fn_t... 131.0 131.0 12.0 NaN 143 0 NaN NaN 8 NaN NaN 7.1 NaN NaN 0.0
 

'Data Analysis > Exploratory Data Analysis' 카테고리의 다른 글

Pandas (5) Method Chaining  (0) 2021.09.29
Pandas (4) Dataframe 연산  (0) 2021.09.28
Pandas (2) Column 조작  (0) 2021.09.24
Pandas (1) 데이터 정보 확인  (0) 2021.09.23