본문 바로가기

Data Analysis/Exploratory Data Analysis

Pandas (1) 데이터 정보 확인

In [1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
 
In [2]:
movie = pd.read_csv(r'C:\Users\user\jupyterpractice\EDA\Pandas-Cookbook-master\data\movie.csv')
 
In [3]:
movie.head(3)
 
Out[3]:
  color director_name num_critic_for_reviews duration director_facebook_likes actor_3_facebook_likes actor_2_name actor_1_facebook_likes gross genres ... num_user_for_reviews language country content_rating budget title_year actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
0 Color James Cameron 723.0 178.0 0.0 855.0 Joel David Moore 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi ... 3054.0 English USA PG-13 237000000.0 2009.0 936.0 7.9 1.78 33000
1 Color Gore Verbinski 302.0 169.0 563.0 1000.0 Orlando Bloom 40000.0 309404152.0 Action|Adventure|Fantasy ... 1238.0 English USA PG-13 300000000.0 2007.0 5000.0 7.1 2.35 0
2 Color Sam Mendes 602.0 148.0 0.0 161.0 Rory Kinnear 11000.0 200074175.0 Action|Adventure|Thriller ... 994.0 English UK PG-13 245000000.0 2015.0 393.0 6.8 2.35 85000

3 rows × 28 columns

 
In [13]:
movie.tail(3)
 
Out[13]:
  color director_name num_critic_for_reviews duration director_facebook_likes actor_3_facebook_likes actor_2_name actor_1_facebook_likes gross genres ... num_user_for_reviews language country content_rating budget title_year actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
4913 Color Benjamin Roberds 13.0 76.0 0.0 0.0 Maxwell Moody 0.0 NaN Drama|Horror|Thriller ... 3.0 English USA NaN 1400.0 2013.0 0.0 6.3 NaN 16
4914 Color Daniel Hsia 14.0 100.0 0.0 489.0 Daniel Henney 946.0 10443.0 Comedy|Drama|Romance ... 9.0 English USA PG-13 NaN 2012.0 719.0 6.3 2.35 660
4915 Color Jon Gunn 43.0 90.0 16.0 16.0 Brian Herzlinger 86.0 85222.0 Documentary ... 84.0 English USA PG 1100.0 2004.0 23.0 6.6 1.85 456

3 rows × 28 columns

 
In [4]:
columns = movie.columns
index = movie.index
data = movie.values
 
In [5]:
columns
 
Out[5]:
Index(['color', 'director_name', 'num_critic_for_reviews', 'duration',
       'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
       'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
       'movie_title', 'num_voted_users', 'cast_total_facebook_likes',
       'actor_3_name', 'facenumber_in_poster', 'plot_keywords',
       'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',
       'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',
       'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],
      dtype='object')
 
In [6]:
index
 
Out[6]:
RangeIndex(start=0, stop=4916, step=1)
 
In [7]:
data
 
Out[7]:
array([['Color', 'James Cameron', 723.0, ..., 7.9, 1.78, 33000],
       ['Color', 'Gore Verbinski', 302.0, ..., 7.1, 2.35, 0],
       ['Color', 'Sam Mendes', 602.0, ..., 6.8, 2.35, 85000],
       ...,
       ['Color', 'Benjamin Roberds', 13.0, ..., 6.3, nan, 16],
       ['Color', 'Daniel Hsia', 14.0, ..., 6.3, 2.35, 660],
       ['Color', 'Jon Gunn', 43.0, ..., 6.6, 1.85, 456]], dtype=object)
 
In [9]:
print(type(index))
print()
print(type(columns))
print()
print(type(data))
 
<class 'pandas.core.indexes.range.RangeIndex'>

<class 'pandas.core.indexes.base.Index'>

<class 'numpy.ndarray'>
 
In [14]:
issubclass(pd.RangeIndex, pd.Index)
 
Out[14]:
True
 
In [16]:
issubclass(pd.RangeIndex, np.ndarray)
 
Out[16]:
False
 
In [10]:
index.values
 
Out[10]:
array([   0,    1,    2, ..., 4913, 4914, 4915], dtype=int64)
 
In [11]:
columns.values
 
Out[11]:
array(['color', 'director_name', 'num_critic_for_reviews', 'duration',
       'director_facebook_likes', 'actor_3_facebook_likes',
       'actor_2_name', 'actor_1_facebook_likes', 'gross', 'genres',
       'actor_1_name', 'movie_title', 'num_voted_users',
       'cast_total_facebook_likes', 'actor_3_name',
       'facenumber_in_poster', 'plot_keywords', 'movie_imdb_link',
       'num_user_for_reviews', 'language', 'country', 'content_rating',
       'budget', 'title_year', 'actor_2_facebook_likes', 'imdb_score',
       'aspect_ratio', 'movie_facebook_likes'], dtype=object)
 
In [12]:
movie.dtypes
 
Out[12]:
color                         object
director_name                 object
num_critic_for_reviews       float64
duration                     float64
director_facebook_likes      float64
actor_3_facebook_likes       float64
actor_2_name                  object
actor_1_facebook_likes       float64
gross                        float64
genres                        object
actor_1_name                  object
movie_title                   object
num_voted_users                int64
cast_total_facebook_likes      int64
actor_3_name                  object
facenumber_in_poster         float64
plot_keywords                 object
movie_imdb_link               object
num_user_for_reviews         float64
language                      object
country                       object
content_rating                object
budget                       float64
title_year                   float64
actor_2_facebook_likes       float64
imdb_score                   float64
aspect_ratio                 float64
movie_facebook_likes           int64
dtype: object
 
In [17]:
movie.dtypes.value_counts()
 
Out[17]:
float64    13
object     12
int64       3
dtype: int64
 
In [18]:
movie['director_name']
 
Out[18]:
0           James Cameron
1          Gore Verbinski
2              Sam Mendes
3       Christopher Nolan
4             Doug Walker
              ...        
4911          Scott Smith
4912                  NaN
4913     Benjamin Roberds
4914          Daniel Hsia
4915             Jon Gunn
Name: director_name, Length: 4916, dtype: object
 
In [19]:
movie.director_name
 
Out[19]:
0           James Cameron
1          Gore Verbinski
2              Sam Mendes
3       Christopher Nolan
4             Doug Walker
              ...        
4911          Scott Smith
4912                  NaN
4913     Benjamin Roberds
4914          Daniel Hsia
4915             Jon Gunn
Name: director_name, Length: 4916, dtype: object
 
In [20]:
type(movie['director_name'])
 
Out[20]:
pandas.core.series.Series
 
In [21]:
director = movie['director_name']
director.name
 
Out[21]:
'director_name'
 
In [22]:
director.to_frame().head()
 
Out[22]:
  director_name
0 James Cameron
1 Gore Verbinski
2 Sam Mendes
3 Christopher Nolan
4 Doug Walker
 
In [24]:
s_attr_methods = set(dir(pd.Series))
len(s_attr_methods)
 
Out[24]:
433
 
In [27]:
df_attr_methods = set(dir(pd.DataFrame))
len(df_attr_methods)
 
Out[27]:
430
 
In [28]:
len(s_attr_methods & df_attr_methods)
 
Out[28]:
377

'Data Analysis > Exploratory Data Analysis' 카테고리의 다른 글

Pandas (5) Method Chaining  (0) 2021.09.29
Pandas (4) Dataframe 연산  (0) 2021.09.28
Pandas (3) Column 네이밍  (0) 2021.09.27
Pandas (2) Column 조작  (0) 2021.09.24