Reading-Notes

View the Project on GitHub

Pandas

Pandas is a fast, powerful and easy data analysis and manipulating tool

import pandas as pd

why to use pandas

Object creation

series

a serie is one dimensionsl array that holds any data type. to create a series using pandas:

` s = pd.Series(data, index=index)`

d = {"b": 1, "a": 0, "c": 2}`
pd.Series(d)

DataFrame

The main data structure of pandas.

output --->
 A          B    C  D      E    F 0  1.0 2013-01-02  1.0  3   test  foo 1  1.0 2013-01-02  1.0  3  train  foo 2  1.0 2013-01-02  1.0  3   test  foo 3  1.0 2013-01-02  1.0  3  train  foo ```

Viewing data

To view the top and bottom rows of the frame:

Gives a NumPy representation of the underlying data:

Selection

Getting

.loc is strict when you present slicers that are not compatible (or convertible) with the index type.

Selection by label

Selection by position

Boolean indexing

Setting

Setting a new column automatically aligns the data by the indexes

Missing data

pandas primarily uses the value np.nan to represent missing data.

df1.dropna(how=”any”)

df1.fillna(value=5)

pd.isna(df1)

Operations

Stats

Operations in general exclude missing data.

Performing a descriptive statistic:

df.mean()
 
A   -0.004474
B   -0.383981
C   -0.687758
D    5.000000
F    3.000000

more pandas operations: link

Panda python