Pandas Examples


Introduction

Following is an example of many features of the Pandas library for the Python language. You may think of it as a cheat sheet of sorts but I'm reluctant to call it that because to be useful as such a cheat sheet should be much smaller. It was developed from personal notes accumulated over many years but just released publicaly now for the benefit of my grandson.

You can reach me using the contact from from one of my other sites: Birdland Contact.
This is also available as a: Markdown file.

Bill Wetzel
24-Apr-2025


Preparatory

Imports

import numpy as np import pandas as pd from pandas import DataFrame, Series import matplotlib import matplotlib.pyplot as plt

Series

Make a Series

s = Series( np.random.rand( 10 ))
0 0.026916 1 0.899089 2 0.820587 3 0.379184 4 0.454606 5 0.629850 6 0.444346 7 0.243404 8 0.125773 9 0.825495 dtype: float64

Show Prior Series

s
0 0.026916 1 0.899089 2 0.820587 3 0.379184 4 0.454606 5 0.629850 6 0.444346 7 0.243404 8 0.125773 9 0.825495 dtype: float64

Make a Series with Named Index

s = Series( np.random.rand( 10 ), ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"] )
a 0.206310 b 0.504339 c 0.680141 d 0.010279 e 0.365725 f 0.597950 g 0.130547 h 0.952285 i 0.683855 j 0.810067 dtype: float64

Index for Series

s.index
Index(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'], dtype='object')

Distinct values (rounded) in Series

s.round( 1 ).value_counts()
0.7 2 0.2 1 0.5 1 0.0 1 0.4 1 0.6 1 0.1 1 1.0 1 0.8 1 Name: count, dtype: int64

Dataframes

Make a Dataframe

df = DataFrame( np.random.rand( 6, 4 ))
0 1 2 3 0 0.722405 0.222979 0.455354 0.728915 1 0.262425 0.625404 0.899462 0.289853 2 0.398877 0.898858 0.480237 0.278750 3 0.686958 0.374376 0.225608 0.708383 4 0.152364 0.652888 0.512709 0.698512 5 0.818334 0.465584 0.550335 0.581760

Columns in DataFrame

df.columns
[0, 1, 2, 3]

Rows (index) of DataFrame

df.index
[0, 1, 2, 3, 4, 5]

Transpose of Dataframe

df.T
0 1 2 3 4 5 0 0.722405 0.262425 0.398877 0.686958 0.152364 0.818334 1 0.222979 0.625404 0.898858 0.374376 0.652888 0.465584 2 0.455354 0.899462 0.480237 0.225608 0.512709 0.550335 3 0.728915 0.289853 0.278750 0.708383 0.698512 0.581760

Description of Dataframe

df.describe()
0 1 2 3 count 6.000000 6.000000 6.000000 6.000000 mean 0.506894 0.540015 0.520617 0.547695 std 0.273125 0.237608 0.217894 0.210419 min 0.152364 0.222979 0.225608 0.278750 25% 0.296538 0.397178 0.461575 0.362829 50% 0.542918 0.545494 0.496473 0.640136 75% 0.713544 0.646017 0.540928 0.705915 max 0.818334 0.898858 0.899462 0.728915

Information about Dataframe

df.info()
RangeIndex: 6 entries, 0 to 5 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 0 6 non-null float64 1 1 6 non-null float64 2 2 6 non-null float64 3 3 6 non-null float64 dtypes: float64(4) memory usage: 324.0 bytes

Shape of Dataframe

df.shape
(6, 4)

Rounded Dataframe

df.round(1)
0 1 2 3 0 0.7 0.2 0.5 0.7 1 0.3 0.6 0.9 0.3 2 0.4 0.9 0.5 0.3 3 0.7 0.4 0.2 0.7 4 0.2 0.7 0.5 0.7 5 0.8 0.5 0.6 0.6

Value Counts of Rounded Dataframe

df.round(1).value_counts()
0 1 2 3 0.2 0.7 0.5 0.7 1 0.3 0.6 0.9 0.3 1 0.4 0.9 0.5 0.3 1 0.7 0.2 0.5 0.7 1 0.4 0.2 0.7 1 0.8 0.5 0.6 0.6 1 Name: count, dtype: int64

Unique Value Counts of Rounded Dataframe

df.round(1).nunique()
0 5 1 6 2 4 3 3 dtype: int64

Rounded Selected Column

df[2].round(1)
0 0.5 1 0.9 2 0.5 3 0.2 4 0.5 5 0.6 Name: 2, dtype: float64

Value Counts of Rounded Selected Column

df[2].round(1).value_counts()
2 0.5 3 0.9 1 0.2 1 0.6 1 Name: count, dtype: int64

Unique Value Counts of Rounded Selected Column

df[2].round(1).nunique()
4

Get Array from DataFrame for Faster Iteration

df.values
[[0.72240536 0.22297917 0.45535418 0.72891476] [0.26242474 0.62540412 0.8994617 0.28985255] [0.39887738 0.89885752 0.48023714 0.27875013] [0.68695812 0.37437587 0.22560782 0.70838271] [0.15236431 0.65288795 0.51270861 0.69851235] [0.81833438 0.46558359 0.55033471 0.58176026]]

Assign Dataframe Index to Variable

idx = df.index
[0, 1, 2, 3, 4, 5]

The following two examples are explained in detail in a later section.

Drop Rows Meeting Condition in Specified Column

df.drop( df[ df[2] < .5].index, axis=0 )
0 1 2 3 1 0.262425 0.625404 0.899462 0.289853 4 0.152364 0.652888 0.512709 0.698512 5 0.818334 0.465584 0.550335 0.581760

Drop Columns Meeting Condition in Specified Row

df.drop( df.loc[ :, df.iloc[ 2, : ] < .5 ].columns, axis=1 )
1 0 0.222979 1 0.625404 2 0.898858 3 0.374376 4 0.652888 5 0.465584

A Few Properties of DataFrame Index

idx.is_monotonic_increasing
True

idx.is_monotonic_decreasing
False

idx.has_duplicates
False

idx.nlevels
1

idx.values
[0 1 2 3 4 5]

idx.tolist()
[0, 1, 2, 3, 4, 5]

idx.min()
0

idx.max()
5

Make a DataFrame from Argument List with Labeled Rows and Columns

df = DataFrame( [[ 1, 2, 3 ], [4, 5, 6], [7, 8, 9]], index = [ 'R1', 'R2', 'R3' ], columns = ['C1', 'C2', 'C3'] )
C1 C2 C3 R1 1 2 3 R2 4 5 6 R3 7 8 9

Assign CSV Data to a Variable

csv_data = """Color, Code, Alt-Code R1, Red, #ff0000, 255/0/0 R2, Green, #00ff00, 0/255/0 R3, Blue, #0000ff, 0/0/255"""

Make a DataFrame from CSV Data with Labeled Rows and Columns

df = pd.read_csv( StringIO( csv_data ), header = 0, index_col = 0, skipinitialspace = True )
Color Code Alt-Code R1 Red #ff0000 255/0/0 R2 Green #00ff00 0/255/0 R3 Blue #0000ff 0/0/255

Save DataFrame for Later Reference

df_colors = df
Color Code Alt-Code R1 Red #ff0000 255/0/0 R2 Green #00ff00 0/255/0 R3 Blue #0000ff 0/0/255

Assign Dictionary to a Variable

dict_data = { 'c0' : [ 1, 2, 3, 4, 5 ], 'c1' : [ 2, 3, 4, 5, 6 ], 'c2' : [ 3, 4, 5, 6, 7 ] }

Make a DataFrame from Dictionary with Labeled Columns

df = DataFrame( dict_data )
c0 c1 c2 0 1 2 3 1 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7

Assign Dictionary to a Variable

dict_data = { 'r0' : [ 1, 2, 3, 4, 5 ], 'r1' : [ 2, 3, 4, 5, 6 ], 'r2' : [ 3, 4, 5, 6, 7 ] }

Make a DataFrame from Dictionary with Labeled Rows

df = DataFrame.from_dict( dict_data, orient='index' )
0 1 2 3 4 r0 1 2 3 4 5 r1 2 3 4 5 6 r2 3 4 5 6 7

Assign Dictionary to a Variable

# global dict_data dict_data = { 'r0' : { 'c0' : 1, 'c1' : 2, 'c2' : 3, 'c3' : 4, 'c4' : 5 }, 'r1' : { 'c0' : 2, 'c1' : 3, 'c2' : 4, 'c3' : 5, 'c4' : 6 }, 'r2' : { 'c0' : 3, 'c1' : 4, 'c2' : 5, 'c3' : 6, 'c4' : 7 } }

Make a DataFrame from Dictionary with Labeled Rows and Columns

df = DataFrame.from_dict( dict_data, orient='index' )
c0 c1 c2 c3 c4 r0 1 2 3 4 5 r1 2 3 4 5 6 r2 3 4 5 6 7

Convert DataFrame to String

df.to_string()
c0 c1 c2 c3 c4 r0 1 2 3 4 5 r1 2 3 4 5 6 r2 3 4 5 6 7

Iterate Over Data

Make a DataFrame

df = DataFrame( [[ 1, 2 ], [3, 4]], index = [ 'R1', 'R2' ], columns = ['C1', 'C2'] )
C1 C2 R1 1 2 R2 3 4

Iterate on Columns

for n, s in df.items(): # df.iteritems() deprecated print("n:", n) print("s:") print(s) print()
n: C1 s: R1 1 R2 3 Name: C1, dtype: int64 n: C2 s: R1 2 R2 4 Name: C2, dtype: int64

Iterate on Rows

for n, s in df.iterrows(): print("n:", n) print("s:") print(s) print()
n: R1 s: C1 1 C2 2 Name: R1, dtype: int64 n: R2 s: C1 3 C2 4 Name: R2, dtype: int64

Iterate on .values for Fast Access

for row in df.values: print("row:", row) for v in row: print(" v:", v)
row: [1 2] v: 1 v: 2 row: [3 4] v: 3 v: 4

Series Indexing

Make a Series

s = Series( np.random.rand( 5 ), index = ['x0', 'x1', 'x2', 'x3', 'x4' ] )
x0 0.909917 x1 0.909049 x2 0.627205 x3 0.061164 x4 0.574769 dtype: float64

Index by Label

s[ 'x2' ]
0.6272046954527306

Index by Attribute

s.x2
0.6272046954527306

Index by Label with .loc

s.loc[ 'x2' ]
0.6272046954527306

Index by Integer with .iloc

s.iloc[ 2 ]
0.6272046954527306

DataFrame Indexing

Fetch Saved DataFrame

df = df_colors
Color Code Alt-Code R1 Red #ff0000 255/0/0 R2 Green #00ff00 0/255/0 R3 Blue #0000ff 0/0/255

Index Column by Label

df[ 'Color' ]
R1 Red R2 Green R3 Blue Name: Color, dtype: object

Index Column by Attribute

df.Color
R1 Red R2 Green R3 Blue Name: Color, dtype: object

Index Column by Integer with .iloc

df.iloc[ :, 2 ]
R1 255/0/0 R2 0/255/0 R3 0/0/255 Name: Alt-Code, dtype: object

Index Row by Label with .loc

df.loc[ 'R1' ]
Color Red Code #ff0000 Alt-Code 255/0/0 Name: R1, dtype: object

Index Row by Integer with .iloc

df.iloc[ 2 ]
Color Blue Code #0000ff Alt-Code 0/0/255 Name: R3, dtype: object

Index Row/Col by Labels with .loc

df.loc[ 'R1', 'Color' ]
Red

Index Row/Col by Integers with .iloc

df.iloc[ 1, 1 ]
#00ff00

Index by Contents of String

df.query( 'Color == "Green"' )
Color Code Alt-Code R2 Green #00ff00 0/255/0

Boolean Indexing

Make a Dataframe

df = DataFrame( np.random.rand( 6, 4 ))
0 1 2 3 0 0.189668 0.005349 0.512844 0.247548 1 0.533455 0.262488 0.751297 0.399315 2 0.169302 0.351072 0.083203 0.037990 3 0.125548 0.468077 0.056051 0.152374 4 0.171021 0.557780 0.603718 0.810708 5 0.606764 0.345117 0.337621 0.250499

Make Boolean Series on Rows

df[2] < .5
0 False 1 False 2 True 3 True 4 False 5 True Name: 2, dtype: bool

Index Rows by Boolean Series

df[ df[2] < .5 ]
0 1 2 3 2 0.169302 0.351072 0.083203 0.037990 3 0.125548 0.468077 0.056051 0.152374 5 0.606764 0.345117 0.337621 0.250499

Index Rows by Boolean Series

df[ df[2] > .5 ]
0 1 2 3 0 0.189668 0.005349 0.512844 0.247548 1 0.533455 0.262488 0.751297 0.399315 4 0.171021 0.557780 0.603718 0.810708

Make Boolean Series on Columns

df.iloc[:,2] < .5
0 False 1 False 2 True 3 True 4 False 5 True Name: 2, dtype: bool

Index Columns by Boolean Series

df[ df.iloc[:,2] < .5 ]
0 1 2 3 2 0.169302 0.351072 0.083203 0.037990 3 0.125548 0.468077 0.056051 0.152374 5 0.606764 0.345117 0.337621 0.250499

Index Columns by Boolean Series

df[ df.iloc[:,2] > .5 ]
0 1 2 3 0 0.189668 0.005349 0.512844 0.247548 1 0.533455 0.262488 0.751297 0.399315 4 0.171021 0.557780 0.603718 0.810708

Operations on Series / Dataframes

Get Series from Saved Dataframe

s = df_colors[ 'Color' ]
R1 Red R2 Green R3 Blue Name: Color, dtype: object

map() - execute function on each element of series

s.map( lambda x: print( x ))
Red Green Blue

apply() - execute function on each column of dataframe

df_colors.apply( lambda x: print( x ), axis=0 )
R1 Red R2 Green R3 Blue Name: Color, dtype: object R1 #ff0000 R2 #00ff00 R3 #0000ff Name: Code, dtype: object R1 255/0/0 R2 0/255/0 R3 0/0/255 Name: Alt-Code, dtype: object

apply() - execute function on each row of dataframe

df_colors.apply( lambda x: print( x ), axis=1 )
Color Red Code #ff0000 Alt-Code 255/0/0 Name: R1, dtype: object Color Green Code #00ff00 Alt-Code 0/255/0 Name: R2, dtype: object Color Blue Code #0000ff Alt-Code 0/0/255 Name: R3, dtype: object

map() - execute function on each element of dataframe

df_colors.map( lambda x: print( x ))
Red Green Blue #ff0000 #00ff00 #0000ff 255/0/0 0/255/0 0/0/255

Sorting Series

Make Series

s = Series( np.random.rand( 5 ), index=[ 'Ix', 'Ie', 'Ia', 'It', 'Io'])
Ix 0.413698 Ie 0.571940 Ia 0.670323 It 0.909708 Io 0.307946 dtype: float64

Sort by Index

s.sort_index()
Ia 0.670323 Ie 0.571940 Io 0.307946 It 0.909708 Ix 0.413698 dtype: float64

Sort by Value

s.sort_values()
Io 0.307946 Ix 0.413698 Ie 0.571940 Ia 0.670323 It 0.909708 dtype: float64

Sorting Dataframe

Make Dataframe

df = DataFrame( np.random.rand( 6, 4 ), index=['Rc', 'Ra', 'Rx', 'Rj', 'Rw', 'Rn' ], columns = ['Ce', 'Cr', 'Ci', 'Cq' ] )
Ce Cr Ci Cq Rc 0.275871 0.886893 0.729471 0.323904 Ra 0.086272 0.128443 0.756100 0.417623 Rx 0.052068 0.286391 0.072795 0.913555 Rj 0.916973 0.653368 0.643975 0.902830 Rw 0.411599 0.386706 0.118761 0.317172 Rn 0.975480 0.375994 0.468117 0.317176

Sort by Row Index

df.sort_index()
Ce Cr Ci Cq Ra 0.086272 0.128443 0.756100 0.417623 Rc 0.275871 0.886893 0.729471 0.323904 Rj 0.916973 0.653368 0.643975 0.902830 Rn 0.975480 0.375994 0.468117 0.317176 Rw 0.411599 0.386706 0.118761 0.317172 Rx 0.052068 0.286391 0.072795 0.913555

Sort by Column Index

df.sort_index( axis=1 )
Ce Ci Cq Cr Rc 0.275871 0.729471 0.323904 0.886893 Ra 0.086272 0.756100 0.417623 0.128443 Rx 0.052068 0.072795 0.913555 0.286391 Rj 0.916973 0.643975 0.902830 0.653368 Rw 0.411599 0.118761 0.317172 0.386706 Rn 0.975480 0.468117 0.317176 0.375994

Sort by Column Value

df.sort_values( by='Cr' )
Ce Cr Ci Cq Ra 0.086272 0.128443 0.756100 0.417623 Rx 0.052068 0.286391 0.072795 0.913555 Rn 0.975480 0.375994 0.468117 0.317176 Rw 0.411599 0.386706 0.118761 0.317172 Rj 0.916973 0.653368 0.643975 0.902830 Rc 0.275871 0.886893 0.729471 0.323904

Sort by Row Value

df.sort_values( by='Rj', axis=1 )
Ci Cr Cq Ce Rc 0.729471 0.886893 0.323904 0.275871 Ra 0.756100 0.128443 0.417623 0.086272 Rx 0.072795 0.286391 0.913555 0.052068 Rj 0.643975 0.653368 0.902830 0.916973 Rw 0.118761 0.386706 0.317172 0.411599 Rn 0.468117 0.375994 0.317176 0.975480

Summarize Functions on Dataframe

Make Dataframe

df = DataFrame( np.random.rand( 6, 4 ))
0 1 2 3 0 0.269477 0.094294 0.495846 0.815584 1 0.338004 0.827806 0.834389 0.421820 2 0.265565 0.832254 0.895707 0.674183 3 0.895362 0.279115 0.000134 0.884850 4 0.299228 0.042413 0.903832 0.588768 5 0.822910 0.451138 0.526698 0.213542

Sum

df.sum()
0 2.890545 1 2.527020 2 3.656606 3 3.598748 dtype: float64

Count

df.count()
0 6 1 6 2 6 3 6 dtype: int64

Median

df.median()
0 0.318616 1 0.365126 2 0.680543 3 0.631475 dtype: float64

Quantile

df.quantile([0.25, 0.5, 0.75])
0 1 2 3 0.25 0.276915 0.140499 0.503559 0.463557 0.50 0.318616 0.365126 0.680543 0.631475 0.75 0.701684 0.733639 0.880377 0.780234

Min

df.min()
0 0.265565 1 0.042413 2 0.000134 3 0.213542 dtype: float64

Max

df.max()
0 0.895362 1 0.832254 2 0.903832 3 0.884850 dtype: float64

Mean

df.mean()
0 0.481758 1 0.421170 2 0.609434 3 0.599791 dtype: float64

Var

df.var()
0 0.086648 1 0.121145 2 0.122044 3 0.062869 dtype: float64

Std

df.std()
0 0.294360 1 0.348059 2 0.349348 3 0.250736 dtype: float64

Variations of Summarize Functions Shown for sum()

Sum Columns

df.sum()
0 2.890545 1 2.527020 2 3.656606 3 3.598748 dtype: float64

Sum Rows

df.sum( axis=1 )
0 1.675201 1 2.422020 2 2.667709 3 2.059461 4 1.834240 5 2.014288 dtype: float64

Sum Entire Dataframe

df.sum( axis=1 ).sum()
12.6729190770787

Histograms

Make Dataframe

df = DataFrame( np.random.normal( loc = 5, scale = 2, size = [1000, 4] ), dtype = float ).round()
0 1 2 3 0 9.0 7.0 3.0 8.0 1 3.0 6.0 2.0 8.0 2 6.0 4.0 4.0 2.0 3 6.0 2.0 9.0 3.0 4 4.0 7.0 6.0 11.0 .. ... ... ... ... 995 5.0 6.0 4.0 1.0 996 1.0 5.0 4.0 5.0 997 5.0 1.0 4.0 3.0 998 4.0 6.0 6.0 2.0 999 7.0 5.0 5.0 4.0 [1000 rows x 4 columns]

Histogram using pd.value_counts

s = df.apply( lambda x: x.value_counts()).fillna(0)
0 1 2 3 -3.0 0.0 1.0 0.0 0.0 -2.0 1.0 0.0 2.0 0.0 -1.0 6.0 1.0 2.0 6.0 -0.0 8.0 8.0 8.0 5.0 1.0 28.0 25.0 33.0 26.0 2.0 65.0 70.0 67.0 71.0 3.0 128.0 111.0 123.0 112.0 4.0 158.0 158.0 177.0 177.0 5.0 176.0 201.0 202.0 218.0 6.0 174.0 182.0 166.0 157.0 7.0 140.0 121.0 120.0 116.0 8.0 77.0 91.0 65.0 71.0 9.0 32.0 24.0 23.0 30.0 10.0 5.0 5.0 11.0 8.0 11.0 2.0 2.0 1.0 3.0
Embedded Image

Histogram Using Pandas .hist()

h = df.hist( sharex=True, sharey=True, linewidth=1, figsize=(Fig_X, Fig_Y) )
[[ ] [ ]]
Embedded Image

Get Series from Column 0 of Dataframe

s = df.iloc[ :, 0].values
[ 9. 3. 6. 6. 4. 2. 6. 3. 2. 7. 6. 9. 6. 5. 4. 9. 8. 7. 4. 2. 3. 5. 8. 6. 3. 6. 3. 6. 5. 2. 6. 4. 6. 2. 7. 4. 6. 6. 7. 8. 8. 4. 8. 9. 4. 5. 3. 5. 8. 5. -1. 5. 7. 5. 3. 6. 4. 7. 5. 3. 4. 5. 5. 4. 7. 4. 4. 2. 4. 2. 1. 8. 7. 5. 4. 5. 7. 4. 3. 2. 4. 8. 6. 4. 4. 8. 8. 7. 7. 5. 3. 2. 6. 5. 7. 2. 3. 6. 7. 4. 3. 4. 6. 8. -0. 6. 8. 6. 3. 4. 9. 8. 7. 5. 6. 8. 6. 6. 2. 2. 3. 4. 4. 6. 7. 7. 2. 7. 2. 4. 7. 2. 6. 9. 4. 6. 6. 7. -0. 4. 6. 7. 6. 5. 5. 3. 6. 5. 7. 5. 6. 6. 3. 7. 3. 5. 3. 4. 4. 8. 4. 9. 7. 1. 3. 7. 2. 1. 2. 6. 3. 5. 6. 6. 4. 3. 7. 4. 6. 4. 6. 5. 6. 6. 5. 8. 7. 5. 6. 3. -1. 6. 5. 5. 6. 6. 7. 6. 6. 6. 8. 7. 6. 6. 7. 2. 4. 7. 1. 5. 8. 1. 3. 7. 5. 3. 2. 3. 2. 5. 5. 5. 3. 5. 8. 8. 7. 4. 7. 7. 3. 5. 10. 9. 8. 7. 6. 9. 7. 5. 5. 6. 4. 6. 6. 4. 5. -1. 6. 4. 5. 4. 3. 4. 1. 3. 4. 7. 3. 9. 5. 6. 5. 6. 3. 3. 4. 7. 5. 4. 5. 4. 4. 7. 5. 5. 6. 6. 7. 5. 5. 6. 5. 4. 6. 6. 8. 1. 1. 9. 4. 2. 4. 7. 6. 6. 5. 6. 6. 5. 6. 5. 4. 5. 2. 3. 2. 8. 6. 5. 6. 6. 9. 0. 4. 5. 6. 3. 6. 5. 7. 9. 3. 2. 3. 6. 5. 4. 7. 4. 7. 7. 6. 4. 9. 5. 5. 6. 8. 5. 8. 5. 11. 8. 5. 7. 2. 5. 5. 6. 5. 2. 3. 9. 8. 7. 3. 4. 3. 3. 6. 3. 3. 5. 7. 5. 5. 3. 7. 2. 7. 7. 6. 2. 7. 6. 5. 3. 5. 4. 3. 7. 5. 3. 5. 8. 5. 3. 4. 6. 6. 5. 4. 4. 3. 7. 7. 4. 4. 7. 7. 5. 8. 3. 8. 4. 3. 4. 2. 7. 5. 6. 5. 4. 7. 5. -1. 8. 9. 8. 8. 7. 4. 3. 8. 5. 6. 4. 4. 3. 4. 7. 4. 4. 7. 6. 8. 4. 5. 3. 4. 7. 5. 4. 4. 8. 2. 5. 5. 7. 8. 3. 3. 8. 6. 4. 3. 9. 4. 5. 4. 4. 3. 7. 9. 2. 4. 7. 6. 6. 6. 3. 5. 2. 4. 4. 5. 6. 8. 8. 0. 3. -1. 4. 2. 3. 5. 7. 9. 7. 8. 4. 6. 4. 6. 4. 3. 1. 3. 4. 4. 5. 4. 5. 5. 3. 5. 3. 3. 8. 3. 5. 3. 5. 5. 7. 1. 7. 7. 3. 5. 3. 7. 5. 4. 5. 3. 6. 4. 6. 6. 6. 3. 6. 10. 5. 7. 5. 6. 3. 6. 4. 1. 6. 4. 7. 2. 6. 6. 6. 4. 4. 6. 3. 5. 6. 1. 3. 5. 5. 3. 7. 8. 2. 4. 8. 7. 4. 4. 5. 4. 3. 5. 6. 6. 6. 5. 7. 3. 7. 3. 2. 5. 8. -1. 4. 6. 4. 5. 3. 6. 7. 4. 4. 6. 8. 4. 2. 8. 4. 7. 3. 2. 4. 7. 3. 6. 2. 7. 5. 6. 6. 7. 5. 5. 3. 7. 5. 4. 5. 4. 3. 7. -0. 7. 5. 5. 4. 8. 2. 2. 4. 2. 3. 4. 6. 4. 2. 5. 7. 6. 7. 8. 8. 6. 7. 5. 8. 4. 3. 2. 5. 9. 8. 6. 7. 5. 8. 5. 3. 2. 4. 5. 3. 5. 2. 7. 5. 5. 2. 7. 6. 6. 7. 3. 7. 5. 4. 8. 4. 8. 9. 9. 10. 6. 1. 6. 5. 0. 5. 7. 3. 4. 3. 8. 5. 4. 4. 4. 3. 10. 3. 4. 4. 11. 7. 2. 5. 7. 5. 7. 4. 1. 3. 5. 3. 6. 3. 9. 2. 5. 3. 6. 7. 6. 8. 8. 6. 2. 6. 3. 2. 4. 5. 7. 0. 4. 8. 5. 2. 6. 9. 7. 7. 2. 4. 2. 9. 3. 4. 6. 6. 3. 5. 2. 8. -2. 7. 4. 6. 5. 4. 7. 3. 3. 5. 7. 7. 5. 5. 6. 6. 2. 6. 2. 6. 3. 8. 7. 8. 1. 7. 3. 4. 7. 8. 6. 4. 3. 1. 7. 7. 3. 7. 6. 5. 7. 3. 5. 6. 3. 9. 3. 2. 9. 8. 4. 3. 4. 6. 7. 3. 3. 3. 6. 1. 9. 5. 6. 3. 5. 5. 7. 3. 4. 3. 7. 4. 7. 5. 5. 6. 5. 6. 4. 4. 3. 9. 2. 5. 6. 7. 8. 6. 10. 7. 5. 5. 6. 7. 6. 3. 6. 4. 7. 9. 6. 5. 1. 5. 6. 6. 4. 4. 3. 3. 7. 2. 5. 9. 4. 7. 8. 4. 7. 5. 6. 6. 1. 5. 5. 3. 6. 4. 3. 4. 7. 7. 4. 5. 6. 2. 7. 2. 7. 6. 6. 7. 6. 6. 6. 4. 8. 6. 1. 4. 7. 4. 7. 7. 4. 4. 5. 6. 7. 4. 4. 2. 5. 5. 8. 8. 6. 3. 5. 0. 6. 8. 6. 1. 5. 5. 3. 5. 1. 4. 7. 7. 4. 6. 5. 7. 5. 5. 1. 6. 5. 7. 6. 8. 7. 4. 6. 6. 1. 8. 6. 8. 2. 6. 5. 5. 6. 6. 3. 5. 2. 1. 3. 9. 6. 5. 4. 5. 4. 6. 5. 5. 1. 4. 4. 6. 8. 4. 3. 8. 3. 3. 8. 7. 5. 5. 1. 8. 6. 5. 1. 5. 4. 7.]
Embedded Image

Histogram using np.histogram()

h = np.histogram( s )
(array([ 7, 8, 28, 193, 158, 176, 314, 77, 32, 7]), array([-2. , -0.7, 0.6, 1.9, 3.2, 4.5, 5.8, 7.1, 8.4, 9.7, 11. ]))
Embedded Image

Miscellaneous

Fetch Saved DataFrame

df = df_colors
Color Code Alt-Code R1 Red #ff0000 255/0/0 R2 Green #00ff00 0/255/0 R3 Blue #0000ff 0/0/255

Get JSON from Dataframe

df.to_json( orient='records', lines=True )
{"Color":"Red","Code":"#ff0000","Alt-Code":"255\/0\/0"} {"Color":"Green","Code":"#00ff00","Alt-Code":"0\/255\/0"} {"Color":"Blue","Code":"#0000ff","Alt-Code":"0\/0\/255"}

Series with MultiIndex

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] tuples = list(zip(*arrays)) index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second']) s = pd.Series(np.random.randn(8), index=index) print(s)
first second bar one 0.082317 two -0.291566 baz one -0.849155 two -0.374036 foo one 3.025427 two -0.671678 qux one -0.698915 two 0.380036 dtype: float64

Dataframe with MultiIndex

df = DataFrame( np.arange(12).reshape((4,3)), index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2 ]], columns = [['NJ', 'Ohio', 'Colorado' ]] )
NJ Ohio Colorado a 1 0 1 2 2 3 4 5 b 1 6 7 8 2 9 10 11

Show Dataframe with MultiIndex

for x, ndf in df.groupby(level=0): print(ndf) print("x:", x)
NJ Ohio Colorado a 1 0 1 2 2 3 4 5 x: a NJ Ohio Colorado b 1 6 7 8 2 9 10 11 x: b

Make a Series

s = pd.Series( { 'I1': '1234', 'I2': '2345', 'I3': '3456' } )
I1 1234 I2 2345 I3 3456 dtype: object

Series to DataFrame

df = s.to_frame()
0 I1 1234 I2 2345 I3 3456

DataFrame Transposed

df.T
I1 I2 I3 0 1234 2345 3456

Pivot Tables

Make a Dataframe with Named Columns

df = pd.DataFrame( [ ["Alice", "North", 200 ], ["Alice", "South", 150 ], ["Bob", "East", 100 ], ["Bob", "West", 100 ], ["Charlie", "North", 120 ], ["Charlie", "West", 200 ], ], columns = [ "Agent", "Region", "Sales" ] )
Agent Region Sales 0 Alice North 200 1 Alice South 150 2 Bob East 100 3 Bob West 100 4 Charlie North 120 5 Charlie West 200

Pivot

df.pivot(index='Agent', columns='Region', values='Sales')
Region East North South West Agent Alice NaN 200.0 150.0 NaN Bob 100.0 NaN NaN 100.0 Charlie NaN 120.0 NaN 200.0

Pivot

df.pivot_table(index='Agent', columns='Region', values='Sales')
Region East North South West Agent Alice NaN 200.0 150.0 NaN Bob 100.0 NaN NaN 100.0 Charlie NaN 120.0 NaN 200.0

Concatenation

Make Dataframes

df1 = DataFrame( np.random.rand( 2, 3 ))
0 1 2 0 0.440904 0.256521 0.587847 1 0.306891 0.165160 0.803135
df2 = DataFrame( np.random.rand( 4, 3 ))
0 1 2 0 0.130194 0.769418 0.654715 1 0.911438 0.554571 0.296708 2 0.862839 0.067075 0.131169 3 0.836283 0.491386 0.153753
df3 = DataFrame( np.random.rand( 2, 4 ))
0 1 2 3 0 0.494061 0.619880 0.435683 0.047196 1 0.083486 0.893137 0.880059 0.356199

Concatenation Dataframes Vertically

pd.concat( [df1, df2] )
0 1 2 0 0.440904 0.256521 0.587847 1 0.306891 0.165160 0.803135 0 0.130194 0.769418 0.654715 1 0.911438 0.554571 0.296708 2 0.862839 0.067075 0.131169 3 0.836283 0.491386 0.153753

Concatenation Dataframes Horizontally

pd.concat( [df1, df3], axis=1 )
0 1 2 0 1 2 3 0 0.440904 0.256521 0.587847 0.494061 0.619880 0.435683 0.047196 1 0.306891 0.165160 0.803135 0.083486 0.893137 0.880059 0.356199

Make a Dataframe

df = DataFrame( np.random.rand( 2, 3 ))
0 1 2 0 0.576904 0.881436 0.335952 1 0.026668 0.814627 0.499610

Make a Series

s = Series( np.random.rand( 3 ))
0 0.821551 1 0.306187 2 0.408363 dtype: float64

Concat Dataframe and Series

pd.concat( [df, s.to_frame().T ] )
0 1 2 0 0.576904 0.881436 0.335952 1 0.026668 0.814627 0.499610 0 0.821551 0.306187 0.408363

Incremental, Detailed Development of Selected Concepts

With some overlap of other examples and in excruciating detail we show here the development of selectors for drop and change. All with integer indexing.


Incremental Development of Row Selection

Make a Dataframe

df = DataFrame( np.random.rand( 6, 4 ))
0 1 2 3 0 0.589165 0.094179 0.057888 0.285678 1 0.397765 0.302189 0.626236 0.032388 2 0.097558 0.984696 0.597267 0.897242 3 0.985481 0.093285 0.102098 0.879075 4 0.543510 0.030003 0.685510 0.420352 5 0.846434 0.647452 0.306411 0.429671

Make Explicit Boolean Row Index

bi = [True, False, True, False, True, False]
[True, False, True, False, True, False]

Dataframe Matching Boolean Row Index

df[ bi ]
0 1 2 3 0 0.589165 0.094179 0.057888 0.285678 2 0.097558 0.984696 0.597267 0.897242 4 0.543510 0.030003 0.685510 0.420352

Drop Test Column

df[2]
0 0.057888 1 0.626236 2 0.597267 3 0.102098 4 0.685510 5 0.306411 Name: 2, dtype: float64

Drop Row Condition

df[2] < .5
0 True 1 False 2 False 3 True 4 False 5 True Name: 2, dtype: bool

Dataframe Matching Drop Condition

df[ df[2] < .5 ]
0 1 2 3 0 0.589165 0.094179 0.057888 0.285678 3 0.985481 0.093285 0.102098 0.879075 5 0.846434 0.647452 0.306411 0.429671

Dataframe Matching Drop Condition Index

df[ df[2] < .5 ].index
Index([0, 3, 5], dtype='int64')

Dataframe Matching Drop Condition Index with List Expansion

df[ df[2] < .5 ].index
[0, 3, 5]

Drop Rows Meeting Column Condition

df.drop( df[ df[2] < .5].index, axis=0 )
0 1 2 3 1 0.397765 0.302189 0.626236 0.032388 2 0.097558 0.984696 0.597267 0.897242 4 0.543510 0.030003 0.685510 0.420352

Change Column Values Meeting Condition

df.iloc[ df[2] < .5, 2 ] = 99999

Show Updated Dataframe

df
0 1 2 3 0 0.589165 0.094179 99999.000000 0.285678 1 0.397765 0.302189 0.626236 0.032388 2 0.097558 0.984696 0.597267 0.897242 3 0.985481 0.093285 99999.000000 0.879075 4 0.543510 0.030003 0.685510 0.420352 5 0.846434 0.647452 99999.000000 0.429671

Incremental Development of Column Selection

Make a Dataframe

df = DataFrame( np.random.rand( 6, 4 ))
0 1 2 3 0 0.025985 0.756718 0.706298 0.717380 1 0.540794 0.097740 0.087657 0.529131 2 0.316994 0.539271 0.035169 0.362495 3 0.247725 0.168063 0.758273 0.332343 4 0.127355 0.355506 0.956690 0.128359 5 0.189782 0.626763 0.342883 0.288338

Make Explicit Boolean Column Index

bi = [True, False, True, False ]
[True, False, True, False]

Dataframe Matching Explicit Boolean Column Index

df.iloc[ :, bi ]
0 2 0 0.025985 0.706298 1 0.540794 0.087657 2 0.316994 0.035169 3 0.247725 0.758273 4 0.127355 0.956690 5 0.189782 0.342883

Drop Test Row

df.iloc[2]
0 0.316994 1 0.539271 2 0.035169 3 0.362495 Name: 2, dtype: float64

Drop Column Condition

df.iloc[ 2, : ] < .5
0 True 1 False 2 True 3 True Name: 2, dtype: bool

Dataframe Matching Drop Condition

df.loc[ :, df.iloc[ 2, : ] < .5 ]
0 2 3 0 0.025985 0.706298 0.717380 1 0.540794 0.087657 0.529131 2 0.316994 0.035169 0.362495 3 0.247725 0.758273 0.332343 4 0.127355 0.956690 0.128359 5 0.189782 0.342883 0.288338

Dataframe Matching Drop Condition Column Index

df.loc[ :, df.iloc[ 2, : ] < .5 ].columns
Index([0, 2, 3], dtype='int64')

Dataframe Matching Drop Condition Column Index with List Expansion

df.loc[ :, df.iloc[ 2, : ] < .5 ].columns
[0, 2, 3]

Drop Columns Meeting Condition

df.drop( df.loc[ :, df.iloc[ 2, : ] < .5 ].columns, axis=1 )
1 0 0.756718 1 0.097740 2 0.539271 3 0.168063 4 0.355506 5 0.626763

Change Column Values Meeting Condition

df.loc[ :, df.iloc[ 2, : ] < .5 ] = 99999

Show Updated Dataframe

df
0 1 2 3 0 99999.0 0.756718 99999.0 99999.0 1 99999.0 0.097740 99999.0 99999.0 2 99999.0 0.539271 99999.0 99999.0 3 99999.0 0.168063 99999.0 99999.0 4 99999.0 0.355506 99999.0 99999.0 5 99999.0 0.626763 99999.0 99999.0

Arithmetic on Dataframes

Make a Dataframe

df = DataFrame( np.random.rand( 6, 4 ))
0 1 2 3 0 0.028382 0.921065 0.317498 0.942125 1 0.038418 0.089952 0.807904 0.532124 2 0.575442 0.596120 0.466168 0.299828 3 0.525795 0.534371 0.000020 0.763198 4 0.481748 0.024781 0.599480 0.223577 5 0.077146 0.818834 0.655083 0.170712

Make a Series

s = Series( np.random.rand( 4 ))
0 0.754740 1 0.244048 2 0.573210 3 0.644877 dtype: float64

Add a Constant to Every Element of Dataframe

df + 3
0 1 2 3 0 3.028382 3.921065 3.317498 3.942125 1 3.038418 3.089952 3.807904 3.532124 2 3.575442 3.596120 3.466168 3.299828 3 3.525795 3.534371 3.000020 3.763198 4 3.481748 3.024781 3.599480 3.223577 5 3.077146 3.818834 3.655083 3.170712

Add a Series to Every Row of Dataframe

df.add( s, axis='columns' )
0 1 2 3 0 0.783122 1.165113 0.890708 1.587002 1 0.793158 0.334000 1.381114 1.177001 2 1.330182 0.840168 1.039378 0.944705 3 1.280535 0.778419 0.573230 1.408075 4 1.236488 0.268829 1.172690 0.868454 5 0.831886 1.062883 1.228293 0.815589

Add a Series to Every Row of Dataframe

df + s
0 1 2 3 0 0.783122 1.165113 0.890708 1.587002 1 0.793158 0.334000 1.381114 1.177001 2 1.330182 0.840168 1.039378 0.944705 3 1.280535 0.778419 0.573230 1.408075 4 1.236488 0.268829 1.172690 0.868454 5 0.831886 1.062883 1.228293 0.815589

Add a Series to Every Column of Dataframe

df.add( s, axis='index' )
0 1 2 3 0 0.783122 1.675805 1.072238 1.696865 1 0.282467 0.334000 1.051952 0.776172 2 1.148652 1.169330 1.039378 0.873038 3 1.170672 1.179248 0.644897 1.408075 4 NaN NaN NaN NaN 5 NaN NaN NaN NaN

Add two Dataframes

df + df
0 1 2 3 0 0.056763 1.842130 0.634996 1.884250 1 0.076837 0.179903 1.615808 1.064248 2 1.150884 1.192240 0.932336 0.599656 3 1.051590 1.068742 0.000041 1.526397 4 0.963495 0.049562 1.198960 0.447154 5 0.154291 1.637669 1.310166 0.341424

Missing Data

Make a Dataframe

df = DataFrame( np.random.rand( 6, 4 ))
0 1 2 3 0 0.080150 0.082561 0.835291 0.020266 1 0.059493 0.217728 0.871861 0.909545 2 0.795321 0.711423 0.560041 0.984722 3 0.462916 0.819765 0.917924 0.137840 4 0.467936 0.623530 0.952643 0.583052 5 0.331254 0.313015 0.729224 0.525719

Add Missing Data

df.iloc[3,2] = np.nan
df
0 1 2 3 0 0.080150 0.082561 0.835291 0.020266 1 0.059493 0.217728 0.871861 0.909545 2 0.795321 0.711423 0.560041 0.984722 3 0.462916 0.819765 NaN 0.137840 4 0.467936 0.623530 0.952643 0.583052 5 0.331254 0.313015 0.729224 0.525719

Drop Rows Having Missing Data

df.dropna()
0 1 2 3 0 0.080150 0.082561 0.835291 0.020266 1 0.059493 0.217728 0.871861 0.909545 2 0.795321 0.711423 0.560041 0.984722 4 0.467936 0.623530 0.952643 0.583052 5 0.331254 0.313015 0.729224 0.525719

Drop Columns Having Missing Data

df.dropna( axis=1 )
0 1 3 0 0.080150 0.082561 0.020266 1 0.059493 0.217728 0.909545 2 0.795321 0.711423 0.984722 3 0.462916 0.819765 0.137840 4 0.467936 0.623530 0.583052 5 0.331254 0.313015 0.525719

Fill Missing Data

df.fillna( 99 )
0 1 2 3 0 0.080150 0.082561 0.835291 0.020266 1 0.059493 0.217728 0.871861 0.909545 2 0.795321 0.711423 0.560041 0.984722 3 0.462916 0.819765 99.000000 0.137840 4 0.467936 0.623530 0.952643 0.583052 5 0.331254 0.313015 0.729224 0.525719

Replace Values

Make Dataframe

df = DataFrame( np.random.rand( 6, 4 )).round(1)
0 1 2 3 0 0.5 0.5 0.8 0.8 1 0.6 0.7 0.1 0.2 2 0.9 0.1 0.2 0.7 3 1.0 0.1 0.6 0.5 4 1.0 0.7 0.6 1.0 5 0.4 1.0 0.1 0.4

Replace Single Value

df.replace( .3, 99 )
0 1 2 3 0 0.5 0.5 0.8 0.8 1 0.6 0.7 0.1 0.2 2 0.9 0.1 0.2 0.7 3 1.0 0.1 0.6 0.5 4 1.0 0.7 0.6 1.0 5 0.4 1.0 0.1 0.4

Replace Values in List

df.replace( [.2, .3, .4], [99, 98, 97] )
0 1 2 3 0 0.5 0.5 0.8 0.8 1 0.6 0.7 0.1 99.0 2 0.9 0.1 99.0 0.7 3 1.0 0.1 0.6 0.5 4 1.0 0.7 0.6 1.0 5 97.0 1.0 0.1 97.0

Replace Values in Dict

df.replace( {.2 : 97, .3 : 98, .4: 99} )
0 1 2 3 0 0.5 0.5 0.8 0.8 1 0.6 0.7 0.1 97.0 2 0.9 0.1 97.0 0.7 3 1.0 0.1 0.6 0.5 4 1.0 0.7 0.6 1.0 5 99.0 1.0 0.1 99.0

Replace Values with Where

df.where( df < .5, 99 )
0 1 2 3 0 99.0 99.0 99.0 99.0 1 99.0 99.0 0.1 0.2 2 99.0 0.1 0.2 99.0 3 99.0 0.1 99.0 99.0 4 99.0 99.0 99.0 99.0 5 0.4 99.0 0.1 0.4

Adding Columns

Make Dataframe

df = DataFrame( np.random.rand( 6, 4 )).round(1)
0 1 2 3 0 0.5 0.2 0.6 0.4 1 0.2 1.0 0.4 0.2 2 0.0 0.9 0.9 0.9 3 0.8 0.8 0.4 0.8 4 0.1 0.3 0.3 0.9 5 0.6 0.2 0.1 0.8

Add Column with Assign

df.assign( Max = df.max( axis=1 ))
0 1 2 3 Max 0 0.5 0.2 0.6 0.4 0.6 1 0.2 1.0 0.4 0.2 1.0 2 0.0 0.9 0.9 0.9 0.9 3 0.8 0.8 0.4 0.8 0.8 4 0.1 0.3 0.3 0.9 0.9 5 0.6 0.2 0.1 0.8 0.8

Add Column with Explicit Assignment

df['Min'] = df.min( axis=1 ) print(df)
0 1 2 3 Min 0 0.5 0.2 0.6 0.4 0.2 1 0.2 1.0 0.4 0.2 0.2 2 0.0 0.9 0.9 0.9 0.0 3 0.8 0.8 0.4 0.8 0.4 4 0.1 0.3 0.3 0.9 0.1 5 0.6 0.2 0.1 0.8 0.1

Merging Dataframes

Make Dataframes with Labeled Rows and Columns

df1 = DataFrame( [[ 1, 2, 3 ], [4, 5, 6], [7, 8, 9]], index = [ 'R1', 'R2', 'R3' ], columns = ['C1', 'C2', 'C3'] ) print( df1 )
C1 C2 C3 R1 1 2 3 R2 4 5 6 R3 7 8 9
df2 = DataFrame( [[ 10, 2, 3 ], [4, 5, 6], [7, 8, 9]], index = [ 'R1', 'R2', 'R4' ], columns = ['C1', 'C4', 'C5'] ) print( df2 )
C1 C4 C5 R1 10 2 3 R2 4 5 6 R4 7 8 9

Join Using Keys from Left Frame

pd.merge( df1, df2, how='left', on='C1', indicator=True)
C1 C2 C3 C4 C5 _merge 0 1 2 3 NaN NaN left_only 1 4 5 6 5.0 6.0 both 2 7 8 9 8.0 9.0 both

Join Using Keys from Right Frame

pd.merge( df1, df2, how='right', on='C1', indicator=True)
C1 C2 C3 C4 C5 _merge 0 10 NaN NaN 2 3 right_only 1 4 5.0 6.0 5 6 both 2 7 8.0 9.0 8 9 both

Join Using Union of Keys from Both Frames

pd.merge( df1, df2, how='outer', on='C1', indicator=True)
C1 C2 C3 C4 C5 _merge 0 1 2.0 3.0 NaN NaN left_only 1 4 5.0 6.0 5.0 6.0 both 2 7 8.0 9.0 8.0 9.0 both 3 10 NaN NaN 2.0 3.0 right_only

Join Using Intersection of Keys from Both Frames

pd.merge( df1, df2, how='inner', on='C1', indicator=True)
C1 C2 C3 C4 C5 _merge 0 4 5 6 5 6 both 1 7 8 9 8 9 both

Appendix


Version Information

Python

platform.python_version()
3.14.4

Pandas

pd.__version__
2.3.3

Matplotlib

matplotlib.__version__
3.10.9

Internals for the Examples

The examples here use the Python eval() function for expressions and exec() for statements. Expressions have a return value, which is shown following the code example. Statements do not have a return value and show results with calls to print(). Both are executed in the global context. Here are simple examples of eval() and exec().

Assign value to variable with eval()

var1 = 123
123

Show variable

var1
123

Assign value to variable with exec()

var2 = 234 print( var2 )
234

Index

Pandas Examples
Introduction
Preparatory
Series
Dataframes
Iterate Over Data
Series Indexing
DataFrame Indexing
Boolean Indexing
Operations on Series / Dataframes
Sorting Series
Sorting Dataframe
Summarize Functions on Dataframe
Variations of Summarize Functions Shown for sum()
Histograms
Miscellaneous
Pivot Tables
Concatenation
Incremental, Detailed Development of Selected Concepts
Incremental Development of Row Selection
Incremental Development of Column Selection
Arithmetic on Dataframes
Missing Data
Replace Values
Adding Columns
Merging Dataframes
Appendix
Version Information
Internals for the Examples