Series

Series#

Section Title: Series

A Series is a one-dimensional data structure in the Pandas library, distinguished by a labeled axis for rows.

It resembles a column in a table, with row labels referred to as the index.
A Series can contain data of any type.
It supports vectorized operations, allowing efficient computations across rows or columns, and facilitates data visualization.
The Pandas library is typically imported with the abbreviation pd for brevity in code:

import pandas as pd.

Create Series#

You can use a tuple, list, or numpy array to create a Series.

Use pd.Series() to create a series.
When using pd.Series(), you can provide row labels. If they are not provided, default values \(0, 1, 2, \ldots\) are used.
The name parameter allows you to assign a name to the Series.

import pandas as pd

state_tuple = ('NJ', 'NY', 'TX', 'AZ', 'MO')
pd.Series(state_tuple)

  NJ
  NY
  TX
  AZ
  MO
dtype: object

state_list = ['NJ', 'NY', 'TX', 'AZ', 'MO']
pd.Series(state_list)

  NJ
  NY
  TX
  AZ
  MO
dtype: object

import numpy as np
state_array = np.array(['NJ', 'NY', 'TX', 'AZ', 'MO'])
pd.Series(state_array)

  NJ
  NY
  TX
  AZ
  MO
dtype: object

You can provide row labels using the index parameter.

pd.Series( state_list  , index=['S-1', 'S-2', 'S-3', 'S-4', 'S-5'])

S-1    NJ
S-2    NY
S-3    TX
S-4    AZ
S-5    MO
dtype: object

You can assign a name to the Series using the name parameter.

pd.Series( state_list , index=['S-1', 'S-2', 'S-3', 'S-4', 'S-5'], name='States')

S-1    NJ
S-2    NY
S-3    TX
S-4    AZ
S-5    MO
Name: States, dtype: object

Accessing values#

Square brackets, along with the loc and iloc operators, can be used to access values in a Series. - For row labels: Use [row label]. - For row labels: Use loc[row label]. - For row indexes: Use iloc[row index].

We will use the following Series.

state_series = pd.Series( state_list , index=['S-1', 'S-2', 'S-3', 'S-4', 'S-5'], name='States')
state_series

S-1    NJ
S-2    NY
S-3    TX
S-4    AZ
S-5    MO
Name: States, dtype: object

The square brackets can be used to access single or multiple rows using row labels.

# row with label 'S-3'
state_series['S-3']

'TX'

To handle multiple rows, you can use a list.

# rows with labels 'S-3' and 'S-4'
state_series[['S-3','S-4']]

S-3    TX
S-4    AZ
Name: States, dtype: object

The loc operator also can be used to access single or multiple rows using row labels.

# row with label 'S-3'
state_series.loc['S-3']

'TX'

# rows with labels 'S-3' and 'S-4'
state_series.loc[['S-3','S-4']]

S-3    TX
S-4    AZ
Name: States, dtype: object

state_series.iloc[3]

'AZ'

The iloc operator can be used to access single or multiple rows using row indexes.

# row with index 2
state_series.iloc[2]

'TX'

# rows with indexes 2 and 3
state_series.iloc[[2,3]]

S-3    TX
S-4    AZ
Name: States, dtype: object

Slicing#

Square brackets, along with the loc and iloc operators, and the colon (:), can be used to access a slice in a Series.

WARNING: When using square brackets or the loc operator, the value corresponding to the end label is included. However, with the iloc operator, the value corresponding to the end index is not included.

state_series

S-1    NJ
S-2    NY
S-3    TX
S-4    AZ
S-5    MO
Name: States, dtype: object

# rows with labels: 'S-2', 'S-3', 'S-4'
state_series['S-2':'S-4']

S-2    NY
S-3    TX
S-4    AZ
Name: States, dtype: object

# rows with indexes: 1, 2
state_series.iloc[1:3]

S-2    NY
S-3    TX
Name: States, dtype: object

Adding a New Row#

To add a new row, use either of the following methods:

state_series[‘new_row_label’] = new_value
state_series.loc[‘new_row_label’] = new_value

state_series

S-1    NJ
S-2    NY
S-3    TX
S-4    AZ
S-5    MO
Name: States, dtype: object

state_series['S-6'] = 'FL'
state_series

S-1    NJ
S-2    NY
S-3    TX
S-4    AZ
S-5    MO
S-6    FL
Name: States, dtype: object

state_series.loc['S-7'] = 'CA'
state_series

S-1    NJ
S-2    NY
S-3    TX
S-4    AZ
S-5    MO
S-6    FL
S-7    CA
Name: States, dtype: object

Changing a Value#

To change a value in a Series, use any of the following methods:

state_series[‘row_label’] = new_value
state_series.loc[‘row_label’] = new_value
state_series.iloc[row_index] = new_value

The following code changes the value at the index ‘S-7’ to ‘MA’.

state_series.loc['S-7'] = 'MA'
state_series

S-1    NJ
S-2    NY
S-3    TX
S-4    AZ
S-5    MO
S-6    FL
S-7    MA
Name: States, dtype: object

Change the value at the index ‘S-7’ to ‘TN’.

state_series.loc['S-7'] = 'TN'
state_series

S-1    NJ
S-2    NY
S-3    TX
S-4    AZ
S-5    MO
S-6    FL
S-7    TN
Name: States, dtype: object

Change the value at the index 6 to ‘ID’.

state_series.iloc[6] =  'ID'
state_series

S-1    NJ
S-2    NY
S-3    TX
S-4    AZ
S-5    MO
S-6    FL
S-7    ID
Name: States, dtype: object

Changing Row Labels#

The row labels of a Series can be modified by assigning new_row_labels to state_series.index.

The following labels will be the new row labels.

new_labels = ['St-'+str(i) for i in range(1,8)]
new_labels

['St-1', 'St-2', 'St-3', 'St-4', 'St-5', 'St-6', 'St-7']

state_series.index = new_labels
state_series

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
Name: States, dtype: object

Vectorized Operations#

Vectorized operations can be performed using the values of a Series or between two or more Series.

We will use the following two test grades:

test1_grades = pd.Series( [90, 85, 70, 80, 60, 60],   index = ['Jack', 'Joe', 'Amy', 'Ted', 'Mia', 'Ben'], name='Test-1')
test1_grades 

Jack    90
Joe     85
Amy     70
Ted     80
Mia     60
Ben     60
Name: Test-1, dtype: int64

test2_grades = pd.Series( [50, 70, 30, 90, 95, 75],   index = ['Jack', 'Joe', 'Amy', 'Ted', 'Mia', 'Ben'], name='Test-2')
test2_grades 

Jack    50
Joe     70
Amy     30
Ted     90
Mia     95
Ben     75
Name: Test-2, dtype: int64

Multiply each Test-1 value by 2

2*test1_grades

Jack    180
Joe     170
Amy     140
Ted     160
Mia     120
Ben     120
Name: Test-1, dtype: int64

Divide each Test-1 value by 100

test1_grades/100

Jack    0.90
Joe     0.85
Amy     0.70
Ted     0.80
Mia     0.60
Ben     0.60
Name: Test-1, dtype: float64

Sum of Test-1 and Test-2 grades

test1_grades + test2_grades

Jack    140
Joe     155
Amy     100
Ted     170
Mia     155
Ben     135
dtype: int64

Difference of Test-1 and Test-2 grades

test1_grades - test2_grades

Jack    40
Joe     15
Amy     40
Ted    -10
Mia    -35
Ben    -15
dtype: int64

Weighted average of Test-1 and Test-2 grades.

0.3*test1_grades + 0.7*test2_grades

Jack    62.0
Joe     74.5
Amy     42.0
Ted     87.0
Mia     84.5
Ben     70.5
dtype: float64

Comparison Operations#

Comparison operators can be applied to the values of a Series, returning boolean values for each row.

test1_grades

Jack    90
Joe     85
Amy     70
Ted     80
Mia     60
Ben     60
Name: Test-1, dtype: int64

# grades less than 75 
test1_grades < 75

Jack    False
Joe     False
Amy      True
Ted     False
Mia      True
Ben      True
Name: Test-1, dtype: bool

Filtering#

Comparison operations can be used to filter rows based on the boolean values returned by comparison operators.

# the rows with test1_grades < 75
test1_grades[test1_grades < 75]

Amy    70
Mia    60
Ben    60
Name: Test-1, dtype: int64

Attributes#

Attributes provide information about the properties and characteristics of objects.

shape and len()#

shape returns the tuple (number of rows, ).
len() returns the number of rows.

state_series

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
Name: States, dtype: object

state_series.shape

(7,)

len(state_series)

index#

index returns row labels.

state_series

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
Name: States, dtype: object

state_series.index

Index(['St-1', 'St-2', 'St-3', 'St-4', 'St-5', 'St-6', 'St-7'], dtype='object')

values#

The values attribute removes labels and returns an array.

state_series

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
Name: States, dtype: object

state_series.values

array(['NJ', 'NY', 'TX', 'AZ', 'MO', 'FL', 'ID'], dtype=object)

Methods#

Methods can manipulate the data and perform actions on it.

head() and tail()#

They are primarily used for large Series to display only a specific number of rows.

head(n) displays the first n rows.
tail(n) displays the last n rows.
The default value of \(n\) is 5.

# first 5 rows
state_series.head()

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
Name: States, dtype: object

# first 2 rows
state_series.head(2)

St-1    NJ
St-2    NY
Name: States, dtype: object

# last 5 rows
state_series.tail()

St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
Name: States, dtype: object

# last 2 rows
state_series.tail(2)

St-6    FL
St-7    ID
Name: States, dtype: object

info()#

The info() method provides basic information about the Series, including:

row labels: Jack to Ben
number of rows: 5 entries
Non-null (non-missing) values: 5 non-null
dtype: int64
memory usage: 252.0+ bytes

test1_grades

Jack    90
Joe     85
Amy     70
Ted     80
Mia     60
Ben     60
Name: Test-1, dtype: int64

test1_grades.info()

<class 'pandas.core.series.Series'>
Index: 6 entries, Jack to Ben
Series name: Test-1
Non-Null Count  Dtype
--------------  -----
6 non-null      int64
dtypes: int64(1)
memory usage: 268.0+ bytes

describe()#

It returns the descriptive statistics of the values of the Series, including:

Count
Mean
Standard deviation (std)
Minimum (min)
25th percentile
50th percentile (median)
75th percentile
Maximum (max)

test1_grades.describe()

count     6.000000
mean     74.166667
std      12.812754
min      60.000000
25%      62.500000
50%      75.000000
75%      83.750000
max      90.000000
Name: Test-1, dtype: float64

value_counts()#

It returns the number of occurrences for each value of the column.

state_series

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
Name: States, dtype: object

# one from each value
state_series.value_counts()

States
NJ    1
NY    1
TX    1
AZ    1
MO    1
FL    1
ID    1
Name: count, dtype: int64

# add 'AZ' as a new value
state_series['S-8'] = 'AZ'
state_series

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
S-8     AZ
Name: States, dtype: object

# there are 2 AZ values
state_series.value_counts()

States
AZ    2
NJ    1
NY    1
TX    1
MO    1
FL    1
ID    1
Name: count, dtype: int64

copy()#

The copy() method returns a copy of the Series.

state_series_copy = state_series.copy()
state_series_copy

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
S-8     AZ
Name: States, dtype: object

drop()#

The drop() method is used to remove row(s) from the Series.

state_series

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
S-8     AZ
Name: States, dtype: object

state_series.drop('S-8')

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
Name: States, dtype: object

By default, the drop() method does not change the Series.
It just displays how the Series would look if the dropping were performed.

# no change on state_series
state_series

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
S-8     AZ
Name: States, dtype: object

To make a change to the Series, the parameter inplace should be set to True, which is False by default.

state_series.drop('S-8', inplace=True)

# no 'S-8' row any more
state_series

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
Name: States, dtype: object

reset_index()#

The reset_index() method is used to reset the row labels of a Series to default values: \(0, 1, 2, ...\)

It also adds the initial row labels as a new column with label index.
It returns a DataFrame (Similar to a Series, but it can have more than one column).

By default, the drop() method does not change the Series.
It just displays how the Series would look if the dropping were performed.

state_series_copy.reset_index()

	index	States
0	St-1	NJ
1	St-2	NY
2	St-3	TX
3	St-4	AZ
4	St-5	MO
5	St-6	FL
6	St-7	ID
7	S-8	AZ

If you do not want to keep the initial index values, you can set the drop parameter to True.

state_series_copy.reset_index(drop=True)

  NJ
  NY
  TX
  AZ
  MO
  FL
  ID
  AZ
Name: States, dtype: object

# no change on state_series_copy
state_series_copy

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
S-8     AZ
Name: States, dtype: object

reset_index() does not change the Series unless inplace=True.

state_series_copy.reset_index(drop=True, inplace=True)

# indexes updated
state_series_copy

  NJ
  NY
  TX
  AZ
  MO
  FL
  ID
  AZ
Name: States, dtype: object

set_axis()#

The set_axis() method is used to assign new labels to the Series’ row indixes.

It does not have an inplace parameter to change the Series.
It returns a new Series.

new_labels = ['State-'+str(i) for i in range(7)]
new_labels

['State-0', 'State-1', 'State-2', 'State-3', 'State-4', 'State-5', 'State-6']

state_series.set_axis(new_labels)

State-0    NJ
State-1    NY
State-2    TX
State-3    AZ
State-4    MO
State-5    FL
State-6    ID
Name: States, dtype: object

# no change on state_series
state_series

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
Name: States, dtype: object

sum(), mean(), median(), std()#

These operations can be applied directly to the values of the Series.

test1_grades

Jack    90
Joe     85
Amy     70
Ted     80
Mia     60
Ben     60
Name: Test-1, dtype: int64

test1_grades.sum()

test1_grades.mean()

74.16666666666667

test1_grades.median()

75.0

test1_grades.std()

12.812754062521714

sort_values()#

It sorts the series based on the values in ascending or descending order.

By default, sorting is done in ascending order.
To sort in descending order, set the ascending parameter to False.

test1_grades.sort_values()

Mia     60
Ben     60
Amy     70
Ted     80
Joe     85
Jack    90
Name: Test-1, dtype: int64

test1_grades.sort_values(ascending=False)

Jack    90
Joe     85
Ted     80
Amy     70
Mia     60
Ben     60
Name: Test-1, dtype: int64

To make a change to the Series, the parameter inplace should be set to True, which is False by default.

test1_grades.sort_values(inplace=True)

test1_grades

Mia     60
Ben     60
Amy     70
Ted     80
Joe     85
Jack    90
Name: Test-1, dtype: int64

sort_index()#

It sorts the series based on the indexes in ascending or descending order.

By default, sorting is done in ascending order.
To sort in descending order, set the ascending parameter to False.

# alphabetical order
test1_grades.sort_index()

Amy     70
Ben     60
Jack    90
Joe     85
Mia     60
Ted     80
Name: Test-1, dtype: int64

test1_grades.sort_index(ascending=False)

Ted     80
Mia     60
Joe     85
Jack    90
Ben     60
Amy     70
Name: Test-1, dtype: int64

To make a change to the Series, the parameter inplace should be set to True, which is False by default.

test1_grades.sort_index(inplace=True)

test1_grades

Amy     70
Ben     60
Jack    90
Joe     85
Mia     60
Ted     80
Name: Test-1, dtype: int64

duplicated()#

It returns a Series with boolean values showing if a row is a repeat or not.

state_series

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
Name: States, dtype: object

# no repetition
state_series.duplicated()

St-1    False
St-2    False
St-3    False
St-4    False
St-5    False
St-6    False
St-7    False
Name: States, dtype: bool

# add a repeatition
state_series['St-8'] = 'AZ'
state_series

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
St-8    AZ
Name: States, dtype: object

# second 'AZ' is repeated
state_series.duplicated()

St-1    False
St-2    False
St-3    False
St-4    False
St-5    False
St-6    False
St-7    False
St-8     True
Name: States, dtype: bool

drop_duplicates()#

It returns a Series with duplicate rows removed.

state_series

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
St-8    AZ
Name: States, dtype: object

# The last row with the value 'AZ' has been removed.
state_series.drop_duplicates()

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
Name: States, dtype: object

# no change
state_series

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
St-8    AZ
Name: States, dtype: object

To make a change to the Series, the parameter inplace should be set to True, which is False by default.

state_series.drop_duplicates(inplace=True)

# The last row with the value 'AZ' has been removed.
state_series

St-1    NJ
St-2    NY
St-3    TX
St-4    AZ
St-5    MO
St-6    FL
St-7    ID
Name: States, dtype: object

apply()#

The apply() function is used to apply a specific function to the values in a Series.

A lambda function is often used to define the function inline.

We will use a Series that represents the radius values of circles.

radius = pd.Series([2,5,6,8,9])
radius

  2
  5
  6
  8
  9
dtype: int64

To calculate the area of a circle (disc), you can define a simple function.

import math

def area(r):
    return math.pi*r**2

Pass function name as an argument to the apply() function.

radius.apply(area)

   12.566371
   78.539816
  113.097336
  201.061930
  254.469005
dtype: float64

Alternatively, with the lambda function version, there is no need to define a separate function name.

radius.apply(lambda r: math.pi*r**2)

   12.566371
   78.539816
  113.097336
  201.061930
  254.469005
dtype: float64

Iterations and Series#

Series can be iterated through using their row labels or indexes. Here’s how you can access them:

Use state_series.index for row labels.
Use range(len(state_series)) or range(state_series.shape[0]) for row indexes.

# row labels
for row in state_series.index:
    print(row)

St-1
St-2
St-3
St-4
St-5
St-6
St-7

# access all values using the row labels
for row in state_series.index:
    print(state_series.loc[row])

NJ
NY
TX
AZ
MO
FL
ID

# access all values using the row indexes
for i in range(state_series.shape[0]):    # you can also use len(state_series)
    print(state_series.iloc[i])

NJ
NY
TX
AZ
MO
FL
ID

Visualization#

Line Plot#

test1_grades.plot();

_images/dac6c42ac7dffc099893638c8281711423479d2bcc27e9c96c8a5ed43e43d94e.png

Scatter Plot#

test1_grades.plot(style='.');

_images/5d41347ed9b820b8940f9ee9323d3957101dccc087b3ac30b97482fbb663794b.png

Histogram#

Distribution of values.

grade_series = pd.Series([1,2,2,2,2,3,3,3,4,5,5,5,5,5])

grade_series.hist();

_images/3cf50f61711b2bddb5eadf55c77939425108f74d28376206d05b51d1d80cd7d8.png

Matplotlib and Series#

import matplotlib.pyplot as plt

Scatter Plot#

x-coordinates should be provided.

test1_grades

Amy     70
Ben     60
Jack    90
Joe     85
Mia     60
Ted     80
Name: Test-1, dtype: int64

plt.scatter(test1_grades.index, test1_grades);

_images/fbfca46f06d523c2e0f60611b04d51074c2234ccbbba33fba84f08404c225128.png

Line Plot#

The default x values are the indexes of the Series.

plt.plot(test1_grades);

Histogram#

plt.hist(test1_grades);

_images/1664b8371377fa7365e164364bf212b923c62a743d20ab528525664fa69f781e.png

Series

Contents

Series#

Create Series#

Accessing values#

Slicing#

Adding a New Row#

Changing a Value#

Changing Row Labels#

Vectorized Operations#

Comparison Operations#

Filtering#

Attributes#

shape and len()#

index#

values#

Methods#

head() and tail()#

info()#

describe()#

value_counts()#

copy()#

drop()#

reset_index()#

set_axis()#

sum(), mean(), median(), std()#

sort_values()#

sort_index()#

duplicated()#

drop_duplicates()#

apply()#

Iterations and Series#

Visualization#

Line Plot#

Scatter Plot#

Histogram#

Matplotlib and Series#

Scatter Plot#

Line Plot#

Histogram#