Read Values other than NaN as NaN
Not all null values have NaN in them some may have other substitutes like ‘_’ or ‘?’ To mitigate that we have na_values in read_csv
import pandas as pd
d = {'id':[0, 1, 2], 'Name':['Ben', 'Max', '?']}
df = pd.DataFrame(d)
df.head()
id | Name | |
---|---|---|
0 | 0 | Ben |
1 | 1 | Max |
2 | 2 | ? |
df.isnull().sum().any()
False
’?’ is interpreted as not a null value here
df.to_csv('test.csv')
df_real = pd.read_csv('test.csv', na_values=['?']) # Reads '?' as NaN value
df_real.head()
Unnamed: 0 | id | Name | |
---|---|---|---|
0 | 0 | 0 | Ben |
1 | 1 | 1 | Max |
2 | 2 | 2 | NaN |
df_real.isnull().sum().any()
True