Read Values other than NaN as NaN
Not all null values have NaN in them some may have other substitutes like ‘_’ or ‘?’ To mitigate that we have na_values in read_csv
import pandas as pdd = {'id':[0, 1, 2], 'Name':['Ben', 'Max', '?']}
df = pd.DataFrame(d)
df.head()| id | Name | |
|---|---|---|
| 0 | 0 | Ben |
| 1 | 1 | Max |
| 2 | 2 | ? |
df.isnull().sum().any()False
’?’ is interpreted as not a null value here
df.to_csv('test.csv')df_real = pd.read_csv('test.csv', na_values=['?']) # Reads '?' as NaN value
df_real.head()| Unnamed: 0 | id | Name | |
|---|---|---|---|
| 0 | 0 | 0 | Ben |
| 1 | 1 | 1 | Max |
| 2 | 2 | 2 | NaN |
df_real.isnull().sum().any()True