read_input_csv⚓︎
CSV and Pickle Data Reader Utilities
This module provides utility functions for reading and processing data from CSV and pickle files. It supports various data formats including NumPy arrays, pandas DataFrames, and handles data type conversions for ensemble modeling and data assimilation workflows.
Main Functions: - read_data_df: Reads data from CSV/pickle files, returns as NumPy arrays or dictionaries - read_var_df: Reads variance data from CSV/pickle files - read_data_csv: Legacy CSV reading function with data flattening - read_var_csv: Legacy variance CSV reading function - convert_to_array: Converts string representations to NumPy arrays - to_array_if_sequence: Converts various data types to NumPy array format
Typical use cases: - Loading observational data for data assimilation - Reading ensemble data with various data types - Processing CSV files with mixed data types and array-like strings - Handling variance/uncertainty data alongside measurements
Last Modified: February 2026
convert_to_array(array_str)
⚓︎
Convert space-separated string representations of numbers to NumPy arrays.
This function handles strings with space-separated numeric values and converts them back to NumPy arrays. It removes brackets and whitespace before parsing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
array_str
|
str
|
String containing space-separated numbers, optionally with brackets. Example: "[1.0 2.0 3.0]" or "1.0 2.0 3.0" |
required |
Returns:
| Type | Description |
|---|---|
ndarray or str
|
NumPy array of floats if conversion is successful, otherwise returns the original string unchanged. |
Examples:
read_data_csv(filename, datatype, truedataindex)
⚓︎
Read observational data from CSV files (legacy function).
This is a legacy function for reading CSV files with flexible header configurations. Supports files with column headers, row headers, both, or neither. Handles missing values by replacing them with 'n/a'.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Path to the CSV file. |
required |
datatype
|
list of str
|
Column names (or positional column identifiers) for data types to extract. |
required |
truedataindex
|
list
|
Row identifiers where observational data was recorded (e.g., time stamps, observation indices). Used to select specific rows from the CSV. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
imported_data |
list of list
|
2D list where each sublist represents a row of extracted data. Each element is either a float (numeric data) or string (text/missing data). Missing numeric values are replaced with 'n/a'. |
Notes
- If the first column is 'header_both', the CSV is assumed to have both row and column headers.
- If row count matches len(truedataindex), assumes column headers exist.
- If row count is len(truedataindex)+1, assumes first row was misinterpreted as header and re-reads it as data.
- NaN values in numeric columns are replaced with 'n/a' strings.
See Also
read_data_df : Modern version using pandas DataFrames with more flexible output.
read_data_df(filename, datatype=None, truedataindex=None, outtype='np.array', return_data_info=True)
⚓︎
Read observational data from CSV or pickle files with flexible output formats.
This function reads data files (CSV or pickle) containing observational data, processes array-like string representations, and returns the data in the requested format. Supports filtering by data types and row indices.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Path to the data file. Must end with '.csv' or '.pkl'. |
required |
datatype
|
list of str
|
Column names to extract. If None, all columns are used. Default is None. |
None
|
truedataindex
|
list of int
|
Row indices to extract (0-based). If None, all rows are used. Default is None. |
None
|
outtype
|
(array, list)
|
Output format: - 'np.array': Returns flattened NumPy array - 'list': Returns list of dictionaries Default is 'np.array'. |
'np.array'
|
return_data_info
|
bool
|
If True, also returns metadata (column names and row indices). Default is True. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
flat_array |
ndarray
|
Flattened 1D array of all data (if outtype='np.array'). |
data |
list of dict
|
List where each element is a dictionary with column names as keys (if outtype='list'). |
datatype |
list of str
|
Column names used (only if return_data_info=True). |
indices |
list
|
Row indices/labels used (only if return_data_info=True). |
Notes
- String representations of arrays (e.g., "[1.0 2.0 3.0]") are automatically converted to NumPy arrays.
- When outtype='np.array', arrays from multiple columns and rows are concatenated into a single flat array.
- The first column in CSV files is used as the index.
read_var_csv(filename, datatype, truedataindex)
⚓︎
Read variance/uncertainty data from CSV files (legacy function).
This is a legacy function for reading CSV files containing variance or standard deviation data. Assumes that variance data is stored in alternating columns: data type identifier (string) followed by variance value (numeric).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Path to the CSV file containing variance data. |
required |
datatype
|
list of str
|
Column names (or positional identifiers) for data types. The function expects variance values in adjacent columns (datatype_col + 1). |
required |
truedataindex
|
list
|
Row identifiers where variance data was recorded. Used to select specific rows from the CSV. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
imported_var |
list of list
|
2D list where each sublist contains alternating data type identifiers (strings, converted to lowercase) and variance values (floats). Format: [type1, var1, type2, var2, ...] for each row. |
Notes
- The function expects variance data in alternating columns with the structure: [type_name, variance_value, type_name, variance_value, ...]
- Data type names are automatically converted to lowercase.
- Supports the same header configurations as read_data_csv: both headers, column headers only, row headers only, or no headers.
- If first column is 'header_both', assumes both row and column headers exist.
See Also
read_var_df : Modern version using pandas DataFrames. read_data_csv : Companion function for reading observational data.
read_var_df(filename, datatype=None, truedataindex=None, outtype='list')
⚓︎
Read variance/uncertainty data from CSV or pickle files.
This function is designed to read variance or standard deviation data that corresponds to observational data. It returns the data as a list of dictionaries, with special handling for datatype columns that may contain tuple representations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Path to the variance file. Must end with '.csv' or '.pkl'. |
required |
datatype
|
list of str
|
Column names to extract. Supports tuple-like string representations (e.g., "('OPR', 'WWCT')") which are parsed using ast.literal_eval. If None, all columns are used. Default is None. |
None
|
truedataindex
|
list of str or int
|
Row indices/labels to extract. If None, all rows are used. Default is None. |
None
|
outtype
|
list
|
Output format. Currently only 'list' is supported. Default is 'list'. |
'list'
|
Returns:
| Name | Type | Description |
|---|---|---|
var |
list of dict
|
List where each element is a dictionary with column names as keys and variance/uncertainty values as values. Each dictionary corresponds to one row. |
Notes
- CSV file indices are converted to strings for consistent lookup.
- The datatype parameter attempts to evaluate string representations of tuples, which is useful when column names are composite keys.
- This function is typically used alongside read_data_df to load both observations and their uncertainties.
to_array_if_sequence(val)
⚓︎
Convert various data types to NumPy array or sequence format.
Handles conversion of different input types (scalars, lists, strings, arrays) into a consistent array-like format for data processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
val
|
various
|
Input value to convert. Can be np.ndarray, int, float, list, str, or other. |
required |
Returns:
| Type | Description |
|---|---|
ndarray or list
|
|
Notes
String inputs are only parsed if they are enclosed in brackets (e.g., "[1 2 3]"). All numeric scalars are wrapped into 1D arrays.