Skip to content

read_input_csv⚓︎

CSV and Pickle Data Reader Utilities

This module provides utility functions for reading and processing data from CSV and pickle files. It supports various data formats including NumPy arrays, pandas DataFrames, and handles data type conversions for ensemble modeling and data assimilation workflows.

Main Functions: - read_data_df: Reads data from CSV/pickle files, returns as NumPy arrays or dictionaries - read_var_df: Reads variance data from CSV/pickle files - read_data_csv: Legacy CSV reading function with data flattening - read_var_csv: Legacy variance CSV reading function - convert_to_array: Converts string representations to NumPy arrays - to_array_if_sequence: Converts various data types to NumPy array format

Typical use cases: - Loading observational data for data assimilation - Reading ensemble data with various data types - Processing CSV files with mixed data types and array-like strings - Handling variance/uncertainty data alongside measurements

Last Modified: February 2026

convert_to_array(array_str) ⚓︎

Convert space-separated string representations of numbers to NumPy arrays.

This function handles strings with space-separated numeric values and converts them back to NumPy arrays. It removes brackets and whitespace before parsing.

Parameters:

Name Type Description Default
array_str str

String containing space-separated numbers, optionally with brackets. Example: "[1.0 2.0 3.0]" or "1.0 2.0 3.0"

required

Returns:

Type Description
ndarray or str

NumPy array of floats if conversion is successful, otherwise returns the original string unchanged.

Examples:

>>> convert_to_array("1.0 2.0 3.0")
array([1., 2., 3.])
>>> convert_to_array("[1.0 2.0 3.0]")
array([1., 2., 3.])

read_data_csv(filename, datatype, truedataindex) ⚓︎

Read observational data from CSV files (legacy function).

This is a legacy function for reading CSV files with flexible header configurations. Supports files with column headers, row headers, both, or neither. Handles missing values by replacing them with 'n/a'.

Parameters:

Name Type Description Default
filename str

Path to the CSV file.

required
datatype list of str

Column names (or positional column identifiers) for data types to extract.

required
truedataindex list

Row identifiers where observational data was recorded (e.g., time stamps, observation indices). Used to select specific rows from the CSV.

required

Returns:

Name Type Description
imported_data list of list

2D list where each sublist represents a row of extracted data. Each element is either a float (numeric data) or string (text/missing data). Missing numeric values are replaced with 'n/a'.

Notes
  • If the first column is 'header_both', the CSV is assumed to have both row and column headers.
  • If row count matches len(truedataindex), assumes column headers exist.
  • If row count is len(truedataindex)+1, assumes first row was misinterpreted as header and re-reads it as data.
  • NaN values in numeric columns are replaced with 'n/a' strings.
See Also

read_data_df : Modern version using pandas DataFrames with more flexible output.

read_data_df(filename, datatype=None, truedataindex=None, outtype='np.array', return_data_info=True) ⚓︎

Read observational data from CSV or pickle files with flexible output formats.

This function reads data files (CSV or pickle) containing observational data, processes array-like string representations, and returns the data in the requested format. Supports filtering by data types and row indices.

Parameters:

Name Type Description Default
filename str

Path to the data file. Must end with '.csv' or '.pkl'.

required
datatype list of str

Column names to extract. If None, all columns are used. Default is None.

None
truedataindex list of int

Row indices to extract (0-based). If None, all rows are used. Default is None.

None
outtype (array, list)

Output format: - 'np.array': Returns flattened NumPy array - 'list': Returns list of dictionaries Default is 'np.array'.

'np.array'
return_data_info bool

If True, also returns metadata (column names and row indices). Default is True.

True

Returns:

Name Type Description
flat_array ndarray

Flattened 1D array of all data (if outtype='np.array').

data list of dict

List where each element is a dictionary with column names as keys (if outtype='list').

datatype list of str

Column names used (only if return_data_info=True).

indices list

Row indices/labels used (only if return_data_info=True).

Notes
  • String representations of arrays (e.g., "[1.0 2.0 3.0]") are automatically converted to NumPy arrays.
  • When outtype='np.array', arrays from multiple columns and rows are concatenated into a single flat array.
  • The first column in CSV files is used as the index.

read_var_csv(filename, datatype, truedataindex) ⚓︎

Read variance/uncertainty data from CSV files (legacy function).

This is a legacy function for reading CSV files containing variance or standard deviation data. Assumes that variance data is stored in alternating columns: data type identifier (string) followed by variance value (numeric).

Parameters:

Name Type Description Default
filename str

Path to the CSV file containing variance data.

required
datatype list of str

Column names (or positional identifiers) for data types. The function expects variance values in adjacent columns (datatype_col + 1).

required
truedataindex list

Row identifiers where variance data was recorded. Used to select specific rows from the CSV.

required

Returns:

Name Type Description
imported_var list of list

2D list where each sublist contains alternating data type identifiers (strings, converted to lowercase) and variance values (floats). Format: [type1, var1, type2, var2, ...] for each row.

Notes
  • The function expects variance data in alternating columns with the structure: [type_name, variance_value, type_name, variance_value, ...]
  • Data type names are automatically converted to lowercase.
  • Supports the same header configurations as read_data_csv: both headers, column headers only, row headers only, or no headers.
  • If first column is 'header_both', assumes both row and column headers exist.
See Also

read_var_df : Modern version using pandas DataFrames. read_data_csv : Companion function for reading observational data.

read_var_df(filename, datatype=None, truedataindex=None, outtype='list') ⚓︎

Read variance/uncertainty data from CSV or pickle files.

This function is designed to read variance or standard deviation data that corresponds to observational data. It returns the data as a list of dictionaries, with special handling for datatype columns that may contain tuple representations.

Parameters:

Name Type Description Default
filename str

Path to the variance file. Must end with '.csv' or '.pkl'.

required
datatype list of str

Column names to extract. Supports tuple-like string representations (e.g., "('OPR', 'WWCT')") which are parsed using ast.literal_eval. If None, all columns are used. Default is None.

None
truedataindex list of str or int

Row indices/labels to extract. If None, all rows are used. Default is None.

None
outtype list

Output format. Currently only 'list' is supported. Default is 'list'.

'list'

Returns:

Name Type Description
var list of dict

List where each element is a dictionary with column names as keys and variance/uncertainty values as values. Each dictionary corresponds to one row.

Notes
  • CSV file indices are converted to strings for consistent lookup.
  • The datatype parameter attempts to evaluate string representations of tuples, which is useful when column names are composite keys.
  • This function is typically used alongside read_data_df to load both observations and their uncertainties.

to_array_if_sequence(val) ⚓︎

Convert various data types to NumPy array or sequence format.

Handles conversion of different input types (scalars, lists, strings, arrays) into a consistent array-like format for data processing.

Parameters:

Name Type Description Default
val various

Input value to convert. Can be np.ndarray, int, float, list, str, or other.

required

Returns:

Type Description
ndarray or list
  • NumPy array if input is ndarray, numeric scalar, list, or parseable string
  • List containing the value if input is of another type
Notes

String inputs are only parsed if they are enclosed in brackets (e.g., "[1 2 3]"). All numeric scalars are wrapped into 1D arrays.