`pdstable` Module

class pdstable.PdsTable(label_file, *, label_contents=None, times=None, columns=None, nostrip=None, callbacks=None, ascii=False, replacements=None, invalid=None, valid_ranges=None, table_callback=None, merge_masks=False, filename_keylen=0, row_range=None, table_file=None, label_method='strict')[source]

Bases: object

The PdsTable class holds the contents of a PDS-labeled table.

It is represented by a list of Numpy arrays, one for each column.

Current limitations for PDS3:

ASCII tables only, no binary formats.
Detached PDS labels only.
Only one data file per label.
No row or record offsets in the label’s pointer to the table file.
STRUCTURE fields in the label are not supported.
Columns containing multiple items are not loaded.
Time fields are represented as character strings unless explicitly listed for conversion.

__init__(label_file, *, label_contents=None, times=None, columns=None, nostrip=None, callbacks=None, ascii=False, replacements=None, invalid=None, valid_ranges=None, table_callback=None, merge_masks=False, filename_keylen=0, row_range=None, table_file=None, label_method='strict')[source]

Constructor for a PdsTable object.

Parameters:

label_file (str or Path or FCPath) – The path to the PDS label of the table file. Must be supplied to get proper relative path resolution.
label_contents (list or Pds3Label, optional) – The contents of the label as a list of strings if we shouldn’t read it from the file. Alternatively, a Pds3Label object to avoid label parsing entirely. Note: this param is for PDS3 labels only; it is ignored for PDS4.
columns (list, optional) – An optional list of the names of the columns to return. If the list is empty, then every column is returned.
times (list, optional) – An optional list of the names of time columns to be stored as floats in units of seconds TAI rather than as strings.
nostrip (list, optional) – An optional list of the names of string columns that are not to be stripped of surrounding whitespace.
callbacks (dict, optional) – An optional dictionary that returns a callback function given the name of a column. If a callback is provided for any column, then the function is called on the string value of that column before it is parsed. This can be used to update known syntax errors in a particular table.
ascii (bool, optional) – True to interpret the callbacks as translating ASCII byte strings; False to interpret them as translating the default str type (Unicode).
replacements (dict, optional) – An optional dictionary that returns a replacement dictionary given the name of a column. If a replacement dictionary is provided for any column, then any value in that column (as a string or as its native value) that matches a key in the dictionary is replaced by the value resulting from the dictionary lookup.
invalid (dict, optional) – An optional dictionary keyed by column name. The returned value must be a list or set of values that are to be treated as invalid, missing, or unknown. An optional entry keyed by “default” can be a list or set of values that are invalid by default; these are used for any column whose name does not appear as a key in the dictionary.
valid_ranges (dict, optional) – An optional dictionary keyed by column name. The returned value must be a tuple or list containing the minimum and maximum numeric values in that column.
table_callback (callable, optional) – An optional function to be called after reading the data table contents before processing them. Note that this callback must handle bytestrings.
merge_masks (bool, optional) – True to return a single mask value for each column, regardless of how many items might be in that column. False to return a separate mask value for each value in a column.
filename_keylen (int, optional) – Number of characters in the filename to use as the key of the index if this table is to be indexed by filename. Zero to use the entire file basename after stripping off the extension.
row_range (tuple or list, optional) – A tuple or list of integers containing the index of the first row to read and the first row to omit. If not specified, then all the rows are read.
table_file (str or int, optional) – Specify a table file name to be read or an integer (1-based) representing the order in which the table appears in the label file. If the provided table name doesn’t exist in the label or the integer is out of the range, an error will be raised. Only relevant for PDS4 labels.
label_method (str, optional) – The method to use to parse the label. Valid values are ‘strict’ (default) or ‘fast’. The ‘fast’ method is faster but may not be as accurate. Only relevant for PDS3 labels.

Notes

If both a replacement and a callback are provided for the same column, the callback is applied first. The invalid and valid_ranges parameters are applied afterward.

Note that performance will be slightly faster if ascii=True.

property pdslabel: The label of the table as a Pds3Label for PDS3 or dict for PDS4.

property label_file_name: The name of the label file (without the path).

property label_file_path: The local path to the label file.

property table_file_name: The name of the table file (without the path).

property table_file_path: The local path to the table file.

property is_pds4: True if the read label was a PDS4 label, False otherwise.

property rows: The number of rows that were read.

property first: The index of the first row that was read (0-based).

property columns: The number of columns in the table (possibly as restricted by the columns parameter).

property all_columns: The number of columns in the table (possibly as restricted by the columns parameter).

property column_values: The values of the columns that were read as a dict indexed by column name.

property column_masks: The masks of the columns that were read as a dict indexed by column name.

property column_info_list

The list of PdsColumnInfo objects for the columns in the table.

This list includes ALL of the columns, not just the ones restricted by the columns parameter.

property column_info_dict

The dict of PdsColumnInfo objects for the columns in the table, keyed by the column name.

This dict includes ALL of the columns, not just the ones restricted by the columns parameter.

property header_bytes: The number of bytes in the header of the table.

property encoding: The encoding of the table file (e.g., ‘utf-8’ or ‘latin-1’).

property fixed_length_row: True if the table has fixed-length rows.

property field_delimiter: The field delimiter for the table.

property row_bytes: The number of bytes in a single row of the table.

property dtype0

The dtype dictionary for the table, keyed by the column name.

Each value is a tuple of (dtype_string, start_byte) where dtype_string is the string representation of the dtype used to isolate the column (e.g., ‘S10’ for a 10-character string) and start_byte is the starting byte position of the column in a row.

property info

The Pds3/4TableInfo object that holds the attributes of the table.

DEPRECATED.

dicts_by_row(lowercase=(False, False))[source]

Returns a list of dictionaries, one for each row in the table.

Each dictionary contains all of the column values in that particular row. The dictionary keys are the column names; append “_mask” to the key to get the mask value, which is True if the column value is invalid; False otherwise.

Parameters:: lowercase (tuple or bool) – A tuple of two booleans. If the first is True, then the dictionary is also keyed by column names converted to lower case. If the second is True, then keys with “_lower” appended return values converted to lower case. If a single boolean is provided, it will be duplicated for both parameters.
Returns:: A list of dictionaries, one for each row in the table.
Return type:: list

get_column(name)[source]

Return the values in the specified column as a list.

Parameters:: name (str) – The name of the column to retrieve.
Returns:: The values in the specified column.
Return type:: list

get_column_mask(name)[source]

Return the masks for the specified column as a list.

Parameters:: name (str) – The name of the column to retrieve masks for.
Returns:: The masks for the specified column.
Return type:: list

get_keys()[source]

Get the list of column names that were actually loaded.

Returns:: A list of column names.
Return type:: list

find_row_indices(lowercase=(False, False), *, limit=None, substrings=None, **params)[source]

Find indices of rows where each named parameter equals the specified value.

Parameters:

lowercase (tuple or bool) – Whether to enable testing of the column name and value converted to lower case. This is a tuple of two booleans. If the first is True, then we also allow testing of an entry in params with a _lower suffix. If the second boolean is True, then such a column also converts the value to match lower case. If a single boolean is provided, it will be duplicated for both parameters.
limit (int, optional) – If not zero or None, this is the maximum number of matching rows that are returned.
substrings (list, optional) – A list of column names for which a match occurs if the given parameter value is embedded within the string; an exact match is not required.
**params – Named parameters where each parameter name corresponds to a column name and the value is what to search for in that column.

Returns:

A list of row indices that match the search criteria.

Return type:

list

find_row_index(lowercase=(False, False), *, substrings=None, **params)[source]

Find the first row where each named parameter equals the specified value.

Parameters:

lowercase (tuple or bool) – Whether to enable testing of the column name and value converted to lower case. This is a tuple of two booleans. If the first is True, then we also allow testing of an entry in params with a _lower suffix. If the second boolean is True, then such a column also converts the value to match lower case. If a single boolean is provided, it will be duplicated for both parameters.
substrings (list, optional) – A list of column names for which a match occurs if the given parameter value is embedded within the string; an exact match is not required.
**params – Named parameters where each parameter name corresponds to a column name and the value is what to search for in that column.

Returns:

The index of the first matching row.

Return type:

int

Raises:

ValueError – If no matching row is found.

find_rows(lowercase=(False, False), **params)[source]

Return a list of dicts representing rows where each named parameter equals the specified value.

Parameters:

lowercase (tuple or bool) – Whether to enable testing of the column name and value converted to lower case. This is a tuple of two booleans. If the first is True, then we also allow testing of an entry in params with a _lower suffix. If the second boolean is True, then such a column also converts the value to match lower case. If a single boolean is provided, it will be duplicated for both parameters.
**params – Named parameters where each parameter name corresponds to a column name and the value is what to search for in that column.

Returns:

A list of dictionaries representing the matching rows. Each dictionary is keyed by column name.

Return type:

list

find_row(lowercase=(False, False), **params)[source]

Return a dict representing the first row where each named parameter equals the specified value.

Parameters:

lowercase (tuple or bool) – Whether to enable testing of the column name and value converted to lower case. This is a tuple of two booleans. If the first is True, then we also allow testing of an entry in params with a _lower suffix. If the second boolean is True, then such a column also converts the value to match lower case. If a single boolean is provided, it will be duplicated for both parameters.
**params – Named parameters where each parameter name corresponds to a column name and the value is what to search for in that column.

Returns:

A dictionary representing the first matching row. The dictionary is keyed by column name.

Return type:

dict

Raises:

ValueError – If no matching row is found.

filename_key(filename)[source]

Convert a filename to a key for indexing the rows.

The key is the basename with the extension removed.

Parameters:: filename (str) – The filename to convert to a key.
Returns:: The filename key for indexing.
Return type:: str

bundle_column_index()[source]

Get the index of the column containing volume IDs or bundle names.

This is an alias for the volume_column_index() method.

Returns:: The index of the column containing volume IDs or bundle names, or -1 if none.
Return type:: int

volume_column_index()[source]

Get the index of the column containing volume IDs or bundle names.

Returns:: The index of the column containing volume IDs or bundle names, or -1 if none.
Return type:: int

filespec_column_index()[source]

Get the index of the column containing the file specification name.

For PDS3 tables, this is a column with a name like “file_specification_name”. PDS4 tables do not have a standard name, so we look for some possible names.

Returns:: The index of the column containing the file specification name, or -1 if none.
Return type:: int

find_row_indices_by_bundle_filespec(bundle_name, filespec=None, *, limit=None, substring=False)[source]

Find the row indices of the table with the specified bundle_name and file_specification_name.

This is an alias for the find_row_indices_by_volume_filespec() method.

The search is case-insensitive.

If the table does not contain the bundle name or if the given value of bundle_name is blank or not supplied, the search is performed on the filespec alone, ignoring the bundle name. Also, if only one argument is specified, it is treated as the filespec.

The search ignores the extension of filespec so it does not matter whether the column contains paths to labels or data files. It also works in tables that contain columns of file names without directory paths.

Parameters:

bundle_name (str) – The bundle name to search for.
filespec (str, optional) – The file specification name to search for. If None, bundle_name is treated as the filespec.
limit (int, optional) – Maximum number of matching rows to return.
substring (bool, optional) – If True, a match occurs whenever the given filespec appears inside what is tabulated in the file, so a complete match is not required.

Returns:

A list of row indices that match the search criteria.

Return type:

list

find_row_indices_by_volume_filespec(volume_id, filespec=None, *, limit=None, substring=False)[source]

Find the row indices of the table with the specified volume_id and file_specification_name.

The search is case-insensitive.

If the table does not contain the volume ID or if the given value of volume_id is blank or not supplied, the search is performed on the filespec alone, ignoring the volume ID. Also, if only one argument is specified, it is treated as the filespec.

The search ignores the extension of filespec so it does not matter whether the column contains paths to labels or data files. It also works in tables that contain columns of file names without directory paths.

Parameters:

volume_id (str) – The volume ID to search for.
filespec (str, optional) – The file specification name to search for. If None, volume_id is treated as the filespec.
limit (int, optional) – Maximum number of matching rows to return.
substring (bool, optional) – If True, a match occurs whenever the given filespec appears inside what is tabulated in the file, so a complete match is not required.

Returns:

A list of row indices that match the search criteria.

Return type:

list

find_row_index_by_bundle_filespec(bundle_name, filespec=None, *, substring=False)[source]

Find the first row index with the specified bundle_name and file_specification_name.

This is an alias for the find_row_index_by_volume_filespec() method.

The search is case-insensitive.

If the table does not contain the bundle name or if the given value of bundle_name is blank, the search is performed on the filespec alone, ignoring the bundle name. Also, if only one argument is specified, it is treated as the filespec.

The search ignores the extension of filespec so it does not matter whether the column contains paths to labels or data files. It also works in tables that contain columns of file names without directory paths.

Parameters:

bundle_name (str) – The bundle name to search for.
filespec (str, optional) – The file specification name to search for. If None, bundle_name is treated as the filespec.
substring (bool, optional) – If True, a match occurs whenever the given filespec appears inside what is tabulated in the file, so a complete match is not required.

Returns:

The index of the first matching row.

Return type:

int

Raises:

ValueError – If no matching row is found.

find_row_index_by_volume_filespec(volume_id, filespec=None, substring=False)[source]

Find the first row index with the specified volume_id and file_specification_name.

The search is case-insensitive.

If the table does not contain the volume ID or if the given value of volume_id is blank, the search is performed on the filespec alone, ignoring the volume ID. Also, if only one argument is specified, it is treated as the filespec.

The search ignores the extension of filespec so it does not matter whether the column contains paths to labels or data files. It also works in tables that contain columns of file names without directory paths.

Parameters:

volume_id (str) – The volume ID to search for.
filespec (str, optional) – The file specification name to search for. If None, volume_id is treated as the filespec.
substring (bool, optional) – If True, a match occurs whenever the given filespec appears inside what is tabulated in the file, so a complete match is not required.

Returns:

The index of the first matching row.

Return type:

int

Raises:

ValueError – If no matching row is found.

find_rows_by_bundle_filespec(bundle_name, filespec=None, *, limit=None, substring=False)[source]

Find the rows of the table with the specified bundle_name and file_specification_name.

This is an alias for the find_rows_by_volume_filespec() method.

The search is case-insensitive.

If the table does not contain the bundle name or if the given value of bundle_name is blank or not supplied, the search is performed on the filespec alone, ignoring the bundle name. Also, if only one argument is specified, it is treated as the filespec.

The search ignores the extension of filespec so it does not matter whether the column contains paths to labels or data files. It also works in tables that contain columns of file names without directory paths.

If input parameter substring is True, then a match occurs whenever the given filespec appears inside what is tabulated in the file, so a complete match is not required.

Parameters:

bundle_name (str) – The bundle name to search for.
filespec (str, optional) – The file specification name to search for. If None, bundle_name is treated as the filespec.
limit (int, optional) – Maximum number of matching rows to return.
substring (bool, optional) – If True, a match occurs whenever the given filespec appears inside what is tabulated in the file.

Returns:

A list of dictionaries representing the matching rows.

Return type:

list

find_rows_by_volume_filespec(volume_id, filespec=None, *, limit=None, substring=False)[source]

Find the rows of the table with the specified volume_id and file_specification_name.

The search is case-insensitive.

If the table does not contain the volume ID or if the given value of volume_id is blank, the search is performed on the filespec alone, ignoring the volume ID. Also, if only one argument is specified, it is treated as the filespec.

The search ignores the extension of filespec so it does not matter whether the column contains paths to labels or data files. It also works in tables that contain columns of file names without directory paths.

If input parameter substring is True, then a match occurs whenever the given filespec appears inside what is tabulated in the file, so a complete match is not required.

Parameters:

volume_id (str) – The volume ID to search for.
filespec (str, optional) – The file specification name to search for. If None, volume_id is treated as the filespec.
limit (int, optional) – Maximum number of matching rows to return.
substring (bool, optional) – If True, a match occurs whenever the given filespec appears inside what is tabulated in the file.

Returns:

A list of dictionaries representing the matching rows.

Return type:

list

find_row_by_bundle_filespec(bundle_name, filespec=None, substring=False)[source]: See find_row_by_volume_filespec.

find_row_by_volume_filespec(volume_id, filespec=None, *, substring=False)[source]

Find the first row of the table with the specified volume_id and file_specification_name.

The search is case-insensitive.

If the table does not contain the volume ID or if the given value of volume_id is blank, the search is performed on the filespec alone, ignoring the volume ID. Also, if only one argument is specified, it is treated as the filespec.

The search ignores the extension of filespec so it does not matter whether the column contains paths to labels or data files. It also works in tables that contain columns of file names without directory paths.

Parameters:

volume_id (str) – The volume ID to search for.
filespec (str, optional) – The file specification name to search for. If None, volume_id is treated as the filespec.
substring (bool, optional) – If True, a match occurs whenever the given filespec appears inside what is tabulated in the file, so a complete match is not required.

Returns:

A dictionary representing the first matching row.

Return type:

dict

Raises:

ValueError – If no matching row is found.

index_rows_by_filename_key()[source]

Create a dictionary of row indices keyed by the file basename associated with the row.

The key has the file extension stripped away and is converted to lower case. The result is available in the filename_keys attribute.

property filename_keys

The list of filename keys for the table.

Returns:: A list of filename keys.
Return type:: list

row_indices_by_filename_key(key)[source]

Quick lookup of the row indices associated with a filename key.

Parameters:: key (str) – The filename key to look up.
Returns:: A list of row indices associated with the filename key.
Return type:: list

rows_by_filename_key(key)[source]

Quick lookup of the rows associated with a filename key.

Parameters:: key (str) – The filename key to look up.
Returns:: A list of dictionaries representing the rows associated with the filename key.
Return type:: list

class pdstable.PdsTableInfo(label_file_path)[source]

Bases: object

Class to hold the attributes of a PDS-labeled table.

Direct access to this class’s attributes by the end user is deprecated and only supported for backwards compatibility. Use the properties of PdsTable instead.

__init__(label_file_path)[source]

property column_info_dict: The dict of PdsColumnInfo objects for the columns in the table, keyed by the column name.

property column_info_list: The list of PdsColumnInfo objects for the columns in the table.

property columns: The number of columns in the table.

property dtype0

The dtype dictionary for the table, keyed by the column name.

Each value is a tuple of (dtype_string, start_byte) where dtype_string is the string representation of the dtype (e.g., ‘S10’ for a 10-character string) and start_byte is the starting byte position of the column in a row.

property field_delimiter: The field delimiter for the table.

property fixed_length_row: True if the table has fixed-length rows.

property header_bytes: The number of bytes in the header of the table.

property label: The label of the table as a Pds3Label for PDS3 or dict for PDS4.

property label_file_name: The name of the label file (without the path).

property label_file_path: The local path to the label file.

property row_bytes: The number of bytes in a single row of the table.

property rows: The number of rows in the table.

property table_file_name: The name of the table file (without the path).

property table_file_path: The local path to the table file.

class pdstable.PdsColumnInfo[source]

Bases: object

Class to hold the attributes of one column in a PDS-labeled table.

Direct access of this class’s attributes by the end user is not generally necessary, but is permitted if you want the inner details of each column.

__init__()[source]

property bytes: The number of bytes in the column.

property colno: The index number of the column, starting at zero.

property data_type

The data type of the column.

Possible values are ‘int’, ‘float’, ‘time’, and ‘string’.

property dtype0

The dtype of the entire column as a string.

Each value is a tuple of (dtype_string, start_byte) where dtype_string is the string representation of the dtype (e.g., ‘S10’ for a 10-character string) and start_byte is the starting byte position of the column in a row.

property dtype1

The dtype of a multiple-item column.

If items == 0, this value is None. Otherwise, it is a dict keyed by ‘item_0’, ‘item_1’, etc. with the value being a tuple of the string representation of the dtype used to isolate each item as a string (Snnn) and the starting byte relative to the beginning of the column for that item.

property dtype2

The dtype of the column with the actual data type.

Possible values are ‘int’, ‘float’, ‘S’, and ‘U’.

property invalid_values

The set of invalid value markers for the column.

If the column’s value equals one of these markers, the column value is considered invalid.

property item_bytes: The number of bytes in an item of the column (PDS3 only).

property item_offset: The incremental offset of each item within the column.

property items: The number of items in the column (PDS3 only).

property name: The name of the column.

property scalar_func: The scalar function used to convert the column’s string value to its data value.

property start_byte: The starting byte of the column in the row (1-based).

property valid_range: The valid range of the column as a tuple (lower, upper) or None.

pdstable Module

`pdstable` Module