API Reference

Processing Functions

bom_analyzer.caller.run_sentence_transform(csv_path: str | None = None, device: str = 'cpu', load_path: str | None = None, archive_path: str | None = None) → ndarray[source]

Performs sentence-level semantic similarity analysis on data from a CSV file.

Parameters:

csv_path (str) – Path to the CSV file containing the data. Required headers: ‘SERNUM’,’PCA’,’CPN_1’,’DateCode_1’,’LOTCODE_1’,’MPN_1’,’RD_1’, ‘HWRMA’.
device (str) – Device to use for sentence transformation. Defaults to ‘cpu’.
load_path (Optional[str]) – Path to a NumPy file containing archived data to load instead of preprocessing.
archive_path (Optional[str]) – Path to a NumPy file where the transformed data will be archived.

Raises:

FileNotFoundError – If the CSV file or load path is not found, or if the directory for the archive path does not exist.
PermissionError – If there is no write access to the directory for the archive path.
ValueError – If an invalid device is used for sentence transformation or the CSV file does not contain the required headers.

Returns:

NumPy array containing the transformed sentence embeddings.

Return type:

np.ndarray

Performs dimensionality reduction on sentence embeddings and appends the reduced dimensions to a table.

Parameters:

table (Union[pd.DataFrame, str]) – Either a Pandas DataFrame containing the data or a string representing the path to a CSV file containing the data.
st_embeddings (Union[np.ndarray, str]) – Either a NumPy array of sentence embeddings or a string representing the path to a NumPy file containing the embeddings.
param_dict (Dict[str, Union[int, float]]) – A dictionary containing the parameters for the dimension reduction algorithm.
seed (Optional[int]) – Random seed for reproducibility. Defaults to 42.
archive_path (Optional[str]) – Path to a CSV file where the resulting table will be archived.

Returns:

The original table with two additional columns: ‘DATA_X’ and ‘DATA_Y’: containing the reduced dimensions.

Return type:

pd.DataFrame

Raises:

ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the archive path directory does not exist.
PermissionError – If there is no write access to the archive path directory.

bom_analyzer.caller.run_clustering(table: DataFrame | str, param_dict: str | Dict[str, int | float], archive_path: str | None = None) → DataFrame[source]

Performs clustering on dimensionally reduced data and appends the cluster labels to a table.

Parameters:

table (Union[pd.DataFrame, str]) – Either a Pandas DataFrame containing the data or a string representing the path to a CSV file containing the data.
param_dict (Dict[str, Union[int, float]]) – A dictionary containing the parameters for the clustering algorithm.
archive_path (Optional[str]) – Path to a CSV file where the resulting table will be archived.

Returns:

The original table with an additional column ‘CLUSTERS’ containing the assigned cluster labels.

Return type:

pd.DataFrame

Raises:

ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the archive path directory does not exist.
PermissionError – If there is no write access to the archive path directory.
IOError – If the required columns are not present in the table.

bom_analyzer.caller.run_optimizer(st_data: ndarray | str, seed: int = 42, trials: int = 50, archive_path: str | None = None) → Dict[str, int | float][source]

Performs hyperparameter optimization for a model using sentence embeddings.

Parameters:

st_data (Union[np.ndarray, str]) – Either a NumPy array of sentence embeddings or a string representing the path to a NumPy file containing the embeddings.
seed (Optional[int]) – Random seed for reproducibility. Defaults to 42.
trials (Optional[int]) – Number of optimization trials to run. Defaults to 50.
archive_path (Optional[str]) – Path to a NumPy file where the optimized parameters will be archived.

Returns:

A dictionary containing the optimized hyperparameters.

Return type:

Dict[str, Union[int, float]]

Raises:

ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the archive path directory does not exist.
PermissionError – If there is no write access to the archive path directory.

Analysis Functions

bom_analyzer.caller.label_outliers(table: DataFrame | str, archive_path: str | None = None) → DataFrame[source]

Calculates outlier density for each cluster in a table and appends it as a new column.

Parameters:

table (pd.DataFrame) – Pandas DataFrame containing cluster labels in a column named ‘CLUSTERS’ and outlier indicators in a column named ‘HWRMA’.
archive_path (Optional[str]) – Path to a CSV file where the resulting table will be archived.

Returns:

The original table with an additional column ‘OUTLIER_DENSITY’ containing: the calculated outlier density for each cluster.

Return type:

pd.DataFrame

Raises:

ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the archive path directory does not exist.
PermissionError – If there is no write access to the archive path directory.
IOError – If the ‘CLUSTERS’ column is not present in the table.

bom_analyzer.caller.report_outliers(table: DataFrame | str, threshold: float, archive_path: str | None = None) → DataFrame[source]

Filters outliers based on a specified outlier density threshold.

Parameters:

table (Union[pd.DataFrame, str]) – Either a Pandas DataFrame containing the data or a string representing the path to a CSV file containing the data.
threshold (float) – The threshold above which outliers will be reported.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.

Returns:

A DataFrame containing outliers that exceed the specified outlier density threshold.

Return type:

pd.DataFrame

Raises:

ValueError – If the inputs are not of the expected types or threshold is not between 0 and 1.

bom_analyzer.caller.report_suspect_components(table: DataFrame | str, num_clusters: int, archive_path: str | None = None) → DataFrame[source]

Identifies potential component suspects based on cluster analysis and outlier density.

Parameters:

table (Union[pd.DataFrame, str]) – Pandas DataFrame (or path to it) containing: - A column named ‘CLUSTERS’ with cluster labels. - A column named ‘OUTLIER_DENSITY’ calculated by report_outliers. - Any additional component information used for grouping by group_components.
num_clusters (int) – Maximum number of clusters to consider as potential sources of suspects.
archive_path (Optional[str]) – Path to a CSV file where the identified suspects will be archived.

Returns:

A DataFrame containing potential suspects, identified as components: in clusters with high outlier density and not present in clusters with lower density.

Return type:

pd.DataFrame

Raises:

ValueError – If num_clusters is less than 1 or greater than the number of unique clusters. or the inputs are not of the expected types.
FileNotFoundError – If the archive path directory does not exist.
PermissionError – If there is no write access to the archive path directory.
IOError – If required columns (‘CLUSTERS’, ‘OUTLIER_DENSITY’) are missing in the table.

bom_analyzer.caller.report_suspect_units(suspect_components: DataFrame | str, bom: DataFrame | str, archive_path: str | None = None) → DataFrame[source]

Filters the input table to include only the units containing the suspect components.

Parameters:

suspect_components (Union[pd.DataFrame, str]) – DataFrame containing suspect components or the file path to the suspect components table.
bom (Union[pd.DataFrame, str]) – DataFrame containing the Bill of Materials (BoM) or the file path to the BoM table.
archive_path (Optional[str]) – Path to a CSV file where the filtered units will be archived.

Returns:

Filtered units DataFrame containing only the units containing the suspect components.

Return type:

pd.DataFrame

bom_analyzer.caller.find_sernum(sernum_values: List, table: DataFrame | str, archive_path: str | None = None) → DataFrame[source]

Filters input table to include only the units containing the specified sernum values in the SERNUM column.

Parameters:

sernum_values (List) – The list of sernum values to filter by.
table (Union[pd.DataFrame, str]) – The DataFrame or file path to the DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.

Returns:

Filtered DataFrame containing only the units containing the specified sernum value(s).

Return type:

pd.DataFrame

Raises:

ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.

bom_analyzer.caller.find_cluster(cluster_values: List[int], table: DataFrame | str, archive_path: str | None = None) → DataFrame[source]

Finds all rows in the table with a specified cluster label.

Parameters:

cluster_values (List[int]) – The cluster label(s) to filter by.
table (pd.DataFrame) – The DataFrame to search within.
archive_path (Optional[str]) – Path to a CSV file where the DataFrame of all rows in the specified cluster will be archived.

Returns:

A DataFrame containing all rows with the matching cluster label.

Return type:

pd.DataFrame

Raises:

ValueError – If the inputs are not of the expected types.

bom_analyzer.caller.find_cluster_by_sernum(sernums: List, table: DataFrame | str, archive_path: str | None = None) → DataFrame[source]

Finds all rows in the table that belong to the same cluster of the specified serial number(s).

Parameters:

sernums (List) – The serial number(s) to identify the cluster.
table (pd.DataFrame) – The DataFrame to search within.
archive_path (Optional[str]) – Path to a CSV file where the DataFrame of all rows in the same cluster as the specified serial number will be archived.

Returns:

A DataFrame containing all rows belonging to the same cluster as the specified serial number.

Return type:

pd.DataFrame

Raises:

ValueError – If the inputs are not of the expected types.

bom_analyzer.caller.find_neighbors(sernums: List, table: DataFrame | str, n_neighbors: str | int, archive_path: str | None = None) → DataFrame[source]

Finds the n closest neighbors to the specified serial number(s) in the dimension-reduced space.

Parameters:

sernums (List) – The serial number(s) to find neighbors for.
table (pd.DataFrame) – The DataFrame containing dimensionally reduced data.
n_neighbors (int) – The number of neighbors to retrieve.
archive_path (Optional[str]) – Path to a CSV file where the DataFrame of closest neighbors will be stored.

Returns:

A DataFrame containing the num closest neighbors to the specified serial number.

Return type:

pd.DataFrame

Raises:

ValueError – If the inputs are not of the expected types or n_neighbors < 0

bom_analyzer.caller.find_differences(table: DataFrame | str, sernum_values: List | None = None, archive_path: str | None = None) → DataFrame[source]

Reduces list of parts to the differences between them. Removes identical columns. :param table: The DataFrame to examine, or a file path to that DataFrame. :type table: pd.DataFrame :param sernum_values: The set of serial number(s) from the table that will be checked.

If None, the whole table will be checked.

Parameters:

archive_path (Optional[str]) – Path to a CSV file where the DataFrame of part differences will be archived.

Returns:

A DataFrame describing the differences between the entries in the set.

Return type:

pd.DataFrame

Raises:

ValueError – If the inputs are not of the expected types or if the column_filter contains non-existent columns.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.

Filter Functions

bom_analyzer.caller.filter_for_HWRMA(table: DataFrame | str, archive_path: str | None = None) → DataFrame[source]

Filters a dataset to include only rows marked as anomalies.

Parameters:

table (Union[pd.DataFrame, str]) – Either a Pandas DataFrame containing the data or a string representing the path to a CSV file containing the data.
archive_path (Optional[str]) – Path to a CSV file where the DataFrame of anomalies will be archived.

Returns:

A DataFrame containing only the rows where the ‘HWRMA’ column is True,: indicating known anomalies.

Return type:

pd.DataFrame

Raises:

ValueError – If the input is the wrong type or missing necessary columns.

bom_analyzer.caller.filter_by_column_header(column_filter: List[str], table: DataFrame | str, archive_path: str | None = None) → DataFrame[source]

Filters input digest to list only the specified properties for each part

Parameters:

column_filter (List[str]) – The columns that must persist after culling.
table (Union[pd.DataFrame, str]) – The DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.

Returns:

A DataFrame containing filtered part data for each part in the input set.

Return type:

pd.DataFrame

Raises:

ValueError – If the inputs are not of the expected types or if the column_filter contains non-existent columns.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.

bom_analyzer.caller.filter_by_PCA(pca_values: List, table: DataFrame | str, archive_path: str | None = None) → DataFrame[source]

Filters input table to include only the units containing the specified PCA value in the PCA column.

Parameters:

pca_values (List) – The list of PCA values to filter by.
table (Union[pd.DataFrame, str]) – The DataFrame or file path to the DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.

Returns:

Filtered DataFrame containing only the units containing the specified PCA value(s).

Return type:

pd.DataFrame

Raises:

ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.

bom_analyzer.caller.filter_by_CPN(cpn_values: List, table: DataFrame | str, archive_path: str | None = None) → DataFrame[source]

Filters input table to include only the units containing the specified CPN value in any CPN_i column.

Parameters:

cpn_values (List) – The list of CPN values to filter by.
table (Union[pd.DataFrame, str]) – The DataFrame or file path to the DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.

Returns:

Filtered DataFrame containing only the units containing the specified CPN value(s).

Return type:

pd.DataFrame

Raises:

ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.

bom_analyzer.caller.filter_by_DateCode(datecode_values: List, table: DataFrame | str, archive_path: str | None = None) → DataFrame[source]

Filters input table to include only the units containing the specified DateCode value in any DateCode_i column.

Parameters:

datecode_values (List) – The list of DateCode values to filter by.
table (Union[pd.DataFrame, str]) – The DataFrame or file path to the DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.

Returns:

Filtered DataFrame containing only the units containing the specified DateCode value(s).

Return type:

pd.DataFrame

Raises:

ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.

bom_analyzer.caller.filter_by_LOTCODE(lotcode_values: List, table: DataFrame | str, archive_path: str | None = None) → DataFrame[source]

Filters input table to include only the units containing the specified LOTCODE value in any LOTCODE_i column.

Parameters:

lotcode_values (List) – The list of LOTCODE values to filter by.
table (Union[pd.DataFrame, str]) – The DataFrame or file path to the DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.

Returns:

Filtered DataFrame containing only the units containing the specified LOTCODE value(s).

Return type:

pd.DataFrame

Raises:

ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.

bom_analyzer.caller.filter_by_MPN(mpn_values: List, table: DataFrame | str, archive_path: str | None = None) → DataFrame[source]

Filters input table to include only the units containing the specified MPN value in any MPN_i column.

Parameters:

mpn_values (List) – The list of MPN values to filter by.
table (Union[pd.DataFrame, str]) – The DataFrame or file path to the DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.

Returns:

Filtered DataFrame containing only the units containing the specified MPN value(s).

Return type:

pd.DataFrame

Raises:

ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.

bom_analyzer.caller.filter_by_RD(rd_values: List, table: DataFrame | str, archive_path: str | None = None) → DataFrame[source]

Filters input table to include only the units containing the specified RD value in any RD_i column.

Parameters:

rd_values (List) – The list of RD values to filter by.
table (Union[pd.DataFrame, str]) – The DataFrame or file path to the DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.

Returns:

Filtered DataFrame containing only the units containing the specified RD value(s).

Return type:

pd.DataFrame

Raises:

ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.

bom_analyzer.caller.filter_by_Util(header: str, values: List, table: DataFrame | str, archive_path: str | None = None) → DataFrame[source]

Filters input table to include only the units containing the specified values in any column: specified by the ‘header’ input

Parameters:

header (str) – The columns whose values will be checked
values (List) – The list of values to check for
table (Union[pd.DataFrame, str]) – The DataFrame or file path to the DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.

Returns:

Filtered DataFrame containing only the units containing the specified MPN value(s).

Return type:

pd.DataFrame

Raises:

ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.

Graphing Functions

bom_analyzer.caller.plot_clusters(table: DataFrame | str, archive_path: str | None = None) → None[source]

Generates a plot of data points colored by their cluster labels.

Parameters:

table (pd.DataFrame) – A DataFrame containing columns named ‘DATA_X’, ‘DATA_Y’, and ‘CLUSTERS’, representing the dimensionally reduced data and cluster assignments.
archive_path (str, optional) – Path to a file where an image of the plot will be archived.

Raises:

ValueError – If the input is the wrong type
IOError – If the required columns are not present in the table.

bom_analyzer.caller.plot_hwrma(table: DataFrame | str, archive_path: str | None = None) → None[source]

Generates a plot of data points colored by their HWRMA (anomaly) status.

Parameters:

table (pd.DataFrame) – A DataFrame containing columns named ‘DATA_X’, ‘DATA_Y’, and ‘HWRMA’, representing the dimensionally reduced data and HWRMA labels.
archive_path (str, optional) – Path to a file where an image of the plot will be archived.

Raises:

ValueError – If the input is the wrong type
IOError – If the required columns are not present in the table.

Util Functions

bom_analyzer.caller.to_dataframe(pd_data: str | DataFrame) → DataFrame[source]

Ensures that the input is a pandas DataFrame, either by loading it from a CSV file or directly using the provided DataFrame.

Parameters:: pd_data (Union[str, pd.DataFrame]) – A pandas DataFrame or a string representing the path to a CSV file.
Returns:: The pandas DataFrame.
Return type:: pd.DataFrame
Raises:: ValueError – If the input is not a pandas DataFrame or a string representing a file path.

bom_analyzer.caller.to_ndarray(np_data: str | ndarray) → ndarray[source]

Ensures that the input is a NumPy array, either by loading it from a file or directly using the provided array.

Parameters:: np_data (Union[str, np.ndarray]) – A NumPy array or a string representing the path to a NumPy array file.
Returns:: The NumPy array.
Return type:: np.ndarray
Raises:: ValueError – If the input is not a NumPy array or a string representing a file path.

bom_analyzer.caller.to_dict(dict_data: str | Dict) → Dict[source]

Ensures that the input is a dictionary, either by loading it from a json file or directly using the provided dictionary.

Parameters:: dict_data (Union[str, Dict]) – A pandas DataFrame or a string representing the path to a CSV file.
Returns:: The dictionary.
Return type:: Dict
Raises:: ValueError – If the input is not a dictionary or a string representing a file path.

bom_analyzer.caller.combine_boms(bom_path_1: str, bom_path_2: str, archive_path: str | None) → DataFrame[source]

Combines two CSV files containing bill of materials (BOMs) into a single DataFrame.

Parameters:

bom_path_1 (str) – The path to the first BOM CSV file.
bom_path_2 (str) – The path to the second BOM CSV file.
archive_path (Optional[str]) – The path to save the combined BOM data. Defaults to None.

Returns:

A pandas DataFrame containing the combined BOM data.

Return type:

pd.DataFrame

Raises:

FileNotFoundError – If either of the specified CSV files does not exist.
ValueError – If either of the CSV files does not contain the required headers.