API Reference
Processing Functions
- bom_analyzer.caller.run_sentence_transform(csv_path: str | None = None, device: str = 'cpu', load_path: str | None = None, archive_path: str | None = None) ndarray[source]
Performs sentence-level semantic similarity analysis on data from a CSV file.
- Parameters:
csv_path (str) – Path to the CSV file containing the data. Required headers: ‘SERNUM’,’PCA’,’CPN_1’,’DateCode_1’,’LOTCODE_1’,’MPN_1’,’RD_1’, ‘HWRMA’.
device (str) – Device to use for sentence transformation. Defaults to ‘cpu’.
load_path (Optional[str]) – Path to a NumPy file containing archived data to load instead of preprocessing.
archive_path (Optional[str]) – Path to a NumPy file where the transformed data will be archived.
- Raises:
FileNotFoundError – If the CSV file or load path is not found, or if the directory for the archive path does not exist.
PermissionError – If there is no write access to the directory for the archive path.
ValueError – If an invalid device is used for sentence transformation or the CSV file does not contain the required headers.
- Returns:
NumPy array containing the transformed sentence embeddings.
- Return type:
np.ndarray
- bom_analyzer.caller.run_dimension_reduction(table: DataFrame | str, st_embeddings: ndarray | str, param_dict: str | Dict[str, int | float], seed: int = 42, archive_path: str | None = None) DataFrame[source]
Performs dimensionality reduction on sentence embeddings and appends the reduced dimensions to a table.
- Parameters:
table (Union[pd.DataFrame, str]) – Either a Pandas DataFrame containing the data or a string representing the path to a CSV file containing the data.
st_embeddings (Union[np.ndarray, str]) – Either a NumPy array of sentence embeddings or a string representing the path to a NumPy file containing the embeddings.
param_dict (Dict[str, Union[int, float]]) – A dictionary containing the parameters for the dimension reduction algorithm.
seed (Optional[int]) – Random seed for reproducibility. Defaults to 42.
archive_path (Optional[str]) – Path to a CSV file where the resulting table will be archived.
- Returns:
- The original table with two additional columns: ‘DATA_X’ and ‘DATA_Y’
containing the reduced dimensions.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the archive path directory does not exist.
PermissionError – If there is no write access to the archive path directory.
- bom_analyzer.caller.run_clustering(table: DataFrame | str, param_dict: str | Dict[str, int | float], archive_path: str | None = None) DataFrame[source]
Performs clustering on dimensionally reduced data and appends the cluster labels to a table.
- Parameters:
table (Union[pd.DataFrame, str]) – Either a Pandas DataFrame containing the data or a string representing the path to a CSV file containing the data.
param_dict (Dict[str, Union[int, float]]) – A dictionary containing the parameters for the clustering algorithm.
archive_path (Optional[str]) – Path to a CSV file where the resulting table will be archived.
- Returns:
The original table with an additional column ‘CLUSTERS’ containing the assigned cluster labels.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the archive path directory does not exist.
PermissionError – If there is no write access to the archive path directory.
IOError – If the required columns are not present in the table.
- bom_analyzer.caller.run_optimizer(st_data: ndarray | str, seed: int = 42, trials: int = 50, archive_path: str | None = None) Dict[str, int | float][source]
Performs hyperparameter optimization for a model using sentence embeddings.
- Parameters:
st_data (Union[np.ndarray, str]) – Either a NumPy array of sentence embeddings or a string representing the path to a NumPy file containing the embeddings.
seed (Optional[int]) – Random seed for reproducibility. Defaults to 42.
trials (Optional[int]) – Number of optimization trials to run. Defaults to 50.
archive_path (Optional[str]) – Path to a NumPy file where the optimized parameters will be archived.
- Returns:
A dictionary containing the optimized hyperparameters.
- Return type:
Dict[str, Union[int, float]]
- Raises:
ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the archive path directory does not exist.
PermissionError – If there is no write access to the archive path directory.
Analysis Functions
- bom_analyzer.caller.label_outliers(table: DataFrame | str, archive_path: str | None = None) DataFrame[source]
Calculates outlier density for each cluster in a table and appends it as a new column.
- Parameters:
table (pd.DataFrame) – Pandas DataFrame containing cluster labels in a column named ‘CLUSTERS’ and outlier indicators in a column named ‘HWRMA’.
archive_path (Optional[str]) – Path to a CSV file where the resulting table will be archived.
- Returns:
- The original table with an additional column ‘OUTLIER_DENSITY’ containing
the calculated outlier density for each cluster.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the archive path directory does not exist.
PermissionError – If there is no write access to the archive path directory.
IOError – If the ‘CLUSTERS’ column is not present in the table.
- bom_analyzer.caller.report_outliers(table: DataFrame | str, threshold: float, archive_path: str | None = None) DataFrame[source]
Filters outliers based on a specified outlier density threshold.
- Parameters:
table (Union[pd.DataFrame, str]) – Either a Pandas DataFrame containing the data or a string representing the path to a CSV file containing the data.
threshold (float) – The threshold above which outliers will be reported.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.
- Returns:
A DataFrame containing outliers that exceed the specified outlier density threshold.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the inputs are not of the expected types or threshold is not between 0 and 1.
- bom_analyzer.caller.report_suspect_components(table: DataFrame | str, num_clusters: int, archive_path: str | None = None) DataFrame[source]
Identifies potential component suspects based on cluster analysis and outlier density.
- Parameters:
table (Union[pd.DataFrame, str]) – Pandas DataFrame (or path to it) containing: - A column named ‘CLUSTERS’ with cluster labels. - A column named ‘OUTLIER_DENSITY’ calculated by report_outliers. - Any additional component information used for grouping by group_components.
num_clusters (int) – Maximum number of clusters to consider as potential sources of suspects.
archive_path (Optional[str]) – Path to a CSV file where the identified suspects will be archived.
- Returns:
- A DataFrame containing potential suspects, identified as components
in clusters with high outlier density and not present in clusters with lower density.
- Return type:
pd.DataFrame
- Raises:
ValueError – If num_clusters is less than 1 or greater than the number of unique clusters. or the inputs are not of the expected types.
FileNotFoundError – If the archive path directory does not exist.
PermissionError – If there is no write access to the archive path directory.
IOError – If required columns (‘CLUSTERS’, ‘OUTLIER_DENSITY’) are missing in the table.
- bom_analyzer.caller.report_suspect_units(suspect_components: DataFrame | str, bom: DataFrame | str, archive_path: str | None = None) DataFrame[source]
Filters the input table to include only the units containing the suspect components.
- Parameters:
suspect_components (Union[pd.DataFrame, str]) – DataFrame containing suspect components or the file path to the suspect components table.
bom (Union[pd.DataFrame, str]) – DataFrame containing the Bill of Materials (BoM) or the file path to the BoM table.
archive_path (Optional[str]) – Path to a CSV file where the filtered units will be archived.
- Returns:
Filtered units DataFrame containing only the units containing the suspect components.
- Return type:
pd.DataFrame
- bom_analyzer.caller.find_sernum(sernum_values: List, table: DataFrame | str, archive_path: str | None = None) DataFrame[source]
Filters input table to include only the units containing the specified sernum values in the SERNUM column.
- Parameters:
sernum_values (List) – The list of sernum values to filter by.
table (Union[pd.DataFrame, str]) – The DataFrame or file path to the DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.
- Returns:
Filtered DataFrame containing only the units containing the specified sernum value(s).
- Return type:
pd.DataFrame
- Raises:
ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.
- bom_analyzer.caller.find_cluster(cluster_values: List[int], table: DataFrame | str, archive_path: str | None = None) DataFrame[source]
Finds all rows in the table with a specified cluster label.
- Parameters:
cluster_values (List[int]) – The cluster label(s) to filter by.
table (pd.DataFrame) – The DataFrame to search within.
archive_path (Optional[str]) – Path to a CSV file where the DataFrame of all rows in the specified cluster will be archived.
- Returns:
A DataFrame containing all rows with the matching cluster label.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the inputs are not of the expected types.
- bom_analyzer.caller.find_cluster_by_sernum(sernums: List, table: DataFrame | str, archive_path: str | None = None) DataFrame[source]
Finds all rows in the table that belong to the same cluster of the specified serial number(s).
- Parameters:
sernums (List) – The serial number(s) to identify the cluster.
table (pd.DataFrame) – The DataFrame to search within.
archive_path (Optional[str]) – Path to a CSV file where the DataFrame of all rows in the same cluster as the specified serial number will be archived.
- Returns:
A DataFrame containing all rows belonging to the same cluster as the specified serial number.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the inputs are not of the expected types.
- bom_analyzer.caller.find_neighbors(sernums: List, table: DataFrame | str, n_neighbors: str | int, archive_path: str | None = None) DataFrame[source]
Finds the n closest neighbors to the specified serial number(s) in the dimension-reduced space.
- Parameters:
sernums (List) – The serial number(s) to find neighbors for.
table (pd.DataFrame) – The DataFrame containing dimensionally reduced data.
n_neighbors (int) – The number of neighbors to retrieve.
archive_path (Optional[str]) – Path to a CSV file where the DataFrame of closest neighbors will be stored.
- Returns:
A DataFrame containing the num closest neighbors to the specified serial number.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the inputs are not of the expected types or n_neighbors < 0
- bom_analyzer.caller.find_differences(table: DataFrame | str, sernum_values: List | None = None, archive_path: str | None = None) DataFrame[source]
Reduces list of parts to the differences between them. Removes identical columns. :param table: The DataFrame to examine, or a file path to that DataFrame. :type table: pd.DataFrame :param sernum_values: The set of serial number(s) from the table that will be checked.
If None, the whole table will be checked.
- Parameters:
archive_path (Optional[str]) – Path to a CSV file where the DataFrame of part differences will be archived.
- Returns:
A DataFrame describing the differences between the entries in the set.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the inputs are not of the expected types or if the column_filter contains non-existent columns.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.
Filter Functions
- bom_analyzer.caller.filter_for_HWRMA(table: DataFrame | str, archive_path: str | None = None) DataFrame[source]
Filters a dataset to include only rows marked as anomalies.
- Parameters:
table (Union[pd.DataFrame, str]) – Either a Pandas DataFrame containing the data or a string representing the path to a CSV file containing the data.
archive_path (Optional[str]) – Path to a CSV file where the DataFrame of anomalies will be archived.
- Returns:
- A DataFrame containing only the rows where the ‘HWRMA’ column is True,
indicating known anomalies.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the input is the wrong type or missing necessary columns.
- bom_analyzer.caller.filter_by_column_header(column_filter: List[str], table: DataFrame | str, archive_path: str | None = None) DataFrame[source]
Filters input digest to list only the specified properties for each part
- Parameters:
column_filter (List[str]) – The columns that must persist after culling.
table (Union[pd.DataFrame, str]) – The DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.
- Returns:
A DataFrame containing filtered part data for each part in the input set.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the inputs are not of the expected types or if the column_filter contains non-existent columns.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.
- bom_analyzer.caller.filter_by_PCA(pca_values: List, table: DataFrame | str, archive_path: str | None = None) DataFrame[source]
Filters input table to include only the units containing the specified PCA value in the PCA column.
- Parameters:
pca_values (List) – The list of PCA values to filter by.
table (Union[pd.DataFrame, str]) – The DataFrame or file path to the DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.
- Returns:
Filtered DataFrame containing only the units containing the specified PCA value(s).
- Return type:
pd.DataFrame
- Raises:
ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.
- bom_analyzer.caller.filter_by_CPN(cpn_values: List, table: DataFrame | str, archive_path: str | None = None) DataFrame[source]
Filters input table to include only the units containing the specified CPN value in any CPN_i column.
- Parameters:
cpn_values (List) – The list of CPN values to filter by.
table (Union[pd.DataFrame, str]) – The DataFrame or file path to the DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.
- Returns:
Filtered DataFrame containing only the units containing the specified CPN value(s).
- Return type:
pd.DataFrame
- Raises:
ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.
- bom_analyzer.caller.filter_by_DateCode(datecode_values: List, table: DataFrame | str, archive_path: str | None = None) DataFrame[source]
Filters input table to include only the units containing the specified DateCode value in any DateCode_i column.
- Parameters:
datecode_values (List) – The list of DateCode values to filter by.
table (Union[pd.DataFrame, str]) – The DataFrame or file path to the DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.
- Returns:
Filtered DataFrame containing only the units containing the specified DateCode value(s).
- Return type:
pd.DataFrame
- Raises:
ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.
- bom_analyzer.caller.filter_by_LOTCODE(lotcode_values: List, table: DataFrame | str, archive_path: str | None = None) DataFrame[source]
Filters input table to include only the units containing the specified LOTCODE value in any LOTCODE_i column.
- Parameters:
lotcode_values (List) – The list of LOTCODE values to filter by.
table (Union[pd.DataFrame, str]) – The DataFrame or file path to the DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.
- Returns:
Filtered DataFrame containing only the units containing the specified LOTCODE value(s).
- Return type:
pd.DataFrame
- Raises:
ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.
- bom_analyzer.caller.filter_by_MPN(mpn_values: List, table: DataFrame | str, archive_path: str | None = None) DataFrame[source]
Filters input table to include only the units containing the specified MPN value in any MPN_i column.
- Parameters:
mpn_values (List) – The list of MPN values to filter by.
table (Union[pd.DataFrame, str]) – The DataFrame or file path to the DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.
- Returns:
Filtered DataFrame containing only the units containing the specified MPN value(s).
- Return type:
pd.DataFrame
- Raises:
ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.
- bom_analyzer.caller.filter_by_RD(rd_values: List, table: DataFrame | str, archive_path: str | None = None) DataFrame[source]
Filters input table to include only the units containing the specified RD value in any RD_i column.
- Parameters:
rd_values (List) – The list of RD values to filter by.
table (Union[pd.DataFrame, str]) – The DataFrame or file path to the DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.
- Returns:
Filtered DataFrame containing only the units containing the specified RD value(s).
- Return type:
pd.DataFrame
- Raises:
ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.
- bom_analyzer.caller.filter_by_Util(header: str, values: List, table: DataFrame | str, archive_path: str | None = None) DataFrame[source]
- Filters input table to include only the units containing the specified values in any column
specified by the ‘header’ input
- Parameters:
header (str) – The columns whose values will be checked
values (List) – The list of values to check for
table (Union[pd.DataFrame, str]) – The DataFrame or file path to the DataFrame to filter.
archive_path (Optional[str]) – Path to a CSV file where the filtered DataFrame will be archived.
- Returns:
Filtered DataFrame containing only the units containing the specified MPN value(s).
- Return type:
pd.DataFrame
- Raises:
ValueError – If the inputs are not of the expected types.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the archive path directory.
Graphing Functions
- bom_analyzer.caller.plot_clusters(table: DataFrame | str, archive_path: str | None = None) None[source]
Generates a plot of data points colored by their cluster labels.
- Parameters:
table (pd.DataFrame) – A DataFrame containing columns named ‘DATA_X’, ‘DATA_Y’, and ‘CLUSTERS’, representing the dimensionally reduced data and cluster assignments.
archive_path (str, optional) – Path to a file where an image of the plot will be archived.
- Raises:
ValueError – If the input is the wrong type
IOError – If the required columns are not present in the table.
- bom_analyzer.caller.plot_hwrma(table: DataFrame | str, archive_path: str | None = None) None[source]
Generates a plot of data points colored by their HWRMA (anomaly) status.
- Parameters:
table (pd.DataFrame) – A DataFrame containing columns named ‘DATA_X’, ‘DATA_Y’, and ‘HWRMA’, representing the dimensionally reduced data and HWRMA labels.
archive_path (str, optional) – Path to a file where an image of the plot will be archived.
- Raises:
ValueError – If the input is the wrong type
IOError – If the required columns are not present in the table.
Util Functions
- bom_analyzer.caller.to_dataframe(pd_data: str | DataFrame) DataFrame[source]
Ensures that the input is a pandas DataFrame, either by loading it from a CSV file or directly using the provided DataFrame.
- Parameters:
pd_data (Union[str, pd.DataFrame]) – A pandas DataFrame or a string representing the path to a CSV file.
- Returns:
The pandas DataFrame.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the input is not a pandas DataFrame or a string representing a file path.
- bom_analyzer.caller.to_ndarray(np_data: str | ndarray) ndarray[source]
Ensures that the input is a NumPy array, either by loading it from a file or directly using the provided array.
- Parameters:
np_data (Union[str, np.ndarray]) – A NumPy array or a string representing the path to a NumPy array file.
- Returns:
The NumPy array.
- Return type:
np.ndarray
- Raises:
ValueError – If the input is not a NumPy array or a string representing a file path.
- bom_analyzer.caller.to_dict(dict_data: str | Dict) Dict[source]
Ensures that the input is a dictionary, either by loading it from a json file or directly using the provided dictionary.
- Parameters:
dict_data (Union[str, Dict]) – A pandas DataFrame or a string representing the path to a CSV file.
- Returns:
The dictionary.
- Return type:
Dict
- Raises:
ValueError – If the input is not a dictionary or a string representing a file path.
- bom_analyzer.caller.combine_boms(bom_path_1: str, bom_path_2: str, archive_path: str | None) DataFrame[source]
Combines two CSV files containing bill of materials (BOMs) into a single DataFrame.
- Parameters:
bom_path_1 (str) – The path to the first BOM CSV file.
bom_path_2 (str) – The path to the second BOM CSV file.
archive_path (Optional[str]) – The path to save the combined BOM data. Defaults to None.
- Returns:
A pandas DataFrame containing the combined BOM data.
- Return type:
pd.DataFrame
- Raises:
FileNotFoundError – If either of the specified CSV files does not exist.
ValueError – If either of the CSV files does not contain the required headers.