Helper Functions

Note

These functions are helper functions called by the API. They are not user callable.

Analysis Functions

bom_analyzer.analysis.cluster.clustering(umap_data: ndarray, param_dict: dict) → ndarray[source]

Performs clustering on a 2D NumPy array using HDBSCAN.

Parameters:

umap_data (np.ndarray) – The 2D NumPy array containing the data points to cluster.
param_dict (dict) – A dictionary containing hyperparameters for HDBSCAN, including: - min_cluster_size: The minimum size of clusters. - min_samples: The minimum number of samples required to form a cluster. - alpha: The minimum span distance for DBSCAN.

Returns:

A NumPy array containing cluster labels for each data point.

Return type:

np.ndarray

bom_analyzer.analysis.cluster.dimension_reduction(st_data: ndarray, param_dict: dict, seed: int) → ndarray[source]

Reduces the dimensionality of a NumPy array containing sentence embeddings using UMAP.

Parameters:

st_data (np.ndarray) – The NumPy array containing the sentence embeddings (assumed to have higher dimensionality).
param_dict (dict) – A dictionary containing hyperparameters for UMAP, including: - n_neighbors: The number of neighbors to consider for each data point. - min_dist: The minimum distance between embedded points.
seed (int) – The random seed for UMAP (for reproducibility).

Returns:

The reduced-dimensionality NumPy array representing the data in 2D space.

Return type:

np.ndarray

bom_analyzer.analysis.optimization.objective_function(trial: Trial, data: ndarray, seed: int) → float[source]

Objective function used for hyperparameter optimization in optimize_hyperparameters.

Parameters:

trial (optuna.Trial) – The Optuna trial object used for suggesting hyperparameters.
data (np.ndarray) – The NumPy array containing the data to use for evaluation.
seed (int) – The random seed for UMAP (for reproducibility).

Returns:

The DBCV score of the clustering results using the suggested hyperparameters.

Return type:

float

bom_analyzer.analysis.optimization.optimize_hyperparameters(data: ndarray, seed: int, trials: int = 50) → Dict[str, int | float][source]

Optimizes hyperparameters for UMAP and HDBSCAN using Optuna and the DBCV score as the objective function.

Parameters:

data (np.ndarray) – The NumPy array containing the data to use for optimization.
seed (int) – The random seed for Optuna (for reproducibility).
trials (int, optional) – The number of hyperparameter configurations to try. Defaults to 50.

Returns:

The dictionary containing the best hyperparameter values found during optimization.

Return type:

Dict[str, Union[int, float]]

bom_analyzer.analysis.outlier_detection.group_components(table: DataFrame, labels: ndarray) → DataFrame[source]

Groups components from a DataFrame based on specified cluster labels and extracts relevant data.

Parameters:

table (pd.DataFrame) – The input DataFrame containing product data.
labels (np.ndarray) – A NumPy array containing cluster labels to group components by.

Returns:

A new DataFrame containing the grouped components with columns:

CPN: Component part number
DateCode: Manufacturing date code
LOTCODE: Lot code
MPN: Manufacturer part number
RD: Revision date

Return type:

pd.DataFrame

bom_analyzer.analysis.outlier_detection.parse_columns(table: DataFrame) → List[int][source]

Finds the indices of columns starting with “CPN” and the “HWRMA” column in a DataFrame. Helper function used by ‘group_components’.

Parameters:: table (pd.DataFrame) – The input DataFrame.
Returns:: A list of column indices, including those starting with “CPN” and the “HWRMA” column.
Return type:: list

Data Functions

bom_analyzer.data.archive.archive_dict(archive_path: str, dict_data: Dict) → None[source]

Saves a dictionary to a specified archive file in JSON format.

Parameters:

archive_path (str) – The path to the archive file.
dict_data (Dict) – The dictionary to save.

Raises:

ValueError – If the input archive_path is not a string.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the directory for the archive file.

bom_analyzer.data.archive.archive_err_check(archive_path: str) → None[source]

Checks for errors related to the specified archive path.

Parameters:

archive_path (str) – The path to the archive file.

Raises:

ValueError – If the input archive_path is not a string.
PermissionError – If there is no write access to the directory for the archive file.

bom_analyzer.data.archive.archive_np_data(archive_path: str, np_data: ndarray) → None[source]

Saves a NumPy array to a specified archive file.

Parameters:

archive_path (str) – The path to the archive file.
np_data (np.ndarray) – The NumPy array to save.

Raises:

ValueError – If the input archive_path is not a string.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the directory for the archive file.

bom_analyzer.data.archive.archive_pd_data(archive_path: str, pd_data: DataFrame) → None[source]

Saves a pandas DataFrame to a specified archive file in CSV format.

Parameters:

archive_path (str) – The path to the archive file.
pd_data (pd.DataFrame) – The pandas DataFrame to save.

Raises:

ValueError – If the input archive_path is not a string.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the directory for the archive file.

bom_analyzer.data.preprocess.preprocess(csv_path: str) → ndarray[source]

Preprocesses a CSV file containing a bill of materials (BOM) for sentence transformation.

Parameters:: csv_path (str) – The path to the CSV file.
Returns:: A NumPy array containing the preprocessed BOM data, ready for sentence transformation.
Return type:: np.ndarray

bom_analyzer.data.preprocess.sentence_transform(data: ndarray, device: str) → ndarray[source]

Encodes a NumPy array of product strings using a sentence transformer model.

Parameters:

data (np.ndarray) – A NumPy array of product strings to encode.
device (str) – The device to use for model computation (e.g., ‘cpu’ or ‘cuda’).

Returns:

A NumPy array containing the encoded sentence embeddings.

Return type:

np.ndarray

Visualization Functions

bom_analyzer.visualization.graph.plot_data(arr, heading='_')[source]

Plots a 2D scatter plot of the given NumPy array with a specified heading.

Parameters:

arr (np.ndarray) – The NumPy array containing the data to plot (assumed to have two columns).
heading (str, optional) – The title for the plot. Defaults to “_”.

bom_analyzer.visualization.graph.plot_labeled_data(arr, labels, heading='', archive_path=None)[source]

Plots a 2D scatter plot of the given NumPy array with color-coded labels and a specified heading.

Parameters:

arr (np.ndarray) – The NumPy array containing the data to plot (assumed to have two columns).
labels (list) – A list of labels corresponding to each data point.
heading (str, optional) – The title for the plot. Defaults to “”.
archive_path (str, optional) – The path to save the plot as an image. Defaults to None.