Helper Functions

Note

These functions are helper functions called by the API. They are not user callable.

Analysis Functions

bom_analyzer.analysis.cluster.clustering(umap_data: ndarray, param_dict: dict) ndarray[source]

Performs clustering on a 2D NumPy array using HDBSCAN.

Parameters:
  • umap_data (np.ndarray) – The 2D NumPy array containing the data points to cluster.

  • param_dict (dict) – A dictionary containing hyperparameters for HDBSCAN, including: - min_cluster_size: The minimum size of clusters. - min_samples: The minimum number of samples required to form a cluster. - alpha: The minimum span distance for DBSCAN.

Returns:

A NumPy array containing cluster labels for each data point.

Return type:

np.ndarray

bom_analyzer.analysis.cluster.dimension_reduction(st_data: ndarray, param_dict: dict, seed: int) ndarray[source]

Reduces the dimensionality of a NumPy array containing sentence embeddings using UMAP.

Parameters:
  • st_data (np.ndarray) – The NumPy array containing the sentence embeddings (assumed to have higher dimensionality).

  • param_dict (dict) – A dictionary containing hyperparameters for UMAP, including: - n_neighbors: The number of neighbors to consider for each data point. - min_dist: The minimum distance between embedded points.

  • seed (int) – The random seed for UMAP (for reproducibility).

Returns:

The reduced-dimensionality NumPy array representing the data in 2D space.

Return type:

np.ndarray

bom_analyzer.analysis.optimization.objective_function(trial: Trial, data: ndarray, seed: int) float[source]

Objective function used for hyperparameter optimization in optimize_hyperparameters.

Parameters:
  • trial (optuna.Trial) – The Optuna trial object used for suggesting hyperparameters.

  • data (np.ndarray) – The NumPy array containing the data to use for evaluation.

  • seed (int) – The random seed for UMAP (for reproducibility).

Returns:

The DBCV score of the clustering results using the suggested hyperparameters.

Return type:

float

bom_analyzer.analysis.optimization.optimize_hyperparameters(data: ndarray, seed: int, trials: int = 50) Dict[str, int | float][source]

Optimizes hyperparameters for UMAP and HDBSCAN using Optuna and the DBCV score as the objective function.

Parameters:
  • data (np.ndarray) – The NumPy array containing the data to use for optimization.

  • seed (int) – The random seed for Optuna (for reproducibility).

  • trials (int, optional) – The number of hyperparameter configurations to try. Defaults to 50.

Returns:

The dictionary containing the best hyperparameter values found during optimization.

Return type:

Dict[str, Union[int, float]]

bom_analyzer.analysis.outlier_detection.group_components(table: DataFrame, labels: ndarray) DataFrame[source]

Groups components from a DataFrame based on specified cluster labels and extracts relevant data.

Parameters:
  • table (pd.DataFrame) – The input DataFrame containing product data.

  • labels (np.ndarray) – A NumPy array containing cluster labels to group components by.

Returns:

A new DataFrame containing the grouped components with columns:
  • CPN: Component part number

  • DateCode: Manufacturing date code

  • LOTCODE: Lot code

  • MPN: Manufacturer part number

  • RD: Revision date

Return type:

pd.DataFrame

bom_analyzer.analysis.outlier_detection.parse_columns(table: DataFrame) List[int][source]

Finds the indices of columns starting with “CPN” and the “HWRMA” column in a DataFrame. Helper function used by ‘group_components’.

Parameters:

table (pd.DataFrame) – The input DataFrame.

Returns:

A list of column indices, including those starting with “CPN” and the “HWRMA” column.

Return type:

list

Data Functions

bom_analyzer.data.archive.archive_dict(archive_path: str, dict_data: Dict) None[source]

Saves a dictionary to a specified archive file in JSON format.

Parameters:
  • archive_path (str) – The path to the archive file.

  • dict_data (Dict) – The dictionary to save.

Raises:
  • ValueError – If the input archive_path is not a string.

  • FileNotFoundError – If the directory for the archive file does not exist.

  • PermissionError – If there is no write access to the directory for the archive file.

bom_analyzer.data.archive.archive_err_check(archive_path: str) None[source]

Checks for errors related to the specified archive path.

Parameters:

archive_path (str) – The path to the archive file.

Raises:
  • ValueError – If the input archive_path is not a string.

  • PermissionError – If there is no write access to the directory for the archive file.

bom_analyzer.data.archive.archive_np_data(archive_path: str, np_data: ndarray) None[source]

Saves a NumPy array to a specified archive file.

Parameters:
  • archive_path (str) – The path to the archive file.

  • np_data (np.ndarray) – The NumPy array to save.

Raises:
  • ValueError – If the input archive_path is not a string.

  • FileNotFoundError – If the directory for the archive file does not exist.

  • PermissionError – If there is no write access to the directory for the archive file.

bom_analyzer.data.archive.archive_pd_data(archive_path: str, pd_data: DataFrame) None[source]

Saves a pandas DataFrame to a specified archive file in CSV format.

Parameters:
  • archive_path (str) – The path to the archive file.

  • pd_data (pd.DataFrame) – The pandas DataFrame to save.

Raises:
  • ValueError – If the input archive_path is not a string.

  • FileNotFoundError – If the directory for the archive file does not exist.

  • PermissionError – If there is no write access to the directory for the archive file.

bom_analyzer.data.preprocess.preprocess(csv_path: str) ndarray[source]

Preprocesses a CSV file containing a bill of materials (BOM) for sentence transformation.

Parameters:

csv_path (str) – The path to the CSV file.

Returns:

A NumPy array containing the preprocessed BOM data, ready for sentence transformation.

Return type:

np.ndarray

bom_analyzer.data.preprocess.sentence_transform(data: ndarray, device: str) ndarray[source]

Encodes a NumPy array of product strings using a sentence transformer model.

Parameters:
  • data (np.ndarray) – A NumPy array of product strings to encode.

  • device (str) – The device to use for model computation (e.g., ‘cpu’ or ‘cuda’).

Returns:

A NumPy array containing the encoded sentence embeddings.

Return type:

np.ndarray

Visualization Functions

bom_analyzer.visualization.graph.plot_data(arr, heading='_')[source]

Plots a 2D scatter plot of the given NumPy array with a specified heading.

Parameters:
  • arr (np.ndarray) – The NumPy array containing the data to plot (assumed to have two columns).

  • heading (str, optional) – The title for the plot. Defaults to “_”.

bom_analyzer.visualization.graph.plot_labeled_data(arr, labels, heading='', archive_path=None)[source]

Plots a 2D scatter plot of the given NumPy array with color-coded labels and a specified heading.

Parameters:
  • arr (np.ndarray) – The NumPy array containing the data to plot (assumed to have two columns).

  • labels (list) – A list of labels corresponding to each data point.

  • heading (str, optional) – The title for the plot. Defaults to “”.

  • archive_path (str, optional) – The path to save the plot as an image. Defaults to None.