Helper Functions
Note
These functions are helper functions called by the API. They are not user callable.
Analysis Functions
- bom_analyzer.analysis.cluster.clustering(umap_data: ndarray, param_dict: dict) ndarray[source]
Performs clustering on a 2D NumPy array using HDBSCAN.
- Parameters:
umap_data (np.ndarray) – The 2D NumPy array containing the data points to cluster.
param_dict (dict) – A dictionary containing hyperparameters for HDBSCAN, including: - min_cluster_size: The minimum size of clusters. - min_samples: The minimum number of samples required to form a cluster. - alpha: The minimum span distance for DBSCAN.
- Returns:
A NumPy array containing cluster labels for each data point.
- Return type:
np.ndarray
- bom_analyzer.analysis.cluster.dimension_reduction(st_data: ndarray, param_dict: dict, seed: int) ndarray[source]
Reduces the dimensionality of a NumPy array containing sentence embeddings using UMAP.
- Parameters:
st_data (np.ndarray) – The NumPy array containing the sentence embeddings (assumed to have higher dimensionality).
param_dict (dict) – A dictionary containing hyperparameters for UMAP, including: - n_neighbors: The number of neighbors to consider for each data point. - min_dist: The minimum distance between embedded points.
seed (int) – The random seed for UMAP (for reproducibility).
- Returns:
The reduced-dimensionality NumPy array representing the data in 2D space.
- Return type:
np.ndarray
- bom_analyzer.analysis.optimization.objective_function(trial: Trial, data: ndarray, seed: int) float[source]
Objective function used for hyperparameter optimization in optimize_hyperparameters.
- Parameters:
trial (optuna.Trial) – The Optuna trial object used for suggesting hyperparameters.
data (np.ndarray) – The NumPy array containing the data to use for evaluation.
seed (int) – The random seed for UMAP (for reproducibility).
- Returns:
The DBCV score of the clustering results using the suggested hyperparameters.
- Return type:
float
- bom_analyzer.analysis.optimization.optimize_hyperparameters(data: ndarray, seed: int, trials: int = 50) Dict[str, int | float][source]
Optimizes hyperparameters for UMAP and HDBSCAN using Optuna and the DBCV score as the objective function.
- Parameters:
data (np.ndarray) – The NumPy array containing the data to use for optimization.
seed (int) – The random seed for Optuna (for reproducibility).
trials (int, optional) – The number of hyperparameter configurations to try. Defaults to 50.
- Returns:
The dictionary containing the best hyperparameter values found during optimization.
- Return type:
Dict[str, Union[int, float]]
- bom_analyzer.analysis.outlier_detection.group_components(table: DataFrame, labels: ndarray) DataFrame[source]
Groups components from a DataFrame based on specified cluster labels and extracts relevant data.
- Parameters:
table (pd.DataFrame) – The input DataFrame containing product data.
labels (np.ndarray) – A NumPy array containing cluster labels to group components by.
- Returns:
- A new DataFrame containing the grouped components with columns:
CPN: Component part number
DateCode: Manufacturing date code
LOTCODE: Lot code
MPN: Manufacturer part number
RD: Revision date
- Return type:
pd.DataFrame
- bom_analyzer.analysis.outlier_detection.parse_columns(table: DataFrame) List[int][source]
Finds the indices of columns starting with “CPN” and the “HWRMA” column in a DataFrame. Helper function used by ‘group_components’.
- Parameters:
table (pd.DataFrame) – The input DataFrame.
- Returns:
A list of column indices, including those starting with “CPN” and the “HWRMA” column.
- Return type:
list
Data Functions
- bom_analyzer.data.archive.archive_dict(archive_path: str, dict_data: Dict) None[source]
Saves a dictionary to a specified archive file in JSON format.
- Parameters:
archive_path (str) – The path to the archive file.
dict_data (Dict) – The dictionary to save.
- Raises:
ValueError – If the input archive_path is not a string.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the directory for the archive file.
- bom_analyzer.data.archive.archive_err_check(archive_path: str) None[source]
Checks for errors related to the specified archive path.
- Parameters:
archive_path (str) – The path to the archive file.
- Raises:
ValueError – If the input archive_path is not a string.
PermissionError – If there is no write access to the directory for the archive file.
- bom_analyzer.data.archive.archive_np_data(archive_path: str, np_data: ndarray) None[source]
Saves a NumPy array to a specified archive file.
- Parameters:
archive_path (str) – The path to the archive file.
np_data (np.ndarray) – The NumPy array to save.
- Raises:
ValueError – If the input archive_path is not a string.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the directory for the archive file.
- bom_analyzer.data.archive.archive_pd_data(archive_path: str, pd_data: DataFrame) None[source]
Saves a pandas DataFrame to a specified archive file in CSV format.
- Parameters:
archive_path (str) – The path to the archive file.
pd_data (pd.DataFrame) – The pandas DataFrame to save.
- Raises:
ValueError – If the input archive_path is not a string.
FileNotFoundError – If the directory for the archive file does not exist.
PermissionError – If there is no write access to the directory for the archive file.
- bom_analyzer.data.preprocess.preprocess(csv_path: str) ndarray[source]
Preprocesses a CSV file containing a bill of materials (BOM) for sentence transformation.
- Parameters:
csv_path (str) – The path to the CSV file.
- Returns:
A NumPy array containing the preprocessed BOM data, ready for sentence transformation.
- Return type:
np.ndarray
- bom_analyzer.data.preprocess.sentence_transform(data: ndarray, device: str) ndarray[source]
Encodes a NumPy array of product strings using a sentence transformer model.
- Parameters:
data (np.ndarray) – A NumPy array of product strings to encode.
device (str) – The device to use for model computation (e.g., ‘cpu’ or ‘cuda’).
- Returns:
A NumPy array containing the encoded sentence embeddings.
- Return type:
np.ndarray
Visualization Functions
- bom_analyzer.visualization.graph.plot_data(arr, heading='_')[source]
Plots a 2D scatter plot of the given NumPy array with a specified heading.
- Parameters:
arr (np.ndarray) – The NumPy array containing the data to plot (assumed to have two columns).
heading (str, optional) – The title for the plot. Defaults to “_”.
- bom_analyzer.visualization.graph.plot_labeled_data(arr, labels, heading='', archive_path=None)[source]
Plots a 2D scatter plot of the given NumPy array with color-coded labels and a specified heading.
- Parameters:
arr (np.ndarray) – The NumPy array containing the data to plot (assumed to have two columns).
labels (list) – A list of labels corresponding to each data point.
heading (str, optional) – The title for the plot. Defaults to “”.
archive_path (str, optional) – The path to save the plot as an image. Defaults to None.