HMB Logo

Last Commit MIT License Python versions PyPI

HMB Documentation

Version: 0.2.0 — see the full changelog and contribution guidelines:

Welcome to the HMB Helpers Package documentation!

The HMB package provides a comprehensive suite of helper modules for image processing, text generation, PDF handling, performance metrics, and more. It is designed to accelerate research and development workflows in machine learning, computer vision, and natural language processing.

Note

This documentation covers all modules and utilities included in the HMB package.

Warning

This package is under active development. APIs, example scripts and behaviors may change between releases. Example scripts included in the repository are provided for demonstration purposes and may require additional datasets, environment configuration, or optional dependencies to run successfully. Use them as reference; exercise caution before running any example in production environments.

Features

  • Image comparison, normalization, and segmentation metrics.

  • Text generation and embedding utilities.

  • PDF processing helpers.

  • PyTorch segmentation losses.

  • Performance metrics and initializations.

  • Utilities for working with whole slide images (WSI).

Installation

Install the package as described in the Installation guide.

Quick pip install:

pip install hmb-helpers

Quickstart

Prerequisites

Before running examples, ensure you have:

  • Python 3.8+ installed

  • Required dependencies: pip install -r requirements.txt (see Installation for core minima)

  • Optional: GPU support for PyTorch/TensorFlow examples (see Installation)

Tip

For CPU-only environments, install PyTorch with:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

Note

The documentation and packaging intentionally keep the all extra from installing heavy frameworks (PyTorch/TensorFlow/Keras/tensorboard) so users can install the framework builds that match their platform and device. Use the pytorch or tensorflow extras or the official framework installers for platform-specific wheels.

Basic Examples

Image Helper: Calculate empty region percentage

from PIL import Image
import numpy as np
from HMB.ImagesHelper import GetEmptyPercentage

# Load and preprocess image
img = np.array(Image.open("path/to/image.png").convert("RGB"))

# Calculate empty region ratio (default shape: 256x256).
emptyRatio = GetEmptyPercentage(img, shape=(256, 256))
print(f"Empty region: {emptyRatio:.2%}")
# Expected output:
# Empty region: 12.34%

PDF Helper: Extract full text

from HMB.PDFHelper import ReadFullPDF

# Extract all text from PDF.
text = ReadFullPDF("path/to/document.pdf")
print(text[:200])  # Preview first 200 characters.
# Expected output:
# This is the beginning of the extracted PDF content...

Text Helper: Clean and normalize text

from HMB.TextHelper import CleanText

raw = "  Hello!!!  This is a SAMPLE text...  "
cleaned = CleanText(
  raw,
  lowercase=True,
  removeSpecialChars=True,
  normalizeWhitespace=True
)
print(f"Original: '{raw}'")
print(f"Cleaned:  '{cleaned}'")
# Expected output:
# Original: '  Hello!!!  This is a SAMPLE text...  '
# Cleaned:  'hello this is a sample text'

Advanced Examples

PyTorch Model: Quick inference with device auto-detection

import torch
from HMB.PyTorchHelper import CreateTimmModel, LoadModel

# Auto-detect device.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Create and load model.
model = CreateTimmModel("resnet18", numClasses=10, pretrained=True)
model = LoadModel(model, filename="model.pth", device=device)
model.eval()

# Run inference on dummy input.
dummyInput = torch.randn(1, 3, 224, 224).to(device)
with torch.no_grad():
  output = model(dummyInput)
  predictions = torch.softmax(output, dim=1)
  print(f"Top-3 classes: {torch.topk(predictions, 3)[1].cpu().numpy()}")

Segmentation Metrics: Evaluate predictions

import numpy as np
from HMB.ImageSegmentationMetrics import ComputeIoU, ComputeDice

# Dummy binary masks (batch=1, channels=1, H=256, W=256).
preds = np.random.rand(1, 1, 256, 256) > 0.5
targets = np.random.randint(0, 2, size=(1, 1, 256, 256))

# Compute metrics.
iou = ComputeIoU(preds.astype(float), targets.astype(float))
dice = ComputeDice(preds.astype(float), targets.astype(float))

print(f"IoU: {iou:.4f}, Dice: {dice:.4f}")
# Expected output (values will vary):
# IoU: 0.5123, Dice: 0.6789

Common Pitfalls & Tips

Warning

File paths: Always use absolute paths or pathlib.Path to avoid working-directory issues.

from pathlib import Path
imgPath = Path("data") / "images" / "sample.png"

Note

Memory management: For large images or WSIs, process in chunks or use ExtractRandomTilesFromImages to avoid OOM errors.

Tip

Reproducibility: Seed all random sources for consistent results:

from HMB.Initializations import SeedEverything
SeedEverything(seed=42)

Next Steps

  • Explore more examples in HMB/Examples/

  • Read module-specific guides: AgentsHelper, PyTorchHelper

  • Run the test suite: python tests/run_tests.py

  • View the full API reference: modules.html

Getting Help

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

FAQ

See the FAQ for common questions and troubleshooting tips.

License

This project is licensed under the MIT License. See the LICENSE file or the official MIT terms at Open Source Initiative.