iCLUTO

About iCLUTO

iCLUTO stands for improved CLUstering TOolkit. It provides various functions that help with data analysis in Break Junction experiment. It clusters conductance traces (i.e. vectors).

iCLUTO uses both main machine learning approches:

  • unsupervised machine learning
  • supervised machine learning

Supervised requires a labled dataset, which is usually very hard to obtain. On the other hand an unsupervised approach requires only conductance traces. More on both approaches in Clustering Algorithms

Installing and Running iCLUTO

As of now, you can run icluto cluster for running Unsupervised clustering.

How to install

Dependencies (Fedora/Linux)

On Fedora Workstation one might install "Development Tools" group using dnf:

dnf group install development-tools
dnf install g++ libstdc++-devel python3-devel

iCLUTO runs on Python 3.11 or newer.

  1. Download the .whl package HERE.
  2. Download the install.sh script from the repository.
  3. Run the installation script:
chmod +x install.sh
./install.sh PATH/TO/icluto-*.whl

This script will create a virtual environment in ~/.icluto and a symlink in ~/.local/bin/icluto.

Manual Installation (Alternative)

If you prefer manual steps: 1. Create a virtual environment venv:

python3 -m venv venv
  1. Activate:
source venv/bin/activate
  1. Install the package using pip:
pip install PATH/TO/icluto-*.whl

One can verify that iCluto installed successfully with:

icluto --help

Tested on: - Fedora Python 3.13.9 - Ubuntu Python 3.12.3

iCluto installation for developers

We are using Poetry as dependency and package management. Get Poetry by running

curl -sSL https://install.python-poetry.org | python3 -

Get iCLUTO from Gitlab

git clone https://gitlab.fel.cvut.cz/klimtoli/icluto-cli.git

Go to iCluto's directory

cd icluto-cli

and run Poetry install

poetry install

Update iCluto

To update iCluto to a newer version, use the update command:

icluto update PATH/TO/NEW/icluto-*.whl

Cluster Installation (SLURM)

For instructions on how to install and run iCluto on a high-performance computing cluster using SLURM, see the SLURM Guide.

How to run

Make sure your virtual environment is activated.

Run unsupervised clustering

Run with

icluto cluster -cfg config_file.yaml [--no-plot]

Use --no-plot to disable plotting of histograms.

Interactive Inspection

After clustering, you can interactively inspect the results:

icluto inspect <output_folder>

See Interactive Inspection for more details.

Data Conversion

Convert between different trace formats:

icluto convert <input_files>... <output_file>

See Data Conversion for more details.

Plotting

Generate 2D and 1D histograms:

icluto plot <traces_file> [output_dir]

See Plotting for more details.

Merging Files

Combine multiple trace files:

icluto merge <input_files>... <output_file>

See Merge Command for more details.

An example of config file in YAML

# Comments begins with # and everything after # is ignored
# Example on how a config file is structured.
traces:
  # If loading .txt files, one can list them using dashes
  # - /Users/oliverklimt/uni/BJM/data_raw/2204IVC.txt
  # - /Users/oliverklimt/uni/BJM/data_raw/2204IVC4.txt
  # Files .npy, .bin can be loaded directly
  - /Users/oliverklimt/uni/BJM/icluto-cli/experiment/new_iter/2204_tmp_filtered.npy
output folder: ./out

# Performs tunneling current analysis
analyze: True

# Saving only labels can save space
save:
  traces: False
  labels: True
  filtered out traces: True
  good traces: True
  format: "npy" # "bin", "txt" or "npy"

plot: True
plots:
  size x: 9 # in inches
  size y: 9 # in inches
  formats: ["png", "svg"] # List of formats to save plots in (e.g., png, svg, pdf, jpg)
  unify after limit: True # Conductance values after LIMIT (typically 1e-6) are replaced by the limit value.
  histograms:
    number of bins x: 30
    number of bins y: 30

features:
  - type: histogram
    histogram vector length: 350
    PCA dim: 32
  - type: traces
    max length: "auto"
    PCA dim: 64

k-means:
  run: True
  k-min: 2
  k-max: 5
  n-init: 10 # number of initializations of k means algorithm

BIRCH:
  run: False
 # TODO: BIRCH params

dbscan:
  run: True
  sweep epsilon start: 0.01
  sweep epsilon stop: 2
  sweep number of points: 3 # has to be an integer not float!
  min cluster size: 35

hdbscan:
  run: False
  min cluster size: 40