iCLUTO
About iCLUTO
iCLUTO stands for improved CLUstering TOolkit. It provides various functions that help
with data analysis in Break Junction experiment.
It clusters conductance traces (i.e. vectors).
iCLUTO uses both main machine learning approches:
- unsupervised machine learning
- supervised machine learning
Supervised requires a labled dataset, which is usually very hard to obtain. On the other hand an unsupervised approach requires only conductance traces. More on both approaches in Algorithms
Installing and Running iCLUTO
As of now, you can run icluto cluster for running Unsupervised clustering.
How to install
Fedora
On Fedora Workstation one might install "Development Tools" group using dnf:
sudo dnf group install development-tools
the mentioned group does not have g++ and python3-devel so another step is required:
sudo dnf install g++ libstdc++-devel python3-devel
For users
iCLUTO runs on Python 3.9 or newer.
Download the .whl package HERE.
Create a virtual environment venv.
python3 -m venv venv
Activate
source venv/bin/activate
Install the package using pip
pip3 install PATH/TO/icluto-*.whl
One can verify that iCluto installed successfully with
icluto --help
Tested on: - Fedora Python 3.13.9 - Ubuntu Python 3.12.3
iCluto installation for developers
We are using Poetry as dependency and package management. Get Poetry by running
curl -sSL https://install.python-poetry.org | python3 -
Get iCLUTO from Gitlab
git clone https://gitlab.fel.cvut.cz/klimtoli/icluto-cli.git
Go to iCluto's directory
cd icluto-cli
and run Poetry install
poetry install
How to run
Make sure your virtual environment is activated.
Run unsupervised clustering
Run with
icluto cluster -cfg config_file.yaml [--no-plot]
Use --no-plot to disable plotting of histograms.
An example of config file in YAML
# Comments begins with # and everything after # is ignored
# Example on how a config file is structured.
traces:
# If loading .txt files, one can list them using dashes
# - /Users/oliverklimt/uni/BJM/data_raw/2204IVC.txt
# - /Users/oliverklimt/uni/BJM/data_raw/2204IVC4.txt
# Files .npy, .bin can be loaded directly
- /Users/oliverklimt/uni/BJM/icluto-cli/experiment/new_iter/2204_tmp_filtered.npy
output folder: ./out
# Performs tunneling current analysis
analyze: True
# Saving only labels can save space
save:
traces: False
labels: True
filtered out traces: True
good traces: True
format: "npy" # "bin", "txt" or "npy"
plot: True
plots:
size x: 9 # in inches
size y: 9 # in inches
unify after limit: True # Conductance values after LIMIT (typically 1e-6) are replaced by the limit value.
histograms:
number of bins x: 30
number of bins y: 30
features:
- type: histogram
histogram vector length: 350
PCA dim: 32
- type: traces
max length: "auto"
PCA dim: 64
k-means:
run: True
k-min: 2
k-max: 5
n-init: 10 # number of initializations of k means algorithm
BIRCH:
run: False
# TODO: BIRCH params
dbscan:
run: True
sweep epsilon start: 0.01
sweep epsilon stop: 2
sweep number of points: 3 # has to be an integer not float!
min cluster size: 35
hdbscan:
run: False
min cluster size: 40