Developement

iCluto Core

This part of iCluto handles the main clustering logic. It receives datasets in a form that is parsable to individual clustering algorithms.

flowchart TD bindata[Binary data] --> dataParser txtdata[Text files] --> dataParser dataParser{Data parser} -->|Matrix form| PCA PCA --> |Matrix form| clustAlgs clustAlgs("Clustering algorithms (DBSCAN, AF,K-Means)") -->|Array| Labels Labels --> hists2d(2D Histograms) Labels --> hists1d(1D Histograms)

Data loading

iCluto supports various methods of data loading.

Text files

This is the main format from break junction apparatus, which generates one or more files such as:

183, 0, 2.222747E-8
183, 1, 2.217464E-8
183, 2, 2.400587E-8
183, 3, 2.287278E-8

which is an CSV-like format where the first column indicates the trace ID, the second column position in the trace (note that the position number is always ascending) and the third column stores the value.

trace_ID, position, value

Traces saved in a raw/txt format tend to be 12500 pts long, but we need approx 2500 pts. A method for locating snapback is used and only traces that are 2500 pts long are stored.

Bin files

Preprocessed traces are stored in a MATLAB's Matrix like format. iCluto can load and store traces in this format as well.

Numpy's npy files

TODO!! A default way of storing Numpy matrices.