sasnets package

Submodules

sasnets.analysis module

File used for analysis of SASNet networks using various techniques, including dendrograms and confusion matrices.

sasnets.analysis.cpredict(model, x, l=71, pl=5000)[source]

Runs a Keras model to create a confusion matrix.

Parameters:
  • model – Model to use.
  • x – A list of x values to predict on.
  • l – The number of input models.
  • pl – The number of data iterations per model.
Returns:

A confusion matrix of percentages

sasnets.analysis.dcluster(model, x, names)[source]

Displays a dendrogram clustering based on the confusion matrix.

Parameters:
  • model – The model to predict on.
  • x – A list of x values to predict on.
  • names – List of all model names.
Returns:

The dendrogram object.

sasnets.analysis.fit(mn, q, iq)[source]

Fit resulting data using bumps server. Currently unimplemented.

Parameters:
  • mn – Model name.
  • q – List of q values.
  • iq – List of I(q) values.
Returns:

Bumps fit.

sasnets.analysis.load_from(path)[source]

Loads a model from the specified path.

Parameters:path – Relative or absolute path to the .h5 model file
Returns:The loaded model.
sasnets.analysis.main(args)[source]

Main method. Called from command line; uses argparse.

Parameters:args – Arguments from command line.
Returns:None.
sasnets.analysis.predict(model, x, names, num=5)[source]

Runs a Keras model to predict based on input.

Parameters:
  • model – The model to use.
  • x – The x inputs to predict from.
  • names – A list of all model names.
  • num – The top num probabilities and models will be printed.
Returns:

None

sasnets.analysis.predict_and_val(model, x, y, names)[source]

Runs the model on the input datasets and compares the results with the correct labels provided from y.

Parameters:
  • model – The model to evaluate.
  • x – List of x values to predict on.
  • y – List of y values to predict on.
  • names – A list of all possible model names.
Returns:

Two lists, il and nl, which are the indices of the model and its proper name respectively.

sasnets.analysis.rpredict(model, x, names)[source]

Same as predict, but outputs names only.

Parameters:
  • model – The model to use.
  • x – List of x to predict on.
  • names – List of all model names.
Returns:

List of predicted names.

sasnets.analysis.tcluster(model, x, names)[source]

Displays a t-SNE cluster coloured by the model predicted labels.

Parameters:
  • model – Model to use.
  • x – List of x values to predict on.
  • names – List of all model names.
Returns:

The tSNE object that was plotted.

sasnets.hyp module

sasnets.hyp.data()[source]
sasnets.hyp.model(xtrain, ytrain, xtest, ytest)[source]

sasnets.sas_io module

Collection of utility IO functions used in SASNet. Contains the read from disk functions as well as the SQL generator.

sasnets.sas_io.read_h(l)[source]

Read helper for parallel read.

Parameters:l – A list of filenames to read from.
Returns:Three lists, Q, IQ, and Y, corresponding to Q data, I(Q) data, and model labels respectively.
sasnets.sas_io.read_parallel_1d(path, pattern='_eval_')[source]

Reads all files in the folder path. Opens the files whose names match the regex pattern. Returns lists of Q, I(Q), and ID. Path can be a relative or absolute path. Uses Pool and map to speed up IO. WIP. Uses an excessive amount of memory currently. It is recommended to use sequential on systems with less than 16 GiB of memory.

Calling parallel on 69 150k line files, a gc, and parallel on 69 5k line files takes around 70 seconds. Running sequential on both sets without a gc takes around 562 seconds. Parallel peaks at 15 + GB of memory used with two file reading threads. Sequential peaks at around 7 to 10 GB. Use at your own risk. Be prepared to kill the threads and/or press the reset button.

Assumes files contain 1D data.

Parameters:
  • path – Path to the directory of files to read from.
  • pattern – A regex. Only files matching this regex are opened.
sasnets.sas_io.read_seq_1d(path, pattern='_eval_', typef='aggr', verbosity=False)[source]

Reads all files in the folder path. Opens the files whose names match the regex pattern. Returns lists of Q, I(Q), and ID. Path can be a relative or absolute path. Uses a single thread only. It is recommended to use read_parallel_1d(), except in hyperopt, where map() is broken.

typef is one of ‘json’ or ‘aggr’. JSON mode reads in all and only json files in the folder specified by path. aggr mode reads in aggregated data files. See sasmodels/generate_sets.py for more about these formats.

Assumes files contain 1D data.

Parameters:
  • path – Path to the directory of files to read from.
  • pattern – A regex. Only files matching this regex are opened.
  • typef – Type of file to read (aggregate data or json data).
  • verbosity – Controls the verbosity of output.
sasnets.sas_io.sql_dat_gen(dname, mname, dbname='sas_data', host='127.0.0.1', user='sasnets', encoder=None)[source]

A Pythonic generator that gets its data from a PostgreSQL database. Yields a (iq, diq) list and a label list.

Parameters:
  • dname – The data table name to connect to.
  • mname – The metadata table name to connect to.
  • dbname – The database name.
  • host – The database host.
  • user – The username to connect as.
  • encoder – LabelEncoder for transforming labels to categorical ints.
Returns:

None

sasnets.sasnet module

SASNets main file. Contains the main neural network code used for training networks.

SASNets uses Keras and Tensorflow for the networks. You can change the backend to Theano or CNTK through the Keras config file.

sasnets.sasnet.main(args)[source]

Main method. Takes in arguments from command line and runs a model.

Parameters:args – Command line args.
Returns:None.
sasnets.sasnet.oned_convnet(x, y, xevl=None, yevl=None, random_s=235, verbosity=False, save_path=None)[source]

Runs a 1D convolutional classification neural net on the input data x and y.

Parameters:
  • x – List of training data x.
  • y – List of corresponding categories for each vector in x.
  • xevl – List of evaluation data.
  • yevl – List of corresponding categories for each vector in x.
  • random_s – Random seed. Defaults to 235 for reproducibility purposes, but should be set randomly in an actual run.
  • verbosity – Either true or false. Controls level of output.
  • save_path – The path to save the model to. If it points to a directory, writes to a file named the current unix time. If it points to a file, the file is overwritten.
Returns:

None.

sasnets.sasnet.sql_net(dn, mn, verbosity=False, save_path=None, encoder=None, xval=None, yval=None)[source]

A 1D convnet that uses a generator reading from a Postgres database instead of loading all files into memory at once.

Parameters:
  • dn – The data table name.
  • mn – The metadata table name.
  • verbosity – The verbosity level.
  • save_path – The path to save model output and weights to.
  • encoder – A LabelEncoder for encoding labels.
  • xval – A list of validation data, x version.
  • yval – A list of validation data, label version.
Returns:

None

sasnets.sasnet.trad_nn(x, y, xevl=None, yevl=None, random_s=235)[source]

Runs a traditional MLP categorisation neural net on the input data x and y.

Parameters:
  • x – List of training data x.
  • y – List of corresponding categories for each vector in x.
  • random_s – Random seed. Defaults to 235 for reproducibility purposes, but should be set randomly in an actual run.
  • xevl – Evaluation data for model.
  • yevl – Evaluation data for model.
Returns:

None

Module contents