Command Reference


Below, we list the commands that are available in SOS.  These commands may be entered on the command line instead of accessed via the GUI.  For first-time users of SOS, it is probably beneficial to use the GUI to create your first optimization and record the output that the GUI produces on the console, which lists each of these commands.  The commands can then be stored and edited in a text editor.  

Required arguments generally must be listed in the order in which they appear below, unless they are explicitly listed as key/value pairs.  Note that many user arguments -- particularly optional ones -- are entered as key-value pairs.  In this case, the 'key' (name of option being set) is enclosed in single-quotes.  The 'value' is also enclosed in single-quotes if it is a string, but it is written directly if it is numeric or logical.

Additionally, most variables have a 'name' parameter which, by convention, should be the same as the actual name of the variable being created (e.g., myPop = population('name', 'myPop')).  This allows the GUI and script files to both display the same variable name.

Finally, this list of commands is intended for end users.  Additional 'public' commands which should generally not be used by end users but which may be of use to programmers or individuals extending SOS are also available and are documented in detail within the source code.  The source code also provides additional detailed documentation related to the commands below.


LAUNCH GUI

ENVIRONMENT INITIALIZATION

(CREATE) POPULATION

SAMPLE

SOS OPTIMIZATION OBJECT

EXPANNEAL

DATAFRAME


Launch GUI:


    sos_gui()
         
Launches the SOS GUI.
        <no args>


Environment Initialization:


    setSeed()

        REQUIRED:
            seed - must be a positive number for that particular value to be used to                 seed the random generator.  Negative numbers set the random   
                number generator to novel random states not linked to those
                numbers and can be used to forcibly generate different random
                sequences each time the algorithm is run.

        EXAMPLES:
            setSeed(123);


(Create) Population:


    population()

        REQUIRED:
            fileName - source file for the population is required.  File must follow the                   SOS dataFrame format specifications. 
            
        OPTIONAL:
            isHeader/logical - param/logical-value pair indicating if the source file
                has a header.  Defaults to false.
            isFormatting/logical -  param/logical-value indicating if the source file
                has formating.  Defaults to false.
            outFile - param/string-value pair indicating the name (including path,
                if other than current directory is desired) of file to save the
                residual population in after optimization has been completed. 
                Outfile is not validated until write.  Defaults to 'null'.
            name/string - string name for the population variable. 
   
        EXAMPLES:
            p1 = population('p1.txt', 'isHeader', true, isFormatting, true, 'name',
                      p1);
            p2 = population('p2.txt');
           

    writeData()
        Writes the data from the object to its specified outFile.
        <no args>

        EXAMPLES:    
            writeData();

Sample:


    sample()


        REQUIRED:
            n - target number of observations for the sample.
        
        OPTIONAL:
            fileName/string - key-value pair points to items to be included in the                     the sample, which must follow the SOS input data specifications. 
            isHeader/logical - param/logical-value pair indicating if the source file
                has a header.  Defaults to false.
            isFormatting/logical - param/logical-value indicating if the source file
                has formating.  Defaults to false.
            outFile - param/string-value pair indicating the name (including path,
                if other than current directory is desired) of file to save the sample in
                after optimization has been completed.  Outfile is not validated until
                write.  Defaults to 'null'.
           name/string - param/string-value pair indicating string name to associate
                with the variable.

        EXAMPLES:
            s1 = sample(5, 'isHeader', true, 'outfile', 's1.txt', 'name', 's1');
            s2 = sample(100);
            s3 = sample(50, 'fileName', 'itemsToInclude.txt', 'outfile', 's3.txt');

    setPop()
       
        REQUIRED:
            population object - the population object with which to link the sample.

        EXAMPLES:
            s1.setPop(p1);


    lockAll()
        Locks the observations in the sample so they cannot be changed during
            the optimiztion.
        <no args>

        EXAMPLES:
            s1.lockAll();

    unlockAll()
        Unlocks all of the observations in the sample so that they can be changed              during the optimization.
        <no args>

        EXAMPLES:
            s1.unlockAll();

    writeData()
        Writes the data from the object to its specified outFile.
        <no args>

        EXAMPLES:
            writeData();

SOS Optimization Object:


    sos()
        Creates the SOS optimization object.

        OPTIONAL:
            'maxIt'/integer - maximum number of iterations to run the optimizer
            'pSwapFunction'/string - name of pSwapFunction to use.
            'targSampleCandSelectMethod'/string - name of target sample
                candidate selection method
            'feederdfCandSelectMethod'/string - name of feeder dataframe
                candidate selection method (see manuscript for options)
            'reportInterval'/int - number of iterations between general cost reports
            'stopFreezeIt'/int - operationalized number of sequential iterations cost
                value must remain the same for state to be considered 'frozen'
            'statInterval'/int - number of iterations between stat reports
            'statTestReportStyle'/string - style of stat reports ('short' or 'full')
            'blockSize'/int - number of iterations in a block; used to determine length
                of deltaCostLog

        EXAMPLES:
            mySOS = sos('maxIt', 100000, 'reportInterval', 500);
            newSOS = sos('maxIt', 1000000, 'feederdfCandSelectMethod',
                randomPopulationAndSample', 'reportInterval', 1000, 'stopFreezeIt',
                5000, 'statInterval', 5000, 'statTestReportStyle', 'full');
            mySOS2 = sos('blockSize', 10000);

    addConstraint()
       
        REQUIRED:
            <variable>
       
        OPTIONAL:
            <variable>

        The input to the constraint creation method varies widely depending on the
        type of constraint that is being created.  Users are advised to create an
        instance of the desired type of constraint via the GUI and use that as a
        template for using this command.  Alternatively, users may identify the
        specific constraints that are available by following the source code
        execution starting in the 'createConstraint' method that is part of the
        genericConstraint class (see source code).  

        EXAMPLES:
            Click here for examples of each type of constraint and explanations of
            the arguments required for each.  

    addkstest()

        Adds a Kolmogorov-Smirnov test to assess whether a sample distribution is
        uniform. This statistical test is to be used in conjunction with entropy
        constraints.

        REQUIRED:
            name - name identifying the statistical test
            type - currently, only 'matchUniform' for K-S tests
            sample1 - name of the sample over which the test will be computed
            s1ColName - name of the column in the sample that contains the data
                for the statistical test

        OPTIONAL:
            pdSpread - determines whether entropy measure is calculated over the
                range of values present in the sample or population ('sample' or
                'allItems').
            nbin - the number of bins into which the data have been placed.
                Defaults to the number of items in the sample.
            desiredpvalCondition - the comparison operator relevent for the
                statistical test.  Can be '<=' (less than or equal to), '=>' (greater
                than or equal to), or 'N/A' (when no criterion is desired to end the
                optimization but the user still wants the statistical test to be
                performed).  Defaults to 'N/A'.
            desiredpval - the relevent p-value criterion.  Defaults to 0.05.
            tail - tail of test.  Can be 'left' (sample1 < sample2), 'right' (sample2 <
                sample1), or 'both' (two-tailed).  Defaults to 'both'.

        EXAMPLES:
            mySOS.addkstest('name', 'ksTest1', 'type', 'matchUniform', 'sample1',
                mySample1, 's1ColName', 'frequency');
            newSOS.addkstest('name', 'newTest', 'type', 'matchUniform', 'sample1',
                stimSample1, 's1ColName', 'numLetters', 'pdSpread', 'allItems', 'nbin',
                10, 'desiredpvalCondition', '=>', 'despiredpval', 0.5, tail, 'both');
       
    addttest()
        Adds a t-test to assess whether two samples are significantly different
        from each other on some dimension.  Can be a one-sample, independent
        samples, or paired samples t-test.  The result of the t-test when it is run 
        can either be "FAIL", "PASS", or PTHRESH, the latter case denoting when
        the statistical test failed but the difference between the conditions fell
        within a pre-specified threshold (this is particularly useful in paired sample
        tests, wherein both the mean and variance are minimized by a pairwise
        minimization constraint).  

        REQUIRED:
            name - name identifying the statistical test
            type - type of t-test to perform.  Can be 'single' (one-sample t-test),
                'independent' (independent samples t-test), or 'paired' (paired
                samples t-test).
            sample1 - name of the first sample relevent for the t-test.
            sample2 - name of the second sample relevent for the t-test.  This
                parameter is not required for one-sample t-tests.
            s1ColName - name of the column in the first sample that contains the
                data for the t-test
            s2ColName - name of the column in the second sample that contains
                the data for the t-test.  This parameter is not required for
                one-sample t-tests.
       
        OPTIONAL:
            targValue - used for one-sample t-tests only.  Specifies the value to
                which the data should be compared.
            desiredpvalCondition - the comparison operator relevent for the
                statistical test.  Can be '<=' (less than or equal to), '=>' (greater
                than or equal to), or 'N/A' (when no criterion is desired to end the
                optimization but the user still wants the statistical test to be
                performed).  Defaults to 'N/A'.
            desiredpval - the relevent p-value criterion.  Defaults to 0.05.
            tail - tail of test.  Can be 'left' (sample1 < sample2), 'right' (sample2 <
                sample1), or 'both' (two-tailed).  Defaults to 'both'.   
           thresh - threshold value such that the test will be passed if the                             difference between the conditions is within this range even if the
                 statistical test fails.                   
   
        EXAMPLES:
            mySOS.addttest('name', 'ttest1', 'type', 'single', 'sample1', mySample1,
                's1ColName', 'frequency', 'targValue', 0, 'desiredpvalCondition', '<=',
                'desiredpval', 0.05, 'tail', 'right');
            newSOS.addttest('name', 'pairedTest1', 'type', 'paired', 'sample1',
                stimSample1, 'sample2', stimSample2, 's1ColName', 'frequency'
                's2ColName', 'frequency', 'desiredpvalCondition', '=>',
                desiredpval, .5);
            newSOS.addttest('name', 'pairedTest1', 'type', 'paired', 'sample1',
                stimSample1, 'sample2', stimSample2, 's1ColName', 'frequency'
                's2ColName', 'frequency', 'desiredpvalCondition', '=>',
                desiredpval, .5, 'thresh', 5);


    addztest()
        Adds a z-test to determine whether or not a correlation between samples
        matches a specified value.

        REQUIRED:
            name - name identifying the statistical test.
            type - currently, only 'matchCorrel' for z-tests.
            sample1 - name of the first sample relevent for the z-test.
            sample2 - name of the second sample relevent for the z-test.
            s1ColName - name of the column in the first sample that contains the
                data for the z-test
            s2ColName - name of the column in the second sample that contains
                the data for the z-test.

        OPTIONAL:
            targVal - the value to which the correlation between the two samples
                should be compared (e.g., 0 = no correlation between samples and 1
                = perfect positive correlation between samples).  Must be in range
                (-1,1).
            desiredpvalCondition - the comparison operator relevent for the
                statistical test.  Can be '<=' (less than or equal to), '=>' (greater
                than or equal to), or 'N/A' (when no criterion is desired to end the
                optimization but the user still wants the statistical test to be
                performed).  Defaults to 'N/A'.
            desiredpval - the relevent p-value criterion.  Defaults to 0.05.
            tail - tail of test.  Can be 'left' (sample1 < sample2), 'right' (sample2 <
                sample1), or 'both' (two-tailed).  Defaults to 'both'.

        EXAMPLES:
            mySOS.addztest('name', 'corrTest1', 'type', 'matchCorrel', 'sample1',
                mySample1, 'sample2', mySample2, 's1ColName', 'frequency',
                's2ColName', 'frequency', 'targVal', 0.0, 'desiredpvalCondition', '=>',
                desiredpval, 0.5);
            newSOS.addztest('name', 'zTest1', 'type', 'matchCorrel', 'sample1',
                stimSample1, 'sample2', stimSample2, 's1ColName', 'letters',
                's2ColName', 'letters', 'targVal', -.99999);

     createHistory()
        Records the detailed history of the optimization. Can be saved later and is
            used to produce GUI output
        <no args>   
   
        EXAMPLES:
            mySOS.createHistory();

    createPlots()

        REQUIRED:
            dispIt - the number of iterations to show on the plot screen.

        EXAMPLES:
            mySOS.createPlots(10000);

    deltaCostPercentiles()
        Displays the breakdown of deltaCost values per decile and other important
            percentiles
        <no args>

        EXAMPLES:
            mySOS.deltaCostPercentiles();

    disableBufferedHistoryWrite()
        Disables buffering the history information to a file after each new entry
        <no args>

        EXAMPLES:
            mySOS.disableBufferedHistoryWrite();

    dispCost()

        Displays the current cost for each constraint, and total cost
        <no args>

        EXAMPLES:
            mySOS.dispCost();

    enableBufferedHistoryWrite()
        Enables buffering the history information to a file after each new entry
        <no args>

        EXAMPLES:
            mySOS.enableBufferedHistoryWrite();

    initCost()
        (Re)initializes cost
        <no args>

        EXAMPLES:
            mySOS.initCost();

    initFillSamples()
        Fills each sample with observations taken at random from the population.
            Does not replace items that were "read in" to the sample initially.
        <no args>

        EXAMPLES:
            mySOS.initFillSamples();

    normalizeData()
        Normalizes the data prior to optimization.  This is called internally before
            each optimization and so is generally not required.

    optimize()
        Runs the optimization.  Returns 1 if optimization ends because statistical
            criteria were passed, 0 otherwise.
        <no args>
   
        OPTIONAL:
            numIt = number of iterations to run (otherwise uses default or value
                from object initialization)

        EXAMPLES:
            success = mySOS.optimize();
            mySOS.optimize();
            mySOS.optimize(100000);

    setAnnealSchedule()
        Sets the anneal schedule for the SOS object.

        REQUIRED:
            'schedule'/scheduleName - param/value pair indicating the name of the
                anneal schedule to use.  Defaults to 'greedy', alternatively 'exp' for
                annealing
     
        OPTIONAL:
            As required by specific schedule to create.  See its constructor or the
                GUI for details.

        EXAMPLES:
            mySOS.setAnnealSchedule();
                (greedy annealing, i.e., temperature = 0)
            mySOS.setAnnealSchedule('schedule','greedy');
                (greedy annealing, i.e., temperature = 0)
            mySOS.setAnnealSchedule('schedule', 'exp', 'pDecrease', .40);
                (exponentially decaying annealing with pDecrease of .4)

    setBufferedHistoryOutfile()

        Writes the history on-line, one update at a time, to outfile.  If outfile
            exists, it will be overridden.
   
        REQUIRED:
            outfile - a single argument specifying the file to which the history will be
                written

        EXAMPLES:
            mySOS.setBufferedHistoryOutfile('myHistory.txt');

    setFeederdfCandidateSelectionMethod()
        Determines how candidate replacement items will be selected from the
            population.
   
        REQUIRED:
            methodName - string name of method to use; either 'randomPopulation'
                or 'randomPopulationAndSample'

        EXAMPLES;
            mySOS.setFeederdfCandidateSelectionMethod('methodName',
                'randomPopulation');

    writeAll()
        Writes all samples to their specified output files.
        <no args>

        EXAMPLES:
            mySOS.writeAll();

    writePopulations()
        Writes all populations to their specified output files.
        <no args>
   
        EXAMPLES:
            mySOS.writePopulations();

    writeSamples()
        Writes all samples to their specified output files.
        <no args>

        EXAMPLES:
            mySOS.writePopulations();

    writeHistory()
        Writes entire stored history to specified output file.
        <no args>

        EXAMPLES:
            mySOS.writeHistory();


expAnnEal (Exponentially decay temperature 

    annealing):


    maxpDecrease()
        Calculates the maximum pDecrease value for exponential annealing based
            on the equations listed in the manuscript and supplemental materials.
            See the source code documentation for additional details.
   
        REQUIRED:
            initDeltaCost - initial value of deltaCost.
            finalDeltaCost - should be smaller than initDeltaCost; if not see source
                code documentation.
            nStep - should be at least 3 steps.

        EXAMPLES:
            expAnneal.maxpDecrease(100, 5, 10);
           

    numSteps()
        Calculates the number of steps to go from initDeltaCost to finalDeltaCost
            based on a specified pDecrease.
   
        REQUIRED:
            initDeltaCost - initial value of deltaCost.
            finalDeltaCost - should be smaller than initDeltaCost; if not see source
                code documentation.
            pDecrease - must be greater than 0 and less than 1.
   
        EXAMPLES;
            expAnneal.numSteps(100, 5, 0.5);


dataFrame (parent of population and sample):


    overlap()
        Calculates and returns the percent of overlap between two dataframes
            (usually two samples).

        REQUIRED:
            df1 - first dataframe.
            df2 - second dataframe.

        EXAMPLES:
            dataFrame.overlap(df1, df2)
               (note here, dataFrame refers to the class name as this is a static
                method)


Blair Armstrong, Christine Watson, David Plaut, 2011-2012