- version 1.0 (16 March 2006). First public release under the terms of the Lesser-GPL. - version 0.7.4 (3 february 2005): - added option -a to sd2gram to allow the use of perret labels with a dirac bond kernel on bond perretLabels. - added gistOptimize2eval.rb, corrected but in gistLine2eval.rb - version 0.7.3 (2 February 2005): Version 0.7.0 and 0.7.1 should not be used to detect rings. - corrected detectSSSR to allow the correct detection of all cycles in cycles with connecting branches. - corrected an error preventing the correct creation of rings when ring information is read from a V2000JLP mol / sd file. - int descriptor sssrSize.integer containing the number of rings in SSSR is now saved in sd files. - version 0.7.2 (1 February 2005): - in MoleculeUtils::moleculeKernel, limited the recursive call to rlk to a depth of 50. - version 0.7.1 (31 January 2005): - Fixed a loop with itterators because changing a vector<> content in a loop bound by itterators makes problems with altix1 - Replaced most return( NULL ) by throw( CError ) in Atom::nextUnvisitedAtom, Atom::getBondWithTarget and MoleculeSet::findFirstMoleculeWithName. - Fixed small pending memory leaks. - version 0.7.0 (27 January 2005): this version has many improvements. The main addition is the ring detection algorithm taken from Figueras, J. Chem Inf Comput Sci 1996, 36, 986-991. The resulting rings are used to set perretLabels for atoms and bonds. The ring information is saved in files written by chemcpp as an addition to bond information ( non standard V2000JLP, a future improbement will produce and read MDL V3000 files). If this information is present in a file read by chemcpp, then the rings are not recomputed. Ring detection is automatically done when a molecule is read, but not when modifications in the molecule occur after loading. - added option -d to sd2fragmentsd which prevents duplicate removal. - added -i flag to sdsort allowing to sort in descending order instead of ascending order - modified sdqualify to allow setting of activity status - modified sdbinclassifyfromdescriptor added option -k. By default, molecules without activity descriptor are removed from the classified file. Option -k allows to keep them. - added sd2dots: produces dots files (to be used with ezgraphs) from a sd file. Optionaly a morgan index is added to the atomic symbol if wanted. - molecule kernels can now take 2 int parameters - added the possibility to restrict minimum path length when using the fused graph kernel computation or the Kashima method. - added the Ring class and Molecule::detectSSSR which is automatically called after loading a molecule (through Molecule::compute). It detects the smallest set of smallest rings using a fast BFS algorithm. The resulting rings are stored in Molecule and in member Atom and Bond. - added the automatic computation of Perret labels for atoms and bonds (should be computed after ring detection. Automatically done in Molecule::compute(). Carbons joining 2 rings (aromatic or not) get a J appended to the atomic symbol. Carbons joining 3 rings (aromatic or not) get a K appended to the atomic symbol. - added atom kernel atomKernelPerretLabel, atomKernelPerretLabelExternalatrix and bond kernel bondKernelPerretLabel in MoleculeUtils. These kernels make use of the perretLabels. bondKernelPerretLabel give a weight to missmatching bonds. It makes use of the cycle membership to define 1) cyclic bonds 2) aliphatic bonds. Mismatching rings in this respect get kernel value 0. Within cyclic bonds an aromatic bond is considered very close to a single cyclic bond (0.99). Other values are smaller. - modified sd2gram so that atomKernelPerretLabel and bondKernelPerretLabel can be used (option -p). -x also allows to restrict the path to a minimum length (-x 2 starts with paths of length 2). - modified the mol file format written. The bond entries in CTAB now contain not only the usual information in V2000 mol files, but in addition the number of rings this bond is member of (3 characters) followed by the ids of each ring (3 characters each). This is not standard mol file therefore I print V2000JLP in the header block instead of V2000. It would be better to write V3000 files which can store additional information! Will be done in a future version. - version 0.6.2 - debugged sdsplitfromdescriptor - added sdbinclassifyfromdescriptor to set binary activity according to a descriptor value and a threshold. - corrected sd2gistActivity because activity in molecules is not a regular descriptor anymore. - corrected sdsort which was not working correctly. - added sd2gramMismatch (now contains a copy of the main of sd2gramSpectrum). - version 0.6.1 - added a step in MoleculeSet::removeDuplicates to accelerate the procedure (sum the order of bonds in the two compounds) - changed the threshold value for molecular weight similarity in MoleculeSet::remodeDuplicates and moved it to constant.h - version 0.6.0 - reviewed the way activity is stored in class Molecule. The activity should be stored with the special function setActivity and read with getActivity. Changed sd import/export functions. In sd files the last descriptor with name either activity, activity.integer, activity.float or activity.string sets the molecule activity. - since Molecule::getActivity now throws a CError if the molecule has no activity, all calls to this function should treat the error. Corrected toString for that. - corrected mol2gram to allow the use of MoleculeUtils::atomKernelMorganLabel when using the original kashima kernel - changed the short help of sd2gram and mol2gram - added script molStandardize.rb - in Atom::hideHydrogens(): changed implementation because the map.erase function did not work properly on SGI Origin with gcc. - corrected molsd2gram so that the computation with the Kashima method uses morgan indices by default. - corrected bug in JLPIOUtils::readDirectory which failed to return the correct list of files if an extension was specified. For SGI the macro SGI must be defined during compilation. Adapted installation procedure. Please READ INSTALL - added launchJobsInParalell.rb in bin. This is a ruby script to run separate processes (stored in a file) in paralell using a specified number of processors (uses library.jobLauncher.rb, check it if you want to run jobs from a ruby program) - added gistFilter.rb in the bin directory (allows to filter a file produced by gist-classify, for example to remove examples used in the training set) - allowed the possibility to compute the gram matrix to a subset of a test set with sd2gram, mol2gram and molsd2gram - corrected bug in MoleculeSet::gramCompute: when using a test set the kashima parameters where not set for the training set! THIS BUG AFFECTS THE GRAM MATRIX WITH TEST SETS! (this problem did not affect molsd2gram but only sd2gram and mol2gram). - version 0.5.8 (6 december 2004) - modified gram2gistCrossValidation.rb. The size of the test set is adapted so that all objects appear once in a test set. The previous version was unable to complete the required number of test sets in some cases. - version 0.5.7 - added ruby script for cross-validation with gist (gram2gistCrossValidation.rb) - added ruby script for cross-validation with replacements with gist (gram2gistRandomSet.rb) - added ruby libraries for manipulating gram matrices gist files and other objects in the bin folder. Ruby tools in the bin folder need the environment variable CHEMCPPPATH to point to the chemcpp folder. - modified installation explanations (INSTALL file) - version 0.5.6 - added "location" attribute to class Molecule. Stores the location where the molecule is stored (set when loading a mol file, not yet set when loading from other sources like sd files). - sd2gram: replaced default atom kernel by atomKernelMorganLabel (except when -k is specified). - version 0.5.5 (29 Octobre 2004) - sd2gramSpectrum and sd2gramSpectrumNoTottering: added optional restrictions on the paths considered for the kernel calculation. - replaces all occurence of aBondPointer->second->getTarget() by aBondPointer->first in sd2gramSpectrum and sd2gramSpectrumNoTottering. - version 0.5.4 (28 Octobre 2004) - completed the atom matrices in the data folder up to Meitnerium (atomic numer: 109) - added option -z to sd2gramSpectrum and sd2gramSpectrumNoTottering to allow input of KCF files and SD files with generic labels. - Molecule::getMW() now takes an optional boolean to allow silent errors if the molecular weight cannot be computed. - sd2describe now works for kcf and "generic atom labels" sd file - sd2subset now works for kcf and "generic labels" sd file - corrected bug in MoleculeUtils::writeKCF preventing the correct writing of kcf files - tools called sd* should be renamed to set* since many of them work not only on pure sd file but also sd file containing generic atom labels and on kcf files but for the moment I keep the names unchanged - version 0.5.3 (26 Octobre 2004) - added function MoleculeUtils::bondKernelRotable - added option 'd' in sd2gram to allow using a bond kernel based on their rotatability instead of their type. - version 0.5.2 (25 October 2004) - corrected minor bug in sd2gramSpectrum and sd2gramSpectrumNoTottering with respect to arguments passed to writeGramMatrix. - completed autoTest 7 and added autoTest 8 for sd2gramSpectrum - added autoTest 9 for sd2gramSpectrumNoTottering - version 0.5.1 (22 October 2004) - added sd2gramSpectrumNoTottering (the corresponding autotest7 is incomplete) - modified sd2gramSpectrum - added script gist2svlist in bin directory. This scripts extracts support vectors from a gist file - added sdStandardize.rb in bin directory. This script uses openbabel to detect aromatic cycles, remove hydrogen and remove Salts and Solvents (by keeping the largest connected graph). Warnings are stored in a file if several graphs of similar size are present in one compound entry. - added tool sdexclude which allows to remove molecules from a sdfile if they also occur in another sdfile. Based on successive filter: number of atoms / molecular weight / kernel value. - version 0.5.0 (12 October 2004) - changed the way start, stop and transition probabilities are stored from float to double. - changed the way kernels are computed to make use of double instead of float variables. - sd2gramSpectrum now uses the morgan label instead of the atomic number and allows to specify a morgan order for atom labelling. - did some cosmetic in the code but could do some more. - version 0.4.19 (12 October 2004) - included the functions developped by Pierre Mahe for sd2gramSpectrum. - version 0.4.18 ( 8 October 2004 ) - MoleculeSet::kernelCompute: changed the warning messages when two molecules are otrhogonal. A warning message is now only emmited when two orthogonal compounds have identical biological activity. - corrected a bug in the instanciation of elements and in the Atom constructor. This bug conducted to miscalculation of kernel values using the fused graph approach. - added the autoTest directory which contains test procedure to verify basic functions of the chemcpp gram library. Currently contains 5 tests on 5 different ways to compute the gram matrix for chemical compounds (starting from a sd file). - version 0.4.17 ( 27 septembre 2004 ) - sdadddescriptor: can now add gist activity file information to a sd file - sd2sets.rb: the descriptor name containing the activity of compounds can be specified using the -d option - had some troubles with moleculeset.cpp in kdevelop. Had to rescue from an old file. Seems it is not a good idea to split the arguments of a function in multiple lines. The problematic file is temporarily left in the project as moleculeset.cpp.qnmp - version 0.4.16 ( 24 septembre 2004 ) - Changed some options in sd2gistactivity and sd2descriptors - added the ruby script sb2sets.rb to the bin directory - Atom is now daughter of class node. The different methods to load molecules in MoleculeUtils shall set the member label of node. This label can then be used with atomKernelLabel. This makes it possible to load molecules independantly of the definition of all atoms in elements. Pass true as second argument to MoleculeSet::addSD if you DON'T want to use the elements definition (you can then only use atomKernelLabel to compare atoms). sd2gram now use these changes and should really be renamed to set2gram. mol2gram does not yet take advantage of these changes. - version 0.4.15 ( 21 septembre 2004 ) - Corrected bug in all tools producing a gram matrix. When using the fused graph approach the number of iteration to compute the graph kernel was wrongly passed to the kernel function. - in all tools producing a gram matrix. The morganLabels are now set to order 0 by default. - version 0.4.14 ( 15 septembre 2004 ) - Corrected bug in MoleculeSet::diversityBaryMean. This changes the returned value slightly. - version 0.4.13 - added tool sdaddgistclassify. Adds the information obtained by classifying compounds with gist into the sdfile. - added tool sdsplitfromgistclassify. Uses the class column in a file produced by gist-classify to split a sdfile in two separate sdfile. The entries may be sorted by molecular weight. - version 0.4.12 ( 13 August 2004 ) - DataContainer::deleteDescriptor now only deletes the first occurrence of the named descriptor among Int, Float and String Descriptors. - Corrected a bug in mol2fragmentsd and sd2fragmentsd which caused a path error when output directory was specified on the command line. - modified all tools producing a gram matrix (sd2gram, mol2gram, molsd2gram, mutag2gram) to allow choosing 4 different ways to compute the graph kernel: kashima original kernel, powerKernelConverge, powerKernelUntilN, powerKernelOnlyN. - version 0.4.11 ( 13 August 2004 ) - added functions to cut molecules into fragments. See MoleculeSet::addFragmentsToSet and tools mol2fragmentsd and sd2fragmentsd. - finished Molecule copy constructor. Now also copying bonds and descriptors (fastPQ, fastPS and fastPT are not copied). - version 0.4.10 ( 9 August 2004 ) - added gramraw2self - changed installation procedure so that no editing of constant.h is necessary any more. - corrected a bug in MoleculeUtils::readMDLNSDBlock. - version 0.4.9 ( 6 August 2004 ) - modified the MoleculeUtils::readMDLNSDBlock. The function now sets the name of the molecule to the value defined between () on lines starting with > in NSData block of MDL files. - corrected bug in molsd2gram which produced badly formatted gram matrices - corrected bug in molsd2gram which prevented use of any external atom kernel!!! (August 5 2004, 18h (Japan time) ) - version 0.4.8 ( 5 August 2004 ) - two new tools (added a few functions in MoleculeSet and Molecule: - sdfiltermw to extract molecules within a given molecular weight range from a sdfile (this tool is now used in the script sdStandardize.rb) - sdremovesubset to eliminate entries with selected names from a sd file - cosmetic in the source code - version 0.4.7 - corrected bug in set___Descriptor of sdqualify - corrected bug in StringUtils::split (reading out of array boundaries) - completely changed the configure / make / install procedure - version 0.4.6 - Molsd2gram: changed the separator character for the gram matrices to tab (in accordance with the requirements of GIST). - Minor changes to unused class JLEdge to allow compilation with SGI compiler - changed function DataContainer::set___Descriptor: added a boolean variable to allow specify if the function should automatically add missing descriptors. Reviewed the whole code to take this new argument into account. - version 0.4.5 - corrected bug in Molecule destructor - set absolute paths in install_local.sh - minor changes in JLPIOUtils::readDirectory to allow compilation on Solaris systems - corrected duplicate ; in some source file which generate compilation errors on Solaris systems - version 0.4.4 - minor changes in the installation scripts - version 0.4.3 - added #include to all main.cpp in tools. - version 0.4.2 - included automatic installation scripts install_local.sh and install_global.sh instruction to use these scripts can be found in INSTALL - replaced the way pq and ps are stored in Atom they were of type Descriptor and now are stored with the native float type - corrected a bug in Atom constructor: When the non default constructor is used the non default constructor of DataContainer must be called instead of the default one (was leading to memory leak) - corrected a same bug in Molecule constructor: When the non default constructor is used the non default constructor of DataContainer must be called instead of the default one (was leading to memory leak) - version 0.4.1 - added report of orthogonal molecules in MoleculeSet::gramCompute - changed the way the power of the transition matrix is computes for the computation of the graph kernel using the product graph approach - version 0.4 - main is again only for test. - the tools directory of the chemcpp distribution now contains several binary tools to work with mol, sd and tabular files. - added computation of he kernel using the product graph approach - version 0.3: 17 Jan 2003 - added Molecule::readMolfile() to read Molfiles (atom with coordinates and bonds) - main() does something useful: calculates the Kashima gram matrix for the mutag dataset. Arguments can be specified on the command line try chemcpp -h - version 0.1 and 0.2 - first versions, not intended to do something usefull.