StochHMM  v0.34
Flexible Hidden Markov Model C++ Library and Application
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Public Member Functions | Private Member Functions | Private Attributes | Friends
StochHMM::emm Class Reference

#include <emm.h>

List of all members.

Public Member Functions

 emm ()
 Create an emission.
 emm (std::string &)
 Constructs an empty emission.
 ~emm ()
 Constructs emission from a string;.
bool parse (std::string &, tracks &, weights *, StateFuncs *)
bool parse (std::string &txt, track *trk)
void setRealNumber ()
 Set the emission to a Real Number.
void setComplement ()
 Set the emission to be the complement 1-P of given value.
void setLexicalFunction (emissionFunc *)
bool isReal ()
bool isComplement ()
 Check to see if emission will return the complement (1-P) value of emission.
double get_emission (sequences &, size_t)
double get_emission (sequence &, size_t)
emissionFuncParamgetExtFunction ()
void print ()
 Print the string representation of the emission to stdout.
std::string stringify ()
lexicalTablegetTables ()
bool isSimple ()
bool isComplex ()

Private Member Functions

bool _processTags (std::string &, tracks &, weights *, StateFuncs *)
 Parses the Emission Function Tag information from the text model definition.

Private Attributes

bool real_number
bool continuous
bool multi_continuous
bool complement
trackrealTrack
lexicalTable scores
bool function
emissionFuncParamlexFunc
pdfFuncpdf
std::string pdfName
std::vector< double > * dist_parameters
multiPdfFuncmultiPdf
std::string multiPdfName
size_t number_of_tracks
std::vector< track * > * trcks
std::vector< size_t > * track_indices
std::vector< double > * pass_values
emissionFuncParamtagFunc

Friends

class state
class model

Detailed Description

Emissions for model Contains the emission definition. Each emissions contains the probability, the log(p(x), and counts Counts are used for calculating lower order emissions from higher order. This is only applicable at the beginning of the sequence. Each emission depends on some track or function, an emission can have multiple tracks. Or in other words output a single character from each track it is associated with. Tracks can be either alphabetic, real numbers values. Emissions can also call an external function that is user defined. If ambiguity is defined in the alphabet, the emission score can be defined as such If ambiguity is not defined the returned value will be -INFINITY

Definition at line 58 of file emm.h.


Constructor & Destructor Documentation

StochHMM::emm::emm ( )

Create an emission.

Definition at line 31 of file emm.cpp.

References complement, continuous, dist_parameters, lexFunc, multi_continuous, multiPdf, number_of_tracks, pass_values, pdf, real_number, realTrack, tagFunc, track_indices, and trcks.

{
real_number = false;
complement = false;
continuous = false;
realTrack = NULL;
function = false;
lexFunc = NULL;
pdf = NULL;
multiPdf = NULL;
trcks = NULL;
pass_values = NULL;
track_indices = NULL;
tagFunc = NULL;
}
StochHMM::emm::emm ( std::string &  )

Constructs an empty emission.

StochHMM::emm::~emm ( )

Constructs emission from a string;.

Destroy an emission.

Definition at line 55 of file emm.cpp.

References lexFunc, and tagFunc.

{
delete lexFunc;
delete tagFunc;
function = false;
lexFunc = NULL;
tagFunc = NULL;
}

Member Function Documentation

bool StochHMM::emm::_processTags ( std::string &  txt,
tracks trks,
weights wts,
StateFuncs funcs 
)
private

Parses the Emission Function Tag information from the text model definition.

Definition at line 640 of file emm.cpp.

References StochHMM::extractTag(), StochHMM::emissionFuncParam::getTrackName(), StochHMM::tracks::isTrackDefined(), StochHMM::emissionFuncParam::parse(), StochHMM::stringList::size(), StochHMM::stringList::stringify(), and tagFunc.

Referenced by parse().

{
stringList lst = extractTag(txt);
if (lst.size() == 0){
return true;
}
tagFunc=new(std::nothrow) emissionFuncParam();
if (tagFunc==NULL){
std::cerr << "OUT OF MEMORY\nFile" << __FILE__ << "Line:\t"<< __LINE__ << std::endl;
exit(1);
}
if (!tagFunc->parse(lst, trks, wts,funcs)){
std::cerr << "Couldn't parse Emission Tag: " << lst.stringify() << std::endl;
return false;
}
std::string trackName = tagFunc->getTrackName();
//Don't need if we check in the parsing of emissionFuncParam....
if (!trks.isTrackDefined(trackName)){
std::cerr << "No Track defined with name:\t" << trackName << "\nEmission Tag:\n" << txt << std::endl;
return false;
}
return true;
}
double StochHMM::emm::get_emission ( sequences seqs,
size_t  pos 
)

Calculate the emission value given a position in the sequence *If emission is a real number it will return the value from the real number track *If emission is a sequence then it will get the value and return it

Parameters:
seqsSequences to use
iterPosition within the sequences
Returns:
double log(prob) value of emission

Definition at line 681 of file emm.cpp.

References complement, continuous, dist_parameters, StochHMM::emissionFuncParam::evaluate(), StochHMM::track::getIndex(), StochHMM::lexicalTable::getValue(), lexFunc, multi_continuous, number_of_tracks, pass_values, real_number, realTrack, StochHMM::sequences::realValue(), scores, tagFunc, and track_indices.

Referenced by StochHMM::matrixPosition::getEmissionValue().

{
double final_emission(-INFINITY);
final_emission=seqs.realValue(realTrack->getIndex(),pos);
if (complement){
final_emission=log(1-exp(final_emission));
}
}
else if (function){
final_emission=lexFunc->evaluate(seqs, pos);
}
else if (multi_continuous){
//Get all values from the tracks
for (size_t i = 0; i < number_of_tracks ; ++i){
(*pass_values)[i] = seqs.realValue((*track_indices)[i], pos);
}
final_emission = (*multiPdf)(pass_values, dist_parameters);
if (complement){
final_emission=log(1-exp(final_emission));
}
}
else if (continuous){
final_emission = (*pdf)(seqs.realValue(realTrack->getIndex(),pos),dist_parameters);
if (complement){
final_emission=log(1-exp(final_emission));
}
}
else{
final_emission=scores.getValue(seqs, pos);
}
if (tagFunc!=NULL){
final_emission+=tagFunc->evaluate(seqs, pos);
}
return final_emission;
}
double StochHMM::emm::get_emission ( sequence seq,
size_t  pos 
)

Calculate the emission value given a position in the sequence If emission is a real number it will return the value from the real number track *If emission is a sequence then it will get the value and return it

Parameters:
seqsSequences to use
iterPosition within the sequences
Returns:
double log(prob) value of emission

Definition at line 733 of file emm.cpp.

References complement, continuous, dist_parameters, StochHMM::emissionFuncParam::evaluate(), StochHMM::lexicalTable::getValue(), lexFunc, multi_continuous, number_of_tracks, pass_values, real_number, StochHMM::sequence::realValue(), scores, and tagFunc.

{
double final_emission;
final_emission=seq.realValue(pos);
if (complement){
final_emission=log(1-exp(final_emission));
}
}
else if (function){
final_emission=lexFunc->evaluate(seq, pos);
}
else if (multi_continuous){
//Get all values from the tracks
for (size_t i = 0; i < number_of_tracks ; ++i){
(*pass_values)[i] = seq.realValue(pos);
}
final_emission = (*multiPdf)(pass_values, dist_parameters);
if (complement){
final_emission=log(1-exp(final_emission));
}
}
else if (continuous){
final_emission = (*pdf)(seq.realValue(pos),dist_parameters);
if (complement){
final_emission=log(1-exp(final_emission));
}
}
else{
final_emission=scores.getValue(seq, pos);
}
if (tagFunc!=NULL){
final_emission+=tagFunc->evaluate(seq, pos);
}
return final_emission;
}
emissionFuncParam* StochHMM::emm::getExtFunction ( )
inline

Get the external Functions defined for the emission

Returns:
externalFuncs*

Definition at line 92 of file emm.h.

References tagFunc.

{return tagFunc;};
lexicalTable* StochHMM::emm::getTables ( )
inline

Definition at line 99 of file emm.h.

References scores.

{return &scores;};
bool StochHMM::emm::isComplement ( )
inline

Check to see if emission will return the complement (1-P) value of emission.

Definition at line 85 of file emm.h.

References complement.

{return complement;};
bool StochHMM::emm::isComplex ( )
inline

Definition at line 105 of file emm.h.

References tagFunc.

{
if (function || tagFunc){return true;}
return false;
}
bool StochHMM::emm::isReal ( )

Is the emission from a real Number track

Returns:
true if track is real number track
false if track is alphanumerical track

Definition at line 783 of file emm.cpp.

References StochHMM::track::getAlphaType(), StochHMM::REAL, real_number, and realTrack.

{
return true;
}
else{
return false;
}
}
bool StochHMM::emm::isSimple ( )
inline

Definition at line 100 of file emm.h.

References tagFunc.

{
if (!function && tagFunc==NULL){return true;}
return false;
}
bool StochHMM::emm::parse ( std::string &  txt,
tracks trks,
weights wts,
StateFuncs funcs 
)

Parse an emission from text model file

Parameters:
txtString representation of emission
trksTracks used by the model
wtsWeights used by the model
funcsState functions used by the model

Definition at line 69 of file emm.cpp.

References _processTags(), StochHMM::lexicalTable::addTrack(), StochHMM::AVERAGE_SCORE, complement, StochHMM::stringList::contains(), continuous, StochHMM::COUNTS, StochHMM::DEFINED_SCORE, dist_parameters, StochHMM::expVector(), StochHMM::lexicalTable::getAlphaSize(), StochHMM::lexicalTable::getCountsTable(), StochHMM::lexicalTable::getLogProbabilityTable(), StochHMM::StateFuncs::getMultivariatePdfFunction(), StochHMM::lexicalTable::getNumberOfAlphabets(), StochHMM::lexicalTable::getOrder(), StochHMM::StateFuncs::getPDFFunction(), StochHMM::lexicalTable::getProbabilityTable(), StochHMM::tracks::getTrack(), StochHMM::HIGHEST_SCORE, StochHMM::stringList::indexOf(), StochHMM::lexicalTable::initialize_emission_table(), StochHMM::int_to_string(), lexFunc, StochHMM::LOG_PROB, StochHMM::logVector(), StochHMM::LOWEST_SCORE, multi_continuous, multiPdf, multiPdfName, number_of_tracks, pass_values, pdf, pdfName, StochHMM::stringList::pop_ith(), StochHMM::POWER, StochHMM::PROBABILITY, StochHMM::probVector(), StochHMM::stringList::push_back(), real_number, realTrack, scores, StochHMM::lexicalTable::setUnkScore(), StochHMM::lexicalTable::setUnkScoreType(), StochHMM::stringList::size(), SIZE_MAX, StochHMM::stringList::splitString(), StochHMM::stringToDouble(), StochHMM::stringToInt(), StochHMM::stringList::toVecDouble(), track_indices, and trcks.

Referenced by StochHMM::PWM::_parseBackground(), StochHMM::state::_parseEmission(), and StochHMM::matrixPosition::parse().

{
if (!_processTags(txt,trks, wts, funcs)){
return false;
}
stringList ln;
ln.splitString(txt,"\n");
size_t idx;
if (ln.contains("EMISSION")){
idx = ln.indexOf("EMISSION");
}
else{
std::cerr << "Missing EMISSION tag from emission. Please check the formatting. This is what was handed to the emission class:\n " << txt << std::endl;
return false;
}
stringList line;
line.splitString(ln[idx], "\t,: ");
size_t typeBegin(0);
//Determine Emission Type and set appropriate flags
if (line.contains("P(X)")){
typeBegin = line.indexOf("P(X)");
valtyp=PROBABILITY;
}
else if (line.contains("LOG")){
typeBegin = line.indexOf("LOG");
valtyp=LOG_PROB;
}
else if (line.contains("COUNTS")){
typeBegin = line.indexOf("COUNTS");
valtyp=COUNTS ;
}
else if (line.contains("REAL_NUMBER")){
typeBegin = line.indexOf("REAL_NUMBER");
real_number = true;
if (line.contains("COMPLEMENT") || line.contains("1-P(X)")) {
complement=true;
}
}
else if (line.contains("MULTI_CONTINUOUS")){
typeBegin = line.indexOf("MULTI_CONTINUOUS");
if (line.contains("COMPLEMENT") || line.contains("1-P(X)")) {
complement=true;
}
}
else if (line.contains("CONTINUOUS")){
typeBegin = line.indexOf("CONTINUOUS");
continuous=true;
if (line.contains("COMPLEMENT") || line.contains("1-P(X)")) {
complement=true;
}
}
else if (line.contains("FUNCTION")){
typeBegin = line.indexOf("FUNCTION");
function=true;
}
else {
std::string info = "Couldn't parse Value type in the Emission: " + txt + " Please check the formatting. The allowed types are: P(X), LOG, COUNTS, or REAL_NUMBER. \n";
std::cerr << info << std::endl;
//errorInfo(sCantParseLine, info.c_str());
}
//remaining tracks and Orders then set Track
std::vector<track*> tempTracks;
for(size_t i=1;i<typeBegin;i++){
track* tk = trks.getTrack(line[i]);
if (tk==NULL){
std::cerr << "Emissions tried to add a track named: " << line[i] << " . However, there isn't a matching track in the model. Please check to model formatting.\n";
return false;
}
else{
tempTracks.push_back(tk);
}
}
//Real Number Emissions
if (tempTracks.size()>1){
std::cerr << "Multiple tracks listed under Real Track Emission Definition\n";
return false;
}
realTrack = tempTracks[0];
return true;
}
//Multivariate Continuous PDF emission
else if (multi_continuous){
if (tempTracks.size()==1){
std::cerr << "Only a single track listed under MULTI_CONTINUOUS\n\
Use CONTINUOUS instead of MULTI-CONTINUOUS\n";
return false;
}
//Assign track information
track_indices = new std::vector<size_t>;
trcks = new std::vector<track*> (tempTracks);
pass_values = new std::vector<double> (number_of_tracks,-INFINITY);
for(size_t i = 0; i < number_of_tracks ; ++i){
track_indices->push_back((*trcks)[i]->getIndex());
}
idx = ln.indexOf("PDF");
line.splitString(ln[idx],"\t:, ");
size_t function_idx = line.indexOf("PDF") + 1;
multiPdfName = line[function_idx];
multiPdf = funcs->getMultivariatePdfFunction(multiPdfName);
size_t parameter_idx = line.indexOf("PARAMETERS");
dist_parameters = new(std::nothrow) std::vector<double>;
if(parameter_idx != SIZE_MAX){
for(size_t i = parameter_idx+1 ; i< line.size() ; i++){
double value;
stringToDouble(line[i], value);
dist_parameters->push_back(value);
}
}
return true;
}
//U
else if (continuous){
if (tempTracks.size()>1){
std::cerr << "Multiple tracks listed under CONTINUOUS Track Emission Definition\n\
Must use MULTI_CONTINUOUS for multivariate emissions\n";
return false;
}
realTrack = tempTracks[0];
idx = ln.indexOf("PDF");
line.splitString(ln[idx],"\t:, ");
size_t function_idx = line.indexOf("PDF") + 1;
pdfName = line[function_idx];
size_t parameter_idx = line.indexOf("PARAMETERS");
dist_parameters = new(std::nothrow) std::vector<double>;
if(parameter_idx != SIZE_MAX){
for(size_t i = parameter_idx+1 ; i< line.size() ; i++){
double value;
stringToDouble(line[i], value);
dist_parameters->push_back(value);
}
}
pdf = funcs->getPDFFunction(pdfName);
return true;
}
else if (function){
//Get function name
std::string& functionName = line[typeBegin+1];
//Set parameters for function
lexFunc = new(std::nothrow) emissionFuncParam(functionName,funcs,tempTracks[0]);
if (lexFunc==NULL){
std::cerr << "OUT OF MEMORY\nFile" << __FILE__ << "Line:\t"<< __LINE__ << std::endl;
exit(1);
}
return true;
}
else{ //Traditional Lexcical Emission
if (ln.contains("ORDER")){
idx=ln.indexOf("ORDER");
}
else{
std::cerr << "Couldn't find ORDER in non-Real_Number emission. Please check the formatting" << std::endl;
return false;
//errorInfo(sCantParseLine, "Couldn't find ORDER in non-Real_Number emission. Please check the formatting\n");
}
std::vector<int> tempOrder;
line.splitString(ln[idx],"\t:, ");
size_t orderIdx = line.indexOf("ORDER");
orderIdx++;
size_t ambIdx;
bool containsAmbig = line.contains("AMBIGUOUS");
if (containsAmbig){ambIdx=line.indexOf("AMBIGUOUS");}
else{ ambIdx= line.size();}
for(size_t i=orderIdx;i<ambIdx;i++){
int tempValue;
if (!stringToInt(line[i], tempValue)){
std::cerr << "Emission Order not numeric" << std::endl;
return false;
}
if (tempValue>32){
std::cerr << "Emission order is greater than 32. Must be 32 or less" << std::endl;
return false;
}
tempOrder.push_back(tempValue);
}
if (tempOrder.size() == tempTracks.size()){
for(size_t i=0;i<tempOrder.size();i++){
scores.addTrack(tempTracks[i], tempOrder[i]);
}
}
else{
std::cerr << "Different number of tracks and orders parsed in Emission: " << txt << " Check the formatting of the Emission" << std::endl;
return false;
}
//Parse Ambiguous Tag Info
if (containsAmbig){
ambIdx++;
if (line.size()<=ambIdx){
std::cerr << "No scoring type after AMBIGUOUS label\nAssuming AVG\n";
}
else if (line[ambIdx].compare("AVG")==0){scores.setUnkScoreType(AVERAGE_SCORE);}
else if (line[ambIdx].compare("MAX")==0){scores.setUnkScoreType(HIGHEST_SCORE);}
else if (line[ambIdx].compare("MIN")==0){scores.setUnkScoreType(LOWEST_SCORE);}
//Constant values assigned for Ambiguous characters
//Can either be passed as P(X) or LOG
else if (line[ambIdx].compare("P(X)")==0)
{
ambIdx++;
if (ambIdx>=line.size()){
std::cerr << "Missing Ambiguous Value" << std::endl;
return false;
}
double tempValue;
if (!stringToDouble(line[ambIdx], tempValue)){
std::cerr << "Ambiguous Value couldn't be parsed: "<< line[ambIdx] << std::endl;
return false;
}
scores.setUnkScore(log(tempValue));
}
else if (line[ambIdx].compare("LOG")==0){
ambIdx++;
if (ambIdx>=line.size()){
std::cerr << "Missing Ambiguous Value" << std::endl;
return false;
}
double tempValue;
if (!stringToDouble(line[ambIdx], tempValue)){
std::cerr << "Ambiguous Value couldn't be parsed: "<< line[ambIdx] << std::endl;
return false;
}
scores.setUnkScore(tempValue);
}
}
//Get Emission Tables
size_t expectedColumns(1);
size_t expectedRows(1);
for(size_t i = 0; i<scores.getNumberOfAlphabets(); i++){
expectedColumns*=scores.getAlphaSize(i);
expectedRows*=POWER[scores.getOrder(i)][scores.getAlphaSize(i)-1];
}
std::vector<std::vector<double> >* log_prob = scores.getLogProbabilityTable();
std::vector<std::vector<double> >* prob = scores.getProbabilityTable();
std::vector<std::vector<double> >* counts = scores.getCountsTable();
for (size_t iter = 2; iter< ln.size();iter++){
//If it's the first line check for a '#' indicating that the column header is present
if (iter==2 && ln[iter][0]=='@'){
continue;
}
line.splitString(ln[iter],"\t ");
//Check for Row header
if (line[0][0]=='@'){
line.pop_ith(0);
}
std::vector<double> temp = line.toVecDouble();
if (temp.size() != expectedColumns){
std::string info = "The following line couldn't be parsed into the required number of columns. Expected Columns: " + int_to_string(expectedColumns) + "\n The line appears as: " + ln[iter] ;
std::cerr << info << std::endl;
return false;
//errorInfo(sCantParseLine, info.c_str());
}
else{
if (valtyp == PROBABILITY){
prob->push_back(temp);
logVector(temp);
log_prob->push_back(temp);
}
else if (valtyp == LOG_PROB){
log_prob->push_back(temp);
expVector(temp);
prob->push_back(temp);
}
else if (valtyp == COUNTS){
counts->push_back(temp);
probVector(temp);
prob->push_back(temp);
logVector(temp);
log_prob->push_back(temp);
}
}
}
if (log_prob->size() != expectedRows){
std::cerr << " The Emission table doesn't contain enough rows. Expected Rows: " << expectedRows << " \n Please check the Emission Table and formatting for " << txt << std::endl;
return false;
}
}
return true;
}
bool StochHMM::emm::parse ( std::string &  txt,
track trk 
)

Parse an emission from text

Parameters:
txtString representation of emission
trksTracks used by the model
wtsWeights used by the model
funcsState functions used by the model

Definition at line 427 of file emm.cpp.

References StochHMM::lexicalTable::addTrack(), StochHMM::AVERAGE_SCORE, StochHMM::stringList::contains(), StochHMM::COUNTS, StochHMM::DEFINED_SCORE, StochHMM::expVector(), StochHMM::lexicalTable::getAlphaSize(), StochHMM::lexicalTable::getCountsTable(), StochHMM::lexicalTable::getLogProbabilityTable(), StochHMM::lexicalTable::getNumberOfAlphabets(), StochHMM::lexicalTable::getOrder(), StochHMM::lexicalTable::getProbabilityTable(), StochHMM::HIGHEST_SCORE, StochHMM::stringList::indexOf(), StochHMM::lexicalTable::initialize_emission_table(), StochHMM::int_to_string(), StochHMM::LOG_PROB, StochHMM::logVector(), StochHMM::LOWEST_SCORE, StochHMM::stringList::pop_ith(), StochHMM::POWER, StochHMM::PROBABILITY, StochHMM::probVector(), StochHMM::stringList::push_back(), scores, StochHMM::lexicalTable::setUnkScore(), StochHMM::lexicalTable::setUnkScoreType(), StochHMM::stringList::size(), StochHMM::stringList::splitString(), StochHMM::stringToDouble(), StochHMM::stringToInt(), tagFunc, and StochHMM::stringList::toVecDouble().

{
stringList ln;
ln.splitString(txt,"\n");
size_t idx;
if (ln.contains("EMISSION")){
idx = ln.indexOf("EMISSION");
}
else{
std::cerr << "Missing EMISSION tag from emission. Please check the formatting. This is what was handed to the emission class:\n " << txt << std::endl;
return false;
}
stringList line;
line.splitString(ln[idx], "\t,: ");
//size_t typeBegin(0);
if (line.contains("P(X)")){
//typeBegin = line.indexOf("P(X)");
valtyp=PROBABILITY;
}
else if (line.contains("LOG")){
//typeBegin = line.indexOf("LOG");
valtyp=LOG_PROB;
}
else if (line.contains("COUNTS")){
//typeBegin = line.indexOf("COUNTS");
valtyp=COUNTS ;
}
else {
std::string info = "Couldn't parse Value type in the Emission: " + txt + " Please check the formatting. The allowed types are: P(X), LOG, COUNTS, or REAL_NUMBER. \n";
std::cerr << info << std::endl;
//errorInfo(sCantParseLine, info.c_str());
}
//remaining tracks and Orders then set Track
std::vector<track*> temp_tracks;
temp_tracks.push_back(trk);
if (ln.contains("ORDER")){
idx=ln.indexOf("ORDER");
}
else{
std::cerr << "Couldn't find ORDER in non-Real_Number emission. Please check the formatting" << std::endl;
return false;
//errorInfo(sCantParseLine, "Couldn't find ORDER in non-Real_Number emission. Please check the formatting\n");
}
std::vector<int> temp_order;
line.splitString(ln[idx],"\t:,");
size_t orderIdx = line.indexOf("ORDER");
orderIdx++;
size_t ambIdx;
bool containsAmbig = line.contains("AMBIGUOUS");
if (containsAmbig){ambIdx=line.indexOf("AMBIGUOUS");}
else{ ambIdx= line.size();}
for(size_t i=orderIdx;i<ambIdx;i++){
int temp_value;
if (!stringToInt(line[i], temp_value)){
std::cerr << "Emission Order not numeric" << std::endl;
return false;
}
if (temp_value>32){
std::cerr << "Emission order is greater than 32. Must be 32 or less" << std::endl;
return false;
}
temp_order.push_back(temp_value);
}
if (temp_order.size() == temp_tracks.size()){
for(size_t i=0;i<temp_order.size();i++){
scores.addTrack(temp_tracks[i], temp_order[i]);
}
}
else{
std::cerr << "Different number of tracks and orders parsed in Emission: " << txt << " Check the formatting of the Emission" << std::endl;
return false;
}
//Parse Ambiguous Tag Info
if (containsAmbig){
ambIdx++;
if (line.size()<=ambIdx){
std::cerr << "No scoring type after AMBIGUOUS label\nAssuming AVG\n";
}
else if (line[ambIdx].compare("AVG")==0){scores.setUnkScoreType(AVERAGE_SCORE);}
else if (line[ambIdx].compare("MAX")==0){scores.setUnkScoreType(HIGHEST_SCORE);}
else if (line[ambIdx].compare("MIN")==0){scores.setUnkScoreType(LOWEST_SCORE);}
else if (line[ambIdx].compare("P(X)")==0)
{
ambIdx++;
if (ambIdx>=line.size()){
std::cerr << "Missing Ambiguous Value" << std::endl;
return false;
}
double tempValue;
if (!stringToDouble(line[ambIdx], tempValue)){
std::cerr << "Ambiguous Value couldn't be parsed: "<< line[ambIdx] << std::endl;
return false;
}
scores.setUnkScore(log(tempValue));
}
else if (line[ambIdx].compare("LOG")==0){
ambIdx++;
if (ambIdx>=line.size()){
std::cerr << "Missing Ambiguous Value" << std::endl;
return false;
}
double tempValue;
if (!stringToDouble(line[ambIdx], tempValue)){
std::cerr << "Ambiguous Value couldn't be parsed: "<< line[ambIdx] << std::endl;
return false;
}
scores.setUnkScore(tempValue);
}
}
//Get Tables
size_t expectedColumns(1);
size_t expectedRows(1);
for(size_t i = 0; i<scores.getNumberOfAlphabets(); i++){
expectedColumns*=scores.getAlphaSize(i);
expectedRows*=POWER[scores.getOrder(i)][scores.getAlphaSize(i)-1];
}
std::vector<std::vector<double> >* log_prob = scores.getLogProbabilityTable();
std::vector<std::vector<double> >* prob = scores.getProbabilityTable();
std::vector<std::vector<double> >* counts = scores.getCountsTable();
for (size_t iter = 2; iter< ln.size();iter++){
//If it's the first line check for a '#' indicating that the column header is present
if (iter==2 && ln[iter][0]=='@'){
continue;
}
line.splitString(ln[iter],"\t ");
//Check for Row header
if (line[0][0]=='@'){
line.pop_ith(0);
}
std::vector<double> temp = line.toVecDouble();
if (temp.size() != expectedColumns){
std::string info = "The following line couldn't be parsed into the required number of columns. Expected Columns: " + int_to_string(expectedColumns) + "\n The line appears as: " + ln[iter] ;
std::cerr << info << std::endl;
return false;
//errorInfo(sCantParseLine, info.c_str());
}
else{
if (valtyp == PROBABILITY){
prob->push_back(temp);
logVector(temp);
log_prob->push_back(temp);
}
else if (valtyp == LOG_PROB){
log_prob->push_back(temp);
expVector(temp);
prob->push_back(temp);
}
else if (valtyp == COUNTS){
counts->push_back(temp);
probVector(temp);
prob->push_back(temp);
logVector(temp);
log_prob->push_back(temp);
}
}
}
if (log_prob->size() != expectedRows){
std::cerr << " The Emission table doesn't contain enough rows. Expected Rows: " << expectedRows << " \n Please check the Emission Table and formatting for " << txt << std::endl;
return false;
}
if (tagFunc != NULL){
std::cerr << "Not NULL" << std::endl;
}
return true;
}
void StochHMM::emm::print ( )
inline

Print the string representation of the emission to stdout.

Definition at line 95 of file emm.h.

References stringify().

{std::cout << stringify()<<std::endl;};
void StochHMM::emm::setComplement ( )
inline

Set the emission to be the complement 1-P of given value.

Definition at line 76 of file emm.h.

References complement.

{complement=true;};
void StochHMM::emm::setLexicalFunction ( emissionFunc )
void StochHMM::emm::setRealNumber ( )
inline

Set the emission to a Real Number.

Definition at line 73 of file emm.h.

References real_number.

{real_number=true;};
std::string StochHMM::emm::stringify ( )

Definition at line 795 of file emm.cpp.

References StochHMM::AVERAGE_SCORE, complement, continuous, dist_parameters, StochHMM::double_to_string(), StochHMM::lexicalTable::getAmbDefinedScore(), StochHMM::lexicalTable::getAmbScoringType(), StochHMM::track::getName(), StochHMM::emissionFuncParam::getName(), StochHMM::lexicalTable::getOrder(), StochHMM::lexicalTable::getTrack(), StochHMM::emissionFuncParam::getTrack(), StochHMM::HIGHEST_SCORE, StochHMM::int_to_string(), StochHMM::join(), lexFunc, StochHMM::LOWEST_SCORE, multi_continuous, StochHMM::NO_SCORE, number_of_tracks, pdfName, real_number, realTrack, scores, StochHMM::lexicalTable::stringify(), StochHMM::emissionFuncParam::stringify(), tagFunc, and StochHMM::lexicalTable::trackSize().

Referenced by print(), StochHMM::PWM::stringify(), and StochHMM::matrixPosition::stringify().

{
std::string emissionString("EMISSION:\t");
emissionString+=realTrack->getName();
emissionString+=":\t";
emissionString+="REAL_NUMBER";
if (complement){
emissionString+=":\tCOMPLEMENT\t";
}
else{
emissionString+="\t";
}
if (tagFunc){
emissionString+=tagFunc->stringify();
}
emissionString+="\n";
}
//Univariate Continuous PDF emission
else if (continuous){
emissionString+=realTrack->getName();
emissionString+=":\tCONTINUOUS";
if (tagFunc){
emissionString+=tagFunc->stringify();
}
emissionString+="\n\t";
emissionString+="PDF:\t";
emissionString+=pdfName + "\tPARAMETERS:\t";
emissionString+=join(*dist_parameters, ',');
emissionString+= "\n";
}
else if (multi_continuous){
for (size_t i=0; i < number_of_tracks; i++) {
if (i>0){
emissionString+=",";
}
emissionString+=(*trcks)[i]->getName();
}
emissionString+=":\tMULTI_CONTINUOUS";
if (tagFunc){
emissionString+=tagFunc->stringify();
}
emissionString+="\n\t";
emissionString+="PDF:\t";
emissionString+=pdfName + "\tPARAMETERS:\t";
emissionString+=join(*dist_parameters, ',');
emissionString+= "\n";
}
else if (function){
emissionString+=lexFunc->getTrack()->getName();
emissionString+=":\tFUNCTION:\t";
emissionString+=lexFunc->getName();
emissionString+="\t";
if (tagFunc){
emissionString += tagFunc->stringify();
}
emissionString+="\n";
}
else{
for(size_t i=0;i<scores.trackSize();i++){
if (i>0){
emissionString+=",";
}
emissionString+=scores.getTrack(i)->getName();
}
emissionString+=":\t";
emissionString+="LOG";
if (tagFunc){
emissionString += "\t";
emissionString += tagFunc->stringify();
}
emissionString+="\n\tORDER:\t";
for(size_t i=0;i<scores.trackSize();i++){
if (i>0){
emissionString+=",";
}
emissionString+=int_to_string(scores.getOrder(i));
}
if (ambTemp!=NO_SCORE){
emissionString+="\tAMBIGUOUS:\t";
emissionString+=(ambTemp==HIGHEST_SCORE)? "MAX":
(ambTemp==LOWEST_SCORE)? "MIN":
(ambTemp==AVERAGE_SCORE)? "AVG": "LOG:" + double_to_string(scores.getAmbDefinedScore());
}
emissionString+="\n";
emissionString+=scores.stringify();
//scores.stringifyAmbig();
}
emissionString+="\n";
return emissionString;
}

Friends And Related Function Documentation

friend class model
friend

Definition at line 66 of file emm.h.

friend class state
friend

Definition at line 65 of file emm.h.


Member Data Documentation

bool StochHMM::emm::complement
private

Definition at line 116 of file emm.h.

Referenced by emm(), get_emission(), isComplement(), parse(), setComplement(), and stringify().

bool StochHMM::emm::continuous
private

Definition at line 114 of file emm.h.

Referenced by emm(), get_emission(), parse(), and stringify().

std::vector<double>* StochHMM::emm::dist_parameters
private

Definition at line 132 of file emm.h.

Referenced by emm(), get_emission(), parse(), and stringify().

bool StochHMM::emm::function
private

Definition at line 124 of file emm.h.

emissionFuncParam* StochHMM::emm::lexFunc
private

Definition at line 125 of file emm.h.

Referenced by emm(), get_emission(), parse(), stringify(), and ~emm().

bool StochHMM::emm::multi_continuous
private

Definition at line 115 of file emm.h.

Referenced by emm(), get_emission(), parse(), and stringify().

multiPdfFunc* StochHMM::emm::multiPdf
private

Definition at line 137 of file emm.h.

Referenced by emm(), and parse().

std::string StochHMM::emm::multiPdfName
private

Definition at line 138 of file emm.h.

Referenced by parse().

size_t StochHMM::emm::number_of_tracks
private

Definition at line 139 of file emm.h.

Referenced by emm(), get_emission(), parse(), and stringify().

std::vector<double>* StochHMM::emm::pass_values
private

Definition at line 142 of file emm.h.

Referenced by emm(), get_emission(), and parse().

pdfFunc* StochHMM::emm::pdf
private

Definition at line 128 of file emm.h.

Referenced by emm(), and parse().

std::string StochHMM::emm::pdfName
private

Definition at line 129 of file emm.h.

Referenced by parse(), and stringify().

bool StochHMM::emm::real_number
private

Definition at line 113 of file emm.h.

Referenced by emm(), get_emission(), isReal(), parse(), setRealNumber(), and stringify().

track* StochHMM::emm::realTrack
private

Definition at line 118 of file emm.h.

Referenced by emm(), get_emission(), isReal(), parse(), and stringify().

lexicalTable StochHMM::emm::scores
private

Definition at line 121 of file emm.h.

Referenced by get_emission(), getTables(), parse(), and stringify().

emissionFuncParam* StochHMM::emm::tagFunc
private
std::vector<size_t>* StochHMM::emm::track_indices
private

Definition at line 141 of file emm.h.

Referenced by emm(), get_emission(), and parse().

std::vector<track*>* StochHMM::emm::trcks
private

Definition at line 140 of file emm.h.

Referenced by emm(), and parse().


The documentation for this class was generated from the following files: