Quantcast
Viewing all articles
Browse latest Browse all 14069

How can I sort different elements based on keywords?

I'm trying to sort different sentences from a text file according to the part of speech of the specified word in each sentence. For example: Given the big [house] and the {red} flower, I want to create two dictionaries such as dict1

{house: ["the big house", "substantive"]

and dict2

{red: "the red flower", "adjective"}

This is the idea I came up with to later merge them and have a dictionary that contains the keyword as the main word from the sentence and a list with the sentence where I got it from and also its part of speech.

I've tried in multiple ways but it always end up mixing it all up without almost any order. This is the last I've tried an, though I know it could be better formatted and it's not the most clean solution, it's the most I've got it to work so far.

These are a sample from the sentences that I'm working with:

Es (duftete) nach Erde und Pilzedie [Wände] waren mit Moos überzogen.Ihr zerrissenes [Gewand] war wieder wie neuEr saß da wie verzaubert und schaute sie an und konnte seine Augen nicht {mehr} von ihr abwendenDa sie durchscheinend waren, sahen sie aus wie aus rosa [Glas], das von innen erleuchtet ist.

and this is what I wrote to sort it:

def getWordsSelected (sentence):    #the parameter sentence gets a list with the previous sentence sample showed    global WordsDictionary    WordsDictionary = {}    verbDict = {}    subsDict = {}    adjDict = {}    for wordSentenceToSearch in sentence :        #SUBSTANTIVE         startSubstantive = wordSentenceToSearch.find("[")        endSubstantive = wordSentenceToSearch.find("]")        substringSubstantive = wordSentenceToSearch[startSubstantive:endSubstantive]        wordToSearchSubstantive = substringSubstantive.strip("[]")        subsDict [wordToSearchSubstantive] = [wordSentenceToSearch]        subsDict.setdefault(wordToSearchSubstantive, []).append("substantive")    for wordSentenceToSearch in sentence :        #VERB        startVerb = wordSentenceToSearch.find("(")        endVerb = wordSentenceToSearch.find(")")        substringVerb = wordSentenceToSearch[startVerb:endVerb]        wordToSearchVerb = substringVerb.strip("()")        verbDict [wordToSearchVerb] = [wordSentenceToSearch]        verbDict.setdefault(wordToSearchVerb, []).append("Verb")    for wordSentenceToSearch in sentence :        #ADJ        startADJ = wordSentenceToSearch.find("{")        endADJ = wordSentenceToSearch.find("}")        substringADJ = wordSentenceToSearch[startADJ:endADJ]        wordToSearchADJ = substringADJ.strip(r"{}")        adjDict [wordToSearchADJ] = [wordSentenceToSearch]        adjDict.setdefault(wordToSearchADJ, []).append("ADJ")    print(subsDict)    print(verbDict)    print(adjDict)

This almost works, however this is the result:

{'': ['Er saß da wie verzaubert und schaute sie an und konnte seine Augen nicht {mehr} von ihr abwenden', 'substantive'], 'Wände': ['die [Wände] waren mit Moos überzogen.', 'substantive'], 'Gewand': ['Ihr zerrissenes [Gewand] war wieder wie neu', 'substantive'], 'Glas': ['Da sie durchscheinend waren, sahen sie aus wie aus rosa [Glas], das von innen erleuchtet ist.', 'substantive']}

In the above dictionary it should show only substantives, and it almost does it except for the first element; where it adds the sentence of the highlighted word "mehr", which is not a substantive (And that's why it doesn't add any keyword, because it's not recognizing anything there with the parameters to qualify as a substantive, but it DOES however get it in there for some reason)

{'duftete': ['Es (duftete) nach Erde und Pilze', 'Verb'], '': ['Da sie durchscheinend waren, sahen sie aus wie aus rosa [Glas], das von innen erleuchtet ist.', 'Verb']}

Here is the verb list and it gets it right with duftete (the only verb in the sample), but again it cramps in there another sentence without any rhyme or reason.

{'': ['Da sie durchscheinend waren, sahen sie aus wie aus rosa [Glas], das von innen erleuchtet ist.', 'ADJ'], 'mehr': ['Er saß da wie verzaubert und schaute sie an und konnte seine Augen nicht {mehr} von ihr abwenden', 'ADJ']}

and finally the adjective and adverb category (they must be in the same list) adds as well the sentence for Glas which is a substantive and shouldn't be there since it doesn't (and shouldn't) recognize any parameter for that to happen.

So, what is happening here? why does it add sentences without any (apparent) logical explanation? And most importantly, what can I do to fix this in order to sort the sentences appropriately


Viewing all articles
Browse latest Browse all 14069

Trending Articles