I'm trying to sort different sentences from a text file according to the part of speech of the specified word in each sentence. For example: Given the big [house]
and the {red} flower
, I want to create two dictionaries such as dict1
{house: ["the big house", "substantive"]
and dict2
{red: "the red flower", "adjective"}
This is the idea I came up with to later merge them and have a dictionary that contains the keyword as the main word from the sentence and a list with the sentence where I got it from and also its part of speech.
I've tried in multiple ways but it always end up mixing it all up without almost any order. This is the last I've tried an, though I know it could be better formatted and it's not the most clean solution, it's the most I've got it to work so far.
These are a sample from the sentences that I'm working with:
Es (duftete) nach Erde und Pilzedie [Wände] waren mit Moos überzogen.Ihr zerrissenes [Gewand] war wieder wie neuEr saß da wie verzaubert und schaute sie an und konnte seine Augen nicht {mehr} von ihr abwendenDa sie durchscheinend waren, sahen sie aus wie aus rosa [Glas], das von innen erleuchtet ist.
and this is what I wrote to sort it:
def getWordsSelected (sentence): #the parameter sentence gets a list with the previous sentence sample showed global WordsDictionary WordsDictionary = {} verbDict = {} subsDict = {} adjDict = {} for wordSentenceToSearch in sentence : #SUBSTANTIVE startSubstantive = wordSentenceToSearch.find("[") endSubstantive = wordSentenceToSearch.find("]") substringSubstantive = wordSentenceToSearch[startSubstantive:endSubstantive] wordToSearchSubstantive = substringSubstantive.strip("[]") subsDict [wordToSearchSubstantive] = [wordSentenceToSearch] subsDict.setdefault(wordToSearchSubstantive, []).append("substantive") for wordSentenceToSearch in sentence : #VERB startVerb = wordSentenceToSearch.find("(") endVerb = wordSentenceToSearch.find(")") substringVerb = wordSentenceToSearch[startVerb:endVerb] wordToSearchVerb = substringVerb.strip("()") verbDict [wordToSearchVerb] = [wordSentenceToSearch] verbDict.setdefault(wordToSearchVerb, []).append("Verb") for wordSentenceToSearch in sentence : #ADJ startADJ = wordSentenceToSearch.find("{") endADJ = wordSentenceToSearch.find("}") substringADJ = wordSentenceToSearch[startADJ:endADJ] wordToSearchADJ = substringADJ.strip(r"{}") adjDict [wordToSearchADJ] = [wordSentenceToSearch] adjDict.setdefault(wordToSearchADJ, []).append("ADJ") print(subsDict) print(verbDict) print(adjDict)
This almost works, however this is the result:
{'': ['Er saß da wie verzaubert und schaute sie an und konnte seine Augen nicht {mehr} von ihr abwenden', 'substantive'], 'Wände': ['die [Wände] waren mit Moos überzogen.', 'substantive'], 'Gewand': ['Ihr zerrissenes [Gewand] war wieder wie neu', 'substantive'], 'Glas': ['Da sie durchscheinend waren, sahen sie aus wie aus rosa [Glas], das von innen erleuchtet ist.', 'substantive']}
In the above dictionary it should show only substantives, and it almost does it except for the first element; where it adds the sentence of the highlighted word "mehr", which is not a substantive (And that's why it doesn't add any keyword, because it's not recognizing anything there with the parameters to qualify as a substantive, but it DOES however get it in there for some reason)
{'duftete': ['Es (duftete) nach Erde und Pilze', 'Verb'], '': ['Da sie durchscheinend waren, sahen sie aus wie aus rosa [Glas], das von innen erleuchtet ist.', 'Verb']}
Here is the verb list and it gets it right with duftete (the only verb in the sample), but again it cramps in there another sentence without any rhyme or reason.
{'': ['Da sie durchscheinend waren, sahen sie aus wie aus rosa [Glas], das von innen erleuchtet ist.', 'ADJ'], 'mehr': ['Er saß da wie verzaubert und schaute sie an und konnte seine Augen nicht {mehr} von ihr abwenden', 'ADJ']}
and finally the adjective and adverb category (they must be in the same list) adds as well the sentence for Glas
which is a substantive and shouldn't be there since it doesn't (and shouldn't) recognize any parameter for that to happen.
So, what is happening here? why does it add sentences without any (apparent) logical explanation? And most importantly, what can I do to fix this in order to sort the sentences appropriately