How to parse keyword which is in Sentence using NLTK?

@premsheth wrote:

Hi friends,

I am trying to do CV parsing from PDF file.
My steps are follows:
1)convert pdf to text and list of sentences
2) extracting segment of experience from list of experience segment keywords
3) Extracting company name, Position title and Duration

Now I have problem in Extracting company name, position title
work_segment = ['Work Experience', 'Software Engineer', 'Digital, Data and Technology Services (DDTS), Department for Environment, Food & Rural Affairs (Defra)', 'January 2009 to Present']

Now I want to parse company name from work_segment list. I have list of company names also. here Digital, Data and Technology Services (DDTS) is company name and it included in my company names list.

i tried to used stanford library so it can give me Organisation or location or person tag but it was not working.

I also tried to used following code it works fine for some pdf but sometimes it not working if company name and position titles also included in lists.
work_experience = defaultdict(dict)
def extract_work_experience():
    noun_phrases = []
    comp = []
    pos_tit = []
    date = []
    title = []
    compan = []

    for i,text in enumerate(work_segment):
        lines = nltk.word_tokenize(text)
        tags = nltk.pos_tag(lines)

        nouns = [word for word,pos in tags if(pos == "NN" or pos == "NNP" )]
        company = " ".join(nouns)
        comp.append(company)
        
        print(tags)


        title = [word for word,pos in tags if(pos == "NNP" or pos == 'NN' )]
        ti = " ".join(title)
        pos_tit.append(ti)
    print(pos_tit)
    #print(comp)
    
    for pos in pos_tit:
        print("=== extracted NNP ====")
        print(pos)
        for tit in key_words:
            print("======Title list =====")
            print(tit)
            if pos.lower() == tit.lower():
                title.append(pos)
                print(title)
            elif tit.lower() in pos.lower():
                title.append(tit)
                
    #print(title)
                
    for c in comp:
        #print(c)
        for co in company_names:
            #print(co)
            if co.lower() == c.lower():
                compan.append(c)
            if co.lower() in c:
                compan.append(co)

                
    for text in work_segment:
        matches = re.findall(r'((?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*(?:-|\.|\s|,)\s?\d{,2}[a-z]*(?:-|,|\s)?\s?\d{2,4})',text)
        #print(matches)
        if matches:
            #print(len(matches))
            if len(matches) == 1:
                ty = {'start_year':matches[0],'end_year':"Present"}
            else:
                ty = {'start_year':matches[0],'end_year':matches[1]}
            date.append(ty)
    
    print(date)
    print(title)
    print(compan)
    
    for i in range(0,len(title)):
        vars()["employer" + str(i)] = {}
        vars()["employer" + str(i)]['company'] = compan[i]
        vars()["employer" + str(i)]['Position_title'] = title[i]
        vars()["employer" + str(i)]['Time duration'] = date[i]

        #print(vars()["employer" + str(i)])
        work_experience["employer" + str(i+1)].update(vars()["employer" + str(i)])
    
    e = dict(work_experience)
    
    return e 
Questions:-

How to parse company name if it included in Company name lists

I used NLTK for tagging words and tried to parse all 'NN' and 'NNP' tag words. Now How to get some number of words which have same tag
For Example:
I tagged words from sentence.
[('Digital', 'NNP'), (',', ','), ('Data', 'NNP'), ('and', 'CC'), ('Technology', 'NNP'), ('Services', 'NNP'), ('(', '('), ('DDTS', 'NNP'), (')', ')'), (',', ','), ('Department', 'NNP'), ('for', 'IN'), ('Environment', 'NNP'), (',', ','), ('Food', 'NNP'), ('&', 'CC'), ('Rural', 'NNP'), ('Affairs', 'NNPS'), ('(', '('), ('Defra', 'NNP'), (')', ')')] [('January', 'NNP'), ('2009', 'CD'), ('to', 'TO'), ('Present', 'VB')]
Now I just want to parse only [('Digital', 'NNP'), (',', ','), ('Data', 'NNP'), ('and', 'CC'), ('Technology', 'NNP'), ('Services', 'NNP')]not all 'NNP' tags.
How can I do that?

please if anyone have any ideas or answers it will appreciate it
thanks in Advance.

Posts: 1

Participants: 1

Read full topic

How to parse keyword which is in Sentence using NLTK?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

VMOU RSCIT Result 2017, RSCIT Result VMOU rkcl.vmou.ac.in Name Wise

Bureau of Internal Revenue: Regional Offices (Directory)

Form: VAT: registration - land and property (VAT5L)

Four Air Leitchville Pty Ltd v Hurlad Pty Ltd (No 3) [2024] FCA 238

Trial of East Grinstead man accused of rape to begin next week

WONHO – Better Than Me – Single [iTunes Plus M4A]

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Theja Surapaneni The ‘Most Attractive' Man on Australian TV Of All Time

MS-CHAPV2 NAP Policy failing - Reason Code 65

Ex-Colchester United youth player Craig Winskill carried out armed robbery to...

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

Telangana TS New Food Security Card/ Telangana Ration card Application Form...

NCERT Solutions for Class 9th Sanskrit Chapter 2 अविवेकः परमापदां पदम्

High-speed Ethernet switches a bright spot in network forecasts

Wazifa Remedy to Increase Enlarge Penis Size

Arms accused back in court next month

TBT: Samini “Tempo” Feat Mugeez (R2Bees) Prod by Kaywa

In Court: Cases heard at Central Devon Magistrates' Court

Schools benefit from American donation