Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

For loop in Python

$
0
0

@itijain wrote:

Hello,
I am learning coding and the query I am asking might be a silly thing for most of you but please help me out. I would be really thankful.
I have a dataframe of 3 Lakh rows and multiple columns.
Suppose my data is of :

ID        PAN            DOB
1           x              a
2           y              b
3           z              c
2           x              d

I want to perform an operation that in whichever rows, ID is same, but corresponding PAN or DOB is not same, then store that ID and PAN/DOB in separate dataframe.

I coded the following in python:

clt = pd.read_csv("Sample client1.csv")

Id = clt.loc[:,"ID"]
dob= clt.loc[:,"DOB"]
pan=clt.loc[:,"PAN"]

dataerror=pd.DataFrame(columns= ["ID","PAN","DOB"])   #Empty dataframe to store erroneous data

def IDmatch(Id,dob,pan,i,j):
            if id[i]==id[j]: 
                if dob[i]!=dob[j] or pan[i]!=pan[j]:
                        global dataerror
                        dataerror=dataerror.append({"Error ID":ID[i],"Error type":"Same ID, different PAN or 
                                                                      DOB","Error value":clt.PAN[i]},ignore_index = True)


 for i in range(0,300000):
    for j in range(i+1,300000):
          IDmatch(id,dob,pan,i,j)

The above code is running fine on a sample data of 10 rows but it is just not getting executed on 3 lakh rows. Please suggest what can I do for this.
It would be of great help.

Posts: 3

Participants: 2

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles