@itijain wrote:
Hello,
I am learning coding and the query I am asking might be a silly thing for most of you but please help me out. I would be really thankful.
I have a dataframe of 3 Lakh rows and multiple columns.
Suppose my data is of :ID PAN DOB 1 x a 2 y b 3 z c 2 x d
I want to perform an operation that in whichever rows, ID is same, but corresponding PAN or DOB is not same, then store that ID and PAN/DOB in separate dataframe.
I coded the following in python:
clt = pd.read_csv("Sample client1.csv") Id = clt.loc[:,"ID"] dob= clt.loc[:,"DOB"] pan=clt.loc[:,"PAN"] dataerror=pd.DataFrame(columns= ["ID","PAN","DOB"]) #Empty dataframe to store erroneous data def IDmatch(Id,dob,pan,i,j): if id[i]==id[j]: if dob[i]!=dob[j] or pan[i]!=pan[j]: global dataerror dataerror=dataerror.append({"Error ID":ID[i],"Error type":"Same ID, different PAN or DOB","Error value":clt.PAN[i]},ignore_index = True) for i in range(0,300000): for j in range(i+1,300000): IDmatch(id,dob,pan,i,j)
The above code is running fine on a sample data of 10 rows but it is just not getting executed on 3 lakh rows. Please suggest what can I do for this.
It would be of great help.
Posts: 3
Participants: 2