@mrinmayk wrote:
Please solve this basic data science related question asked for a screening interview of company Integrate.AI, at-least 1 part-
Suppose we have a data set with two variables: type of injury (categorical) and description (string). We want to predict the type of injury for new data given only the description. We get a new description that contains the word “swelling”. Our model, built from a very large training sample, tells us that the only two types of injuries that can produce the word “swelling” are “Burn”, which occurs in 1 out of 10 observations, and “Bruise”, which occurs in 1 out of 100 observations. A “Bruise” observation has a 30% chance of generating the word “swelling”, while “Burn” has only a 5% chance of generating the word “swelling".
Q1. Without any other information, is the new observation with the word “swelling” more likely to be a burn or a bruise? What is the probability of either? *
Q2. What is the probability of at-least 2 bruises given that the 6 observations have descriptions that contain the word “swelling”.
I solved as -
GIVEN-
Burn - 1 out of 10 observations
Bruise - 1 out of 100 observationsBruise - 30% chance of generating word swelling
Burn - 5% change of generating word swellingCALCULATION 1 -
So,
0.01 * 0.3 = 0.003 = P(bruise)
0.1 * 0.05 = 0.005 = P(burn)Now, 6 observations have the word “swelling”.
CALCULATION 2 -
P(at least 2 bruises) = P(2 bruises, 4 burns) + P(3 bruises, 3 burns) + P(4 bruises, 2 burns) + P(5 bruises, 1 burns) + P (6 bruises, 0 burns) = 9 * 625 + 27 * 125 + 81 * 25 + 243 * 5 + 729 / (1000 * 1000)
Is it correct? I believe nCr has to be used.
Please help.
Thanks & Best
Posts: 1
Participants: 1