@Shaik.1290 wrote:
Hello Guys,
I have loaded Forbes data using URL - "https://www.forbes.com/powerful-brands/list/#tab:rank".
I have followed the data cleaning steps and trying to visualize data using ggplot2 function.I have written code as mentioned below
p<-ggplot(BrandsNew1,aes(x=BrandsNew1$Company.Advertising, y=BrandsNew1$Brand.Revenue))+
geom_point(aes(col=BrandsNew1$Industry,size = BrandsNew1$Brand.Value))You can check the output in your R code (haven't uploaded bcoz, system allows only 1 pic)
But, I am trying to get output as below
I know that my code is completely wrong as I am trying to overcome from the initial stages of coding R.
In the 1st output you can see that brandnew1$brand.value displaying the output of its complete factors but I need only Technology factor and brand names of each scatter plot as it mentioned in 2nd output.
Could you guys please help me in getting the output. You may also use the below code for personal use.
library(XML)
library(RCurl)
u2 <- paste(readLines("D:\Praxis\Work\R work\forbes1.html"), collapse = "\n")
class(u2)
str(u2)brand1<- readHTMLTable(u2)
str(brand1)Converting into data.frame
Brands <- as.data.frame(brand1$the_list)
View(Brands)Removing 1 n 2 columns
Brands1<-Brands[,-1]
str(Brands1)# Remove rows contains NA
omit_brand <- na.omit(Brands1)
write.csv(omit_brand,"BrandsNew.csv",row.names = F)
Test b/w row.names = T n F.. If it is T then it creates a new clmn which counts
all the obs of data
write.csv(omit_brand,"BrandsNewtest.csv",row.names = T)
loaded brandsnew data n changed "-" values to NA
BrandsNew <- read.csv("BrandsNew.csv", na.strings = c("-"))
str(BrandsNew)Removing # from rank vector
BrandsNew$Rank <- gsub("#","",BrandsNew$Rank)
str(BrandsNew$Rank)
unique(BrandsNew$Rank)As BrandsNew$Rank its been rank vec.. I havent changed it to numeric
remove $ n B from brand.revenue
BrandsNew$Brand.Value <- gsub("[$]","",BrandsNew$Brand.Value)
BrandsNew$Brand.Value <- gsub("B","",BrandsNew$Brand.Value)
BrandsNew$Brand.Value <- as.numeric(BrandsNew$Brand.Value)
class(BrandsNew$Brand.Value)str(BrandsNew)
View(BrandsNew)checking missing values NA
is.na(BrandsNew$Company.Advertising)
na.rm(BrandsNew$Company.Advertising)
Remove NA obs using na.omit
BrandsNewOmit <- as.data.frame(na.omit(BrandsNew$Company.Advertising))
str(BrandsNewOmit)remove NA rows.. for ref created 2 datasets
BrandsNew1 <- na.omit(BrandsNew)
BrandsNew2 <- na.omit(BrandsNew)View(BrandsNew1)
View(BrandsNew2)Try to convert millions to billions data in comp_adv var/vec
the below grepl func o/p displaying the position of M in a vec/clmn
index = which(grepl("M",BrandsNew1$Company.Advertising))
will check for B-billion
indextest = which(grepl("B",BrandsNew1$Company.Advertising))
Remove M, B, $ using gsub func
BrandsNew1$Company.Advertising = gsub("M","",BrandsNew1$Company.Advertising)
BrandsNew1$Company.Advertising = gsub("B","",BrandsNew1$Company.Advertising)
BrandsNew1$Company.Advertising = gsub("[$]","",BrandsNew1$Company.Advertising)class(BrandsNew1$Company.Advertising)
change it in numeric
BrandsNew1$Company.Advertising <- as.numeric(BrandsNew1$Company.Advertising)
class(BrandsNew1$Company.Advertising)Now millions to billions -
BrandsNew1$Company.Advertising[index] = BrandsNew1$Company.Advertising[index]/1000
Remove M, B, $ using gsub func
BrandsNew1$Brand.Revenue = gsub("M","",BrandsNew1$Brand.Revenue)
BrandsNew1$Brand.Revenue = gsub("B","",BrandsNew1$Brand.Revenue)
BrandsNew1$Brand.Revenue = gsub("[$]","",BrandsNew1$Brand.Revenue)checked only for Million values in brand.value n revenue aswell
indexM = which(grepl("M",BrandsNew1$Brand.Revenue))
View(BrandsNew1)
Visualization
library(ggplot2)
plot(x=BrandsNew1$Company.Advertising,y=BrandsNew1$Brand.Revenue)
geom_point used for scatter plots
p<-ggplot(BrandsNew1,aes(x=BrandsNew1$Company.Advertising, y=BrandsNew1$Brand.Revenue))+
geom_point(aes(col=BrandsNew1$Industry,size = BrandsNew1$Brand.Value))+
scale_x_discrete(seq(0.8,5.4,0.1))+
scale_y_discrete(seq(40,170,10))plot(p)
.
.
.Thanks & Regards,
Shaik
Posts: 1
Participants: 1