Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Data Visualization Using ggplot2

$
0
0

@Shaik.1290 wrote:

Hello Guys,

I have loaded Forbes data using URL - "https://www.forbes.com/powerful-brands/list/#tab:rank".
I have followed the data cleaning steps and trying to visualize data using ggplot2 function.

I have written code as mentioned below

p<-ggplot(BrandsNew1,aes(x=BrandsNew1$Company.Advertising, y=BrandsNew1$Brand.Revenue))+
geom_point(aes(col=BrandsNew1$Industry,size = BrandsNew1$Brand.Value))

You can check the output in your R code (haven't uploaded bcoz, system allows only 1 pic)

But, I am trying to get output as below

I know that my code is completely wrong as I am trying to overcome from the initial stages of coding R.

In the 1st output you can see that brandnew1$brand.value displaying the output of its complete factors but I need only Technology factor and brand names of each scatter plot as it mentioned in 2nd output.

Could you guys please help me in getting the output. You may also use the below code for personal use.

library(XML)
library(RCurl)
u2 <- paste(readLines("D:\Praxis\Work\R work\forbes1.html"), collapse = "\n")
class(u2)
str(u2)

brand1<- readHTMLTable(u2)
str(brand1)

Converting into data.frame

Brands <- as.data.frame(brand1$the_list)
View(Brands)

Removing 1 n 2 columns

Brands1<-Brands[,-1]
str(Brands1)

# Remove rows contains NA

omit_brand <- na.omit(Brands1)

write.csv(omit_brand,"BrandsNew.csv",row.names = F)

Test b/w row.names = T n F.. If it is T then it creates a new clmn which counts

all the obs of data

write.csv(omit_brand,"BrandsNewtest.csv",row.names = T)

loaded brandsnew data n changed "-" values to NA

BrandsNew <- read.csv("BrandsNew.csv", na.strings = c("-"))
str(BrandsNew)

Removing # from rank vector

BrandsNew$Rank <- gsub("#","",BrandsNew$Rank)
str(BrandsNew$Rank)
unique(BrandsNew$Rank)

As BrandsNew$Rank its been rank vec.. I havent changed it to numeric

remove $ n B from brand.revenue

BrandsNew$Brand.Value <- gsub("[$]","",BrandsNew$Brand.Value)

BrandsNew$Brand.Value <- gsub("B","",BrandsNew$Brand.Value)

BrandsNew$Brand.Value <- as.numeric(BrandsNew$Brand.Value)
class(BrandsNew$Brand.Value)

str(BrandsNew)
View(BrandsNew)

checking missing values NA

is.na(BrandsNew$Company.Advertising)

na.rm(BrandsNew$Company.Advertising)

Remove NA obs using na.omit

BrandsNewOmit <- as.data.frame(na.omit(BrandsNew$Company.Advertising))
str(BrandsNewOmit)

remove NA rows.. for ref created 2 datasets

BrandsNew1 <- na.omit(BrandsNew)
BrandsNew2 <- na.omit(BrandsNew)

View(BrandsNew1)
View(BrandsNew2)

Try to convert millions to billions data in comp_adv var/vec

the below grepl func o/p displaying the position of M in a vec/clmn

index = which(grepl("M",BrandsNew1$Company.Advertising))

will check for B-billion

indextest = which(grepl("B",BrandsNew1$Company.Advertising))

Remove M, B, $ using gsub func

BrandsNew1$Company.Advertising = gsub("M","",BrandsNew1$Company.Advertising)
BrandsNew1$Company.Advertising = gsub("B","",BrandsNew1$Company.Advertising)
BrandsNew1$Company.Advertising = gsub("[$]","",BrandsNew1$Company.Advertising)

class(BrandsNew1$Company.Advertising)

change it in numeric

BrandsNew1$Company.Advertising <- as.numeric(BrandsNew1$Company.Advertising)
class(BrandsNew1$Company.Advertising)

Now millions to billions -

BrandsNew1$Company.Advertising[index] = BrandsNew1$Company.Advertising[index]/1000

Remove M, B, $ using gsub func

BrandsNew1$Brand.Revenue = gsub("M","",BrandsNew1$Brand.Revenue)
BrandsNew1$Brand.Revenue = gsub("B","",BrandsNew1$Brand.Revenue)
BrandsNew1$Brand.Revenue = gsub("[$]","",BrandsNew1$Brand.Revenue)

checked only for Million values in brand.value n revenue aswell

indexM = which(grepl("M",BrandsNew1$Brand.Revenue))

View(BrandsNew1)

Visualization

library(ggplot2)

plot(x=BrandsNew1$Company.Advertising,y=BrandsNew1$Brand.Revenue)

geom_point used for scatter plots

p<-ggplot(BrandsNew1,aes(x=BrandsNew1$Company.Advertising, y=BrandsNew1$Brand.Revenue))+
geom_point(aes(col=BrandsNew1$Industry,size = BrandsNew1$Brand.Value))+
scale_x_discrete(seq(0.8,5.4,0.1))+
scale_y_discrete(seq(40,170,10))

plot(p)

.
.
.

Thanks & Regards,
Shaik

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles