Maxmind IP lookup using Redis

Like many others I have benefited from Stack Overflow immensely. Recently I posted a question regarding the use of Redis and Maxmind and found many good responses. I’ve hacked up a working example in Python reproduced both here and below


import redis
import pymysql

conn = pymysql.connect(host=’localhost’,user=’user’,passwd=’password’,db=’foo’)
cur = conn.cursor()
cur.execute(‘select startipnum,endipnum,isp from wiki.ipisp order by endipnum;’)
result = cur.fetchall()

r = redis.StrictRedis(host=’localhost’, port=6379, db=0)
ispctr = 1
for row in result:
tempDict = {‘ispname’:row[2],’fromval’:row[0],’toval’:row[1]}
namefield = ispctr
r.hmset(namefield,tempDict)
r.zadd(‘ispmaxindex’,row[1],namefield)
ispctr = ispctr+1
conn.close()

ipstotest = ['23.23.23.23','24.96.185.10','203.59.91.235','188.66.105.50','99.98.163.93']
for ip in ipstotest:
ipvalsList = [int(ipoct) for ipoct in ip.split('.')]
ipnum = (16777216*ipvalsList[0]) + (65536*ipvalsList[1]) + (256*ipvalsList[2]) + ipvalsList[3]
ipnum = long(ipnum)
tempVal1 = r.zrangebyscore(‘ispmaxindex’,ipnum,float(‘Inf’),0,1)
tempval2 = r.hgetall(tempval1[0])
print tempval2['ispname']

 

Path from root to leaf node in mvpart

I was recently asked by a R user about how one could extract the “rule” in a classification/regression tree. The requirement was to obtain the path traced from the root node to the leaf nodes and obtain all the paths or “rules”

path.rpart() function in the mvpart package provides this convenience


library(mvpart)

# Create a classification tree
ozone <- mvpart(Ozone ~ ., data=airquality)
print(ozone) # Gives you the various splits in the tree

# Issue the two commands below to see the graphical representation
plot(ozone)
text(ozone)

# To obtain the summary of the created tree
summary(ozone)

# To obtain the path to the leaf nodes
leafnodeRows <- grepl("leaf",ozone$frame$var)
nodevals <- as.numeric(rownames(ozone$frame)[leafnodeRows])
rules <- path.rpart(ozone,nodevals)
rulesdf <- do.call("rbind",lapply(rules,function(x)paste(x,collapse = " -AND- ")))
rulesdf <- data.frame(nodeNumber=rownames(rulesdf),rule=rulesdf[,1],stringsAsFactors=FALSE)

GUI for sending email in R (using sendEmail)

After writing the last post on using sendEmail to send email from R I decided to create a simple GUI to enable this functionality.
A snapshot image of the GUI is shown above.
To use this GUI, you will need to install the following packages in R:

More information on installing gWidgets can be found in the package vignette found here

Once you have downloaded the Windows GTK Bundle, unzip it and add the included bin folder in your PATH. The README file mentions the avoidance of Winzip to unzip the GTK Bundle, so maybe you could use Winrar.

I had to restart R a few times for the installations and PATH changes to take effect.

The R script that creates the GUI is hosted as a download on code.google.com

You can download it from this link

Alternately, issuing the command below will also work.

source(“http://sendemail-gui-r.googlecode.com/files/GUI_sendEmail.R”)

The GUI is very simplistic for now and I will continue to develop it and post updates.
I’m trying to familiarize myself with Git so I could update new versions and manage the code more effectively in the near future.

Sending Email from R (using sendEmail)

Like a lot of other R users I’ve felt the need for sending email from R. I haven’t surveyed CRAN for such a package but looked for the possibility of sending command line email in Windows.

Found a nice application called sendEmail that can be found here

Below are code snippets in R that will allow you to make use of this application.

First, download the application from the link provided above. Get the TLS supported variant. Unzip and add to your PATH variable, the location of the sendEmail executable file.

Create a text file with some sample content, place it in C:/ and name it “filetest.txt”. This is the file used as a sample attachment in the code below.

I am assuming you’re using Gmail and have a username: xyz@gmail.com with password: xyz.
Please make appropriate changes to test for your credentials.

In the sendEmail folder, README.txt provides the necessary information of the command line parameters (see section ‘Usage Overview’). The code snippet below simply creates the relevant parameter-value combination(s) and uses system() to issue a system call.

# Create the required command line parameters for sendEmail to work
paramsList <- list()
paramsList$fromAddress <- c("-f",'xyz@gmail.com')
paramsList$toAddress <- c("-t",'xyz@gmail.com')
paramsList$emailSubject <- c("-u","Test Email from R")
paramsList$listemailMessage <- c("-m",paste("Sent at ",format(Sys.time(),"%Y-%d-%m:%H-%M-%S"),sep=" "))
paramsList$serverAndPort <- c("-s","smtp.gmail.com:587")
paramsList$fileAttachPath <- c("-a","C:/filetest.txt")
paramsList$accUsername <- c("-xu","xyz@gmail.com")
paramsList$accPassword <- c("-xp","xyz")

# Add double quotes to the parameter values
paramsList1 <- lapply(paramsList,function(x){x[2] <- dQuote(x[2]);paste(x,collapse = " ")})

# Combine to create one single function call
suffixCall <- paste(do.call("c",paramsList1),collapse = " ")
commandCall <- paste("sendEmail",suffixCall,sep = " ")
# Issue the command via system() – sending it to CMD
returnVal <- system(commandCall,intern=T,wait=T)
print(returnVal)

You could use the R2HTML package in R to send model summary report in HTML like below:

# ———
library(R2HTML)
data(iris)
# From the help files of R2HTML package
tmpfic <- HTMLInitFile(outdir="C:/",filename="regressionTest")
HTML(as.title("Regression Example"),file=tmpfic)
HTML(summary(lm(Sepal.Length~Sepal.Width+ Petal.Length + Petal.Width,data=iris)),file=tmpfic)
HTMLEndFile(tmpfic)
browseURL(tmpfic) # View the file in your local browser

# Update the paramsList appropriately
paramsList$fileAttachPath <- c("-a",tmpfic)

# Recreate the new command and execute
paramsList1 <- lapply(paramsList,function(x){x[2] <- dQuote(x[2]);paste(x,collapse = " ")})
suffixCall <- paste(do.call("c",paramsList1),collapse = " ")
commandCall <- paste("sendEmail",suffixCall,sep = " ")
returnVal <- system(commandCall,intern=T,wait=T)
print(returnVal)

The solution presented here is not a pure R implementation and could be considered more of a hack.
Thanks to Brandon Zehm for creating sendEmail. More information about Brandon Zehm can be found here

Finding functions in R

When looking for functions whose exact name is unknown

# Functions related to “shrinkage” methods
help.search(“shrinkage”)

Package sos does a great job in finding functions

install.packages(“sos”)
library(sos)
shrinkageResults <- findFn("shrinkage", maxPages = 1)
shrinkageResults
# This opens a webpage in your browser with the results

The table in the webpage created above have sortable columns.

# Sort on creation data of resultant packages
summaryObj <- summary(shrinkageResults)
pkgTable <- summaryObj$PackageSummary
pkgTable <- pkgTable[order(pkgTable$Date,decreasing=T),]

Many more R specific search options here

Missing values and column types when reading data into R

Reading data into R when dealing with column types and values that need to be considered as NA

Below are code snippets to introduce a few arguments of the read.csv function in R

# Create sample data
strVals <- do.call("c",lapply(1:1000,function(x)paste(sample(letters,sample(5:20,1)),collapse="")))
miscVals <- sample(c("","999","—-","MISS"),100,replace=T)
numVals <- rnorm(1000)

# Scenario 1 : Pure numeric and strings
dataTemp<-data.frame(numericVals = numVals, stringVals = strVals)
write.csv(dataTemp,file="inputData.csv",quote=F,row.names=F)
inData <- read.csv("inputData.csv",header=T)
sapply(inData,class)
# Col: stringVals is type factor

# Using the function argument stringsAsFactors = FALSE mitigates character columns
# being turned into factor type
inData <- read.csv("inputData.csv",header=T,stringsAsFactors=FALSE)
sapply(inData,class)

# Using function argument colClasses
# predefine the column types in the input file
inData <- read.csv("inputData.csv",header=T,colClasses = c("numeric","character"))
sapply(inData,class)

# If you have data values that need to be considered as NA
# Add values from miscVals ( "","999","—-","MISS" ) to numVals and strVals
numMiscVals <- sample(c(numVals,miscVals),1000)
strMiscVals <- sample(c(strVals,miscVals),1000)

dataTemp<-data.frame(numericVals = numMiscVals, stringVals = strMiscVals)
write.csv(dataTemp,file="inputData.csv",quote=F,row.names=F)
inData 0

# Use na.strings argument
inData <- read.csv("inputData.csv",header=T,stringsAsFactors=FALSE,na.strings = c("","999","—-","MISS"))
sapply(inData,class)
# The columns have the right type numericVals is numeric and stringVals is character
sum(c("","999","—-","MISS") %in% inData$numericVals)
# should return 0

Setting up AWS Cluster to use snow in R

Setting up AWS Cluster

I wanted to setup an AWS cluster to take a shot at a Kaggle contest – DunnHumby Challenge

http://www.kaggle.com/c/dunnhumbychallenge

For this, I found StarCluster to be of great help. It allows you to set-up AWS nodes in a few lines of code and does much more (choosing AMIs and cluster configurations)

http://web.mit.edu/stardev/cluster/

Make sure you use the Bioconductor AMI which comes bundled with R and a host of installed packages.

http://www.bioconductor.org/help/bioconductor-cloud-ami/

I used the package “snowfall” for parallel processing.

Relevant SO questions I had asked

http://stackoverflow.com/questions/7241244/using-aws-for-parallel-processing-with-r

http://stackoverflow.com/questions/7333801/using-snow-and-snowfall-with-aws-for-parallel-processing-in-r

Follow

Get every new post delivered to your Inbox.