Writing an IP Address Information Webservice in Ruby
or Writing an XML-RPC Webservice using Ruby and MySQL that can be used to determine Country Information from IP Addresses and to Impress the Opposite Sex, Along the Way Learning How RIPE Assigns Blocks of Addresses, How to Access Webservices from Javascript and More Three Letter Acronyms Than You Ever Cared to Know
In case the slightly baroque title hasn't given you clue about the content of this article, I describe the implementation of an XML-RPC webservice programmed in Ruby, which provides information about IP addresses.
You might find this interesting if you're trying to find information about determining the country of origin of a specific IP address, or if you're looking for an example of how to implement XML-RPC webservices in Ruby.
The webservice itself could be useful to determine where visitors to your website are coming from by looking up the IP addresses in your log files. I also explain how you can use this webservice directly from Javascript applications on your site by using the jsRPC library.
The service exists as described and is located on this
(www.kuriositaet.de) server at
/ip/ip_ws.rb. The XML-RPC method name for the service is getIPInfo
and it returns the following struct:
registry => where this IP is registered (i.e ARIN, RIPE ...)
country => two letter ISO 3166 country code
status => one of "ASSIGNED" or "ALLOCATED"
In case any piece of information is unknown, a "?" is returned in it's
place. In case of invalid IP addresses, a fault is generated.
Where to get the information?
Strangely enough, finding out where to obtain the definitive information about IP space allocation was nearly the most difficult part of the whole project.
How IP Addresses are Allocated
IANA (the Internet Assigned Numbers Authority) allocates IP Addresses to Regional Internet Registries (RIR), who in turn assign addresses to ISP's (Internet Service Providers) acting as LIRs (Local Internet Registries) which assign the addresses to their customers. How IANA divvied up the IPv4 address space is described in this document: ftp://ftp.iana.org/assignments/ipv4-address-space.
Presently, there are five RIR's:
AfriNIC based in Mauritius and responsible for Africa.
APNIC, located in Brisbane, Australia and responsible for the Asia Pacific region
ARIN (American Registry for Internet Numbers) in Virginia, responsible for North America
LACNIC located in Uruguay, responsible for Latin America
RIPE NCC (Réseaux IP Européens) located in the Netherlands and responsible for Europe, the Middle East and Central Asia.
These five regional registries have formed the NRO (Number Resource Organization) to coordinate their efforts.
How Allocation Data is Published.
Information about address allocation is published in the RIR Statistics
Exchange Format. In a nutshell, each registry publishes a
file containing their allocations:
Each file is called delegated-<registry>-yyyymmdd
The <registry> value follows the internal record format and is
one of the specified strings from the set:
{apnic,arin,iana,lacnic,ripencc};
(...)
The most recent file will also be available under a name of
the form delegated-<registry>-latest. This can be a symbolic
or hard link to the delegated-<registry>-yyyymmdd file but must
be pointed at the most recent file when it is updated.
Each RIR will make its files available in a standard ftp
directory, defined as /stats/<registry>/*.
Each RIR also mirrors the data from all the other registries, so it's only necessary to connect to a single server.
Downloading the Files
With the above information, it's easy to download the files using Ruby.
ftp downloading is implemented in Ruby's standard library
net/ftp package.
First, we cobble together a list of files to download:
file_names=[
"afrinic",
"apnic",
"arin",
"lacnic",
"ripencc"
]
file_names.map! {|file| "/pub/stats/#{file}/delegated-#{file}-latest"}
All that remains to be done is to pick a server to download
the files from and provide a local directory to copy the files to. I'm
in Europe, so I'll download from RIPE:
url = "ftp.ripe.net"
localdir = "tmp"
Net::FTP.open(url) { |ftp|
ftp.login
file_names.each { |file|
ftp.get(file, localdir+"/"+file.slice(/[^\/]*$/))
}
}
The example leaves out all checks to make sure the local directory exists, error handling and, all the other stuff that programming is actually about, for the sake of clarity.
File Format
Now that we've downloaded all necessary files, let's look at their
format. Thankfully, the files are CSV formatted using a
pipe "|" (ASCII 0x7c) as field separator. The only other
special feature is line commenting using a hash (#).
The file starts out with some headers which we're not interested in. The format of the main records is:
registry|cc|type|start|value|date|status[|extensions...]
registry contains the name of the registry this IP is assigned to,
one of the fields the webservice will return. cc is the ISO 3166 two letter country
code (e.g. US for, well US or DE for Germany.) We need this information
as well.
type can be one of {asn,ipv4,ipv6} depending on whether this record
is an Autonomous System Number, IP version 4, or IP version 6
entry. Since we're not interested in routing we'll ignore all the ASN
entries and since no one uses IPv6, we'll ignore all records for ipv6
as well.
start is the IP address this block starts at, and value is the
number of hosts compromising this block. date gives information about
when this block was first assigned by the RIR. Finally status provides
information about whether the block is assigned or allocated. In
short, blocks are "assigned" to the final instance using the
block. "Allocation" is basically delegation to LIRs who will split up
the block to assign or allocate the pieces.
Let's have a look at an individual record then. This is the first ipv4
record in the current (2006-01-20) delegated-arin-latest as of my
writing this:
arin|US|ipv4|3.0.0.0|16777216|19880223|assigned
Using our newly gained knowledge, we can immediately see the block of ipv4
addresses described in this record has
been assigned by arin (to GE coincidentally). The first address of
this block is 3.0.0.0 and there are 16777216 further addresses
following 3.0.0.0.
Why provide information about the number of hosts when there's always 16,777,216 addresses in a Class A network? The answer is simply that RIRs allocate CIDR blocks as well, so you can't rely on the class of network to determine the number of hosts.
From CSV into the Database.
In order for the webservice to be snappy (and to be hip with the crowd) we'll
parse the downloaded files and load them into a MySQL database.
Personally, I'd prefer PostgreSQL database, but MySQL is
what comes with my host. The database table is completely straightforward:
CREATE TABLE ip_ranges (
registry VARCHAR (10), -- max length is ripencc, afrinic, each 7
cc CHAR (2),
ip_type CHAR (4), -- only ipv4 for now
ip_from INTEGER UNSIGNED,
ip_to INTEGER UNSIGNED,
first_date DATE,
status CHAR (1), -- L = aLlocated, S= aSsigned,
import_status CHAR (1)
-- import status is used for import. Typically set to null, all
-- existing values in the table are set to "1" before new values
-- are imported. New values are inserted with import_status=2
-- If the import is successful, all rows with
-- import_status==1 are deleted and import_status==2 are set to
-- null. If the import fails, all rows with import_status=2 are deleted
-- and rows with import_status==1 are reset to null.
);
CREATE INDEX idx_ip_from ON ip_ranges (ip_from);
CREATE INDEX idx_ip_to ON ip_ranges (ip_to);
With a few exceptions which I'll discuss below, the table is just a
one-to-one mapping of the fields from the records in the RIR files.
records, cc, first_date correspond to the RIR file, as does
status, though I decided to save a little space by mapping ASSIGNED
and ALLOCATED to S and L.
The import_status field is to ensure that we don't mess up everything
in case an import fails. Details are in the code comments, in case
you're interested.
ip_from is the start IP address converted to a number. IP addresses
are converted to numbers because that makes them easier to work with.
Like when determining which block a given IP falls into. An IP
address is just a series of four bytes. Take 192.0.34.166, currently
the IP of www.example.com in decimal, hex and bits.
| dotted quad | 192 | 0 | 34 | 166 |
| hex | C0 | 00 | 22 | A6 |
| bits | 11000000 | 00000000 | 00100010 | 10100110 |
If we treat: 11000000 00000000 00100010 10100110 like a number, we
get: 3221234342. This is much easier to work with than 192.0.34.166.
Only ip_to is still missing, it's the value of the last IP address in
this block, obtained by adding the value field from the RIR records to
the start address and subtracting one. That was the last bit we needed to
know in order to parse the RIR files:
# for reference:
# arin|US|ipv4|3.0.0.0|16777216|19880223|assigned
arr = lineFromRIRFile.split("|")
registry=arr[0]
cc=arr[1]
ip_type=arr[2]
start_ip=IPAddr.new(arr[3])
ip_from=tmp.start_ip.to_i
number=arr[4].to_i
ip_to=from+number-1
first_date = arr[5]
status = arr[6]=="allocated"?"L":"S"
With all the information extracted and nicely laid out in aptly named variables, all that remains to be done is pack the data into the database. For a quick one-off job, we could do this:
insert= "INSERT INTO ip_ranges (registry, cc, ip_type, ip_from,"+
" ip_to, first_date, status, import_status)"+
" VALUES (?,?,?,?,?,?,?,'2')"
db = get_mysql () # magic !
db.prepare (insert)
db.execute (registry, cc, ip_type, ip_from, ip_to, first_date, status)
get_mysql() is defined elsewhere and retrieves a Ruby MySQL
driver. I'm using a prepared statement for the insert
because it saves the hassle of quoting and such. The call to prepare
compiles the insert statement and execute() executes the statement with
all the values previously parsed from the record.
Unfortunately, actually inserting every single record like this takes
a while. Any database worth it's while has some sort of bulk import
tool that's faster than plain inserting. Even MySQL has one! In MySQL
the bulk import command is LOAD DATA LOCAL INFILE. All we
need is a CSV file, with fields separated by tabs and each field
corresponding to each column in the table.
tmplt = "%s\t%s\t%s\t%s\t%s\t%s\t%s\t2"
line = tmplt % [registry, cc, ip_type, ip_from, ip_to, first_date, status]
Once the import file is complete and saved in, say tmp/import.file,
it can be imported into the database like this:
load = "LOAD DATA LOCAL INFILE 'tmp/import.file' INTO TABLE ip_ranges"
db.query(sql_insert)
Almost there.
Now that all the data is loaded into the database, we should be able to
get information about addresses by converting the IP to a number, and
issuing a select statement. Since we previously calculated the number value of
192.0.34.166 (www.example.com) to be 3221234342, we'll use that:
SELECT * FROM ip_ranges
WHERE 3221234342 BETWEEN ip_from AND ip_to
Unfortunately though, that select doesn't return anything because
192.0.34.166 belongs to IANA and isn't assigned by the RIRs.
Therefore it isn't contained in any of the files we imported. Which
means we stumbled across a little bug, eh, limitation. Try again with
www.google.com. Google has a bunch of addresses, I'll pick one at
random: 66.249.93.99. Most likely you can't transform that into a
number in your head, but I can! It's: 1123638627, so typing:
SELECT * FROM ip_ranges
WHERE 1123638627 BETWEEN ip_from AND ip_to
yields:
| registry | cc | ip_type | ip_from | ip_to | first_date | status | import_status |
| arin | US | ipv4 | 1123631104 | 1123639295 | 2004-03-05 | S | 0 |
Creating the Ruby Webservice
Now we can spend our nights checking IP addresses, so long as we have access to the MySQL database. That's not very useful, though, so we'll provide some wrapper methods to access the data. As promised, we'll write a XML-RPC webservice in Ruby. Lucky for us, Ruby provides XML-RPC functionality as part of it's standard library.
require "xmlrpc/server"
require "ipaddr"
The first line of code includes the standard libraries we're using. Both
xmlrpc/server and ipaddr should have come installed with your Ruby
distribution if it's moderately fresh. First off, I'll define a generic
function to return a Ruby hash representation of the the XML-RPC
struct we defined at the beginning. In case you've forgotten, the
webservice is supposed to return the registry, country and
assignment status of the provided IP address.
def get_ip_information ip
addr = IPAddr.new(ip)
result = nil
h = {
"registry" => "?",
"country" => "?",
"status" => "?"
}
stmt = "SELECT registry, cc, status "+
"FROM ip_ranges "+
"WHERE #{addr.to_i} BETWEEN ip_from AND ip_to"
get_mysql { |db|
db.query(stmt) { |result|
result.each { |result|
h["registry"] =result[0]
h["country"] =result[1]
h["status"] =result[2]=="S" ? "ASSIGNED" : "ALLOCATED"
}
}
}
h
end
The code first instantiates an IPAddr object that we'll use to check
the IP for validity, and to convert it to a number. The return value is
prepared in the variable h to contain ? values in case we run into a
"limitation" like the www.example.com fiasco. get_mysql prepares the
database driver, selects the registration information and fills in our
result.
Next, we need to instantiate an XML-RPC server object and connect the webservice functionality to it:
server = XMLRPC::CGIServer.new
server.add_handler("getIPInfo") { |ip|
get_ip_information ip
}
The add_handler function attaches the functionality for a named
XML-RPC method to the server. In the example above, the server is
instructed to perform the code block behind add_handler whenever it
receives an XML-RPC call to getIPInfo. The value of the parameter in
the XML-RPC method call (the IP address in our case) is passed through
by way of the variable ip.
To keep things simple, the code block doesn't do much. It merely passed the IP
address on to the get_ip_information function.
The value returned by the code block is the Ruby hash generated by the
get_ip_information function which the XML-RPC server automatically
converts to the proper XML-RPC struct type.
Finally, we'll get fancy and define a second handler for the getIPInfo
method which doesn't require you to pass any parameter but automatically
returns the IP information for the caller's address. We need to define a
second handler, because the XMLRPC implementation checks the
arity of the code block and would throws a METHOD_MISSING
fault if it encounters an XML-RPC request containing the incorrect
number of parameters.
server = XMLRPC::CGIServer.new
server.add_handler("getIPInfo") {
get_ip_information ENV["REMOTE_ADDR"]
}
Alternatively, Ruby allows you to require parameters
optionally, sort of like variable argument lists in C, by
prefixing the variable with an asterisks "*". If you do so, any or all
variables get passed to the code block as an array.
s.add_handler("getIPInfo") { |*ip|
if ip.length == 0 || ip[0].strip == ""
ip = ENV["REMOTE_ADDR"]
else
ip = ip[0]
end
get_ip_information ip
}
Just for fun, I'll add one final method: getIPAddr to determine the IP
of the client making the RPC call.
s.add_handler("getIPAddress") {
ENV["REMOTE_ADDR"]
}
Try it!
Everything is set up and ready to go. In order to try out the
webservice, just point your XML-RPC client to
http://www.kuriositaet.de/ip/ip_ws.rb and make calls to getIPInfo.
Or you can try the service right from this page. I'm using my jsRPC library in order to access webservices directly from within this page. For example, you can press on this button to get the information about your IP address:
jsRPC makes it very easy to integrate the service in Javascript. First, you need to include the library:
<script src="/js/all_scripts.js" type="text/javascript"></script>
Apart from that, all you need to know about the library is that it
contains an object named XmlRpc which can create proxy objects
that connect to webservices. For example, in order to create a proxy for our
webservice, do this:
var rpc = XmlRpc.getObject("/ip/ip_ws.rb", ["getIPInfo", "getIPAddress"])
The URL of the service and an array of method names are passed to the
getObject function of XmlRpc, and the call returns a Javascript
object which responds to those functions.
All that's left to do now is plain old Javascript:
// create an "onclick" function for the button
function alertIPInfo1 () {
// call to the webservice
var info = rpc.getIPInfo()
// call to another method of the webservice
var ip = rpc.getIPAddress()
//assemble results and alert()
var str = "Your address is: "+ip+"\n"
str += info.status + " by '" + info.registry + "' in " + info.country
alert (str)
}
And finally a tiny bit of HTML for the button:
<!-- connect the button to the function -->
<input type="button" value = "look up my ip" onclick="alertIPInfo()">
In case you'd like to try another IP address than your own, here's a final example:
A quick peek at the code reveals it's similar to the previous example,
though we can leave out all the initialization. First we create a
function that retrieves the entered IP address to hook up to the
onclick event of the button. Since the rpc object is already
initialized, it can be reused.
function alertIPInfo2 () {
var ip = document.getElementById("ip_field").value
var str = "Please enter a valid IP"
try {
var info = rpc.getIPInfo(ip) //reuse the rpc object here
str = "Information for: "+ip+"\n"
str += info.status + " by '" + info.registry + "' in " + info.country
} catch (e) {
// worry about this some other time :(
}
alert (str)
}
The error handling isn't really pretty. We don't differentiate between
faults generated by the webservice indicating invalid addresses and
network errors, but it's good enough for a start. Now all we need is a
test entry field and a button to hook up to the code.
<input type=text id="ip_field">
<input type=button value = "look up IP" onclick="alertIPInfo2()">