Intro to IPv6:
ipv6.l.google.com has IPv6 address 2001:4860:0:2001::68
2001:4860:0:2001::68 is short notation for:
2001:4860:0000:2001:0000:0000:0000:0068
IPv6 is 128 bit. There are 2^128 IPv6 IPs. That’s 340 undecillion.
I have a whole lot more detail on what IPv6 is, saving that for a later presentation, let’s jump to something fun. This is what I need your help with. What are the considerations you have to make when developing your applications for IPv6? Here’s one:
Imagine you want to store a list of IPv6′s in a MySQL table. Maybe this is a list of IPs allowed to connect to your application, or maybe it’s a list of IP to server assignments.
IPv6 is 128 bit. This is larger than will fit in any single MySQL numeric data type. (Note: postgres has ip/netmask data types and functions to do calculations on them. It will also listen on the IPv6 network stack, MySQL won’t. But, that’s another story. The challenge for now is, storing IPv6 IPs in MySQL)
One method is to store the IPv6 addresses as a string that represents the IP in short notation:
CREATE TABLE `ipshort` (
`id` bigint(22) unsigned NOT NULL auto_increment,
`bitmask` tinyint(3) unsigned NOT NULL default ’128′,
`ip` varchar(39) NOT NULL,
PRIMARY KEY (`id`),
KEY `bitmask` (`bitmask`),
KEY `ip` (`ip`),
UNIQUE KEY `bitmaskip` (`bitmask`, `ip`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Notes:
- The length of “ip” will be up to 39+1. Average length depends on how compressed the IPs are that you insert.
- Not good if you want to test an IP against the table. In other words, if you want to test if 2001:1::1/128 is covered by any ip/bitmask already stored in the table, you’ll have to select each one out and do a bunch of math on each one.
Next attempt, store IPs as their binary string:
CREATE TABLE `ipbinary` (
`id` bigint(22) unsigned NOT NULL auto_increment,
`binstr` varchar(128) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `binstr` (`binstr`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Notes:
- The bitmask is assumed by the length of the binstr string.
- Not space efficient. For example, storing 2001::1 in binary will require all 128 bytes.
- Maybe you don’t care how much disk space it consumes. You’re only going to store 1 million entries, and 100 to 150 MB is not a concern to you for disk or memory (you want the indexes to fit in memory).
- This is much better if you want to test an IP against the table. In other words, if you want to test if 2001:1::1/128 is covered by any ip/bitmask already stored in the table, you could do it like this:
- convert compare_ip to string that represents the binary conversion
- compare_bitmask = compare_bitmask specified ? compare_bitmask : 128
- $sql = “SELECT * FROM ipbinary WHERE 1!=1 “;
for ( $j=1; $j<=$compare_bitmask; $j++)
{
$sql .= ” OR binstr=SUBSTR(‘$binip’, 1, $j)”;
}
NOTE: yes, creates a big query, but limited to 128 conditions max. Hard for human comprehension, but easy for MySQL to handle. This is a fast query, even on a huge table, as long as your indexes are loaded in memory. Key point, does not require a full table scan to find the matching rows, it’s index friendly. - If you get a row from the query, it’s in the table.
Finally, I created a hybrid, that stores each section of the hex-based IP as a decimal:
CREATE TABLE `ipsmallint` (
`id` bigint(22) unsigned NOT NULL auto_increment,
`bitmask` enum(’16′,’32′,’48′,’64′,’80′,’96′,’112′,’128′) NOT NULL default ’128′,
`block1` smallint(5) unsigned NOT NULL default ’0′,
`block2` smallint(5) unsigned NOT NULL default ’0′,
`block3` smallint(5) unsigned NOT NULL default ’0′,
`block4` smallint(5) unsigned NOT NULL default ’0′,
`block5` smallint(5) unsigned NOT NULL default ’0′,
`block6` smallint(5) unsigned NOT NULL default ’0′,
`block7` smallint(5) unsigned NOT NULL default ’0′,
`block8` smallint(5) unsigned NOT NULL default ’0′,
PRIMARY KEY (`id`),
UNIQUE KEY `block` (`block1`,`block2`,`block3`,`block4`,`block5`,`block6`,`block7`,`block8`),
KEY `bitmask` (`bitmask`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Notes:
- Only use this method if it’s acceptable to limit your bitmasks to ’16′,’32′,’48′,’64′,’80′,’96′,’112′,’128′
- A hex address is 8 blocks of hex numbers, like this: FFFF:AAAA:0000:0000:BBBB:BBBB:CCCC:55555. The max value for each block is FFFF or 65535.
- In MySQL, a smallint is 2 bytes, unsigned, holds a max size of 65535. Magic.
- To store an IP in this table, you take each block, convert to decimal, and store in the appropriate block. If you want to store FFFF::/16, you put 65535 in block1, the rest get 0.
- This is a fixed table size. Every row is 8 bytes bigint + 1 byte for the enum + 2 bytes for each smallint. 25 bytes per row. This is the most disk space efficient method I can think of. The ip part of the table is 8*2bytes = 16 bytes. That’s exactly how many bytes are required to store a 128-bit number.
- You can a similar technique if you want to test an IP against the table. In other words, if you want to test if 2001:1::1/128 is covered by any ip/bitmask already stored in the table, you could do it like this:
- Uncompress compare_ip and explode on the you’ll have 8 parts.
- compare_bitmask = compare_bitmask specified ? compare_bitmask : 128
- $sql = “SELECT * FROM ipsmallint WHERE 1!=1″;
for ( $j=1; $j<=$bitmask/16; $j++)
{
$sql .= ” OR (bitmask = ‘”.($j*16).”‘”;
for ( $k=1; $k<=$j; $k++)
$sql .= ” AND block$k = “.hexdec($parts[$k-1]);
$sql .= “)”;
}
NOTE: smaller query, limited to 8 conditions. Again, fast query, even on a big table.- If you get a row from the query, it’s in the table.
- Limitation to all 3 tables: BIGINT is 8 bytes, or 64-bits. You’re limited to 2^64 entries, half of the IP space available in IPv6. You’ll have serious problems with MySQL long before you get to 2^64 rows anyway, so not much of a concern, just worth a note.
In my experiments, I injected ~800 thousand random IPs into ipsmallint and only ~60 thousand rows into ipbinary. Compare IP time was ~1000 lookups per second against ipsmallint, and about 200 per second against ipbinary. ipbinary consumed 3 times as much disk space per row (51975 rows consumed 4707568 bytes. ipsmallint was 801171 rows consuming 20830446 bytes). In conclusion, I think the long, up to 128 conditions, lookup query really does have a heavy factor. I did an EXPLAIN on one of the long ipbinary queries, it showed:
+—-+————-+———-+——-+—————+——–+———+——+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+———-+——-+—————+——–+———+——+——+————-+
| 1 | SIMPLE | ipbinary | range | binstr | binstr | 130 | NULL | 98 | Using where |
+—-+————-+———-+——-+—————+——–+———+——+——+————-+
Doesn’t seem like it should take so long. Maybe I made a mistake, it is late, and I’m tired. OR, maybe the theory about indexes being more efficient in fixed length tables, such as ipsmallint, is true, and more so than I originally thought!
One other interesting topic, AAAA records in a MyDNS system. Is ipshort or ipsmallint method better for disk space efficiency? ipshort is up to 40 bytes (39 + 1 byte to store the length). ipsmallint method is always exactly 16 bytes. Let’s see, the google example 20 bytes + 1 to store the length. Let’s just say that’s the average, 21. We get a 23.8% savings in disk space by using ipsmallint. Another way to express is it requires 31.25% more disk space to use the ipshort method. Of course, there is a trade-off to using ipsmallint – the DNS system now must convert the 8 blocks to hex, assemble the string, and convert to short notation. This requires processing power, and could reduce the efficiency/scalability of the DNS application servers. The question is, by how much, and is it worth the trade off in disk space savings?
DaveK
Yes, definitely a single query with 128 “=” conditions will run a lot slow, than just 8 conditions with the ipsmallint method. Index lookups are fast, but they still add up.
There is another method though: use DECIMAL and store the entire IP as an integer. IPv6 IPs can range from 0 to 340282366920938463463374607431768211455 when expressed as an integer. You can store that in a DECIMAL(39) column. Mask lookups will still involve using 128 lookups (or 8 if you want to limit yourself). However, the advantage of DECIMAL(39) is that you can do range lookups.
mysql> show create table address;
+———+———————————————————————————————————————————————————————————————+
| Table | Create Table |
+———+———————————————————————————————————————————————————————————————+
| address | CREATE TABLE `address` (
`id` int(10) unsigned NOT NULL auto_increment,
`ip` decimal(39,0) default NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=10 DEFAULT CHARSET=latin1 |
+———+———————————————————————————————————————————————————————————————+
1 row in set (0.00 sec)
mysql> Insert into address set ip= CONV(‘ffff’, 16, 10) * POW(65535, 7) +
-> CONV(‘ffff’, 16, 10) * POW(65535, 6) +
-> CONV(‘ffff’, 16, 10) * POW(65535, 5) +
-> CONV(‘ffff’, 16, 10) * POW(65535, 4) +
-> CONV(‘ffff’, 16, 10) * POW(65535, 3) +
-> CONV(‘ffff’, 16, 10) * POW(65535, 2) +
-> CONV(‘ffff’, 16, 10) * POW(65535, 1) +
-> CONV(‘ffff’, 16, 10) * POW(65535, 0);
Query OK, 1 row affected, 1 warning (0.07 sec)
mysql> Insert into address set ip= CONV(‘ffff’, 16, 10) * POW(65535, 7) +
-> CONV(‘ffff’, 16, 10) * POW(65535, 6) +
-> CONV(‘ffff’, 16, 10) * POW(65535, 5) +
-> CONV(‘ffff’, 16, 10) * POW(65535, 4) +
-> CONV(‘ffff’, 16, 10) * POW(65535, 3) +
-> CONV(‘ffff’, 16, 10) * POW(65535, 2) +
-> CONV(‘ffff’, 16, 10) * POW(65535, 1) +
->
-> CONV(‘fffe’, 16, 10) * POW(65535, 0);
Query OK, 1 row affected, 1 warning (0.00 sec)
mysql> select * from address;
+—-+—————————————–+
| id | ip |
+—-+—————————————–+
| 10 | 340246022585899900000000000000000000000 |
| 11 | 340246022585899900000000000000000000000 |
+—-+—————————————–+
2 rows in set (0.00 sec)
Question:
What’s wrong with this picture?
Answer:
The calculated values are identical, but they are not from the same IP. In fact the values inserted are not the correct decimal representation. This is due to MySQL losing precision in the POW calculation.
This is why:
mysql> select POW(65535, 7);
+——————–+
| POW(65535, 7) |
+——————–+
| 5.191742286784e+33 |
+——————–+
1 row in set (0.00 sec)
So, DECIMAL(39) will work great for storing, and even does good on math such as where ip > value1 and ip < value2. Just have to do the IPv6 to decimal mathematics outside of MySQL.
If someone can show a reliable method for converting an IPv6 IP to a decimal value using MySQL, that would be awesome!
What about
mysql> SELECT CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,7) AS DECIMAL(39,0)) +
-> CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,6) AS DECIMAL(39,0)) +
-> CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,5) AS DECIMAL(39,0)) +
-> CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,4) AS DECIMAL(39,0)) +
-> CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,3) AS DECIMAL(39,0)) +
-> CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,2) AS DECIMAL(39,0)) +
-> CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,1) AS DECIMAL(39,0)) +
-> CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,0) AS DECIMAL(39,0));
+—————————————————————————————————————————————————————————————————————————————————————–+
| CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,7) AS DECIMAL(39,0)) +
CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,6) AS DECIMAL(39,0)) +
CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,5) AS DECIMAL(39,0)) +
CAST(CONV |
+—————————————————————————————————————————————————————————————————————————————————————–+
| 340209681578175878893739963388222212325 |
+—————————————————————————————————————————————————————————————————————————————————————–+
1 row in set (0.00 sec)
mysql> SELECT CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,7) AS DECIMAL(39,0)) +
-> CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,6) AS DECIMAL(39,0)) +
-> CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,5) AS DECIMAL(39,0)) +
-> CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,4) AS DECIMAL(39,0)) +
-> CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,3) AS DECIMAL(39,0)) +
-> CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,2) AS DECIMAL(39,0)) +
-> CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,1) AS DECIMAL(39,0)) +
-> CAST(CONV(‘fffe’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,0) AS DECIMAL(39,0));
+—————————————————————————————————————————————————————————————————————————————————————–+
| CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,7) AS DECIMAL(39,0)) +
CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,6) AS DECIMAL(39,0)) +
CAST(CONV(‘ffff’,16,10) AS DECIMAL(39,0)) * CAST(POW(65534,5) AS DECIMAL(39,0)) +
CAST(CONV |
+—————————————————————————————————————————————————————————————————————————————————————–+
| 340209681578175878893739963388222212324 |
+—————————————————————————————————————————————————————————————————————————————————————–+
1 row in set (0.00 sec)
Of course a typo has ocurred:
POW(65534… must be changed to POW(65535…
Why do you folks insist on calling it “an IP?” No, it is “an address” for starters…or if you want to abbreviate it, “addr.” The most bizzarre thing I see above is “IPv6 IPs”…gee…how is that so different than “IPv6 addr?”
Anyhow…you are also confusing storing an address with storing a network specification. A network specification is what has a bitmask length. If, as you say, you’re looking for specific hosts in a table, it’s either there, or it’s not; or it is between two values where you can find the minimum and maximum address values given an address and a bitmask, and query for (valu >= min and valu <=max). In other words, you should be doing the masking BEFORE any query.
If a single integer data type isn't big enough, your best bet for lookup/indexing efficiency is probably normalization of all the addresses, filling in the missing zeroes from the compressed notations.
An index on the addresses will presumably help. Yes, it probably makes sense also to store the bitmask lengths as well with each address, just that forming any aggregation of the two (such as a unique key or index on both the address and the mask) is rather unlikely methinks to be useful. It also seems rather less than useful above to use bitmask length as a key.
Heck…I'd think the best example of IPv6 storage and lookup efficiency could be found in the Linux or *BSD kernel sources where the routing tables are maintained and accessed. In essence, that code must make decisions on which table entry to use thousands of times every second in some cases.