Module talk:IP

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Plan[edit]

A discussion at WP:Lua (permalink) led to the creation of this module with a plan for a general-purpose IP address handler. I pointed out that Module:IPblock (which implements {{blockcalc}}) includes functions that can extract IPv4 and IPv6 IPs from text arguments, and convert IPs for text display, and can do various calculations including determining the best available CIDR block summary of given IPs.

Apart from the advantage of having a general-purpose IP handler, the immediate requirement was for a module that could generate the list at {{Sensitive IP addresses}} and output identical information for use by MediaWiki:Group-sysop.js.

I am contemplating how functions from Module:IPblock might be used here, and these are my preliminary thoughts. General purpose handling of CIDR blocks is tricky and I built in a restriction to make implementation easier. The module accepts individual IPs (IPv4 and IPv6), and CIDR blocks (IPv4 only, and only of form IP/n where n is 16 to 32 inclusive—that is, 65,536 addresses or fewer). The module puts the IPs in two lists, one for v4 and one for v6, then processes each list. Each list is sorted and duplicate IPs are removed (that's helpful for the user and necessary for the method used to create summaries). Then summaries are generated and the results displayed in a table.

A key point is that each CIDR block is expanded to individual IPs: for example, 1.2.3.224/27 would be expanded to 32 individual addresses (1.2.3.4.224 to 1.2.3.4.255 inclusive). That is done because the method used to create summaries needs a list of individual IPs.

It would be possible to work with CIDR blocks without expanding them, but I suspect the code complexity would blow out, and it's already pretty fierce.

The outline in Module:IP is great, and I initially contemplated something like that, but pragmatism overtook me and I did what was needed for {{blockcalc}}. A more general module would be good, but some of the proposed methods may be difficult to implement. I'm thinking of IPBlock removeIPs—if the input is 1.2.3.224/27 and 1.2.3.243 is removed, the result is 31 IPs which can be summarized as four CIDR blocks and an IP, as shown by blockcalc (this is the result of passing the 31 IPs to {{blockcalc}}):

Sorted 31 IPv4 addresses:

1.2.3.224 – 1.2.3.242
1.2.3.244 – 1.2.3.255
Total
affected
Affected
addresses
Given
addresses
Range Contribs
32 32 31 1.2.3.224/27 contribs
31 16 16 1.2.3.224/28 contribs
2 2 1.2.3.240/31 contribs
1 1 1.2.3.242 contribs
4 4 1.2.3.244/30 contribs
8 8 1.2.3.248/29 contribs

The module would either have to keep a list of each individual IP (what Module:IPblock does), or maintain a list of blocks—possible but awkward.

Before starting the implementation, can we discuss the plan. Are addIPs and removeIPs wanted? For any number of IPs?

My pragmatic side also wonders whether the implementation effort would be worthwhile—would it actually be used? The requirement for sensitive IPs probably does not need the generality, and it might not need a module that can process IP addresses at all (although I haven't looked at what is needed for the js output). Johnuniq (talk) 11:03, 14 July 2016 (UTC)[reply]

@Johnuniq: Thank you for taking the time to look at this. In terms of functionality for the JavaScript side of things, we need to know the following:
  • Whether the IP an admin is about to block is in a sensitive range
  • Whether the IP range that an admin is about to block overlaps with a sensitive range
  • A description of the sensitive range
  • A list of CIDR blocks in the sensitive range
Only the first two need to involve the library, I think - we would need to know whether an IP is in a given range (that is IPBlock's obj:containsIP), and whether two IP blocks overlap with each other. The rest can be done by Module:Sensitive IP addresses.

Thank you also for the demonstration of {{blockcalc}}'s functionality. That is even more impressive than I first thought! I am in two minds about the direction we should go in now that I know more about what would be involved in moving functionality from Module:IPblock to this library.

  • Option 1: We keep the library small and only implement the functionality that is shared between the different modules. The IPBlock class does not store individual IPs, but only the prefix and bit length.
    • Pros: The code will be relatively simple, and can be tested and deployed quickly.
    • Cons: Not that much functionality is shared - possibly only the ability to parse IPv4 and IPv6 strings. Integrating the library into Module:IPblock may be more trouble than it's worth.
  • Option 2: We integrate more functionality from Module:IPblock into the library. Instead of one IPBlock class, we have two classes: IPBlock and Subnet. The IPBlock class stores individual IPs which can be added and removed, and can take a maximum of 65,536 addresses. The Subnet class is like the IPBlock class from option 1 - it only stores prefix and bit length.
    • Pros: It will be easier to integrate into Module:IPblock, and it wins a lot of geek credit.
    • Cons: The codebase will be much more complicated. Although much of the code exists in Module:IPblock already, moving it may take some serious development effort, and testing/deployment may be slow.
After typing all of that out, I think the way forward has become clearer. At first I think we should implement the IPAddress and Subnet classes, after which I can write Module:Sensitive IP addresses, and then later if we deem it is necessary we can implement the IPBlock class. (Perhaps if someone wants to write another module that needs similar functionality.) Perhaps we should choose different names, as well - something like IPCollection may be better than IPBlock at indicating that objects will hold individual IPs and can be divided up into smaller blocks. I'll have a think about all of this and sketch out another outline for the interface maybe tomorrow. — Mr. Stradivarius ♪ talk ♪ 14:27, 14 July 2016 (UTC)[reply]

I added some functions to make IPAddress nearly complete. I just plonked them in with minimal changes to work with the outline—feel free to massage them for style. The following would work.

IPAddress = require('Module:IP').IPAddress
ip1 = IPAddress.new(' 1.2.3.4 ')
ip2 = IPAddress.new('12.34.56.78')
ip3 = IPAddress.new(' 01:002:3:0004:: ')
ip1 < ip2       -- true
tostring(ip1)   -- '1.2.3.4'
tostring(ip3)   -- '1:2:3:4::'

I'll probably be able to do most of the other stuff in the next two or three days. Please see the "notes" section near the top. Johnuniq (talk) 12:36, 15 July 2016 (UTC)[reply]


@Mr. Stradivarius: I changed getIP() to use the new tostring code. That means it returns the normalized string representation of the IP address which may be different from the text that the caller provided. I suspect that is a good idea, but the reason I did it was to make it easier to implement getNextIP and getPreviousIP. It's easier because a new IP can now be constructed using a given table of numbers. It is only possible for methods to do that—a client program does not have access to the IPAddress uniqueId variable and so cannot use the trick employed by getNextIP. Please let me know if there is a better way of integrating the new code.

Please see the TODO items. Fix the style now if wanted, or do it later. What I would like is some thoughts on the first TODO. The points there need to be fixed, and isInSubnet needs to be implemented when Subnet is fixed. Then IPAddress is complete...I think.

I saw Module:IP/testcases but I also created Module:Sandbox/Johnuniq/ip for temporary testing. See the results at Module talk:Sandbox/Johnuniq/ip. Johnuniq (talk) 07:41, 16 July 2016 (UTC)[reply]

Groan, I need watching. Obviously clients do have access to IPAddress! I changed it to uniqueId which is definitely private. I haven't uploaded that yet, and there may be some clever way to avoid the kludge. Johnuniq (talk) 03:16, 17 July 2016 (UTC)[reply]
Sorry for not replying sooner - it's been a busy couple of days. First off, I agree that getIP() should return the normalized string representation of the IP. I expect that keeping the output in a predictable format will be useful for reusers of the library, and if they need their original input string they can always keep it in a variable somewhere. As for getIPParts(), that is a bit ugly, yes. Ideally we shouldn't have to make implementation details like that public. I will have a think whether there is any better way of doing it, but I can't think of any right now. Moving the metatable definition inside IPAddress.new seems like it would be a good idea, but it has the drawback that mt.__eq would no longer work - __eq requires that both objects being compared use the exact same metatable. Let me ponder this one a bit longer. I'll also have a go at making the style more uniform. — Mr. Stradivarius ♪ talk ♪ 12:15, 17 July 2016 (UTC)[reply]
Actually, I see I have that wrong. __eq requires that the function be the same function object for both objects being compared - the metatables themselves can be different. Let me think some more about this. — Mr. Stradivarius ♪ talk ♪ 13:17, 17 July 2016 (UTC)[reply]
Ok, I found a way that works. If we use a unique key and set an __index metamethod inside IPAddress.new, then we can allow access to the data upvalue for any code that knows the unique key. Code that can't access the unique key can't get to the data table, even with pairs or getmetatable. Untrusted code could break the metamethods by setting a new metatable, but that still doesn't allow access to the data. — Mr. Stradivarius ♪ talk ♪ 13:55, 17 July 2016 (UTC)[reply]
The new dataKey is very nice, although it will be a couple of hours before I can do serious thinking and work out exactly how it is being used. Have you finished working on the module for the moment? It is missing uniqueId (which some of the code still uses—possibly when I think about it I'll see what you planned). Also, getIPParts is still in the code and the fact it is no longer defined is probably the reason for the error currently showing at Module talk:Sandbox/Johnuniq/ip. Johnuniq (talk) 23:56, 17 July 2016 (UTC)[reply]

Subnet[edit]

I'm wondering about addIPs(...) and addIPsFromString(s). Are those names from the original idea? I can't see how they relate to my understanding of Subnet. Also, what is "options" in Subnet.new(options)? I would have thought that Subnet.new(ip) was wanted, where ip is a CIDR string like "1.2.3.0/24". And addIPs and addIPsFromString would be removed? If what I've suggested is all that is needed for Subnet, I can probably get it working in a day or so. At some later time we could look at addIPsFromString and others. Johnuniq (talk) 09:47, 16 July 2016 (UTC)[reply]

The idea behind addIPs(...) was that you could do something like addIPs('1.2.3.0', '1.2.3.1', '1.2.3.255'), and that the prefix and bit length would be recalculated to incorporate the new IPs if they were not in the subnet already. addIPsFromString(s) was going to be the same thing, but from a string, like addIPsFromString('1.2.3.0 1.2.3.1 1.2.3.255'). addIPs(...) would also accept IPAddress objects as well as strings. Perhaps we don't really need this functionality, but then again perhaps implementing it wouldn't be all that much hassle. I'm in two minds about it. The "options" in Subnet.new(options) is a table of named arguments. My original idea was to allow Subnet.new('1.2.3.0/24'), Subnet.new('1.2.3.0', '1.2.3.1', '1.2.3.255') and Subnet.new('1.2.3.0 1.2.3.1 1.2.3.255'). If you allow the latter, though, the module has to know whether you want IPv4 addresses or IPv6 addresses, in order to parse something like Subnet.new('1.2.3.0 01:002:3:0004:: 1.2.3.255'). So I thought that maybe we needed something like Subnet.new{ IPs = '1.2.3.0 01:002:3:0004:: 1.2.3.255', version = 'IPv4' }. Perhaps I am trying to make things too complicated, however. — Mr. Stradivarius ♪ talk ♪ 12:36, 17 July 2016 (UTC)[reply]
I just updated Module:IP and Module:Sandbox/Johnuniq/ip (with its results at Module talk:Sandbox/Johnuniq/ip). It's mostly done, but very rushed and I'll be checking/refactoring it. Thanks for the replies but I have to go and can't contemplate them now. Back later. Johnuniq (talk) 13:02, 17 July 2016 (UTC)[reply]

If the functions are needed, they can readily be implemented using the code from Module:IPblock which calculates the number of common prefix bits shared by a list of IPs. One issue is that these functions mean a subnet is mutable—I guess that's ok, but care is needed in the implementation to ensure that all required fields are set. Another possibility would be to have different constructors (don't know how to do that in a pure way!)—a subnet could be constructed from a CIDR string, or constructed from a list of IP strings, or constructed from the union of two subnets. Johnuniq (talk) 08:11, 18 July 2016 (UTC)[reply]

Internal IP representation[edit]

Something I've been wondering about the internal representation of the IPs - why did you make the bit length of each part 16 bits rather than 32? Lua's double precision floats should be able to precisely represent integers between about -2^53 and 2^53 (I forget the exact cutoff), so an IPv4 address should fit inside one Lua number, and an IPv6 address should fit inside a table of 4 Lua numbers. — Mr. Stradivarius ♪ talk ♪ 14:31, 17 July 2016 (UTC)[reply]

I think side-by-side comparisons would be needed to see what was best in practice. However, I chose 16 bits partly for convenience because it makes working with IPv6 easier—a bunch of calculations would be needed to convert the internal representation to-and-from text if 32 bits were used. A second point which weighed more heavily was a hunch that the code emulating bit-wise operations is more efficient working 16 bits at a time. See copyPrefix and setHostBits. Also, the function common_length from Module:IPblock is used frequently to determine the number of prefix bits that two IPs have in common, and it is able to step towards the answer using 16 bits at a time—if the answer is, say 18 bits, it does a very simple calculation for the first 16 bits, then needs only to determine that the next 2 bits are common. I suspect it would be a little less efficient if it were working 32 bits at a time. Johnuniq (talk) 23:39, 17 July 2016 (UTC)[reply]
Thanks for the explanation - that makes a lot of sense. If we really need to we could do some side-by-side comparisons, but my guess is that we won't need to. — Mr. Stradivarius ♪ talk ♪ 02:28, 18 July 2016 (UTC)[reply]

Memory efficiency[edit]

I've been testing how many IPAddress objects can fit into memory. Perhaps unsurprisingly, it turns out that you can fit a lot less objects in memory when you put the methods directly inside the constructor. In the version with the methods inside the constructor, I managed to fit 19,456 objects into memory before getting the Scribunto "not enough memory" error. When I moved the methods out of the constructor but still kept the data table private using __index, I managed to increase that number to 39,168. And when I moved the metatable out of the constructor and made the data public (with underscores), I managed to fit 65,536 objects in memory. I know that premature optimisation is the root of all evil, so I'm going to resist the temptation to make the data public and use underscores. However, I can't think of any major disadvantages of switching IPAddress to the second style. What do you think? — Mr. Stradivarius ♪ talk ♪ 02:21, 18 July 2016 (UTC)[reply]

Actually, I've just thought of one - with the methods in the constructor we can use checkSelf to check whether callers are using the colon syntax correctly, but that isn't possible unless the object is available as an upvalue in the methods. — Mr. Stradivarius ♪ talk ♪ 02:32, 18 July 2016 (UTC)[reply]
I'll need some time to digest the various trials you mentioned. It is a shame that upvalues apparently consume 36 bytes each. It looks like it is not possible to have purity + Lua + Scribunto memory limits. I am not recommending that you look at Module:Date but as a matter of interest it uses what seemed a reasonable idea, although it has very little purity. The Date function returns a simple date table with a metatable. I wanted a date to be immutable because it's hard to support changing a date with all its weird internals, yet it's easy to construct a date (if wanted, by varying another date). Therefore I used a trick to make the date object read-only (search for "readonly" to see it—the date information is in the newdate table). I re-used a sandbox module (Module:Sandbox/Johnuniq/testpre) to construct a table of 36,000 dates. Its Lua memory usage is 47.59 MB/50 MB (the results are on its talk).
Module:Date has an is_date function which it uses instead of checkSelf. A client is supposed to do something like the first two of the following lines. The third line produces the error shown.
local date = Date('2016-07-18')
local s = date:text()

local s = date.text()
date:text: need a date (use "date:text()" with a colon)
Johnuniq (talk) 06:00, 18 July 2016 (UTC)[reply]

IPAddress's getHighestIP and getPrefix[edit]

I think we are breaking the IP address metaphor by having getHighestIP and getPrefix methods in IPAddress. An IP address doesn't have a highest IP, for example - it is only one IP. If you wanted to find the highest IP in a subnet from a given IP address, then from an interface point of view ip:getSubnet(bitLength):getHighestIP() would make more sense than ip:getHighestIP(bitLength). Looking at how you have put together the subnet class, though, I can see that just removing those methods is going to create problems. It looks like we need some way to make a new IPAddress object directly from the IP parts, rather than only through IPAddress.new. I am thinking that we need a newIPAddress function that is called by IPAddress.new and is accessible from Subnet, but is not accessible to callers. Will think some more about this. — Mr. Stradivarius ♪ talk ♪ 03:22, 18 July 2016 (UTC)[reply]

Yes, makeSubnet needs a way to construct an IP from the IP's parts. I had better leave doing more edits until you have had a chance to decide on the design. If you wanted to take a break, I would have a look at what newIPAddress might look like, but I would need a couple of days in order to ensure some quiet thinking time. While what you say is correct, from the point of view of makeSubnet, it has an IP object and it needs to construct another IP based on the first. In summary:
local base = IPAddress.new(lhs)         -- the IP entered in "IP/n"
local prefix = base:getPrefix(n)        -- IP from clearing host bits in base
data.highestIP = base:getHighestIP(n)   -- IP from setting host bits in base
Johnuniq (talk) 08:04, 18 July 2016 (UTC)[reply]

Subnet dataKey[edit]

It seems Subnet now has no need for dataKey. Instead, its metatable could be used to verify that an argument is a Subnet object. Following is an outline of what might be done.

-- Subnet class
-- Instead of
    local dataKey = {}
-- could have
    local mt = {
        __eq = function (self, obj)
            return self:getCIDR() == obj:getCIDR()
        end,
        __tostring = function (self)
            return self:getCIDR()
        end,
    }
-- and change isSubnetObject to be
    local function isSubnetObject(val)
        return type(val) == 'table' and getmetatable(val) == mt
    end
-- and use mt
    return setmetatable(obj, mt)

Just a thought! Johnuniq (talk) 10:43, 21 July 2016 (UTC)[reply]

Good idea! I'll do that then. In fact, we don't even need to check type(val) == 'table', because getmetatable will just return nil for non-tables. — Mr. Stradivarius ♪ talk ♪ 13:53, 21 July 2016 (UTC)[reply]
Hmm. I just thought of a drawback with this scheme - if untrusted code sets a new metatable on a Subnet object then they could affect the function of isSubnetObject and maybe in turn affect the internals of Subnet and IPAddress. Maybe there's something we can do with __metatable to fix that, but maybe a unique upvalue is better. — Mr. Stradivarius ♪ talk ♪ 14:19, 21 July 2016 (UTC)[reply]
Interesting...I have read that getmetatable stuff in the reference manual but it never occurred to me that it meant what it said, namely that getmetatable(x) is always valid and returns nil unless x is a table with a metatable.

I wouldn't worry so much about what an untrusted client might do—it's not as if it can elevate privileges and use Module:IP to do something it couldn't already do. However, I agree that if a simple procedure to guarantee the object is a Subnet is available, it should be used. To subvert the code I pasted above, a client would have to get a Subnet object, then use getmetatable to get mt, then set that on a new table, then pass it to a Subnet method in the hope of crashing Module:IP. A Lua script would generally not care about such things. You might consider getting rid of the mt variable as it is not needed with your uniqueKey approach.

Anyway, the current module is beautiful, and congratulations on the excellent work in getting clean classes! Let me know if there are any IP or subnet features needed at the moment. Johnuniq (talk) 01:07, 22 July 2016 (UTC)[reply]

(edit conflict) Thank you for the kind words. You're probably right that we shouldn't worry about untrusted clients too much, but setting __metatable turned out to be an easy fix, so I don't think there will be any problems with just using that. We do need to keep either the mt variable or a function that tests for subnet equality as an upvalue, though - the __eq function needs to be the exact same one for both objects being tested, not just an equivalent anonymous function. I've had a few ideas for new methods that we might implement, but I'm not sure how useful they will be. In no particular order, maybe we could have methods for:
  • Getting the number of possible hosts in a subnet
  • Getting the number of host bits
  • Determining whether two subnets are adjacent
  • Iterating over smaller subnets of a subnet
  • Getting the next subnet of the same bit length
  • Creating a subnet object from two or more IP addresses (a constructor, not a method)
  • Expanding a subnet given one or more IP addresses
  • Expanding a subnet given one or more subnets
We should probably only implement any of those if they might actually be used somewhere though. If you have any other ideas, I'm all ears! — Mr. Stradivarius ♪ talk ♪ 12:08, 22 July 2016 (UTC)[reply]

See function mischief() in Module:Sandbox/Johnuniq/ip for a demo showing that validateSubnet can be abused. The result is at the bottom of its talk. I wouldn't worry about making it bulletproof. Johnuniq (talk) 10:55, 22 July 2016 (UTC)[reply]

RawIP.newFromIPv6 bug[edit]

The test cases brought up a bug in RawIP.newFromIPv6 - it turns out that IPAddress.new(':::') is being treated as valid. (Remind me to somehow get a better message from ScribuntoUnit than "failed to assert that true is false", by the way.) It's getting too late here for debugging complex string parsing - could you take a look at it if you have a chance? — Mr. Stradivarius ♪ talk ♪ 15:15, 22 July 2016 (UTC)[reply]

I think I anticipated that, and RawIP.newFromIPv6 accepts ":::" as part of a "be liberal in what you accept" strategy. It will not accept anything that is ambiguous and if three colons means anything, it means the same as two colons.
When the input has exactly seven colons there is no ambiguity and the code accepts strings which an IPv6 authority would say were invalid. The following shows some inputs and the result of tostring(IPAddress.new(s)).
These inputs have exactly 7 colons.
Input             Result
0::2::4:5::7      ::2:0:4:5:0:7
::1:2:3:4::       ::1:2:3:4:0:0
::1:2:3:4:5:      ::1:2:3:4:5:0

These inputs have less than 7 colons.
Input             Result
::1:2:3:4:        ::1:2:3:4:0
::1:2:3::         '::1:2:3::' is an invalid IP address
The original code from Module:IPblock ensured that text parsed for an IP address did not contain any whitespace, and I see that RawIP.newFromIPv6 is too liberal in that it ignores whitespace around hex numbers (":: 1 : 2 : 3" is accepted for IP ::1:2:3).
By contrast, RawIP.newFromIPv4 checks that each component matches '^%d+$' because tonumber accepts things like '-1.5' and '1e3' and '0x1' for decimal numbers.
Perhaps newFromIPv6 should check for strict validity, and should reject all the inputs shown above? That just requires a couple of extra lines. Johnuniq (talk) 00:28, 23 July 2016 (UTC)[reply]
Thanks for the explanation - that does make a lot of sense. I think that whether we are strict about validating IPs depends on what other modules might use the library for. If modules are going to use IPAddress.new to test whether a string is a valid IP address, then allowing things like ":::" is going to cause problems for them. However, if modules are going to use the library to parse IPs hand-typed by end users, then being more liberal in what we accept may be an advantage. I think the former will probably happen more than the latter, so I would opt for strict validation, but then again my foresight isn't perfect. Or instead of choosing just one parsing mode we could enable some sort of "fuzzy input" option which could change the strictness of the parsing. — Mr. Stradivarius ♪ talk ♪ 01:08, 23 July 2016 (UTC)[reply]
I'll think about that, and will do something in the next few hours...actually may not have an opportunity for 30 hours.
I should add some more stuff from Module:IPblock, starting with the ability to extract IPs from arbitrary text. For IPv6, IPblock has this comment:
    -- Want to accept all valid IPv6 despite the fact that contributors will
    -- not have an address starting with ':'.
    -- Also want to be able to parse arbitrary wikitext which might use colons
    -- for indenting. To achieve that, if an address at the start of a line
    -- is valid, use it; otherwise strip any leading colons and try again.
I guess the same should be done in Module:IP?
You might add a skeleton function for extracting IPs from text, and I could fill in the implementation in the next couple of days. There is a problem about what it should return. IPblock parses the text twice and returns two lists, one with IPv4 and one with IPv6. This module could return one list and the client can work out which is which with getVersion. The list may also have a mixture of IPs and subnets—might be useful to have a property to distinguish between them. Johnuniq (talk) 01:25, 23 July 2016 (UTC)[reply]
It sounds like the function you describe should return an IPCollection object, which I was speculating about in the #Plan section above. Or maybe we could have separate IPv4Collection and IPv6Collection objects. Whichever we go with, IPCollection should store an arbitrary number of IP addresses and subnets, and should be able to do at least the following things:
  • Get the highest and lowest IP addresses in the collection
  • Find the smallest single subnet that all of the IP addresses and subnets fit in
  • Find the smallest possible set of IP addresses and subnets that fit the items in the collection exactly
  • Iterate over all the IP addresses in the collection
Given that none of these make much sense if IPv4 and IPv6 are mixed together, I'd be tempted to make different classes for each, perhaps with the underlying logic housed in a new internal RawIPCollection class. So instead of a function parsing a string twice and returning two separate lists, clients would parse the same string twice with IPv4Collection.newFromString(str) and IPv6Collection.newFromString(str) if they needed both IP versions. Or perhaps we can have just one IPCollection class and throw an error if clients try to use two different IP versions in the same object. — Mr. Stradivarius ♪ talk ♪ 08:55, 23 July 2016 (UTC)[reply]
Also useful for Module:Sensitive IP addresses might be to take a minimum and maximum IP, and return the smallest possible set of subnets that are contained between them. The reverse might be handy as well - take a group of subnets, and return a list of maximum and minimum IPs, taking into account whether any of the subnets are adjacent to any of the others and whether any of the subnets overlap. — Mr. Stradivarius ♪ talk ♪ 09:46, 23 July 2016 (UTC)[reply]
I haven't given it much thought, but why not IPCollection.newFromString(str, version) where version defaults to "IPv4" but may be "IPv6". It's not useful to parse arbitrary text and throw an error if it contains both v4 and v6. Re your first and last points in the four bullet points, I suppose you mean for the given IPs, rather than for all possible IPs that would fit in points 2 or 3. I know complex options rather contradict a clean class, but options that clients may want would be to have the list (collection) sorted, and to remove duplicates. Removing duplicates is a bit tricky (inefficent) if a list contains many subnets because each IP would have to be checked against each subnet, unless some heroic indexing scheme were devised. Rather than options, an IPCollection could have methods to sort it or clean it (remove duplicates). I should be able to implement the above over the next few days, or probably longer for point 3, particularly as I'll be busy next week. Point 3 is difficult—as discussed, Module:IPblock expands all subnets to individual IPs, then sorts and removes duplicates, then does weird things to get the results. I've been thinking that the essence of the algorithm that it uses could be expanded to work with a list of IPs and subnets, but the code would be very ugly compared with the elegance of the current Module:IP. Another point about Module:IPblock: when parsing text, it ignores items that it thinks are not intended as IPs, and it only throws an error if a CIDR block is invalid (host bits are non-zero). However, it shows a list of warnings about items which might have been intended as IPs. For example, "1.2..4" would be in the warnings list, but "1..3.4" would be ignored as it looks for <number><dot><number><more> for IPv4. The points in your second comment are more adventurous but no doubt achievable. Johnuniq (talk) 10:02, 23 July 2016 (UTC)[reply]

IPCollection[edit]

I'm posting to let you know I haven't been asleep. Things have got hectic here and will continue like that for a few days. I have been trying to get my head around a Lua class based on class.new(). I'm familiar with the concepts and read the Lua docs which use class:new() (colon—helps with inheritance) long ago. Anyway, I've now started but won't get around to posting code for a while.

Current plans:

  • Move Collection out of RawIP and make it consistent with class.new().
  • Make classes IPv4Collection and IPv6Collection derived from IPCollection.
  • IPCollection has collections, not is a collection.
  • Only IPv4Collection and IPv6Collection would be exported, but I don't plan to protect what they contain—I can't see the point of bloating the code to stop a client from breaking the collections.
-- Example.
local modcode = require('Module:IP')
IPv4Collection = modcode.IPv4Collection
IPv6Collection = modcode.IPv6Collection

local v4 = IPv4Collection.new()
local v6 = IPv6Collection.new()
v4:addFromString('any text 1.2.3.255 1.2.0.0/16 11.22.33.44 11.140.0.0/12')
v4:addFromString('more text with IP addresses')

-- Not sure about the following.
-- Currently, there are three lists (each a Collection):
--   .addresses = IPAddress objects
--   .subnets   = Subnet objects
--   .omitted   = strings of text that were close to valid but were not
for _, item in ipairs(v4.addresses) do
	print(item:getIP())  -- print = do something with the result
end
for _, item in ipairs(v4.subnets) do
	print(item:getCIDR())
end
-- Following would show 11.140.0.0/12 because it is not a valid subnet.
-- This seems better than throwing an error if the arbitrary text input
-- contains an error.
print('omitted: ' .. v4.omitted:join(' '))

I'm pretty sure I will replace .addresses and .subnets with one list (.items?) so it can be sorted. I'll need to add a .type property to IPAddress and Subnet, or I suppose :getType(). It would give 'IPAddress' or 'Subnet' so items in the one list can be easily identified. Also, will need to be able to compare IPAddress and Subnet objects for the sort.

Then there would be a method to determine if a given IPAddress or Subnet overlaps anything in the collection. Then more. Johnuniq (talk) 00:02, 27 July 2016 (UTC)[reply]

I updated Module:IP although I will refactor it because I'm sure one list of IPs/subnets would be better than two separate lists. I added function check_IPCollection to Module:Sandbox/Johnuniq/ip to exercise the new code. The results are at the bottom of Module talk:Sandbox/Johnuniq/ip. Johnuniq (talk) 12:30, 27 July 2016 (UTC)[reply]
Thank you for your work on this. I think you may be right that one list would be better than two separate lists. Having two separate lists has certainly increased the volume of code necessary to perform the checks in Module:Sensitive IP addresses. Maybe we can use just one class and allow clients to pass in a version parameter for methods that would otherwise be ambiguous, as with your suggestion of IPCollection.newFromString(str, version) above. So instead of IPv4Collection:getHighestIP(), we could have IPCollection:getHighestIP(version). — Mr. Stradivarius ♪ talk ♪ 13:44, 31 July 2016 (UTC)[reply]

Module:Sensitive IP addresses[edit]

You will see that I dumped some code in Module:Sensitive IP addresses despite the fact that I do not know how that module is intended to work. Will it be used interactively by an admin who is planning to block an IP or an IP range? In broad terms, how? Or, will the module provide the data to be used by some JavaScript for that interaction?

If the former, my code does a lot of the work. If the latter, I don't see why Module:IP needs to have clever methods to determine if an IP is covered by a collection of IPs. At any rate, I wanted to test Module:IP from the point of view of a client. Following are some thoughts from that.

By the way, I used sensitiveList = mw.loadData(modname) but require would be better because loadData has an overhead but would provide no benefit for this application.

I think Module:IP should have some generic features:

  • A function to parse a string and return a suitable object or nil if the string is invalid. The string could represent an IPv4 or IPv6 address, or an IPv4 or IPv6 subnet. See getObject in my code dump.
  • An IPCollection that holds both IPv4 and IPv6 collections. A client could add an IPv4/IPv6 address or subnet and the object would be put in the appropriate sublist. Also, a client could pass a string with a mix of IPv4 and IPv6.
  • I'm not sure that the IPv4Collection and IPv6Collection classes are needed. Instead, the client could use a generic form of IPCollection as in the previous point.
  • IPAddress and Subnet objects should have a method to check what they are. I'm thinking that obj:isSubnet() would be all that is needed.

Johnuniq (talk) 00:33, 29 July 2016 (UTC)[reply]

The scope of Module:Sensitive IP addresses is to serve as a list of sensitive IPs and IP ranges and the reasons why they are sensitive. A different module is necessary to "pass on" its content to Template:Sensitive IP addresses and the MediaWiki:Group-sysop.js. That is all I know, seeing as I am not a Lua person.Jo-Jo Eumerus (talk, contributions) 08:28, 29 July 2016 (UTC)[reply]
Thanks, I did not know about that js page, and it's obviously the target. Mr.S. seems to be busy elsewhere—I hope he is ok and returns soon, and he will clarify what each module should do. Johnuniq (talk) 11:45, 29 July 2016 (UTC)[reply]
Yep, I'm ok, but I've been a little busy over the last few days. Jo-Jo Eumerus is correct - the plan is to use data from Module:Sensitive IP addresses in MediaWiki:Group-sysop.js. That is activated on Special:Block, and pops up a notification if the IP or range that an admin is about to block is sensitive. The idea is to use the module to store the data for this script and for Template:Sensitive IP addresses (and for the table at WP:SIP), rather than having to maintain different forks of the data. The reason I was thinking of possibly adding the more complex functionality was so that we might be able to display ranges in x.x.x.x-y.y.y.y format, where x.x.x.x and y.y.y.y might not fit neatly into one CIDR block. (See the Government of Canada entry in Module:Sensitive IP addresses/list for an example of where that might be useful.) But this feature is not necessary for the module to work - it's just a "nice to have". And because we are able to massage the data into blocks that don't overlap each other and are in ascending order, it's possible to cheat and just check if the highest IP in one block is one lower than the lowest IP in the next block in order to build a x.x.x.x-y.y.y.y range. — Mr. Stradivarius ♪ talk ♪ 15:31, 29 July 2016 (UTC)[reply]

IP ranges[edit]

I added a Ranges class to Module:IP and a demonstration to Module:Sensitive IP addresses (the ranges are displayed at the top of its talk). It's tricky code but I think it will always show the minimal IP ranges that are equivalent to the collection. I'll probably take a break from this and wait to see if anything further is actually needed. Some of the points I mentioned above should probably be implemented, but there may not be much benefit as I'm not sure the code would be used. Johnuniq (talk) 07:52, 31 July 2016 (UTC)[reply]

That's pretty cool - thanks! I'll definitely use that when making the table for Template:Sensitive IP addresses. — Mr. Stradivarius ♪ talk ♪ 13:18, 31 July 2016 (UTC)[reply]

Add static Util class?[edit]

We have Module:IPAddress that can be used to check whether a given string is an IP address or CIDR, but I don't see much reason to prepare a separate module when we have Module:IP and the same can be done by its internal functions (Module:IPAddress is calling Module:IP in p._isIpOrRange). How about adding a static Util class to this module, with methods like Util.isIPAddress, Util.isIPv4Address, and Util.isIPv6Address, as in my draft in Module:IP/sandbox? I'll also leave a documentation here:

Util

The Util class provides (static) utility methods to work with single IP addresses and subnetworks.

removeDirMarkers

Util.removeDirMarkers(str)

Removes LEFT-TO-RIGHT MARK (U+200E), LEFT-TO-RIGHT EMBEDDING (U+202A), and POP DIRECTIONAL FORMATTING (U+202C) from a given string and returns a new string. For instance, the LEFT-TO-RIGHT MARK (U+200E) is a special character represented by a red dot in the wikitext editor. Template parameters can occasionally involve these special characters, and methods in this module throw an error if a string involving such characters is passed to them. removeDirMarkers prevents this issue. The methods in the Util class, listed below, all call this method internally at the very first step of their procedures.

isIPAddress

Util.isIPAddress(str, allowCidr, cidrOnly)

Returns true if a given string is a valid IP address, or false if not. If a CIDR string should be allowed, pass true to allowCidr, and if a CIDR string should only be allowed, pass true to cidrOnly. (Note that the value of allowCidr is ignored if cidrOnly is true.)

Examples:

Util.isIPAddress('1.2.3.4') -- true
Util.isIPAddress('1.2.3.0/24') -- false
Util.isIPAddress('1.2.3.0/24', true) -- true
Util.isIPAddress('1.2.3.4', nil, true) -- false
Util.isIPAddress('1.2.3.0/24', nil, true) -- true

In more detail, Util.isIPAddress returns 3 values: boolean, string, string/nil.

local isIp, input, corrected = Util.isIPAddress('1.2.3.4') -- true, "1.2.3.4", nil
local isCidr, input, corrected = Util.isIPAddress('1.2.3.0/24') -- false, "1.2.3.0/24", nil
local isCidr, input, corrected = Util.isIPAddress('1.2.3.0/24', true) -- true, "1.2.3.0/24", nil
local isCidr, input, corrected = Util.isIPAddress('1.2.3.4/24', true) -- true, "1.2.3.4/24", "1.2.3.0/24"

v[1] is the result of whether the input string is a valid IP address or CIDR, v[2] is the input string, and v[3] is a corrected CIDR string or nil. In the fourth example above, "1.2.3.0/24" is a CIDR for "1.2.3.0 - 1.2.3.255"; hence "1.2.3.4/24" is an inappropriate subnet and evaluated as a non-CIDR (for this reason Subnet.new('1.2.3.4/24') throws an error). Beware that Util.isIPAddress attempts to correct an inappropriate subnet to an appropriate one and then evaluates whether the input string is a CIDR, if either allowCidr or cidrOnly is true. v[3] is always nil if v[1] is false, and can have a string value only if v[1] is true. It is also important that if v[3] has a value, the input string in itself is not a valid CIDR.

-- Strict CIDR evaluation
local isCidr, _, corrected = Util.isIPAddress('1.2.3.4/24', true) -- true, "1.2.3.4/24", "1.2.3.0/24"
if corrected ~= nil then
    isCidr = false
end

isIPv4Address

Util.isIPv4Address(str, allowCidr, cidrOnly)

Returns true if a given string is a valid IPv4 address, or false if not. If a CIDR string should be allowed, pass true to allowCidr, and if a CIDR string should only be allowed, pass true to cidrOnly. (Note that the value of allowCidr is ignored if cidrOnly is true.) As well as Util.isIPAddress, Util.isIPv4Address returns 3 values.

isIPv6Address

Util.isIPv6Address(str, allowCidr, cidrOnly)

Returns true if a given string is a valid IPv6 address, or false if not. If a CIDR string should be allowed, pass true to allowCidr, and if a CIDR string should only be allowed, pass true to cidrOnly. (Note that the value of allowCidr is ignored if cidrOnly is true.) As well as Util.isIPAddress, Util.isIPv6Address returns 3 values.

I believe this is worth considering. Thanks. (The demo doc is trancluded from Module talk:IP/doc 20230319. Feel free to edit it if you want.) Dragoniez (talk) 23:00, 18 March 2023 (UTC)[reply]

I haven't done any testing but a reason for keeping Module:IPAddress is that it is very simple and has a very low overhead. By contrast, Module:IP does a lot of tricky stuff. It might not make any difference but it would be good to find where Module:IPAddress is used and see if there would be much difference. Would onlookers please suppress the desire to link to "we don't care about performance". That's true but why import an elephant when something small will do? Johnuniq (talk) 00:38, 19 March 2023 (UTC)[reply]
I'd say that Module:IP is a module for modules and Module:IPAddress is a module for templates because the first one doesn't have a package object but the second one does. I'm not saying we don't need Module:IPAddress but that it doesn't seem quite reasonable to me to require it from another module because that would involve sequential loading of multiple modules like caller => Module:IPAddress => Module:IP. It appears like there are 3 modules that call Module:IPAddress, and they all (only) use _isIpOrRange. Module:IPAddress is a simple and good module but I believe there are benefits for preparing the same functionalities in this Module:IP. People can grapple with the 3 return values of Util.isIPAddress but in most cases just local isIp = Util.isIPAddress(ip) will do, as simple as _isIpOrRange. Modules that already exist don't need to be modified because of this proposed update. I just think it'll be useful (and reasonable in terms of what I said above) if we had utility functions in this module, especially for any module that may be written in the future. Dragoniez (talk) 01:19, 19 March 2023 (UTC)[reply]
It's a long time since I thought about this module (I wrote most of the IP processing stuff years ago). It probably would benefit from a more helpful interface. I would rename removeDirMarkers to something like clean so it can be more generic and trim whitespace and anything else that should be removed. Someone with more energy than me should give serious thought to isIPAddress. Aren't there already similar functions with confusingly different procedures? It would be great to unify that or at least make it a little consistent (which functions regard a CIDR range as an IP address?). Rather than having parameters such as nil and true, it might be better to either pass strings where the text is a single word identifying the wanted operation, or pass an optional table (default, an empty table) with named parameters in the table. Johnuniq (talk) 03:56, 19 March 2023 (UTC)[reply]
clean is a nice idea. Concerning isIPAddress, I'd go for a fixed string for the second parameter (and maybe also for boolean, table for the return values). I used the IPAddress and Subnet classes for isIPAddress, but we should use more primitive functions for it, you say, right? That's not an easy task and I don't know if I'll be able to spare time for it... at least I think I'll have a closer look, though. Dragoniez (talk) 07:45, 19 March 2023 (UTC)[reply]