Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: get rid of fp arithmetic in ParseIPv4Host #46326

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 7 additions & 11 deletions src/node_url.cc
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
#include "util-inl.h"

#include <algorithm>
#include <cmath>
#include <cstdio>
#include <numeric>
#include <string>
Expand Down Expand Up @@ -477,18 +476,18 @@ void URLHost::ParseIPv4Host(const char* input, size_t length) {
const char* pointer = input;
const char* mark = input;
const char* end = pointer + length;
int parts = 0;
unsigned int parts = 0;
uint32_t val = 0;
uint64_t numbers[4];
int tooBigNumbers = 0;
unsigned int tooBigNumbers = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, how does someone choose between unsigned int and uint32_t?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste, basically. There's no difference unless you're targeting Watcom C++ for DOS, where sizeof(int) == 2.

Of course Real Programmers(TM) don't care for repeating themselves and just write unsigned without the int.


This method is really quite something. Tobias cleans it up but it still looks super complicated. The code below is untested and off the cuff but I think that is what ParseIPv4Host's logic reduces to. The only thing I'm not completely sure about is whether e.g. http://00000000000000000000/ (over 19 chars) is considered a valid input.

if (length > 19) return; // max in octal or hexadecimal
unsigned a, b, c, d, v, ndots = 0;
char s[20];
memcpy(s, input, length);
s[length] = '\0';
for (char* p = s; p = strchr(p, '.'); p++, ndots++);
switch (ndots) {
default:
  return;
case 0:
  if (1 != sscanf(s, "%u", &v)) return;
  break;
case 1:
  if (2 != sscanf(s, "%u.%u", &a, &b)) return;
  if (a > 255 || b > 0xFFFFFF) return;
  v = a << 24 | b;
  break;
case 2:
  if (3 != sscanf(s, "%u.%u.%u", &a, &b, &c)) return;
  if (a > 255 || b > 255 || c > 0xFFFF) return;
  v = a << 24 | b << 16 | c;
  break;
case 3:
  if (4 != sscanf(s, "%u.%u.%u.%u", &a, &b, &c, &d)) return;
  if (a > 255 || b > 255 || c > 255 || d > 255) return;
  v = a << 24 | b << 16 | c << 8 | d;
  break;
}
// parse okay, address in |v|

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr: personally, I read unsigned int as "some smallish non-negative integer", whereas uint32_t makes me question why this variable has to be exactly 32 bits.


Ad-hoc thought process:

Let's assume that there is no obvious type with better semantics for this use case (e.g., size_t). Otherwise, that type should likely be used instead.

❌ If the variable must have the same size across all platforms, uint32_t is the logical choice (even though ILP32, LLP64, and LP64 always use 32-bit unsigned int anyway, and only more exotic architectures such as ILP64 or SILP64 deviate).

❌ If the algorithm requires or benefits from a specific size, then uint32_t is again the logical choice. This is particularly important if the algorithm relies on unsigned integer overflows (which are well-defined, unlike signed overflows).

❌ If the use case requires a fixed minimum size, then uint_fast32_t can be useful. However, the implementation might end up quietly relying on a specific type underlying uint_fast32_t.

  • 64-bit architectures are likely to use 64-bit types for uint_fast32_t. The implementation might store values exceeding 32 bits in this variable, which will break if anyone tries to use the code on an architecture that uses a 32-bit type for uint_fast32_t.
  • Conversely, if an architecture uses a 32-bit type for uint_fast32_t, then assigning a uint_fast32_t value to a 32-bit variable works, but it will break if anyone tries to use the code on an architecture that uses a 64-bit type for uint_fast32_t.

❌ If the variable is mostly used to interact with some library, the type should match whatever the library uses. This is often true for OpenSSL, which for historic reasons frequently uses signed int values instead of semantically more appropriate types.

✔️ If none of that is true, and if the 16-bit minimum size of unsigned int is clearly sufficient, then unsigned int works just fine. Within Node.js, we can even rely on 32-bit unsigned int, but that's not necessarily true in general, e.g., on AVR.

Of course, following this argument, unsigned short would work just as well since it is also guaranteed to have a minimum size of 16 bits. However, aside from the (potentially) smaller size when allocating large arrays (e.g., unsigned short[16 * 1024] may be smaller than unsigned int[16 * 1024]), unsigned short has no benefit over unsigned int. It may even be slower since modern CPUs prefer sizeof(unsigned int) or sizeof(unsigned long) registers, and may have to mask the upper parts of those registers for computations on unsigned short. In fact, uint_fast16_t usually is either the same as uint32_t or uint64_t.


Regardless, what is important to me is the signedness of these variables. I know that this is not a widespread opinion, but to me, "signedness correctness" is almost as important as const correctness.

if (length == 0)
return;

while (pointer <= end) {
const char ch = pointer < end ? pointer[0] : kEOL;
int64_t remaining = end - pointer - 1;
if (ch == '.' || ch == kEOL) {
if (++parts > static_cast<int>(arraysize(numbers))) return;
if (++parts > arraysize(numbers)) return;
if (pointer == mark)
return;
int64_t n = ParseIPv4Number(mark, pointer);
Expand All @@ -510,18 +509,15 @@ void URLHost::ParseIPv4Host(const char* input, size_t length) {
// If any but the last item in numbers is greater than 255, return failure.
// If the last item in numbers is greater than or equal to
// 256^(5 - the number of items in numbers), return failure.
if (tooBigNumbers > 1 ||
(tooBigNumbers == 1 && numbers[parts - 1] <= 255) ||
numbers[parts - 1] >= pow(256, static_cast<double>(5 - parts))) {
if (tooBigNumbers > 1 || (tooBigNumbers == 1 && numbers[parts - 1] <= 255) ||
numbers[parts - 1] >= UINT64_C(1) << (8 * (5 - parts))) {
return;
}

type_ = HostType::H_IPV4;
val = static_cast<uint32_t>(numbers[parts - 1]);
for (int n = 0; n < parts - 1; n++) {
double b = 3 - n;
val +=
static_cast<uint32_t>(numbers[n]) * static_cast<uint32_t>(pow(256, b));
for (unsigned int n = 0; n < parts - 1; n++) {
val += static_cast<uint32_t>(numbers[n]) << (8 * (3 - n));
}

value_.ipv4 = val;
Expand Down