Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dhcp-relay: Format of the option82 field #5

Open
tohojo opened this issue Aug 18, 2021 · 29 comments
Open

dhcp-relay: Format of the option82 field #5

tohojo opened this issue Aug 18, 2021 · 29 comments

Comments

@tohojo
Copy link
Member

tohojo commented Aug 18, 2021

Quote:

Anyway, I think it is time to consult the RFC (
https://tools.ietf.org/rfc/rfc3046.txt), more specifically section 3.1 and
3.2, which defines the content of the sub options in Option 82.

In my view, Agent Circuit ID should contain the following, as it defines
the circuit:

  • Interface name
  • Outer VLAN
  • Inner VLAN

Preferably in ASCII format to ensure compatibility with as many DHCP
servers as possible.

For inspiration, our ZTE BNGs use the following format:

eth 0/0/5:30.3611 0/0/0/0/0/0

where 0/0/5 is the interface name, 30 is the outer VLAN and 3611 is the
inner VLAN. I don't know why they append " 0/0/0/0/0/0".

Agent Remote ID should not contain VLAN tags as this sub option is used to
identify the remote host. It could contain the MAC address of the client,
but that would be a bit redundant. In our current setup, we don't use Agent
Remote ID at all.

I also think GIADDR should be set - preferably as a configurable option.

@yoelcaspersen
Copy link
Contributor

From RFC3046 section 2.1:

A DHCP relay agent adding a Relay Agent Information field SHALL add it as the last option (but before 'End Option' 255, if present) in the DHCP options field of any recognized BOOTP or DHCP packet forwarded from a client to a server.

In the current example, DHCP option 82 is inserted as the first DHCP option. Adding it as the last option will require parsing of all options - is that feasible in XDP?

@tohojo
Copy link
Member Author

tohojo commented Oct 5, 2021 via email

@yoelcaspersen
Copy link
Contributor

I figure out how to loop through the DHCP options in 09bc685.

As for the format of Option 82: I think both sub-options should contain ASCII only to ensure compatibility with most DHCP servers out there (writing non-ASCII characters in configuration files is going to be an issue for most sysadmins).

That begs the question: How can we parse ASCII text in an efficient manner with eBPF?

@tohojo
Copy link
Member Author

tohojo commented Oct 13, 2021 via email

@yoelcaspersen
Copy link
Contributor

Efficient? Sure, just look at the bytes. Convenient? Not so much, there are no string handling functions available...

I suppose we could make a small reverse lookup table, so we can translate ASCII to actual byte values.

Let's say we receive an ASCII text like:

ens6f0.83.200

If we split by the dot character (0x2E), we are left with the interface name, outer and inner VLAN. Can we find the ifindex from interface name in XDP - or do we need to maintain a map for that?

@tohojo
Copy link
Member Author

tohojo commented Oct 13, 2021 via email

@yoelcaspersen
Copy link
Contributor

I considered that possibility as well, but wouldn't that just give us the ifindex of the VLAN interface, when what we really need is the ifindex of the physical interface?

I am starting to like the idea of a stateful DHCP relay...

@tohojo
Copy link
Member Author

tohojo commented Oct 13, 2021 via email

@yoelcaspersen
Copy link
Contributor

Well, you can put anything you want into the map, so just include the physical interface? :)

That is under the assumption that we actually have a map. In the current state, the DHCP relay works without knowing in advance which VLANs to expect or allow - as long as there are 2 VLAN tags, the relay will process the requests.

Why?

Because it would make things really simple. When a DHCP request arrives, we save the following information in a map:

  • Client MAC (key)
  • Client VLANs
  • ifindex
  • Timestamp

When the response arrives from the server, we check the map and find all the stuff we need to forward the packet to the client - no extra VLAN config needed.

However, making the relay stateful would require regular cleanup of the map from user space (one of the reasons I would prefer a stateless DHCP relay in the first place).

@tohojo
Copy link
Member Author

tohojo commented Oct 14, 2021 via email

@yoelcaspersen
Copy link
Contributor

You could use an LRU type map where old entries are automatically evicted when the map runs out of space

Could you point to an example of that?

@tohojo
Copy link
Member Author

tohojo commented Oct 15, 2021 via email

@yoelcaspersen
Copy link
Contributor

yoelcaspersen commented Oct 17, 2021

Sure - I use the same approach in this example to avoid having to garbage collect: https://github.com/xdp-project/bpf-examples/blob/master/preserve-dscp/preserve_dscp_kern.c

Thanks, @tohojo. I have added state to the latest commit (ac9373c).

Next issue: Building a proper Option 82 Circuit ID string including interface name and VLAN tags, e.g.:

eth0.80.25

I get the eth0 part from the user space program through a map, but I need to convert the VLAN tags to ASCII.

In normal C, I would use sprintf:

char buf[IF_NAMESIZE];
sprintf(buf, "%s.%d.%d", device_name, outer_vlan, inner_vlan);

but that doesn't fly with eBPF. Do you have an idea how we can convert the VLAN tags to ASCII in a somewhat efficient manner?

@yoelcaspersen
Copy link
Contributor

Do you have an idea how we can convert the VLAN tags to ASCII in a somewhat efficient manner?

I found a way to do it using the modulo operator - code is committed in a9003fa.

It is not efficient, but it will allow us to move forward with the return path using real replies from DHCP servers. Let me know if you come up with a better approach than mine :)

@tohojo
Copy link
Member Author

tohojo commented Oct 18, 2021

As a general point, if you want code review, please open pull requests. It's possible to comment on commits on github, but such discussions tend to disappear into the void afterwards, so it's not a very good place to do so.

A few comments on that commit:

 * Device name is 16 characters long and prepended with dash, e.g.:
 * ----ens6f0.83.20

Why the dashes? It's text, so just make it shorter? Also, is the interface name really the best identifier here? That means you'll have to update the DHCP server config if the BNG interface configuration changes. Why not put in a logical identifier instead (say "pop-XXX" or whatever makes sense)?

	for (c = VLAN_ASCII_MAX; c > 0; c--) {
		buf[i--] = (inner_vlan % 10) + '0';
		inner_vlan /= 10;
		if (inner_vlan == 0) {
			break;
		}
	}

This is an obvious contender for a small helper function (since you're repeating it twice).

for (c = IF_NAMESIZE - 1; c >= 0; c--) {

Here you're looping up to IF_NAMESIZE, but that's the size of the buffer, which you already used up some of; so this is liable to write beyond the beginning of the buffer.

@yoelcaspersen
Copy link
Contributor

Why the dashes? It's text, so just make it shorter?

Well, that was my first approach, but I ran into a few issues with that.

I haven't found an elegant way to copy a variable number of bytes in XDP - the built-in memcpy takes a const length. Unless we make a workaround for writing a variable number of bytes, we'll need to keep option 82 length static.

Also, is the interface name really the best identifier here? That means you'll have to update the DHCP server config if the BNG interface configuration changes.

That's also an issue with our ZTE setup. But you are right, configuring a logical identifier probably would make better sense.

Here you're looping up to IF_NAMESIZE, but that's the size of the buffer, which you already used up some of; so this is liable to write beyond the beginning of the buffer.

Hence the check:

if (i < 0) { break; }

I am aware that both buffers are IF_NAMESIZE (16 bytes) long - under normal circumstances, the physical interface name should be no longer than 6 characters to allow VLAN IDs to be appended (each VLAN takes up to 4 ASCII characters + 1 character for a separating dot). Strictly speaking, there is no guarantee which naming scheme the administrator will use, so your idea to use a configurable logical identifier is probably better.

We still end up in the same place, though: Without a way to copy a variable number of bytes, option 82 circuit ID will in most cases be longer than strictly necessary.

@tohojo
Copy link
Member Author

tohojo commented Oct 18, 2021

I haven't found an elegant way to copy a variable number of bytes in XDP - the built-in memcpy takes a const length. Unless we make a workaround for writing a variable number of bytes, we'll need to keep option 82 length static.

Well, you basically implemented one - the code that copies the interface name into the buffer is a variable-length memcpy() :)

@yoelcaspersen
Copy link
Contributor

Well, you basically implemented one - the code that copies the interface name into the buffer is a variable-length memcpy() :)

Sure, but I suppose memcpy() derivatives perform significantly better than copying one byte at a time in a loop - right?

@tohojo
Copy link
Member Author

tohojo commented Oct 18, 2021

"slightly" being the operative word, there... you're copying a few dozen bytes at most, so I doubt you'd notice the difference.

You could optimise it to copy the data in 8-byte chunks (since instructions are 64 bits wide). This would look something like:

void memcpy_var(void *to, void *from, __u16 len)
{
__u64 *t64 = to, *f64 = from;
__u8 *t8, *f8;
int i;

for (i = 0; i < 100 && len >= 8; i++, len -= 8)
  *t64++ = *f64++;

t8 = (__u8*)t64;
f8 = (__u8*)f64;
for (i = 0; i < 7 && len > 0; i++, len--)
  *t8++ = *f8++
}

This is basically what the static version does (except it elides the loop checks and just inlines the copy instructions because it knows the size at compile time):
https://github.com/xdp-project/bpf-examples/blob/master/include/bpf/builtins.h#L154

@yoelcaspersen
Copy link
Contributor

yoelcaspersen commented Oct 21, 2021

I tried your approach, but the compiler chokes on the following lines:

  *t64++ = *f64++;

and:

  *t8++ = *f8++;

A call to built-in function 'memcpy' is not supported.

After adding compiler flags -fno-builtin the code compiles.

However, when I try to use memcpy_var() within the write_dhcp_option82() function:

if (sizeof (option.circuit_id.val) == sizeof (buf)) {
	memcpy_var(option.circuit_id.val, buf + i, sizeof (buf) - i);
}

I get a nasty error from the verifier:

misaligned stack access off (0x0; 0x0)+-17+0 size 8

Close... but not quite there yet.

@tohojo
Copy link
Member Author

tohojo commented Oct 21, 2021 via email

@yoelcaspersen
Copy link
Contributor

Simpler to just skip the first bit that does u64-based copy and copy the data byte-by-byte...

I modified your function to:

void memcpy_var(void *to, void *from, __u16 len) {
	__u8 *t8 = to, *f8 = from;
	int i;

	for (i = 0; i < len; i++, len--)
		*t8++ = *f8++;
}

Same error - misaligned stack access.

@tohojo
Copy link
Member Author

tohojo commented Oct 21, 2021 via email

@yoelcaspersen
Copy link
Contributor

Hmm, that sounds like the compiler is being "helpful" and optimising the access. Could you please post the full verifier log?

Sure, you'll find it here:

2021-10-21 Verifier log - BNG router.txt

@tohojo
Copy link
Member Author

tohojo commented Oct 21, 2021 via email

@yoelcaspersen
Copy link
Contributor

that __it_set() call looks like it's coming from __bpf_memzero(), so I think maybe it's not due to the memcpy_var()?

It could be the memset() that follows the copy operation:

	if (sizeof (option.circuit_id.val) == sizeof (buf)) {
		memcpy_var(option.circuit_id.val, buf, sizeof (buf));
	}

	/* Initialize remote ID */
	memset(option.remote_id.val, 0, sizeof (option.remote_id.val));

When I disable the memset() statement, the verifier accepts memcpy_var(), but memcpy_var() copies only the first 8 bytes of buf.

When I enable it, the verifier complains about misaligned stack access - unless I replace memcpy_var() with memcpy(). Then it works as expected.

@yoelcaspersen
Copy link
Contributor

but memcpy_var() copies only the first 8 bytes of buf.

I fixed this by removing len-- from the loop in memcpy_var().

And now I can copy part of the buffer:

memcpy_var(option.circuit_id.val, buf + i, sizeof (buf) - i);

But I still cannot initialize the option.remove_id.val with memset - the verifier complains about misaligned stack access:

memset(option.remote_id.val, 0, sizeof (option.remote_id.val));

If I remove the memset statement, the XDP program works, but I guess option.remote_id.val should be initialized - otherwise it will contain random stack memory, right?

@tohojo
Copy link
Member Author

tohojo commented Oct 21, 2021

if (sizeof (option.circuit_id.val) == sizeof (buf)) {
memcpy_var(option.circuit_id.val, buf, sizeof (buf));
}

This is length is still constant at compile time, so you don't actually need memcpy_var() for it, and also you're not cutting off the unused part of the buffer (which was the point of this whole exercise, wasn't it? :)).

If I remove the memset statement, the XDP program works, but I guess option.remote_id.val should be initialized - otherwise it will contain random stack memory, right?

Could you please open a pull request with the entire code tidbit? It's a bit hard to tell what's what with these small snippets...

@yoelcaspersen
Copy link
Contributor

This is length is still constant at compile time, so you don't actually need memcpy_var() for it, and also you're not cutting off the unused part of the buffer (which was the point of this whole exercise, wasn't it? :)).

My point was to show that the same code that works with memcpy() fails with memcpy_var() - although it should be doing exactly the same.

Could you please open a pull request with the entire code tidbit?

Done. I found out that I could build a custom memset_var() function that passes the verifier checks. I suppose it is less efficient, but it seems to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants