-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functionality to notify systemd on olad startup and config reload #1444
base: master
Are you sure you want to change the base?
Conversation
This helps users who run olad under the management of systemd to order their services after olad startup, (i.e. starting up a daemon that depends on the olad RPC socket being available), avoiding the need for a manual delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, some comments.
Also should we add MAINPID https://www.freedesktop.org/software/systemd/man/sd_notify.html#MAINPID=%E2%80%A6 like the extended example here https://www.freedesktop.org/software/systemd/man/sd_notify.html#Examples
Finally, should we add STOPPING when triggered via the web UI? https://github.com/OpenLightingProject/ola/blob/master/olad/OlaServer.h#L126
configure.ac
Outdated
@@ -387,6 +387,13 @@ esac | |||
have_libftd2xx="no" | |||
AM_CONDITIONAL([HAVE_LIBFTD2XX], [test "x$have_libftd2xx" = xyes]) | |||
|
|||
# libsystemd | |||
AC_SEARCH_LIBS([sd_notify], [systemd], [have_libsystemd="yes"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you see have_libftd2xx above, you should really initialise it to no before you use it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. I'll follow the examples from the above location.
What do you think about making linking against libsystemd
a command line option to the build system? I'm thinking build systems that have libsystemd
may silently cause built binaries to link against libsystemd
, and this could lead to a bit of a dependency problem, unless they're aware of this new change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it may make sense, see e.g. https://github.com/OpenLightingProject/ola/blob/master/configure.ac#L494-L518 .
I was wondering, but at least it's only libsystemd, it doesn't require systemd itself.
olad/OlaServer.cpp
Outdated
#ifdef HAVE_LIBSYSTEMD | ||
// Return value is intentionally not checked for both calls. | ||
// See return value section under sd_notify(3). | ||
sd_notify(0, "RELOADING=1\nSTATUS=Reloading plugins\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having read the man page, I understand why it's not checked, but for user sanity, we should add some logging. Probably both to signify sd_notify has been run/the return code and particularly a warning if it failed.
I'd probably suggest OLA_WARN if we get <0 return code, and then maybe OLA_INFO for 0 and OLA_DEBUG for >0, but that may be a bit excessive. Likewise others may argue those log levels should all be reduced by one...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the system is too old to support RELOADING? I assume they just get discarded as they're plain text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the system is too old to support RELOADING? I assume they just get discarded
as they're plain text?
sd_notify(3)
mentions this: The string may contain any kind of variable assignments, but the following shall be considered well-known
.
I think that does indeed imply that if the service manager doesn't recognize the variable being assigned to, it'll either log that somewhere or ignore that assignment, so we shouldn't have a problem with old systems.
I understand why it's not checked, but for user sanity, we should add some logging. Probably
both to signify sd_notify has been run/the return code and particularly a warning if it failed.
If that's the case, would a message at startup do? libsystemd
provides capabilities for determining whether a binary has been setup under the control of systemd
, and that could be used to indicate that notification messages are being used. As for the failures, though, we could log those as well, but I think conditionally - only when being run under systemd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scratch that, there's no library function to determine whether the service has been run under systemd, but the call shouldn't return a negative value if the socket was not passed, anyway. I'll log failures and log from main to indicate whether notifications are being passed.
olad/OlaServer.cpp
Outdated
OLA_INFO << "Reloading plugins"; | ||
StopPlugins(); | ||
m_plugin_manager->LoadAll(); | ||
#ifdef HAVE_LIBSYSTEMD | ||
sd_notify(0, "READY=1\nSTATUS=Plugin reload complete\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, some logging to help the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
olad/Olad.cpp
Outdated
#if HAVE_LIBSYSTEMD | ||
// Return value is intentionally not checked. See return value section | ||
// under sd_notify(3). | ||
sd_notify(0, "READY=1\nSTATUS=Startup complete\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, a bit of logging please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. I don't think we need it for this one, though. It could be useful to tell the user that we're not running under systemd
, or that we have failed to access the notification socket due to a configuration error.
However, if olad
was launched as a systemd service of type notify
(which is the intended use case): then systemd
will report that the start of the service failed, anyway, which should alert the user to the error. There also shouldn't be an error accesing the notification socket, because systemd
implicitly allows services with type=notify
to access the notification socket.
A failure looks like this:
● olad.service - ola daemon
Loaded: loaded (/home/shenghao/.config/systemd/user/olad.service; static; vendor preset: enabled)
Active: failed (Result: timeout) since Sat 2018-07-14 11:43:42 +08; 9s ago
Process: 25379 ExecStart=/home/shenghao/Projects/ola/olad/olad (code=exited, status=0/SUCCESS)
Main PID: 25379 (code=exited, status=0/SUCCESS)
Jul 14 11:42:13 shenghao-crbook15 olad[25379]: *** WARNING *** Please fix your application to use the native API of Avahi!
Jul 14 11:42:13 shenghao-crbook15 olad[25379]: *** WARNING *** For more information see <http://0pointer.de/blog/projects/avahi-compat.html>
Jul 14 11:42:13 shenghao-crbook15 lt-olad[25379]: *** WARNING *** For more information see <http://0pointer.de/blog/projects/avahi-compat.html>
Jul 14 11:42:13 shenghao-crbook15 olad[25379]: common/io/IOUtils.cpp:39: open(/dev/dmx0): No such file or directory
Jul 14 11:42:13 shenghao-crbook15 olad[25379]: plugins/opendmx/OpenDmxPlugin.cpp:81: Could not open /dev/dmx0 No such file or directory
Jul 14 11:42:13 shenghao-crbook15 olad[25379]: common/io/IOUtils.cpp:39: open(/dev/kldmx0): No such file or directory
Jul 14 11:42:13 shenghao-crbook15 olad[25379]: plugins/karate/KaratePlugin.cpp:80: Could not open /dev/kldmx0 No such file or directory
Jul 14 11:43:42 shenghao-crbook15 systemd[3896]: olad.service: Start operation timed out. Terminating.
Jul 14 11:43:42 shenghao-crbook15 systemd[3896]: olad.service: Failed with result 'timeout'.
Jul 14 11:43:42 shenghao-crbook15 systemd[3896]: Failed to start ola daemon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's those edge cases that can be frustrating to troubleshoot though. If I forget to put type=notify, I want olad to tell me the socket isn't available, so I know to tweak my systemd config. Likewise in that timeout case, I'd like to know olad did/didn't send the message so I can troubleshoot ifdef type stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've added this in commit a5e57fe. A success/fail message everytime may clog up the logs if we want to use notify() a little more, though....
If
Hmm... this could be useful, if you do foresee problems terminating the server through the web UI. The user can be notified that we're stopping - and if we get stuck - he'll probably have a better idea why. But I think this place is a bit too low in the call chain, though, since the |
@peternewman is there a way to add some CFLAGS when compiling the sources that need to be linked with On a side note, I could stick a systemd unit file along as well... |
Changes from v1: - Systemd support is now switched in via a flag to the configure script. - Failures to notify systemd are logged, but only when the notification socket is actually passed. - If the socket is not passed, a warning is displayed. - Added a utility library for the sd_notify calls.
- License error in both files, LGPL -> GPL - Wrong usage of @code in Systemd.h, should be @c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, a few more comments
olad/NotifySystemd.cpp
Outdated
|
||
namespace ola { | ||
|
||
int notify_systemd(int unset_environment, const char *state) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We go CamelCase for our method names.
olad/NotifySystemd.cpp
Outdated
return rtn; | ||
} | ||
|
||
bool notify_available() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again
olad/NotifySystemd.cpp
Outdated
if (rtn < 0) { | ||
char buf[1024]; | ||
OLA_WARN << "Error sending notification to systemd: " << | ||
strerror_r(-rtn, buf, sizeof(buf)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll only print the error code from strerror_r currently. Looking at https://linux.die.net/man/3/strerror_r it seems you should be able to just do << strerror(-rtn)
.
Can you also move the second << to the next line and align with the one above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think you should add a configure check for strerror_r (see https://github.com/OpenLightingProject/ola/blob/master/configure.ac#L185 and e.g. https://github.com/OpenLightingProject/ola/blob/master/common/base/Credentials.cpp#L342 ), it's the first time it's being used in the codebase and we run on a variety of weird platforms. See also some similar fun and discussion with readdir_r #1055 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll only print the error code from strerror_r currently. Looking at https://linux.die.net/man/3/strerror_r it seems you should be able to just do << strerror(-rtn).
I saw that OLA was compiling with GNU extensions - in that case strerror_r
returns a pointer to either an impelemtation-allocated array or a pointer to the buffer that's passed in. We always have to use the returned pointer to reference the error message. I could mangle the defines a little bit and get the POSIX compatible one, but I'd like to confirm the usage of GNU extensions - I dug through the configure file and found loads of references to std=gnu++XX
instead of std=c++XX
.
Yeah, I'll be able to do a simple strerror()
call, but I'm a little concerned that the other threads could be calling into it at the same time, because (I think) some of the plugins do spawn threads for their own use - I see a bunch of threads under htop
for olad
. Unless they don't ever call strerror()
, then it's unsafe to call strerror()
at this site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, in which case can we just have a comment for our future selves if we switch away from the GNU extensions in future.
Well currently multiple plugins call strerror (and some do indeed spawn their own threads, at least one call is in a spawned thread), so we may have that issue already.
We should probably be moving to strerror_r throughout, but equally I wouldn't want this one usage to break the build on machines that don't have it, or by adding it to configure stop it building on those OSes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, in which case can we just have a comment for our future selves if we switch away from the GNU extensions in future.
Sure, alright.
We should probably be moving to strerror_r throughout, but equally I wouldn't want this one usage to break the build on machines that don't have it, or by adding it to configure stop it building on those OSes.
Any modern system with systemd
should have this - deliberately using strerror()
feels pretty icky. Maybe we can take a page from nginx
: they call strerror()
in a section where it is known that no other thread can call strerror()
and caches that for use.
Have a look at their implementation: https://github.com/nginx/nginx/blob/f8a9d528df92c7634088e575e5c3d63a1d4ab8ea/src/os/unix/ngx_errno.c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 469361d
configure.ac
Outdated
AC_CHECK_HEADER([systemd/sd-daemon.h], [], [have_libsystemd="no"])]) | ||
|
||
AS_IF([test "x$have_libsystemd" = xyes], | ||
[AC_DEFINE([HAVE_LIBSYSTEMD], [1], [define if systemd support wanted])]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd probably say this is if systemd support present (or present and wanted).
configure.ac
Outdated
# systemd support | ||
AC_ARG_ENABLE( | ||
[systemd], | ||
[AS_HELP_STRING([--enable-systemd], [Enable systemd support])]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to negate this, so it's turned on by default where available (or no-one will realise it exists/use it). See e.g. https://github.com/OpenLightingProject/ola/blob/master/configure.ac#L573-L589 and don't forget to update the AS_IF below too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eek.. an automatic dependency feels a little weird to me, unless we put it out for people to know. If we put the existence of this option in NEWS
(which is probably where it will end up in) and you suspect:
no-one will realise it exists/use it
then, if we put in a warning that libsystemd
is going to become an automatic dependency, I kinda also think that
no-one will realise it exists/use it
until bug reports come in regarding libsystemd
not being found on non-build systems...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we have the best of both worlds with our plugins, a literal auto mode, so by default if the bits are there, you get your plugin and if they aren't you don't, but you can also force it on (so it complains if dependencies are missing) or force it off (so it doesn't use it even if present).
You may be right about bug reports, I guess I'd tend towards the nudge theory, if we turn it on, then people who read the docs will turn it off easily, those who don't will ideally search or file a bug/ask on the mailing list and there's a workaround.
If we leave it off, we should at least enable it in our Debian build config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, alright. Let's default it to enabled. We're going to be playing devil's advocate (to some people out there) and silently promote systemd
adoption...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit 83807d4
olad/Makefile.mk
Outdated
@@ -42,6 +42,11 @@ ola_server_sources += olad/HttpServerActions.cpp \ | |||
ola_server_additional_libs += common/http/libolahttp.la | |||
endif | |||
|
|||
if HAVE_LIBSYSTEMD | |||
ola_server_sources += olad/NotifySystemd.cpp olad/Systemd.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally match .h and .cpp filenames.
olad/Systemd.h
Outdated
/* | ||
* @brief Notify systemd about daemon state changes. | ||
* | ||
* This function logs on failures to queue notifications, but only if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPaG "logs failures to queue" or "logs on failure to queue"
olad/OlaServer.cpp
Outdated
@@ -472,9 +476,15 @@ bool OlaServer::InternalNewConnection( | |||
} | |||
|
|||
void OlaServer::ReloadPluginsInternal() { | |||
#ifdef HAVE_LIBSYSTEMD | |||
ola::notify_systemd(0, "RELOADING=1\nSTATUS=Reloading plugins\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the socket have gone away by the time this gets fired though? Potentially minutes or hours/days later?
My previous comment was more along the lines of if I'm trying to troubleshoot why my reload isn't getting to systemd, I'd want to see (at info/debug level) the positive return code from calling sd_notify, so I know it's the systemd end I need to look at, and not OLA, or vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the socket have gone away by the time this gets fired though? Potentially minutes or hours/days later?
Well, I'm not sure that this could even happen, unless someone removes the environment variable. The socket is actually passed as a socket address (path name or name in abstract namespace, a-la unix domain sockets), and sd_notify
establishes a new socket sending data to that address, closing it on every return.
If systemd
did somehow... fail, I think the whole system would be going up in flames, by then. But if it did happen and nobody is listening on the socket address, then the sendmsg()
call in sd_notify()
will fail and the error will be logged through OLA_WARN
.
The only way for the notify_systemd
call to fail silently is the removal of the NOTIFY_SOCKET
environment variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, fair enough.
- Scenarios in which HAVE_LIBSYSTEMD is defined is made clearer. - NotifySystemd.cpp and Systemd.h modified to match existing code style. - NotifySystemd.cpp renamed so header and implementation filenames match. - Call sites for functions declared in Systemd.h modified to use new names.
- Added error messages to be output whenever dependent functionality / files are not present, to avoid silent failures. - Added additional check for strerror_r's presence to guard against ancient systems.
olad/Systemd.h
Outdated
* @brief Tests whether the systemd notification socket is available. | ||
* @return @c true if the socket is available, @c false if not. | ||
*/ | ||
bool NotifyAvailable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be clearer if this had Systemd in it's name, perhaps SystemdNotifyAvailable for this function and then rename the other one to SystemdNotify. It's then a closer match to the sd_notify it's wrapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit b7afd38
- NotifySystemd() and NotifyAvailable() modified to better match the naming style of the functions provided by libsystemd. - Call sites modified to use new names. - Log messages emitted to inform the user about the presence of the notification socket has been changed to make it clear that systemd only provides the application with the address of the notification socket. Systemd does not have the process inherit a file descriptor to use when sending notifications.
- Most system functions return an integer value indicating success or failure, and the usage of strerror_r may seem that we are writing that value out to the log. - However, olad is compiled using GNU extensions, which means that the return value from strerror_r is a pointer to the description for the particular error value passed in. - This is a little bit unconventional, and the comment was added to clarify that this is only the case because of a non-standard GNU extension.
- This is to ensure that compilation will not be dependent on GNU extensions being enabled. - A new source file has to be added, because one cannot simply undef _GNU_SOURCE in Systemd.cpp - it will cause the a failure to compile - something in the logging framework depends on GNU extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments
* @param unset_environment whether to unset the notification socket environment | ||
* variable so child processes cannot utilize it. | ||
* @param state state block to pass to systemd. | ||
* @return value returned from @ref sd_notify() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is actually the right Doxygen command, but I can't see a specific one either:
https://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdref
olad/Systemd.cpp
Outdated
if (rtn < 0) { | ||
char buf[1024]; | ||
OLA_WARN << "Error sending notification to systemd: "; | ||
if (ola::Strerror_r(-rtn, buf, sizeof(buf))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this if should all be within the function, rather than repeating it multiple times in each bit of code.
Also I think because you're doing multiple OLA_WARNs and it's a macro, I believe this would log multiple lines. Whereas a function returning a string would avoid those issues.
olad/Strerror_r.h
Outdated
/* | ||
* @brief XSI-compliant version of @ref strerror_r() | ||
|
||
* See strerror(3) for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@see
is supposed to link to some symbol, though, AFAIK. Not sure what doxygen will do when it tries to link to some man page...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It says "Starts a paragraph where one or more cross-references to classes, functions, methods, variables, files or URL may be specified." So an URL isn't a symbol, although I don't know how it decides if something is a URL or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess I could put a link to a manual page in man7.
olad/Olad.cpp
Outdated
@@ -171,6 +175,18 @@ int main(int argc, char *argv[]) { | |||
ola::NewCallback(olad->GetOlaServer(), &ola::OlaServer::ReloadPlugins)); | |||
#endif // _WIN32 | |||
|
|||
#if HAVE_LIBSYSTEMD | |||
if (ola::SystemdNotifyAvailable()) { | |||
OLA_INFO << "Systemd notification socket address present. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably wants to be a comma not a full stop (then adjust the case on the line below.
olad/Olad.cpp
Outdated
OLA_INFO << "Systemd notification socket address present. " | ||
<< "Sending notifications."; | ||
} else { | ||
OLA_WARN << "Systemd notification socket address not present. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again
olad/Strerror_r.cpp
Outdated
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. | ||
* | ||
* Strerror_r.cpp | ||
* Definition of strerror_r that is XSI-compliant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally slightly wary about the future maintainability of this, but I guess a pile of ifdef's to switch behaviour based on the detected version might not be any better.
Where did this come from, is there a link for credit/future reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macros came from strerror(3).
olad/Strerror_r.cpp
Outdated
@@ -0,0 +1,49 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if these should be in include/ola and common/ola for future use by other bits of the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do that, but I'd have to add strerror_r
to the main list of functions checked. Probably not a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember, are they mandatory functions, or still optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well AC_CHECK_FUNCS
wouldn't terminate if something is not found, so it's alright.
- Strerorr_r wrapper could be useful to elements outside of olad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more comments
include/ola/base/Strerror_r.h
Outdated
|
||
* See strerror(3) for more details. | ||
*/ | ||
int Strerror_r(int errnum, char* buf, size_t buflen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this StrError_R or something, to match ola::strings::StrNCopy
and others.
olad/Systemd.cpp
Outdated
if (rtn < 0) { | ||
char buf[1024]; | ||
if (ola::Strerror_r(-rtn, buf, sizeof(buf))) { | ||
OLA_WARN << "Error sending notification to systemd: errno = " << -rtn; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've now just had to duplicate the "Error sending notification to systemd" text. I still feel the whole return a string or return "errno" and a value should be in the function, as we'll always want that functionality. Or if you want to do a direct wrap of strerror_r, we should also ship a helper function which does what we actually want each time (and then calls the wrapped strerror_r).
Otherwise we have to keep writing this code:
char buf[1024];
if (ola::Strerror_r(-rtn, buf, sizeof(buf))) {
OLA_WARN << "Error sending notification to systemd: errno = " << -rtn;
} else {
OLA_WARN << "Error sending notification to systemd: " << buf;
}
Whereas we should just be able to write something like this (or whatever the function gets called) and get the same output as the above:
OLA_WARN << "Error sending notification to systemd: " << ola::StrError_rHelper(-rtn);
- New wrapper function returns a C++ standard library string, that can be easily passed to one of the logging streams for output, as opposed to requiring the caller to discriminate, based on the return value, what to pass to the logging stream. - Added a constant that exposes the internal buffer size of the wrapper, in case this is required. - Naming for the wrapper functions and their source and header files have been changed to reflect convention.
- In StrError_R(): Assume the happy path where the buffer is sufficient to hold the error message, first, instead of preemptively filling the buffer with the fallback numerical code. We rely on the fallback behaviour only when the buffer has been deemed to be too small. - In StrError_R(): Properly handle negative return values from snprintf(), instead of casting it away without considering the possibility of the value being negative. - Doxygen: generation of documentation produced warnings due to linking against strerror_r(), a symbol that wasn't found by doxygen because string.h wasn't included in StrError_R.h used to generate doc output. Including it wasn't necessary, so the references were dropped and fixed-width text is used to refer to the function instead.
e2d5179
to
1cd1202
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more comments, I think it's very nearly there now thanks.
common/base/StrError_R.cpp
Outdated
// of digits in the position of the most significant base 10 digit. | ||
// (i.e. 0-2 representable in an 8-bit unsigned integer at the | ||
// hundredth's place instead of 0-9 because an u8 is clamped to | ||
// [0, 255]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just add a note on the end of this line to explicitly clarify that digits10 will therefore only return 2 for uint8. It was only when I read the docs that your comment made sense.
common/base/StrError_R.cpp
Outdated
out.append("<error>"); | ||
// Ugly cast required to avoid compiler warning when comparing | ||
// numbers of different signedness. | ||
} else if (static_cast<unsigned int>(r) >= sizeof(errs)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would casting the sizeof to an int be better? Although there are downsides either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... that could be done, but r is known to be non-negative from this if onwards, but I guess this would be moot if we were to shift to using an ostringstream
.
common/base/StrError_R.cpp
Outdated
// - 1 additional unit of storage just in case a negative errnum is | ||
// passed, and somehow StrError_R_XSI() failed on that. | ||
char errs[std::numeric_limits<int>::digits10 + 3]; | ||
int r(snprintf(errs, sizeof(errs), "%d", errnum)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our style is int r = snprintf(...);
common/base/StrError_R.cpp
Outdated
} else if (static_cast<unsigned int>(r) >= sizeof(errs)) { | ||
out.append("<truncated>"); | ||
} else { | ||
out.append(errs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this have a resize too?
configure.ac
Outdated
AC_MSG_WARN([disabling systemd support: can't find systemd/sd-daemon.h])]) | ||
AS_IF([test "x$ac_cv_func_strerror_r" = "xno"], | ||
[have_libsystemd="no" | ||
AC_MSG_WARN([disabling systemd support: can't find strerror_r])])]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WARNs mean configure will continue. It might be nice to have a final bit of logic where if "x$enable_systemd" != "xyes"
and "x$have_libsystemd"=="xno"
it throws an error. So if you've explicitly added --enable-systemd to your configure, it bombs out, but if you've done nothing, it just warns and carries on.
include/ola/base/StrError_R.h
Outdated
/** | ||
* @brief XSI-compliant version of \c strerror_r() | ||
* | ||
* See https://linux.die.net/man/3/strerror for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, Doxygen explicitly mentions URLs with @see, so I think doing that and dropping the superflous words around it would make better use of their special styling.
include/ola/base/StrError_R.h
Outdated
* | ||
* @param errnum error value to generate the textual description for. | ||
* @return Textual description of the error value. If the description is | ||
* trucnated due to an insufficient buffer length, the description will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPaG truncated
common/base/StrError_R.cpp
Outdated
// Pre-allocate to prevent re-allocation. | ||
std::string out(StrError_R_BufSize, '\0'); | ||
if (StrError_R_XSI(errnum, &(out[0]), out.size())) { | ||
out.assign("errno = "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming a stringstream doesn't allow access to a buffer to use it throughout this code? However what's the hit, aside from a bit of memory, with using an ostringstream in the error case, and then appending that to out, see e.g. https://github.com/OpenLightingProject/ola/blob/master/common/base/Version.cpp#L43-L47 . Which would completely avoid all the edge case handling you've had to write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be done, just didn't feel like dragging the whole setup associated with the stream into something as trivial as representing an integer as text - really looking forward to the availability of to_string
as we (hopefully) move towards C++11
in the future.
* of this buffer minus one, then the output of @ref StrError_R() will only | ||
* include the numerical error value provided. | ||
*/ | ||
extern const int StrError_R_BufSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually static const
, is there a reason you need extern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we static const
here - then every compilation unit that includes this header file would have their own copy of StrError_R_BufSize
. extern
here just lets me allocate storage for it in that one unit and have the other units referencing this variable link to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could #define
stuff.... but I'd want to avoid that route - maybe future refactoring to constexpr
or inline constexpr
in future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One for @nomis52 then as to why we've gone for static const over extern const?
common/base/StrError_R.cpp
Outdated
|
||
namespace ola { | ||
|
||
const int StrError_R_BufSize(1024); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again our style is ...BufSize = 1024;
- Formatting, style, and spelling: - Fixes for wrong spelling in StrError_R.h - Documentation special command used to refer to manual page for strerror_r() instead of merely mentioning the URL for the manual page. - Style changes to initializers for variables defined in StrError_R.cpp, in order to match conventions. - Refactored StrError_R: - Switched to using ostringstream to convert error value to a human-readable number, avoiding complex error handling at the cost of longer execution time, but this path should not be taken for the most part, so any effect should be minimal.
- Configure script will now check if systemd support has been explicitly enabled when it is unable to enable it. - Provides user feedback when his explicit choice fails, instead of failing silently with a warning message that is probably hidden in the mountain of messages the configure script generates.
Got most of them done for now - when ola moves over to C++11 we can clean up a lot more. |
common/base/StrError_R.cpp
Outdated
// Pre-allocate to prevent re-allocation. | ||
std::string out(StrError_R_BufSize, '\0'); | ||
if (StrError_R_XSI(errnum, &(out[0]), out.size())) { | ||
std::ostringstream errs("errno = "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I realised after I said this, I should just get you to use our existing function to do this ola::strings::IntToString :
http://docs.openlighting.org/ola/doc/latest/namespaceola_1_1strings.html#a1c6637c4498627e405cf8c95cef677c7
Yikes. Wish I'd found that out earlier. Will push a commit in a tick.
|
Sorry @shenghaoyang I didn't think of it when I was commenting; TBH I'm used to more grown-up languages where you don't have to hand code such functions! |
Well... |
I was impressed to see the + string concatenation was in C++98! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a couple of minor comments.
Would you mind adding an example systemd file that uses this functionality too please?
@@ -234,6 +234,9 @@ AM_CONDITIONAL(HAVE_EPOLL, test "${ax_cv_have_epoll}" = "yes") | |||
AC_CHECK_FUNCS([kqueue]) | |||
AM_CONDITIONAL(HAVE_KQUEUE, test "${ac_cv_func_kqueue}" = "yes") | |||
|
|||
# strerror_r | |||
AM_CONDITIONAL([HAVE_STRERROR_R], [test "x$ac_cv_func_strerror_r" = "xyes"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can drop the x's and square brackets here, looking at HAVE_KQUEUE just above, which will have had broader OS testing.
* of this buffer minus one, then the output of @ref StrError_R() will only | ||
* include the numerical error value provided. | ||
*/ | ||
extern const int StrError_R_BufSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One for @nomis52 then as to why we've gone for static const over extern const?
Sure. I can use the unit file used for testing. What sort of systemd version are we targeting? Arch is around |
I'd probably aim at Debian Stretch, as that's where our deb package will go first, which is on 232: I guess you could add the latest stuff in another file somewhere or an issue or something (or comments in the file)? Assuming the features aren't backwards compatible/fail safe. But there are Arch packages too, so I don't know how best we offer it to people like them too? |
Okay, Debian probably has some document on packaging items with systemd unit files. I'll go take a look.
I was intending to use
IIRC the Arch package is on the AUR... Maybe we could selectively install the unit file based on the version? I was actually curious about the uaccess mechanisms I saw in a couple udev scripts. How does olad used this to allow itself access to devices? When I wrote the unit file for testing I had systemd add the dynamically created user to a group that had access to those devices. On a side note, writing a unit file opens up a whole can of worms. If you really want to go full ham on the sandboxing and "container-like" options that systemd offers it's probably going to spawn a long debate. I'll just enable a few simple options and users that are concerned could probably drop in an override file and use that to add options instead. |
Indeed it will, e.g. https://wiki.debian.org/Teams/pkg-systemd/Packaging There's probably something in Debian Policy too. You shouldn't need any real privileges, aside from some permissions on the devices for USB dongles etc. That sounds about right, I don't know much about Arch/AUR at all TBH. The udev uaccess stuff came in here: But that's about all I really know about that bit too... I think there's enough of a can of worms between init and systemd anyway, despite different bits of systemd. I don't think the systemd file needs to be completely gold plated, just lint clean for Debian/compliant with their minimum requirements and to allow people to actually make use of the functionality you've added in without them really needing to think for themselves! |
Since most items are tagged using "uaccess", this is the most easy way to run the olad daemon, without creating additional system users. Note that DynamicUsers can be used in future along with a SupplementaryGroups=plugdev in order to implement an olad service that can be spawned off the main systemd process in charge of the whole system. Since this option is not supported in 232, that has not been enabled here.
[RFC] I've added in a systemd unit file for a user instance of systemd. Try copying the unit file from
When you test this. Note that if your Do try testing these few things:
Hopefully the options I've chosen are compatible across most systems. The unit file for running Note that the user who is running this service MUST have a physical seat on If anyone has problems, please post:
and
If everything works out we can copy the unit file during install. Once this whole shebang has been completed the next step is to make Shenghao A side note regarding the user instance of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of comments from my vague understanding of systemd.
Does systemd allow you to depend on networks being up? I think I've been round this loop before with it, but I thought I'd ask.
systemd/olad.service
Outdated
@@ -0,0 +1,30 @@ | |||
[Unit] | |||
Description=Ola Daemon (User mode) | |||
Documentation=https://github.com/OpenLightingProject/ola |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This says you can have multiple documentation ( https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Documentation= ), so I'd also suggest:
http://docs.openlighting.org/ola/conf/
https://www.openlighting.org/ola/
Probably both listed before GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problems. Any other suggestions? I probably ought to write some kind of documentation article showing how this systemd integration is accomplished as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I can immediately think of. Yeah some docs to put on the website or wiki would be good.
I guess also the man pages:
https://docs.openlighting.org/ola/man/
Actually reading the page I linked to, it says "It is a good idea to first reference documentation that explains what the unit's purpose is, followed by how it is configured, followed by any other related documentation.", so it should probably be the website, then the man pages, then conf, then GitHub.
SystemCallArchitectures=native | ||
|
||
# Directory specifications | ||
ConfigurationDirectory="~/.ola/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This varies on Debian when installed via a deb package, it says you can list multiple, but relative:
https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RuntimeDirectory=
So I don't know if it's worth adding /var/lib/ola/conf, or explicitly setting the --config-dir option in olad to the config directory you're listing? Although as your description says user mode, using ~ probably makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I chose not to add that in. Since olad
appears to default to the user configuration directory, it'd be safe to leave it as it is. If we're developing a unit file for a system wide daemon, then some modification would be required (as fitting distribution-specific group and user settings).
Yes, you can, but this is a bit of a mixed bag. There are targets you can add dependencies on to wait for the network, but the loopback interface should be initialized by You can take a look at some of the targets that can be added as dependencies for startup: EDIT: I've noticed the build to be failing quite a bit due to the placement of the files in the |
Some of the plugins (and indeed bits of the core), want our IP for various reasons. Firing off a SIGHUP will reload the plugins and should resolve most such issues.
I assume you mean this: https://travis-ci.org/OpenLightingProject/ola/jobs/427037785#L7207 You need to either add the file to a Makefile (add one in systemd folder), or add it to the ignore list. |
This helps users who run olad under the management of systemd to
order their services after olad startup, (i.e. starting up a daemon
that depends on the olad RPC socket being available), avoiding
the need for a manual delay.
If anyone's versed in autotools magic, please help check the
changes to
configure.ac
. I've only dabbled with autotools oneother time in my life, and I'm not sure if what I did was exactly correct.
Tested on my local machine with a unit file running the olad binary
as a notifying service. Reload messages tested as well. And I do
see the systemd generated reload notifications bounding the
output from olad during the reload process.
Not sure how we go about writing test cases for this, though.
Seems a bit pathological to formally test notification functionality,
but if the CI environment has
systemd
andlibsystemd
installed,we could cook up a test for the notification functionality using
systemd-run
- if the notification didn't go through, thensystemd-run
should notify on an error starting the service.Thanks for reviewing.
Shenghao