Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] (valgrind findings) Uninitialized memory use #72

Open
ghost opened this issue May 1, 2022 · 6 comments
Open

[bug] (valgrind findings) Uninitialized memory use #72

ghost opened this issue May 1, 2022 · 6 comments
Labels
◼ Type: Bug This issue relates to an encountered bug.

Comments

@ghost
Copy link

ghost commented May 1, 2022

🐛 BUG REPORT

Ran a quick valgrind on the xcash-dpops testnet branch. It popped up a few uninitialized memory findings I thought I'd share here.

Mostly just issues such as an attempt to delete a poll with this uninitialized epoll_ctl structure here, here, here, here:
==856572== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==856572== at 0x4AD2B2E: epoll_ctl (syscall-template.S:78)
==856572== by 0x26733F: new_socket_thread (server_functions.c:211)
==856572== by 0x26A7A1: socket_receive_data_thread (server_functions.c:923)
==856572== by 0x497B608: start_thread (pthread_create.c:477)
==856572== by 0x4AD2162: clone (clone.S:95)

And this socket close call on an unitialized structure from here, here:
==856572== Syscall param close(fd) contains uninitialised byte(s)
==856572== at 0x49863FB: close (close.c:27)
==856572== by 0x25E9F4: get_delegates_online_status (block_verifiers_update_functions.c:1300)
==856572== by 0x224C2E: start_new_round (block_verifiers_functions.c:115)
==856572== by 0x24791E: current_block_height_timer_thread (block_verifiers_thread_server_functions.c:186)
==856572== by 0x497B608: start_thread (pthread_create.c:477)
==856572== by 0x4AD2162: clone (clone.S:95)

And a couple missed stack buffer initializations here (missed the 'data' initialization) and here (missed the 'data3' initialization):
==856572== Conditional jump or move depends on uninitialised value(s)
==856572== at 0x25E555: get_delegates_online_status (block_verifiers_update_functions.c:1238)
==856572== by 0x224C2E: start_new_round (block_verifiers_functions.c:115)
==856572== by 0x24791E: current_block_height_timer_thread (block_verifiers_thread_server_functions.c:186)
==856572== by 0x497B608: start_thread (pthread_create.c:477)
==856572== by 0x4AD2162: clone (clone.S:95)
And here and here:
==900800== Conditional jump or move depends on uninitialised value(s)
==900800== at 0x483EFA9: __strlen_sse2 (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==900800== by 0x1E6739: parse_json_data (string_functions.c:57)
==900800== by 0x22418E: server_receive_data_socket_node_to_node (block_verifiers_server_functions.c:415)
==900800== by 0x269E2E: socket_thread (server_functions.c:856)
==900800== by 0x26A923: socket_receive_data_thread (server_functions.c:932)
==900800== by 0x497B608: start_thread (pthread_create.c:477)
==900800== by 0x4AD2162: clone (clone.S:95)
And here:
==901150== Conditional jump or move depends on uninitialised value(s)
==901150== at 0x483EFA9: __strlen_sse2 (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==901150== by 0x1EA987: get_delegates_total_voters_count (shared_delegate_website_thread_server_functions.c:241)
==901150== by 0x26B5A6: block_verifiers_add_reserve_proof_check_if_data_is_valid (delegate_server_functions.c:146)
==901150== by 0x26C008: server_receive_data_socket_node_to_block_verifiers_add_reserve_proof (delegate_server_functions.c:353)
==901150== by 0x269A32: socket_thread (server_functions.c:753)
==901150== by 0x26A93C: socket_receive_data_thread (server_functions.c:932)
==901150== by 0x497B608: start_thread (pthread_create.c:477)
==901150== by 0x4AD2162: clone (clone.S:95)

Also valgrind doesn't seem to like the individual structure member approach taken here:

  // initialize the reserve_proof struct
  memset(reserve_proof.block_verifier_public_address,0,sizeof(reserve_proof.block_verifier_public_address));
  memset(reserve_proof.public_address_created_reserve_proof,0,sizeof(reserve_proof.public_address_created_reserve_proof));
  memset(reserve_proof.public_address_voted_for,0,sizeof(reserve_proof.public_address_voted_for));
  memset(reserve_proof.reserve_proof_amount,0,sizeof(reserve_proof.reserve_proof_amount));
  memset(reserve_proof.reserve_proof,0,sizeof(reserve_proof.reserve_proof));

I would probably just clear it in one fell swoop:

  memset(&reserve_proof,0,sizeof(reserve_proof));

Expected Behavior
Memory should be initialized before referenced.

Current Behavior
Some missed initializations could potentially result in wrong file handles being closed. Also that one stack buffer could wind up with random data appended to the string. Maybe...

Possible Solution
valgrind seemed happy with a few simple memsets. i.e.:

memset(events,0,sizeof(events));
memset(block_verifiers_send_data_socket,0,sizeof(block_verifiers_send_data_socket));
memset(data,0,sizeof(data));
etc...

I also recommend wrapping the EPOLL_CTL_DEL and the close() calls in a check for a valid socket as such:

   for (count = 0; count < TOTAL_BLOCK_VERIFIERS; count++)
   {
-    epoll_ctl(epoll_fd_copy, EPOLL_CTL_DEL, block_verifiers_send_data_socket[count].socket, &events[count]);
-    close(block_verifiers_send_data_socket[count].socket);
+    if (block_verifiers_send_data_socket[count].socket > 0)
+    {
+      epoll_ctl(epoll_fd_copy, EPOLL_CTL_DEL, block_verifiers_send_data_socket[count].socket, &events[count]);
+      close(block_verifiers_send_data_socket[count].socket);
+    }
   }

Steps to Reproduce (for bugs)
run xcash-dpops under valgrind. I used this in my testnet machine /usr/lib/systemd/system/xcash-dpops.service:

ExecStart=/usr/bin/valgrind --max-stackframe=2100800 --num-callers=20 --leak-check=full --show-leak-kinds=all --track-origins=yes --verbose --log-file=/tmp/valgrind-dpops-out-%%p-%%n.val /home/xcash_user/xcash-official/xcash-dpops/build/xcash-dpops --block-verifiers-secret-key ***secret***  --shared-delegates-website --fee 12.345678 --minimum-amount 10000 --start-time 122 0 8 16 56

Context

Your Environment

  • Version used: latest testnet build
  • Server type and version: Crappy discount Contabo dedicated server :)
  • Operating System and version: Ubuntu 20.04.4 LTS
@ghost ghost added the ◼ Type: Bug This issue relates to an encountered bug. label May 1, 2022
@zachhildreth
Copy link
Member

Thanks a lot for this @CygnusMiner
I will get to looking at it, and applying fixes or asking questions

do you mind running one on the remote-data branch as well and posting any new findings

this way we can leave this issue open while we switch on the testnet to rd and close it when done

Thanks!

@ghost
Copy link
Author

ghost commented May 1, 2022

Sure. I'm attaching a full valgrind file (prior to any memory leak logging since this is an uninitialized memory defect). It does the initialization checking prior to gathering memory leaks (which really clutter up the file) so I can catch it here.

The initialization warnings start at line 206: (edit: .zip files are scary when downloading. Re-uploading it as normal .txt file instead)
valgrind-remote-data.txt

This is a valgrind run on a debug build of a clone of xcash-dpops with a git checkout remote-data using the following flags:

/usr/bin/valgrind --max-stackframe=2100800 --num-callers=20 --leak-check=full --show-leak-kinds=all --track-origins=yes --verbose

zachhildreth added a commit that referenced this issue May 2, 2022
Fix various community memory issues in #72
@zachhildreth
Copy link
Member

zachhildreth commented May 2, 2022

I looked over and fixed all but the epoll struct initializes since it does not say that in the documentation is necessary.

I will look at the attached file next

still need to test on the internal testnet

@ghost
Copy link
Author

ghost commented May 2, 2022

Thanks Zach! The latest clone of the remote-data branch looks good. The only outstanding uninitialized errors are the Syscall param epoll_ctl(event) points to uninitialised byte(s) and Googling that seems to show a consensus that it's a valgrind false positive. I would say this issue has been addressed.

@zachhildreth
Copy link
Member

oh does the txt file not show any more?

@ghost
Copy link
Author

ghost commented May 2, 2022

Not that I can tell. It seemed to show the same findings as what I got from the dpops-test branch which is what I created the first defect against. I believe you've addressed all of the outstanding uninitialized findings thus far.

I can't guarantee that all have been hit, though. It seems valgrind is slowing my dpops down enough that it cant make it past a simple reserve bytes check...

Checking reserve bytes
The database is not synced with the majority of block verifiers, syncing the database from a random block verifier that is in the majority
Syncing the reserve bytes database
Getting the database data from dpops-test-4.xcash.foundation
Could not receive data from dpops-test-4.xcash.foundation
Connecting to another block verifier
Getting the database data from franckiki2.ddns.net
Could not receive data from franckiki2.ddns.net
Connecting to another block verifier
Getting the database data from 5.255.102.225
Could not receive data from 5.255.102.225
Connecting to another block verifier
Getting the database data from testnode-nl.xcash.rocks
Could not receive data from testnode-nl.xcash.rocks
Connecting to another block verifier
Getting the database data from umxnet.ursamajor.eu
Could not receive data from umxnet.ursamajor.eu
Connecting to another block verifier
Getting the database data from 185.215.167.107
Could not receive data from 185.215.167.107
Connecting to another block verifier
etc...

I'm going to have to spend time trying to reduce the overhead. Maybe I'll turn memleak checking off completely...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
◼ Type: Bug This issue relates to an encountered bug.
Projects
None yet
Development

No branches or pull requests

1 participant