Quantcast
Viewing all articles
Browse latest Browse all 25

gRPC usage in gems and our webstack

Oliver is trying to understand our stack. When I read up on gRPC I get the idea of a “remote procedure call”; your program can run processes on remote machines (or the same machine) and those processes can mess up bad without killing your main program. I can’t totally make the connection from that to what Lachele has written, so here are some questions and answers:

GRPC has multiple purposes (We actually use it independently in multiple places, Oliver wasn’t getting this concept, I thought we had one gRPC doing all things, this is not the case). The code for the gRPC implementation is in multiple places:

V_2/GRPC // Docker files for generating the grpc container. No grpc code here.
V_2/Web_Programs/gems/gRPC // for submitting to clusters. Slurm has no means for submitting from another machine.
V_2/Web_Programs/GRPC/JSON // The actual GRPC code for the stack is in.

Note There is also a BatchCompute folder in V_2/Web_Programs/GRPC/ at the same level. L is not sure of its current status. Oliver found a BatchCompute submodule elsewhere in gems that is being used, so maybe this was an old idea (or it’s being used from there)?

Ok now some Q&A on what gRPC does by Lachele:

  1. Stop direct access of gems/gmml by Django. The Django container should have no direct access to GEMS. I see that gems is mounted into the container right now. I will try to figure out why that is. It should not be. The only access to GEMS for Django should be through gRPC. Every interaction in gRPC must be explicitly defined. So, if gRPC doesn’t have a way to do it, it can’t be done.
  2. Ensure that segfaults and such do not kill the website. gRPC processes can segfault and the website is ok. If the website code is importing a module from GEMS, and GEMS is importing something from GMML, the operating system has no way to draw those boundaries. From the OS point of view, the website is doing it all. The segfault comes from the OS, not the code. The OS sees a process ID and it gets killed. It’s possible that the newer gRPC handles that situation automatically. But, it did not when I first ran into this problem.
  3. Catch any errors from gems/gmml and package them nicely for the website.  I’m talking about ensuring an (api) contract. This part ensures that gRPC gives back a proper JSON object to the website. That significantly eases the coding/knowledge burden for the website.
  4. Provide access to services that are not daemons. Neither GEMS nor Slurm are a daemon listening on a port. This is essentially why gRPC exists – to provide that sort of access. We could probably rewrite GEMS to be able to exist in ‘daemon mode’, but we can also just use gRPC.

Viewing all articles
Browse latest Browse all 25

Trending Articles