Based on Ceph v16.2.5
Architecture
The Objecter
is how Ceph clients, i.e. OSDC, packages all its requests. It
works asynchonously in a callback fashion, where Context*
, which is just Ceph’s
own implementation of std::function
, are associated with parameterized requests
Op
. The Op
is sent to responsible OSD via Messenger
, and on receiving reply
Objecter::ms_dispatch()
is kicked off by Messenger
, where the callbacks will
get executed.
Source
src/osdc/Objecter.(h|cc)
Parameterizing
ObjectOperation
is initialized, users may call its member functions to stuff
requests into it.
Handling
OSDC sending request (exemplified by rados_write_full() ) |
op_submit()
- take
Objecter
lock _op_submit_with_budget()
- throttling?
- set timeout callback
_op_submit()
- calculate target OSDs, and get
OSDSession
_send_op_account()
- inflight OSD requests counter
inflight_ops++
- pending completions counter
num_in_flight++
- performance counters
- inflight OSD requests counter
_session_op_assign()
assignOSDSession
toOp
_send_op()
- convert
Objecter
requestOp
toMessenger
requestMOSDOp
- send
MOSDOp
thruOSDSession
- convert
- unlock, release
OSDSession
- calculate target OSDs, and get
- take
Messenger
on OSD gets the request, and processes it.
OSD handling request |
Messenger
on OSDC listening for incomming responds.
Ending
On successfully return
Objecter::ms_dispatch()
overridingDispatcher
- if
CEPH_MSG_OSD_OPREPLY
dohandle_osd_op_reply()
- some context validity checking
op->trace.event("osd op reply)
for zipkin trace- re-
_op_submit()
if returnted retry / redirect /-EAGAIN
- copy return data field
out_(bl/rval/ec)
and callout_handler
in receivedMOSDOpReply
to localObjecter::Op
out_bl
pointers inObjecter::Op
will be forced to point to corresponding receivedOSDOp::outdata
rval
andec
will be converted to corresponding host OS error codes from receivedOSDOp::rval
out_handler
will be executed, all calling parameters are provided by the receivedOSDOp
num_in_flight--
if any callback- log
l_osdc_op_reply
- (get
OSDSession
lock and)_finish_op()
do callback - release
MOSDOpReply
- … (handler for several other types of
Message
s) …
- if
On timeout
op_cancel(tid, -ETIMEOUT)
num_in_flight--
and execute associated callback with-ETIMEOUT
_op_cancel_map_check()
- erase from
check_latest_map_ops
Op
s that were waiting for latest OSD map were pushed into this map during_op_submit()
with_send_op_map_check()
, if_calc_target()
determinescheck_for_latest_map
is true.- erase from
_finish_op()
put_op_budget_bytes()
accumulate budget to throttler- erase from timeout pool
_session_op_remove()
releaseOSDSession
ref cntrinflight_ops--
(andl_osd_op_active
in logger)- release
Op
ref cntr
The timeout case could be a compact implementation of properly ending and releasing an
Op
. We may end up just usingop_cancel(tid, 0)
instead of reimplementing everything inhandle_osd_op_reply()
.