No Description
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

DESIGN 12KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296
  1. DESIGN
  2. ======
  3. What follows is my best effort in giving the big-ascii-picture
  4. of what happens when `smd-pull` is run. `smd-push` simply
  5. swaps `smd-server` and `smd-client`. Note that the sync direction is
  6. from `smd-server` to `smd-client`, so running them on the opposite hosts
  7. inverts the sync direction.
  8. Your mail server Your laptop
  9. ---------------- -----------
  10. --- sync direction ---> smd-pull
  11. |
  12. |
  13. smd-server ------- ssh ----- smd-client
  14. | |
  15. | |
  16. mddiff (mddiff)
  17. `smd-client` uses `mddiff` only compute sha1 sums, not to compute
  18. a diff as `smd-server` does.
  19. Both endpoints hold a file (the `db-file` described below) that describes the
  20. status of the mailbox on which they previously agreed. The server will compute
  21. the difference between the current mailbox status and the previous one on which
  22. the client agreed. This diff is sent to the client, that tries to apply it,
  23. possibly requesting some data to the server. If the client succeeds, both the
  24. server and the client now agree on the new mailbox status.
  25. END USER tools
  26. ==============
  27. smd-pull and smd-push
  28. ---------------------
  29. The idea is quite simple. If `===` is a double pipe (a pair of pipes, one
  30. for `stdin` and one for `stdout`), `smd-pull` simply performs the following
  31. smd-client $CLIENTNAME $MAILBOX === tee log === \
  32. ssh $SERVERNAME smd-server $CLIENTNAME $MAILBOX
  33. The `tee` command is used only for logging, and if $DEBUG is `false` it is
  34. replaced by `cat`. Viceversa `smd-push` performs what follows
  35. smd-server $CLIENTNAME $MAILBOX === tee log === \
  36. ssh $SERVERNAME smd-client $CLIENTNAME $MAILBOX
  37. They are both implemented in `bash`, since their main activity is to
  38. redirect standard file descriptors and call other tools, check their exit
  39. status and eventually notify the user with an extract of their logs.
  40. smd-loop
  41. --------
  42. The idea is to mimic cron, but retry a failed sync attempt if the
  43. given error is transient. `smd-client` and `smd-server` output TAGS
  44. that specify if the occurred error needs human intervention or not, and
  45. also suggest some actions, like retry. `smd-loop` understands these tags,
  46. and gives a second chance to a command that fails with an error that does not
  47. require human intervention and for which the suggested action is retry.
  48. It is implemented in `bash`, since it is mostly a while true loop. Arrays
  49. (non POSIX shell compliant) are used to record failures, and give only a
  50. second chance to every `smd-push` or `smd-pull` command.
  51. smd-applet
  52. ----------
  53. To write an hopefully eye-candy applet for GNOME, the language Vala was an
  54. intriguing choice, since it is based on smart and sound ideas (that is
  55. to avoid the C++ non-standardized calling conventions) to provide a modern
  56. object oriented programming language built around gobject and glib. Bindings
  57. for GTK+, GConf, libnotify, etc... are available, and require no compiled
  58. glue code, just bare text `.vapi` files.
  59. If you are used to languages where writing bindings is not a trivial task,
  60. you'd better look at Vala, where bindings are simple by design.
  61. SERVER/CLIENT interaction
  62. =========================
  63. A server software (`smd-server`) and a client software (`smd-client`) are
  64. respectively used to transmit the diff generated by `mddiff` and eventually
  65. mails header or body, and to apply a diff eventually requesting necessary
  66. data to the other endpoint.
  67. Since they mostly implement policies, like deciding if a diff can be
  68. applied or not, are implemented in an high level scripting language called
  69. [Lua](http://www.lua.org). The language choice is almost arbitrary, there
  70. are no strong reasons for adopting Lua instead of python or others, but its
  71. installation is pretty small and it executes quite fast. Moreover, its
  72. syntax is particularly simple, making it understandable to non Lua experts
  73. too. Finally, I find it elegant.
  74. They send and receive data on their standard input and output channels,
  75. delegating to external tools the transmission of data across a network, and
  76. optimizations like compressing the data, or encrypting it.
  77. [OpenSSH](http://www.openssh.com/) can do both, and is adopted by
  78. `smd-pull` and `smd-push` to connect `smd-client` to `smd-server`.
  79. A simple protocol defines how `smd-client` requests data to `smd-server`
  80. and how `smd-client` notifies `smd-server` that all changes have been
  81. applied correctly.
  82. The protocol
  83. ------------
  84. The protocol is line oriented for commands, chunk oriented for data
  85. transmission.
  86. 1. Both client and server send the following two messages, and check that
  87. they are equal to the ones sent by the other endpoint
  88. protocol NUMBER
  89. dbfile SHA1
  90. This part of the protocol is called handshake
  91. 2. The server sends the output of `mddiff` (that is line oriented)
  92. and then the following message to conclude the first phase of the protocol,
  93. now the client is expected to reply
  94. END
  95. 3. The client, from now on, can at any time send the following (alternative)
  96. messages
  97. ABORT
  98. COMMIT
  99. The former informs the server that the client was unable to apply the
  100. diff generated by `mddiff`, while the latter informs the server that all
  101. changes were applied successfully.
  102. 4. In response to a `COMMIT` message, se server will transmit an `xdelta`
  103. patch the client has to apply to its db-file.
  104. 5. The client replies with `DONE` to complete the synchronization
  105. 6. After point 2. and before point 3. the client can send the following
  106. commands to the server, that can reply transmitting data or with
  107. `ABORT`. NAME is not URL encoded.
  108. GET NAME
  109. GETHEADER NAME
  110. GETBODY NAME
  111. ### Transmission
  112. The server can transmit data or refuse. In the latter case it just sends
  113. `ABORT`. In the former case it sends
  114. chunk NUMBER
  115. ...DATA...
  116. First it declares with `chunk` the number of bytes to be sent, then
  117. its sends the data.
  118. MAILDIR DIFF
  119. ============
  120. Maildir diff (`mddiff`) computes the delta from an old status of a maildir
  121. (previously recorded in the db-file) and the current status, generating a
  122. set of commands (a diff) that a third party software can apply to
  123. synchronize a (remote) copy of the maildir.
  124. How it works
  125. ------------
  126. This software uses sha1 to compute snapshots of a maildir, and computes a
  127. set of actions a client should perform to sync with the mailbox status.
  128. This software alone is unable to synchronize two maildirs. It has to be
  129. supported but an higher level tool implementing the application of actions
  130. and data transfer over the network if the twin maildir is remote.
  131. To cache the expensive sha1 calculation, a cache file is used. Every run
  132. the program generates a new status file (appending .new) that must
  133. substitute the old one if generated actions are committed to the other
  134. maildir. Cache files are specific to the twin maildir, if you have more
  135. than one, you must use a different cache file for each of them.
  136. The db-file (say db.txt) is paired with a timestamp (db.txt.mtime) that
  137. is used to store the timestamp of the last run and files whose mtime
  138. does not exceed this timestamp will not be (re)processed next time
  139. mddiff is run.
  140. The .mtime companion file is updated only server side, since the mtime
  141. concept is local to the host running mddiff.
  142. The db-file format
  143. ------------------
  144. The db-file is composed by two files, a real database file (extension .txt)
  145. and a timestamp (extension .txt.mtime). The latter contains just a number
  146. (date +%s). The former is line oriented, every line has 3, space separated,
  147. fields:
  148. - the sha1 sum of the header
  149. - the sha1 sum of the body
  150. - the name of the file, not URL encoded
  151. The commands
  152. ------------
  153. From now on, name refers to a file name, hsha to the sha1 sum of its header
  154. and bsha to the sha1 sum of its body.
  155. - `ADD name hsha bsha` is generated whenever a new mail message is found,
  156. and there is no mail message with a different name but the same body.
  157. - `COPY name hsha bsha TO newname` is generated if a new message is found,
  158. and the mailbox contains a copy of it.
  159. - `MOVE name hsha bsha TO newname` is generated if a new message is found,
  160. and the mailbox does not contain a copy of it but it used do.
  161. - `COPYBODY name bsha TO newname newhsha` is generated when a new file is
  162. created, and that file has the same body of an already existent file.
  163. In case mail has been moved, this message is followed by a `DELETE` command.
  164. This happens when a new message is moved to another directory and marked
  165. in some way changing its header (for example when a new message is
  166. moved to the trash bin)
  167. - `DELETE name hsha bsha` is emitted when a message is no longer present.
  168. - `REPLACEHEADER name hsha bsha WITH newhsha` is emitted whenever
  169. a message that was already present has a different header but the same body.
  170. - `REPLACE name hsha bsha WITH newhsha newbsha` is emitted whenever the body
  171. (and eventually the header) of mailmassage change. This never happens
  172. in practice, since MUAs should do a copy of the edited message, not replace
  173. it.
  174. - `ERROR message` is emitted whenever an error is encountered; message is
  175. intended to be human readable.
  176. Messages should be processed in order, with the exception of `ADD` that can be
  177. safely postponed. In particular `DELETE` messages are always sent last, and
  178. `COPY` or `COPYBODY` messages preceeding them may refer to the same file
  179. `name`. Performing deletions in advance is still sound (since the client
  180. can always ask the servevr for the message) but clearly suboptimal, since
  181. a local copy does not involve any network traffic.
  182. File names are URL encoded escaping only `' '` (`%20`) and `'%'` (`%25`).
  183. `mddiff` as an hashing server
  184. -----------------------------
  185. `mddiff` is also used by the client to compute the sha1 sums of header
  186. and body of local mails, for example to check that the source of a copy
  187. command holds the intended content. Since this operation may be really
  188. frequent, `mddiff` can operate in server mode. If the argument is a single
  189. file name and that file is a fifo, then `mddiff` reads file names not URL
  190. encoded, separated by `\n` from that fifo and outputs the sha1 sums of
  191. their header and body.
  192. `mddiff` as an `mkdir -p; ln` server
  193. ------------------------------------
  194. `mddiff` is also used by the client to create the indirection layer
  195. needed to ranme mailbox folders. If the argument is a single
  196. file name and that file is a fifo and the `-s` flag is passed, then `mddiff`
  197. reads directories names not URL encoded, separated by `\n`, 2 at a time,
  198. from that fifo. The first one is the source path, the latter the target.
  199. Then it behaves like `mkdir -p $(dirname $target); ln -s $source $target`.
  200. For example if source is `~/Mail/foo/cur` and the target is `Maildir/.foo/cur`
  201. then `mddiff` will create the direcotries `Maildir` and `Maildir/.foo`
  202. and place in the latter a link named `cur` to `~/Mail/foo/cur`.
  203. Easy to parse output messages
  204. =============================
  205. `smd-pull` and `smd-push` prefix all error messages with `ERROR:`, but
  206. what follows is meant to be read by a human being. To make other tools able to
  207. parse and react to error messages, a more formal output is given.
  208. A single line, prefixed with `TAGS:` is output if requested (`-v` option).
  209. It can be followed by `error::` or `stats::`, that denote an error message or a
  210. statistical one respectively. Then a list of improperly called tags is output.
  211. Their meaning should be easy to guess.
  212. <M> ::= "error::" <ET> | "stats::" <ST> | "stats::" <DR>
  213. <ET> ::= "context(" <STR> ")"
  214. "probable-cause(" <STR> ")"
  215. "human-intervention(" <HI> ")"
  216. <SA>
  217. <SA> ::= | "suggested-actions(" <ACTS> ")"
  218. <STR> ::= `[^)]+`
  219. <HI> ::= "necessary" | "avoidable"
  220. <ACT> ::= <A> | <A> <ACTS>
  221. <A> ::= "run(" <STR> ")"
  222. | "display-mail(" <STR> ")"
  223. | "display-permissions(" <STR> ")"
  224. <ST> ::= "new-mails(" <NUM> ")" <SPC>
  225. "del-mails(" <NUM> ")" <SPC>
  226. "bytes-received(" <NUM> ")" <SPC>
  227. "xdelta-received(" <NUM> ")" <SPC>
  228. "xdelta-received(" <NUM> ")"
  229. <DR> ::= "mail-transferred(" <ML> ")"
  230. <ML> ::= <STR> | <STR> " , " <ML>
  231. <NUM> ::= `[0-9]+`
  232. <SPC> ::= ` *,? *`