123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296 |
- DESIGN
- ======
-
- What follows is my best effort in giving the big-ascii-picture
- of what happens when `smd-pull` is run. `smd-push` simply
- swaps `smd-server` and `smd-client`. Note that the sync direction is
- from `smd-server` to `smd-client`, so running them on the opposite hosts
- inverts the sync direction.
-
- Your mail server Your laptop
- ---------------- -----------
-
-
- --- sync direction ---> smd-pull
- |
- |
- smd-server ------- ssh ----- smd-client
- | |
- | |
- mddiff (mddiff)
-
- `smd-client` uses `mddiff` only compute sha1 sums, not to compute
- a diff as `smd-server` does.
-
- Both endpoints hold a file (the `db-file` described below) that describes the
- status of the mailbox on which they previously agreed. The server will compute
- the difference between the current mailbox status and the previous one on which
- the client agreed. This diff is sent to the client, that tries to apply it,
- possibly requesting some data to the server. If the client succeeds, both the
- server and the client now agree on the new mailbox status.
-
- END USER tools
- ==============
-
- smd-pull and smd-push
- ---------------------
-
- The idea is quite simple. If `===` is a double pipe (a pair of pipes, one
- for `stdin` and one for `stdout`), `smd-pull` simply performs the following
-
- smd-client $CLIENTNAME $MAILBOX === tee log === \
- ssh $SERVERNAME smd-server $CLIENTNAME $MAILBOX
-
- The `tee` command is used only for logging, and if $DEBUG is `false` it is
- replaced by `cat`. Viceversa `smd-push` performs what follows
-
- smd-server $CLIENTNAME $MAILBOX === tee log === \
- ssh $SERVERNAME smd-client $CLIENTNAME $MAILBOX
-
- They are both implemented in `bash`, since their main activity is to
- redirect standard file descriptors and call other tools, check their exit
- status and eventually notify the user with an extract of their logs.
-
- smd-loop
- --------
-
- The idea is to mimic cron, but retry a failed sync attempt if the
- given error is transient. `smd-client` and `smd-server` output TAGS
- that specify if the occurred error needs human intervention or not, and
- also suggest some actions, like retry. `smd-loop` understands these tags,
- and gives a second chance to a command that fails with an error that does not
- require human intervention and for which the suggested action is retry.
-
- It is implemented in `bash`, since it is mostly a while true loop. Arrays
- (non POSIX shell compliant) are used to record failures, and give only a
- second chance to every `smd-push` or `smd-pull` command.
-
- smd-applet
- ----------
-
- To write an hopefully eye-candy applet for GNOME, the language Vala was an
- intriguing choice, since it is based on smart and sound ideas (that is
- to avoid the C++ non-standardized calling conventions) to provide a modern
- object oriented programming language built around gobject and glib. Bindings
- for GTK+, GConf, libnotify, etc... are available, and require no compiled
- glue code, just bare text `.vapi` files.
-
- If you are used to languages where writing bindings is not a trivial task,
- you'd better look at Vala, where bindings are simple by design.
-
- SERVER/CLIENT interaction
- =========================
-
- A server software (`smd-server`) and a client software (`smd-client`) are
- respectively used to transmit the diff generated by `mddiff` and eventually
- mails header or body, and to apply a diff eventually requesting necessary
- data to the other endpoint.
-
- Since they mostly implement policies, like deciding if a diff can be
- applied or not, are implemented in an high level scripting language called
- [Lua](http://www.lua.org). The language choice is almost arbitrary, there
- are no strong reasons for adopting Lua instead of python or others, but its
- installation is pretty small and it executes quite fast. Moreover, its
- syntax is particularly simple, making it understandable to non Lua experts
- too. Finally, I find it elegant.
-
- They send and receive data on their standard input and output channels,
- delegating to external tools the transmission of data across a network, and
- optimizations like compressing the data, or encrypting it.
- [OpenSSH](http://www.openssh.com/) can do both, and is adopted by
- `smd-pull` and `smd-push` to connect `smd-client` to `smd-server`.
-
- A simple protocol defines how `smd-client` requests data to `smd-server`
- and how `smd-client` notifies `smd-server` that all changes have been
- applied correctly.
-
- The protocol
- ------------
-
- The protocol is line oriented for commands, chunk oriented for data
- transmission.
-
- 1. Both client and server send the following two messages, and check that
- they are equal to the ones sent by the other endpoint
-
- protocol NUMBER
- dbfile SHA1
-
- This part of the protocol is called handshake
-
- 2. The server sends the output of `mddiff` (that is line oriented)
- and then the following message to conclude the first phase of the protocol,
- now the client is expected to reply
-
- END
-
- 3. The client, from now on, can at any time send the following (alternative)
- messages
-
- ABORT
- COMMIT
-
- The former informs the server that the client was unable to apply the
- diff generated by `mddiff`, while the latter informs the server that all
- changes were applied successfully.
-
- 4. In response to a `COMMIT` message, se server will transmit an `xdelta`
- patch the client has to apply to its db-file.
-
- 5. The client replies with `DONE` to complete the synchronization
-
- 6. After point 2. and before point 3. the client can send the following
- commands to the server, that can reply transmitting data or with
- `ABORT`. NAME is not URL encoded.
-
- GET NAME
- GETHEADER NAME
- GETBODY NAME
-
- ### Transmission
-
- The server can transmit data or refuse. In the latter case it just sends
- `ABORT`. In the former case it sends
-
- chunk NUMBER
- ...DATA...
-
- First it declares with `chunk` the number of bytes to be sent, then
- its sends the data.
-
- MAILDIR DIFF
- ============
-
- Maildir diff (`mddiff`) computes the delta from an old status of a maildir
- (previously recorded in the db-file) and the current status, generating a
- set of commands (a diff) that a third party software can apply to
- synchronize a (remote) copy of the maildir.
-
- How it works
- ------------
-
- This software uses sha1 to compute snapshots of a maildir, and computes a
- set of actions a client should perform to sync with the mailbox status.
- This software alone is unable to synchronize two maildirs. It has to be
- supported but an higher level tool implementing the application of actions
- and data transfer over the network if the twin maildir is remote.
-
- To cache the expensive sha1 calculation, a cache file is used. Every run
- the program generates a new status file (appending .new) that must
- substitute the old one if generated actions are committed to the other
- maildir. Cache files are specific to the twin maildir, if you have more
- than one, you must use a different cache file for each of them.
-
- The db-file (say db.txt) is paired with a timestamp (db.txt.mtime) that
- is used to store the timestamp of the last run and files whose mtime
- does not exceed this timestamp will not be (re)processed next time
- mddiff is run.
-
- The .mtime companion file is updated only server side, since the mtime
- concept is local to the host running mddiff.
-
- The db-file format
- ------------------
-
- The db-file is composed by two files, a real database file (extension .txt)
- and a timestamp (extension .txt.mtime). The latter contains just a number
- (date +%s). The former is line oriented, every line has 3, space separated,
- fields:
- - the sha1 sum of the header
- - the sha1 sum of the body
- - the name of the file, not URL encoded
-
- The commands
- ------------
-
- From now on, name refers to a file name, hsha to the sha1 sum of its header
- and bsha to the sha1 sum of its body.
-
- - `ADD name hsha bsha` is generated whenever a new mail message is found,
- and there is no mail message with a different name but the same body.
- - `COPY name hsha bsha TO newname` is generated if a new message is found,
- and the mailbox contains a copy of it.
- - `MOVE name hsha bsha TO newname` is generated if a new message is found,
- and the mailbox does not contain a copy of it but it used do.
- - `COPYBODY name bsha TO newname newhsha` is generated when a new file is
- created, and that file has the same body of an already existent file.
- In case mail has been moved, this message is followed by a `DELETE` command.
- This happens when a new message is moved to another directory and marked
- in some way changing its header (for example when a new message is
- moved to the trash bin)
- - `DELETE name hsha bsha` is emitted when a message is no longer present.
- - `REPLACEHEADER name hsha bsha WITH newhsha` is emitted whenever
- a message that was already present has a different header but the same body.
- - `REPLACE name hsha bsha WITH newhsha newbsha` is emitted whenever the body
- (and eventually the header) of mailmassage change. This never happens
- in practice, since MUAs should do a copy of the edited message, not replace
- it.
- - `ERROR message` is emitted whenever an error is encountered; message is
- intended to be human readable.
-
- Messages should be processed in order, with the exception of `ADD` that can be
- safely postponed. In particular `DELETE` messages are always sent last, and
- `COPY` or `COPYBODY` messages preceeding them may refer to the same file
- `name`. Performing deletions in advance is still sound (since the client
- can always ask the servevr for the message) but clearly suboptimal, since
- a local copy does not involve any network traffic.
-
- File names are URL encoded escaping only `' '` (`%20`) and `'%'` (`%25`).
-
- `mddiff` as an hashing server
- -----------------------------
-
- `mddiff` is also used by the client to compute the sha1 sums of header
- and body of local mails, for example to check that the source of a copy
- command holds the intended content. Since this operation may be really
- frequent, `mddiff` can operate in server mode. If the argument is a single
- file name and that file is a fifo, then `mddiff` reads file names not URL
- encoded, separated by `\n` from that fifo and outputs the sha1 sums of
- their header and body.
-
- `mddiff` as an `mkdir -p; ln` server
- ------------------------------------
-
- `mddiff` is also used by the client to create the indirection layer
- needed to ranme mailbox folders. If the argument is a single
- file name and that file is a fifo and the `-s` flag is passed, then `mddiff`
- reads directories names not URL encoded, separated by `\n`, 2 at a time,
- from that fifo. The first one is the source path, the latter the target.
- Then it behaves like `mkdir -p $(dirname $target); ln -s $source $target`.
- For example if source is `~/Mail/foo/cur` and the target is `Maildir/.foo/cur`
- then `mddiff` will create the direcotries `Maildir` and `Maildir/.foo`
- and place in the latter a link named `cur` to `~/Mail/foo/cur`.
-
- Easy to parse output messages
- =============================
-
- `smd-pull` and `smd-push` prefix all error messages with `ERROR:`, but
- what follows is meant to be read by a human being. To make other tools able to
- parse and react to error messages, a more formal output is given.
- A single line, prefixed with `TAGS:` is output if requested (`-v` option).
- It can be followed by `error::` or `stats::`, that denote an error message or a
- statistical one respectively. Then a list of improperly called tags is output.
- Their meaning should be easy to guess.
-
- <M> ::= "error::" <ET> | "stats::" <ST> | "stats::" <DR>
- <ET> ::= "context(" <STR> ")"
- "probable-cause(" <STR> ")"
- "human-intervention(" <HI> ")"
- <SA>
- <SA> ::= | "suggested-actions(" <ACTS> ")"
- <STR> ::= `[^)]+`
- <HI> ::= "necessary" | "avoidable"
- <ACT> ::= <A> | <A> <ACTS>
- <A> ::= "run(" <STR> ")"
- | "display-mail(" <STR> ")"
- | "display-permissions(" <STR> ")"
- <ST> ::= "new-mails(" <NUM> ")" <SPC>
- "del-mails(" <NUM> ")" <SPC>
- "bytes-received(" <NUM> ")" <SPC>
- "xdelta-received(" <NUM> ")" <SPC>
- "xdelta-received(" <NUM> ")"
- <DR> ::= "mail-transferred(" <ML> ")"
- <ML> ::= <STR> | <STR> " , " <ML>
- <NUM> ::= `[0-9]+`
- <SPC> ::= ` *,? *`
|