Though special syntax is not really necessary in scheme, it is common. An important concept is
quoting.
'(if a b c)
is the same as
(quote (if a b c))
If you are writing a scheme interpreter, you definitely want to handle quoting. Which leads to one of scheme's quirkier syntaxes:
#\a
is the character 'a'.
Now, it is pretty easy to write a parser that can tell the difference between
'a
and
'a'
, but people accustomed to scheme will find it confusing to use
'a'
instead of
#\a
. If you
must have the c-style character syntax, do it in
addition to the usual syntaxes, or make it very clear that this is not scheme.
(More on that in a moment.)
This also goes to using
#
as a comment delimiter. Most scheme interpreters are smart enough to know that the very first line of a file may begin with something like
1 2
|
#! /usr/env myfoo \
-a -b -c
|
and properly ignore it, but since the hash character is otherwise special, the semicolon ';' is used for end of line commentary. Again, it is entirely possible to modify the interpreter to use '#', but at this point a simple heuristic is not enough to make the two uses coexist. Either it is scheme or it is not.
every list will start with a symbol |
OK, but impractical, especially as it costs nothing to drop that requirement.
... make expressions fit in one line. Later I'll figure out [how to make progressive input] |
You might as well do this right at the start. It will save you grief later.
What do you mean by these questions? |
An s-expression in scheme is actually a binary tree. (A list like an array, the kind of thing we C/C++ people are used to, is called a
vector in Scheme.) This leads to the most important predicate possible: is a node a pair?
If it is a pair, then there are two pointers in the data, one to a 'car' node and one to a 'cdr' node, either of which may be any valid s-expression node.
If it is not a pair, then the data must be an atomic type, like a symbol or number. In your case, you are trying to split between 'literals' and 'symbols', but if you are using on-demand type conversion, there is no point in distinguishing between the two at the tree level unless you plan to encode a symbol as an index into a lookup table or something behind the scenes (your sexp struct doesn't appear to provide for this).
Hence, the list
'(a b c)
is actually the nested pairs
'(a . (b . (c . ())))
, which is represented in memory as:
╔═══╤═══╗ ╔═══╤═══╗ ╔═══╤═══╗
║ │ ─╫─►║ │ ─╫─►║ │NUL║
╚═╪═╧═══╝ ╚═╪═╧═══╝ ╚═╪═╧═══╝
│ │ │
▼ ▼ ▼
╔═══╗ ╔═══╗ ╔═══╗
║ a ║ ║ b ║ ║ c ║
╚═══╝ ╚═══╝ ╚═══╝
where
╔═══╤═══╗ ╔═══╗
║ │ ║ is a pair node, and ║ ║ is an atom node
╚═══╧═══╝ ╚═══╝ |
That NUL there could actually be represented as pointing to a canonical null node (a singleton); both representations are common.
All nodes and data in an s-expression are typically considered to be immutable. This allows the interpreter to apply some nice reasoning on the language. In terms of memory management, it also allows the interpreter to simply pass references around to data instead of deep copying everything. And once we do that, we need reference counting for proper garbage collection.
You may be reinventing the wheel. You should check out
Racket, very nice dialect of Scheme that is both R5RS and R6RS compliant, and makes it very easy to write executables with embedded interpreters in it, defaulting to the dialect you want. The PLT people are also very interested in secure systems, so it is easy to modify the dialect to support your options and forbid dangerous stuff to your users.
I sometimes usually like to write procedural code in C++ because it has a good balance of efficiency and high-level language and library features that C lacks. |
As you will, but you will make your task about a billion times easier for yourself if you use C++ stuff here by default. So unless you are writing something for a very restricted embedded device... in which case I again recommend you to Racket.
An s-expression is really only realizable as a procedural object, but you can manage all that with a very simple wrapper class so that your C++ users only see it as a nice, standard C++ class type.
Oh, a reference:
http://racket-lang.org/
Let me know what you want to do.