Replies: 3 comments 5 replies
-
While all of this discussion is useful, especially with regard to how WMA works, I don't see that any of this is related to either the Mathics scanner or to character tables. (I bundled the two in one project, but they could have been separated into two projects, and maybe in the future that would be a good idea). Character tables have no notion of MakeBoxes, or formatting or any of that. The only thing they care about is whether the properties of characters are represented. The scanner's job is just to produce tokens for a parser. Again this has nothing to do with specific built-in functions like MakeBoxes, Infix, or Expressions or formatting. Right now all of this happens in mathics-core. The second thing I'd say about "best option to store" is that it really doesn't matter. You could use anything that uniquely describe the operator. So the operator name, its ASCII sequence or the WL unicode would all work. Standard Unicode would probably work too, but we'd have to make sure that two operators don't map to the same Unicode. I think that's the case but best to just avoid the problem altogether. We know for example that the operator names and ASCII sequence have to be unique. Since the code currently uses the ASCII sequence, I don't see a problem in keeping that. If you want to reduce vagueness around this name and make massive changes (which is what would have to be done with using the WL unicode), then change the name "operator" to "ascii_operator_string". |
Beta Was this translation helpful? Give feedback.
-
Under "[Input]" For simplicity, let's start out assuming only character string input. We have to learn to walk before we can run. This too has been a pervasive problem: complicating the problem so that things become harder at the outset. I don't think any generality in a solution is lost if we start out with string input only. For other kinds of input, other kinds of scanners and parsers can be written. |
Beta Was this translation helpful? Give feedback.
-
Right - and what I am saying is that it is a gross inefficiency to have to go back to the scanner just to do a table lookup to convert a name like "LeftVector" into a particular character string. And the Instead, define a built-in function to do this, unless there is already one in WMA (which I doubt). |
Beta Was this translation helpful? Give feedback.
-
Regarding the use of the attribute
operator
, and the connection with MathicsScanner, I was thinking about it, and I guess that the best option is to store on it the WL-Unicode representation.I will try to explain here how this works in WMA, based on the description in WR (https://reference.wolfram.com/language/tutorial/TextualInputAndOutput.html), and how I think we can implement it in Mathics.
The main problem with this kind of description is that several (non-equivalent) steps have the same name (for example "parsing") and my own limitations in knowing the specific, technical words, but I am sure you could fill the gaps.
[Input]
The first step in the evaluation process is the user input, in a certain front-end. In a text-based front-end, the input is a
String
. In the graphics interface of WMA, the input could also be some kind of rich text, represented in a WL boxed structure.In WMA, the following step is then to apply the
MakeExpression
rule over the input. This should convert a String or a BoxExpression into a WL evaluableExpression
. As our front-ends just allow "1D" inputs, let's consider just this case.Differently from WMA, instead of applying
MakeExpression
rules, we send the string directly to the Mathics parser (which is the basic rule forMakeExpression
too).The Mathics parser now is inside mathics-core, but it uses the tokenizer implemented in MathicsScanner. This tokenizer uses then the tables defined in that package to produce a sequence of tokens, which are going to be parsed. At that point, we can tell to the tokenizer which is the encoding of the input interface, to interpret different characters accordingly.
At the output of the parser, we have then a
mathics.core.Expression
(ormathics.core.Atom
) (let's refer to it asexpr
) withString
elements encoded with WL-Unicode characters. In the rest of the evaluation, (except in the evaluation of expressions likeMakeExpression[...]
orToString[...]
) there is no need to look at the encoding or to use any table.[Processing]
Now the expression is evaluated by calling the method
expr.evaluate(evaluation)
. The result of this part is another expressionresult
. To get something that we can show in the front-end, we have first to passresult
through the formatter.[Output]
The formatter must convert
result
into amathics.builtin.core.BoxMixin
(mathics.builtin.base.BoxExpression)
object (let's call itboxed_result
). In Mathics, to do this, the expressionMakeBoxes[result, StandardForm]
is evaluated. Inside theMakeBoxes
rules, an evaluation ofFormat[result]
is done. This applies the formatting rules associated to the elements in the expression. At this point, an expression likeEquivalent[a,b,c]
is converted into
Infix[{"a","b","c"}, op]
whereop
is a String (in the WL-Unicode encoding). Let's call the result of applyingFormat
formattedResult
.Then,
MakeBoxes
rules are applied overformattedResult
to produce aString
or aBoxExpression
(let's call itboxedResult
). Here,String
elements are still encoded asWL-Unicode
.Notice that here could be important to differentiate the regular evaluation from the specific sequence of replacement rules application.
Format[expr]
is not evaluated in the sense that is not equivalent toExpression(SymbolFormat, expr).evaluate(evaluation)
. In formatting,Format
rules are applied in a specific order over the Expression. InMakeBoxes
happens a similar process.(I have run several experiments to check this behavior. I am going to organize them and put them bellow, as separated comments, to avoid making this presentation much longer than it already is)
[Interpret the formatted output on the front-end]
In WMA, the
boxedResult
is processed by the front end to produce the textual/graphical output. In Mathics, this is done by calling theboxedResult.boxes_to_format()
method. In a text-only output, the result of this method should be a (Python)str
encoded with the front-end encoding. This can be done by passing to the method the encoding as a parameter. Notice that in this last stage, we do not need themathics.core.Definitions
object anymore, but the tables in MathicsScanner.Comment aside: The resulting
str
object should be equivalent to the value of theString
object obtained fromToString[expression, CharacterEncoding->encoding]
.@rocky, regarding the comment at the end of #43 (this one)
So, in your example, if
ToExpression["\"\\[LeftVector]\""]
is introduced in the interpreter, then, the apply method ofToExpression
parses the argument using mathics_scanner, to produce the String↼
, (encoded as a WL-unicode character) and returns thatString
as output.Now, the expression is formatted (now, it consists of adding double quotes to the string value). Then,
MakeBoxes
rules are applied (in this case, they are trivial) and produce the string"↼"
.The following step is to call the
boxes_to_format
method. This is the point in which the (system) encoding matters: if the encoding is "ASCII", then (theString
)"↼"
should be translated to a (Python's)str
, with the WL-Unicode character replaced by "[LeftVector]", or by an ASCII equivalent. If the encoding isUTF-8
, then a (standard) UTF-8 equivalent should replace the character. And if there is another encoding, another replacement rule should be applied here.Beta Was this translation helpful? Give feedback.
All reactions