Technology

Versão de impressão

2010-01-18: Prof. Sherry Mantyka will visit ActiveMath from March 15 to April 2, 2010

ActiveMath - Segunda, 2010/01/18 - 11:10
The Canadian mathematic pedagogist Prof. Sherry Mantyka who is organizing bridging courses at Memorial University of Newfoundland is visiting Saarbrücken to help designing a remedial scenario for ActiveMath.
Categorias: Technology

tristanhuntjuricek

xbeta - Sábado, 2010/01/16 - 17:52
Updated by tristanhuntjuricek on 2010-01-16 at 15:52:45Z.
Categorias: Technology

2009-12-24: Sergey has received EU Marie Curie International Incoming Fellowship

ActiveMath - Sexta, 2010/01/15 - 16:11
Sergey Sosnovsky's proposal for EU Marie Curie Fellowship has been approved by the EU Research Executive Agency. The funding will start in April, 2010 and will last until April 2012. The project "Intelligent Support for Authoring Semantic Learning Content" will focus on implementation of author-friendly technologies for learning content development, including collaborative authoring support, metadata authoring support, open-corpus content discovery, interactivity authoring, and gap detection.
Categorias: Technology

2009-10-15: Visiting Researcher from Russia

ActiveMath - Sexta, 2010/01/15 - 16:11
Anatoly Belchusov from Russia is a DAAD visiting researcher at DFKI. The purpose of his visit is developing domain reasoning services to serve intelligent diagnosis and feedback generation in the domain of integrals. He is using YACAS platform for encoding the domain reasoner, which is connected to the ITS module of ActiveMath Learning Environment. Anatoly stays in Saarbrücken from 15.10 till 15.12.2009.
Categorias: Technology

2009-10-01: A new researcher joined ActiveMath lab

ActiveMath - Sexta, 2010/01/15 - 16:10
Sergey Sosnovsky, a PhD candidate from the School of Information Sciences, University of Pittsburgh (Pittsburgh, PA, USA) joined our group. Sergey received his MSc and BSc from Kazan State Technological University, Kazan, Russia. His research focuses on combining new trends of Web development with Adaptive and User Modelling technologies. Sergey has co-authored about 60 peer-reviewed research publications and served on programming committees of several workshops in the area of Semantic Web for Adaptation.
Categorias: Technology

RichEdit Versions 1.0 through 3.0

Murray Sargent: Math in Office - Quarta, 2010/01/13 - 03:47

Digging through old doc files, I ran across the following summary of RichEdit up through Version 3.0. It’s more detailed than my post on RichEdit Versions, so it might be of interest to history buffs, anyhow. And it does describe the riched20.dll that still ships with Windows, mostly for purposes of backward compatibility. I wrote this document back in 1998 in preparing for an internal seminar on RichEdit 3.0. It even mentions that RichEdit 3.x would be an ideal development environment for WYSIWYG editing of built-up mathematical expressions! Sure hit that nail on the head. Naturally the statement “there are three main versions of RichEdit” is quite out of date.<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

What is RichEdit?

There are three main versions of RichEdit: 1.x, 2.x, and 3.0. Since all are being used, it makes sense to group the RichEdit features as they were introduced by these three versions. In general, RichEdit adds selective character and paragraph formatting along with embedded objects to the plain text editing facilities well-known in system edit controls.

A RichEdit instance consists of a single story, galley-like text that can be exported and imported using plain text or RTF.  Each version of RichEdit is a superset of the preceeding one, except that only FE builds of RE 1.0 have a vertical text option (a relatively elegant vertical option could be added to RE 3.0 if there’s sufficient demand).

RichEdit 1.0 was originally developed for rich-text email. Major differences between the various builds of RE 1.x and RE 2.0 are that the latter is based on Unicode, is a single world-wide binary (not including BiDi, Thai or Indic scripts), has multilevel undo, has a powerful set of com interfaces, and is substantially more Word compatible. RE 2.1 adds BiDi capabilities.

Major differences between RE 2.x and RE 3.0 include the latter's better performance, richer text, outline view, zoom, font binding, more powerful IME support, and rich complex script support (BiDi, Indic, and Thai).  RE 3.0 is a single, scalable, world-wide binary that offers high performance and substantial Word compatibility in a small package.

RichEdit 2.0 also includes simpler plain-text and single-line controls. RE 3.0 adds rich/plain ListBox and ComboBox controls.

RichEdit 1.0 Features

1.     Text Entry/Selection. Mostly standard (system-edit control) selection and entry of text. Selection bar support. Word-wrap and auto-word-select options. Single, double, and triple click selection.

2.     ANSI (SBCS and MBCS) editing.  No Unicode

3.     Basic set of character/paragraph formatting properties

4.     Character formatting properties: font facename and size, bold, italic, solid underline, strikeout, protected, link, offset, and text color.

5.     Paragraph formatting properties: start indent, right indent, subsequent line offset, bullet, alignment (left, center, right), and tabs.

6.     Find forward: includes case-insensitive and match-whole-word options.

7.     Message-based interface: almost a superset of the system edit-control message set plus the two OLE interfaces, IRichEditOle and IRichEditCallback.

8.     OLE embedded objects:  requires client collaboration based on IRichEditOle and IRichEditCallback interfaces.

9.     Right-button menu support: needs IRichEditOleCallback interface.

10.  Drag & Drop editing.

11.  Notifications: WM_COMMAND messages sent to client plus a number of others. Superset of common-control notifications

12.  Single-level undo/redo.

13.  Simple vertical text (Far East builds only)

14.  IME support. (Far East builds only)

15.  WYSIWYG editing using printer metrics. This is needed for WordPad, in particular.

16.  Cut/Copy/Paste/StreamIn/StreamOut with plain text (CF_TEXT) or RTF with and without objects.

17.  C code base

18.   Different builds for different scripts.

RichEdit 2.x Additions

1.     Unicode. Big effort needed to maintain compatibility with existing nonUnicode documents, i.e., ability to convert to/from nonUnicode plain and rich text. Substantial effort needed to run correctly on Win95.

2.     General international support. General line breaking algorithm (extension of Kinsoku rules), simple font linking, keyboard font switching.

3.     FE support. E.g., Level 2 and 3 IME support

4.     Find Up as well as down.

5.     BiDi support (RichEdit 2.1)

6.     Multilevel undo. Extensible undo architecture that allows client to participate in app-wide undo model.

7.     Magellan mouse support

8.     Dual-font support. Keyboard can automatically switch fonts when active font is inappropriate for current keyboard, e.g., Kanji characters in Times New Roman.

9.     Smart font apply.  Font change request doesn’t apply Western fonts to FE characters.

10.  Improved display.  An off-screen bitmap is used when multiple fonts occur on the same line.  This allows, for example, the last letter of the word “cool” not to be chopped off.

11.  Transparency support.  Also in windowless mode.

12.  System selection colors. Used for selecting text

13.  AutoURL recognition

14.  Word edit UI compatibility. Selection, cursor-keypad semantics.

15.  Word standard EOP (end-of-paragraph mark: CR). Can also handle CRLF

16.  Plain-text controls as well as rich-text. Single character format and single paragraph format.

17.  Single-line controls as well as multiline.  Truncate at first end-of-paragraph and no word wrap.

18.  Accelerator and Password Controls.

19.  Scalable architecture to reduce instance size.

20.  Windowless operation and interfaces (ITextHost/ITextServices). Added primarily for Forms^3.

21.  Com dual interfaces: TOM (Text Object Model)).  This powerful set of interfaces is described separately.

22.  CHARFORMAT2. Added font weight, background color, locale ID, underline type, superscript/subscript (in addition to offset), disabled effect. For RTF roundtripping only, added amount to space between letters, twip size above which to kern character pair, animated-text type, various effects: font shadow/outline, all caps, small caps, hidden, embossed, imprint, and revised.

23.  PARAFORMAT2. Added space before/after and Word line spacings. For RTF roundtripping only, added shading weight/style, numbering start/style/tab, border space/width/sides, tab alignment/leaders, various Word paragraph effects: RTL paragraph, keep, keep-next, page-break-before, no-line-number, no-widow-control, do-not-hyphenate, side-by-side.

24.  More RTF roundtripping. All of Word’s FormatFont and FormatParagraph properties.

25.  Improved OLE support.

26.  Code Stability and stabilization. E.g., parameter and object validation, function invariants, re-entrancy guards, object stabilization, etc.

27.  Strong testing infrastructure including extensive regressions tests and Genesis testing. Shipped with no priority 1 or 2 bugs and not many postponed bugs.

28.  Improved Performance.  Smaller working set, faster load and redisplay times, etc.

29.   C++ code base. The code is written in C++. Provided a solid foundation on which to build RichEdit 3.0.

RichEdit 3.0 Feature Additions

1.     Zoom.  The zoom factor is given by ratio of two longs.

2.     Paragraph numbering (single-level). Numeric, upper/lower alphabetic or Roman numeral.

3.     Simple tables (no wrap inside cells). Limited UI: no resizing, but can delete/insert rows. With LineServices, can align columns centered, flush right, and decimal.  Cells are simulated by tabs, so text tabs and carriage returns are replaced by blanks.

4.     Normal and heading styles. Built-in normal style and heading styles 1 through 9 are supported by the EM_SETPARAFORMAT and TOM APIs.

5.     Outline view (similar to Word’s).  Supports normal style and headings 1 through 9.  Can collapse to heading level n, promote/demote headings/text, move paragraphs up/down.  Can persist collapse status.

6.     More underline types (dashed, dash-dot, dash-dot-dot, dot)

7.     Underline coloring. Underlined text can be tagged with one of 15 document choices for underline colors.

8.     Hidden text. Marked by CHARFORMAT2 attribute. Handy for roundtripping of information that ordinarily shouldn’t be displayed.

9.     More default hot keys, which act as Word’s default hot keys act.  E.g., European accent dead keys (US keyboards only) and outline-view hot keys. Number hot key (Ctrl+L) cycles through numbering options available, starting with bullet.

10.  Smart-quotes (toggled on/off by Ctrl+") for US keyboards.

11.  Soft hyphens. (0xAD in plain text; \- in RTF).

12.  Italics Caret/Cursor.  Also hand cursor over URLs.

13.  LineServices Option: RichEdit 3.0 can use Office’s LineServices component for line breaking and display.  This elegant option was added primarily to facilitate handling complex scripts (BiDi, Indic, and Thai).  In addition a number of improvements occur for simple scripts, e.g., center, right, and decimal tabs, fully justified text, underline averaging giving a uniform underline even when adjacent text runs have different font sizes.  It opens the door to incorporating LineServices FE enhancements, such as Ruby, Warichu, Tatenakayoko, and vertical text. LineServices also paves the way for WYSIWYG editing of built-up mathematical expressions and RichEdit 3.x looks like the ideal development environment for this.

14.  Complex Script Support: RichEdit 3.0 will support BiDi (text with Arabic and/or Hebrew mixed with other scripts), Indic (Indian scripts like Devangari), and Thai.  For support of these complex scripts, the LineServices and NT Uniscribe components are used, which run on Win95 and later OSs.

15.  Font binding: RichEdit 3.0 will automatically choose an appropriate font for characters that clearly do not belong to the current charset stamp.  This is done by assigning charsets to runs and associating fonts with those charsets.  Please see the section on Font Binding below.

16.  Charset-specific plain-text read/write options, notably ability to read a file using one charset and write it with a different one.

17.  UTF-8 RTF. Used preferentially for cut/copy/paste and optionally externally, this file format is substantially more compact than ordinary RTF, faster, and is completely faithful to Unicode.

18.   Office 9 IME support (MSIME98). This more powerful IME capability has been factored out into an independent module (see RichEdit Architectural Improvements). Features include:
a. Reconversion - In the past, the user needs to delete the final string first and then type in a new string to get to the correct candidate. This feature enables the user to convert the final string back to composition mode, allowing easy selection of a different candidate string.
b. Document feed - This feature provides IME98 with the text for the current paragraph, which helps IME98 to do more accurate conversion during typing.
c. Mouse Operation - This feature allows the user to have better control over the candidate and UI windows during typing.
d. Caret position - This feature provides the current caret and line information, which IME98 uses to position UI windows (e.g., candidate list).

19.  AIMM support. Users can invoke the IE/AIMM object, which enables users to enter Far East characters on US systems (NT4.0 & Win95).

20.  More RTF round tripping.

21.  Improved 1.0 compatibility mode, e.g., MBCS to/from Unicode character-position (cp) mappings. Is being used to emulate RE 1.0 in NT 5.

22.  Increased Freeze Control. The display can be frozen over multiple API calls and then unfrozen to display the updates.

23.  Increased Undo Control. Undo can be suspended and resumed (needed for IME).

24.   Increase/Decrease Font Size. Increases or decreases font size to one of six standard values (12, 28, 36, 48, 72, 80 pts).

RichEdit 3.0 Architectural Improvements

1.     Input module: IME has been factored out into separate generally usable input module that supports the latest Office 9 IMEs.  RichEdit 3.0 itself knows nothing of IMEs!  In principle other IME clients can use this input module.  Did need to add some methods to RichEdit’s object model (the approach is discussed in a separate section).

2.     Virtual Win32 Environment: OS-dependent calls have been separated out into a class of their own. RE 3.0 works in a virtual Win32 with some multilingual enhancements. Most calls are static, so no runtime overhead is encountered.  Facilitates building RichEdit with different OSs, e.g., Windows CE.

3.     Factored Rich Text status: allows aspects of rich text to be used with plain-text semantics.  E.g., multiple fonts, coloring, and underlining.  Useful for font binding and IME highlighting. Plain text UI remains the same, so EM_SETCHARFORMAT and EM_SETPARAFORMAT apply to whole control.

4.      Dual Line Methods. Lines can be broken, queried, and displayed with or without LineServices. Simple text can be handled with small instance size and higher speed. More sophisticated text can use the elegant LineServices component.

RichEdit 3.0 Performance Improvements and Maintenance

1.     Many performance/size improvements.
a) reduced size of (to 1/3) and generalized internal versions of RichEdit 2.0’s character and paragraph formatting structures (CHARFORMAT2 and PARAFORMAT2).  Easy to add properties to these important structures, although the additions typically won't be available to the message interface.
b) reduced size of many other structures as well.
c) declared constant data structures const, so that they are included in the code segment and are shared by all active processes.
d) reduced the number of system calls by more caching of frequently used data
e) eliminated redundant code.

2.     Faster startup time: most initialization is postponed to the creation of the first control.  C runtime is no longer needed.

3.     Cleaned up code base. Used the same notation (Hungarian, etc.) for local variable names throughout. Added many new comments and improved many old comments.  Counts are now LONGs rather than the nefarious DWORDs, which might be described as “wishful thinking”! Eliminated evolutionary dead code. Simplified C++ model: no more multiple inheritance and almost no operator overloading (except for new and assignment).

4.      Numerous bug fixes. Eliminated some memory leaks and reference counting errors. Fixed various bugs postponed from RichEdit 2.x.

RichEdit 3.0 Rich System Controls

1.     System edit-control mode that emulates the OS edit controls more accurately.

2.      ListBox and ComboBox controls similar to system versions, but supporting Unicode and font binding on Win95 as well as on NT.  These controls can be made rich, opening the door to substantially more elegant dialogs.

What RichEdit 3.0 Isn't

1.     Native HTML control. There are HTML « RTF converters that can be used with RichEdit. There’s the Trident control, which is substantially bigger.... We have a prototype for direct HTML I/O that uses the TOM interfaces, but it hasn’t been tested adequately for general use. This prototype only roundtrips HTML that RichEdit understands.

2.     Active X control. We have a prototype RichEdit Active X control (ATL), but it too hasn’t undergone testing. Note there is a RichEdit 1.0 Active X control and in the future there may be a VB control based on RichEdit 3.0.

3.     MFC RichEdit class. Note there is a RichEdit 1.0 MFC class.

4.     Multistory editor (like Word).  Each RichEdit instance corresponds to a single story.  Word has many stories, e.g., body text, header, footer, footnote, textbox.  A RichEdit instance can be used for any one of those, but to handle more, you need one instance for each story.

RichEdit Clients

RichEdit Client

Version

Office 97 SDM

2.x

Office 9 SDM (3.0)

3.0

Office Binder

2.0, 3.0

Office 9 Command Bars (3.0)

3.0

Word 97 (non-SDM dialogs)

2.x

Default Exchange Client

1.0

Outlook 97 body/to/from/subject/notes

2.x

Outlook 9 body/to/from/subject/notes

3.0

Pocket Word 2.0

3.0-

WordPad (Win95)

1.x

WordPad (Win98)

2.0

WordPad (NT 5.0)

3.0

MFC RichText Control

1.0

VB RichText Control

1.0

Forms^3 97 edit engine

2.0

Forms^3 9 edit engine

3.0

Layout Control Pack for IE

2.0

FrontPage source viewer

2.0

Windows SDK

1.0

Project 98

2.0

Publisher 98

???

Comic Chat

1.0?

How Create a RichEdit Instance (1)

     

      HRESULT       hRE = LoadLibrary("RICHED20");

 

      hwndRE = CreateWindow(TEXT("RichEdit20W"), TEXT(""),

                                                dwStyle,

                                                rc.left, rc.top,

                                                rc.right - rc.left, rc.bottom - rc.top, hwndParent,

                                                NULL, hinst, NULL);

                              ...                // Send messages to hwndRE

 

      FreeLibrary(hRE);

How Create a RichEdit Instance (2)

A RichEdit control is based on an ITextHost object interacting with an ITextServices object.  The latter doesn’t have a window of its own. The CreateWindow() call above creates an ITextHost object, which, in turn, creates an ITextServices object.

Alternatively, you can create an ITextHost object directly that, in turn, creates as many ITextServices objects as you desire.  This is the way Forms^3 uses RichEdit for dialogs.  It’d also be a great way to make a table object, for which each cell would have its own ITextServices object.

The way to create an ITextServices object is to call the function (it’s a bit complicated, since it allows the object to be aggregated)

 

STDAPI CreateTextServices(

      IUnknown *punkOuter,      // Outer unknown, may be NULL

      ITextHost *phost,              // Client's ITextHost; must be valid

      IUnknown **ppUnk);          // Private IUnknown of text services engine

 

For example,

      if(FAILED(CreateTextServices(NULL, this, &pUnk)))

                  return FALSE;

      hr = pUnk->QueryInterface(IID_ITextServices, (void **)&_pserv);

      pUnk->Release();

 

You can then use the the _pserv pointer to call any ITextServices method, including TxSendMessage(), which is a faster way to send messages to the control than the system SendMessage().  But warning: CreateWindow() and the usual message interface is substantially easier to implement, since you don’t have to create an ITextHost object. As shown below, if all you want to do ist to use some ITextServices methods, you can get an ITextServices interface to a control created by CreateWindow().

How to use RichEdit

There are five main ways to use a RichEdit 2.x or 3.0 control:

 

1.     Messages

2.     ITextServices methods

3.     Keyboard input including cut/copy/paste

4.     File read/write (plain text or RTF)

5.     TOM (Text Object Model) methods

 

The most familar ways (messages and keyboard) are useful, but may not have the performance or functionality that you need.  We describe each of these approaches in the remainder of this talk.

For ordinary keyboard input (not IME), RichEdit acts very similarly to Word.  Word has more hot keys, but the cursor keypad and letter/punctuation keys work essentially the same way.  Ditto for mouse operations.

RichEdit Message Interface

There are many RichEdit messages.  In addition to the system edit control messages defined in winuser.h, there are many new messages defined in richedit.h. All edit messages handled by RichEdit (specifically by ITextServices::TxSendMessage()) are listed below.  System edit and RichEdit 1.0 messages are defined in the system SDK.  RichEdit 2.0 and 3.0 messages aren’t documented in my copy of the SDK, but should be documented on http://richedit sometime soon, and in the SDK sometime later.  Note that a number of RichEdit 1.0 messages have been generalized in later versions.  E.g., EM_STREAMIN/OUT take an optional codepage value (which can be 1200, i.e., Unicode, or CP_UTF8, i.e., UTF-8).  RichEdit only understands enough about IME messages to know to invoke the IME input module (see Input Module). Hence not all IME messages are listed below.

 

System edit control messages not handled by RichEdit

EM_GETHANDLE EM_SETHANDLE

EM_FMTLINES     EM_SETTABSTOPS

WM_GETFONT

 

System edit control messages handled by RichEdit

EM_GETFIRSTVISIBLELINE    EM_GETLINE

EM_GETLINECOUNT               EM_GETMODIFY

EM_GETSEL        EM_GETTHUMB

EM_GETWORDBREAKPROC  EM_LIMITTEXT

EM_LINEFROMCHAR               EM_LINEINDEX

EM_LINELENGTH                     EM_LINESCROLL

EM_REPLACESEL                    EM_SCROLL

EM_SETMODIFY  EM_SETSEL

EM_SETTARGETDEVICE         EM_SETWORDBREAKPROC

EM_UNDO

 

WM_CHAR           WM_CLEAR

WM_CONTEXTMENU               WM_COPY

WM_CUT              WM_DESTROYCLIPBOARD

WM_DROPFILES WM_ERASEBKGND

WM_GETTEXT     WM_GETTEXTLENGTH

WM_HSCROLL     WM_IME_CHAR

WM_INPUTLANGCHANGE       WM_INPUTLANGCHANGEREQUEST

WM_KEYDOWN   WM_KEYUP

WM_KILLFOCUS  WM_LBUTTONDBLCLK

WM_LBUTTONDOWN              WM_LBUTTONUP

WM_MBUTTONDBLCLK           WM_MBUTTONDOWN

WM_MBUTTONUP                    WM_MOUSEACTIVATE

WM_MOUSEMOVE                   WM_MOUSEWHEEL

WM_NCMBUTTONDOWN        WM_PASTE

WM_RBUTTONDBLCLK           WM_RBUTTONDOWN

WM_RBUTTONUP                    WM_RENDERALLFORMATS

WM_RENDERFORMAT            WM_SETFOCUS

WM_SETFONT     WM_SETTEXT

WM_SETTINGCHANGE            WM_SIZE

WM_SYSCHAR    WM_SYSCOLORCHANGE

WM_SYSKEYDOWN                 WM_TIMER

WM_UNDO           WM_VSCROLL

 

RichEdit 1.0 messages

EM_CANPASTE   EM_CHARFROMPOS                             

EM_DISPLAYBAND                  EM_EXGETSEL                                          

EM_EXLIMITTEXT                    EM_EXLINEFROMCHAR               

EM_EXSETSEL    EM_FINDTEXT                                        

EM_FINDTEXTEX                     EM_FINDWORDBREAK                 

EM_FORMATRANGE                EM_GETEVENTMASK

EM_GETCHARFORMAT           EM_GETLIMITTEXT                                   

EM_GETOLEINTERFACE         EM_GETOPTIONS                         

EM_GETPARAFORMAT           EM_GETSELTEXT                         

EM_GETTEXTRANGE              EM_GETWORDBREAKPROCEX  

EM_HIDESELECTION               EM_PASTESPECIAL                                  

EM_POSFROMCHAR               EM_REQUESTRESIZE                  

EM_SCROLLCARET                 EM_SELECTIONTYPE                   

EM_SETBKGNDCOLOR           EM_SETCHARFORMAT                

EM_SETEVENTMASK              EM_SETOLECALLBACK                

EM_SETOPTIONS                    EM_SETPARAFORMAT                 

EM_SETTARGETDEVICE         EM_SETWORDBREAKPROCEX   

EM_STREAMIN    EM_STREAMOUT                                   

 

RichEdit 2.0 messages

EM_SETUNDOLIMIT                 EM_REDO                                                  

EM_CANREDO     EM_GETUNDONAME                             

EM_GETREDONAME               EM_STOPGROUPTYPING            

EM_SETTEXTMODE                 EM_GETTEXTMODE                                 

EM_AUTOURLDETECT            EM_GETAUTOURLDETECT                      

EM_SETPALETTE                     EM_GETTEXTEX                           

EM_GETTEXTLENGTHEX        EM_SHOWSCROLLBAR                

EM_FINDTEXTW  EM_FINDTEXTEXW                                

 

Far East specific messages (some are RE 1.0)

EM_GETPUNCTUATION          EM_SETPUNCTUATION    

EM_GETWORDWRAPMODE   EM_SETWORDWRAPMODE                     

EM_GETIMECOLOR                 EM_SETIMECOLOR                                  

EM_GETIMEOPTIONS              EM_SETIMEOPTIONS                   

EM_GETLANGOPTIONS          EM_SETLANGOPTIONS                

EM_CONVPOSITION                EM_GETIMECOMPMODE             

 

RichEdit 3.0 messages

FE messages

EM_GETIMEMODEBIAS           EM_SETIMEMODEBIAS                

EM_RECONVERSION                                  

 

BiDi specific messages

EM_GETBIDIOPTIONS             EM_SETBIDIOPTIONS                  

 

Extended edit style specific messages

 EM_GETEDITSTYLE                EM_SETEDITSTYLE                                  

 

Outline view message

EM_OUTLINE

 

Message for getting and restoring scroll pos

EM_GETSCROLLPOS              EM_SETSCROLLPOS

 

Zoom and increment/decrement fontsize

EM_GETZOOM    EM_SETZOOM

EM_SETFONTSIZE

 

LineServices messages

EM_GETTYPOGRAPHYOPTIONS    EM_SETTYPOGRAPHYOPTIONS   

 

RichEdit RTF

The RTF control words recognized by RichEdit are given below. Not all of these control words are fully implemented, but almost all are round tripped.

 

adeff, animtext, ansi, ansicpg, b, bgbdiag, bgcross, bgdcross, bgdkbdiag, bgdkcross, bgdkdcross, bgdkfdiag, bgdkhoriz, bgdkvert, bgfdiag, bghoriz, bgvert, bin, blue, box, brdrb, brdrbar, brdrbtw, brdrcf, brdrdash, brdrdashsm, brdrdb, brdrdot, brdrhair, brdrl, brdrr, brdrs, brdrsh, brdrt, brdrth, brdrtriple, brdrw, brsp, bullet, caps, cbpat, cell, cellx, cf, cfpat, clbrdrb, clbrdrl, clbrdrr, clbrdrt, collapsed, colortbl, cpg, cs, deff, deflang, deflangfe, deftab, deleted, dibitmap, disabled, dn, embo, emdash, emspace, endash, enspace, emdash, expndtw, f, fbidi, fchars, fcharset, fdecor, fi, field, fldinst, fldrslt, fmodern, fname, fnil, fonttbl, footer, footerf, footerl, footerr, footnote, fprq, froman, fs, fscript, fswiss, ftech, ftncn, ftnsep, ftnsepc, green, header, headerf, headerl, headerr, highlight, hyphpar, i, impr, info, intbl, keep, keepn, kerning, lang, lchars, ldblquote, li, line, lnkd, lquote, ltrch, ltrdoc, ltrmark, ltrpar, macpict, noline, nosupersub, nowidctlpar, objattph, objautlink, objclass, objcropb, objcropl, objcropr, objcropt, objdata, object, objemb, objh, objicemb, objlink, objname, objpub, objscalex, objscaley, objsetsize, objsub, objw, outl, page, pagebb, par, pard, piccropb, piccropl, piccropr, piccropt, pich, pichgoal, picscalex, picscaley, pict, picw, picwgoal, plain, pmmetafile, pn, pndec, pnindent, pnlcltr, pnlcrm, pnlvlblt, pnlvlbody, pnlvlcont, pnqc, pnqr, pnstart, pntext, pntxta, pntxtb, pnucltr, pnucrm, protect, pwd, qc, qj, ql, qr, rdblquote, red, result, revauth, revised, ri, row, rquote, rtf, rtlch, rtldoc, rtlmark, rtlpar, s, sa, sb, sbys, scaps, sect, sectd, shad, shading, sl, slmult, strike, stylesheet, sub, super, tab, tb, tc, tldot, tleq, tlhyph, tlth, tlul, tqc, tqdec, tqr, trbrdrb, trbrdrl, trbrdrr, trbrdrt, trgaph, trleft, trowd, trqc, trqr, tx, u, uc, ul, uld, uldash, uldashd, uldashdd, uldb, ulhair, ulnone, ulth, ulw, ulwave, up, utf, v, viewkind, viewscale, wbitmap, wbmbitspixel, wbmplanes, wbmwidthbytes, wmetafile, xe, zwj, zwnj.

ITextServices Windowless Interface

As described above, you can get an ITextServices interface using CreateTextServices(), but this requires that you implement your own ITextHost object.  If you use CreateWindow() instead, you can still use ITextServices methods by using the following code:

 

      SendMessage(hedit, EM_GETOLEINTERFACE, 0, (LPARAM)&punk);

      if(punk)

      {

                              hr = pUnk->QueryInterface(IID_ITextServices, (void **)&_pserv);

                              pUnk->Release();

                              ....               // Use _pserv methods

                              _pserv->Release();

      }

 

All ITextServices methods are typed simply as HRESULT. This differs from standard com interface functions, which are typed HRESULT STDMETHODCALLTYPE. The methods are:

 

TxSendMessage(msg, wparam, lparam, plresult)

TxDraw(dwDrawAspect, lindex, pvAspect,ptd, hdcDraw,

                hicTargetDev, lprcBounds, lprcWBounds, lprcUpdate,

                pfnContinue, dwContinue, lViewId)

TxGetHScroll(plMin, plMax, plPos, plPage, pfEnabled)

TxGetVScroll(plMin, plMax, plPos, plPage, pfEnabled)

OnTxSetCursor(dwDrawAspect, lindex, pvAspect, ptd,

                hdcDraw, hicTargetDev, lprcClient, x, y)

TxQueryHitPoint(dwDrawAspect, lindex, pvAspect, ptd,

                hdcDraw, hicTargetDev, lprcClient, x, y, pHitResult)

OnTxInPlaceActivate(prcClient)

OnTxInPlaceDeactivate()

OnTxUIActivate()

OnTxUIDeactivate()

TxGetText(pbstrText)

TxSetText(pszText)

TxGetCurTargetX(pX)

TxGetBaseLinePos(pPos)

TxGetNaturalSize(dwAspect, hdcDraw, hicTargetDev, ptd, dwMode,

                psizelExtent, pwidth, pheight)

TxGetDropTarget(ppDropTarget)

OnTxPropertyBitsChange(dwMask, dwBits)

TxGetCachedSize(pdwWidth, pdwHeight)

Getting to the TOM Interfaces

 

// Skeleton function to manipulate text using TOM ITextRange interface

HRESULT Manipulate(HWND hedit)

{

         IUnknown *         punk;

         ITextDocument *pdoc;

         ITextRange *      prg;

 

         SendMessage(hedit, EM_GETOLEINTERFACE, 0, (LPARAM)&punk);

         if(punk)

         {

                  HRESULT hr;

                  hr = punk->QueryInterface(IID_ITextDocument, (void **)&pdoc);

                  if(pdoc)

                  {

                        hr = pdoc->Range(0, 0, &prg);

                        if(prg)

                        {

                                    ...

 

                                    prg->Release();

                        }

                        pdoc->Release();

                  }

                  punk->Release();

                  return hr;

         }

         return E_NOINTERFACE;

}

 

Font Binding

RichEdit 3.0 will assign a charset to plain-text characters depending on their context.  E.g., Hangul symbols get HANGUL_CHARSET, nonneutral ANSI characters get ANSI_CHARSET in any event, Chinese characters get SHIFTJIS_CHARSET if kana characters are found nearby and GB2312_CHARSET if no kana are found nearby.  Greek characters get GREEK_CHARSET, etc.  Note that we’re using Unicode internally, so this use of charset differs from the original one used in font specifications.  But charset seems to be a pretty good match with what we want, which is a script, and our CHARFORMAT has a well-defined place for the charset.  It also helps with some anomalies in Win95, where we can't always use Unicode. Neutral characters like blanks and digits get assigned a charset depending on their context.  For example, a blank surrounded by characters of the same charset gets that charset.  More generally neutrals/digits for BiDi text are assigned charsets in a way based on the Unicode BiDi algorithm.  Once charsets are assigned, we scan the text around the insertion point forward and backward to find the nearest fonts that have been used for the charsets.  If no font is found for a charset, we use the font chosen by the client for that charset.  If the client hasn’t specified a font for the charset, we use the default Office 9 font for that charset. If the client wants some other font, it can always change it, but the hope is that this approach will work most of the time.  Our current default font choices are based on the following table:

 

CodePage

Languages

Font facename

Size

125x

Western, CE, ME...

Times New Roman

10

932

Japanese

MS Mincho

10.5

949

Korean

Batang

10

936

Simplified Chinese

MS Song

10

950

Traditional Chinese

New MingLiU

10

874

Thai

Tahoma

8

 

Hence in our default font-binding table (entries have charset, facename, size), we allow ANSI_CHARSET to match all 8 125x charsets, while the appropriate charset matches other fonts on a one-to-one basis.  More precisely, we use the ANSI_CHARSET choice whenever no other alternative is found.  The client will be able to specify a finer granularity than this, e.g., assign a specific ARABIC_CHARSET for Arabic runs, a specific Greek font for Greek runs, etc.  This finer granularity will also be used if a font with the desired charset stamp is found somewhere in the document before the area being font-bound.

 

 

Categorias: Technology

Special Capabilities of a Math Font

Murray Sargent: Math in Office - Terça, 2010/01/12 - 04:31

A fairly common inquiry is how a program can use and access the many special glyph variants of a math font. It’s clearly a much more intricate interaction than encountered in most text applications. This post outlines how the Office math layout software interacts with the Cambria Math font and, in principle, with any other math font that has similar capabilities. More specifically, this post describes the functionality of the special library, mathfont.dll, which is shipped with Office 2007/2010. This library, in turn, interacts with the OpenType and OpenType-like tables in a math font.

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> 

Cambria Math and the math tables were developed together with the Office 2007 math software, each influencing the other to obtain high quality results. Some history is given in the post High-Quality Editing and Display of Mathematical Text in Office 2007. The font contains extensive math tables, glyph variants and glyphs for much of the Unicode math character set. It was designed with ClearType and excellent screen readability in mind and enables the best screen-resolution display of math text available today.

 

The specialized math tables include values that control glyph placements in math zones. Many math constants are defined to handle displacements such as axis height, fraction rule thickness, etc. The math tables are formalized as OpenType tables, although they are not yet part of the OpenType standard. Refinements include entries for positioning subscripts/superscripts horizontally using cut-ins and italic corrections. The cut-in tables allow automatic positioning of subscripts and superscripts horizontally better than un-tweaked TeX. Math characters have four cut-in values, one for each corner, allowing sub/superscripts to be kerned with their bases. Other table entries give larger glyph variants for operators like the integral sign, square root, and stretchy characters such as brackets and arrows.

 

The math tables are organized as a hierarchy accessed via the OpenType ID “MATH”. The names of the tables in the hierarchy are MathConstants, MathGlyphInfo, MathItalicsCorrectionInfo, MathTopAccentAttachment, ExtendedShapeCoverage, MathKernInfo, MathKern, MathVariants, MathGlyphConstruction, and GlyphAssembly. The MathConstants table includes parameters such as the em-size-dependent sub/superscript values

 

LONG lSubscriptShiftDown;

LONG lSubscriptTopMax;

LONG lSubscriptBottomDropMin;

LONG lSuperscriptShiftUp;

LONG lSuperscriptShiftUpCramped;

LONG lSuperscriptBottomMin;

LONG lSuperscriptTopRiseMin;

LONG lSubSuperscriptMinGap;

LONG lSuperscriptBottomMaxWithSubscript;

LONG lSpaceAfterScript;

 

Cambria Math contains full sets of glyph variants that have heavier weights so that when scaled down to the script and scriptscript levels the stem widths match those of the text-level glyphs. The prime (U+2032) and multiple prime characters need to be superscripted and scaled down accordingly. The dotless i and j glyph variants are used in the bases of accent objects. Accents over larger bases are given by special flattened and/or widened glyph variants.

 

Brackets, braces, parentheses and other stretchy characters have a number of larger glyph variants as well as arbitrarily large size created using glyph assemblies. When the assemblies are displayed, the pieces are clipped to prevent overlap, since overlaps create ClearType artifacts.

 

One choice not handled by the math font tables is that for the italic open-face characters 0x2145 - 0x2149 (differential D, d, and e, i, j).  According to a document setting, software can display these characters as themselves (useful for patent applications) or with the corresponding math italic or corresponding ASCII letters. Serif italic glyphs are used for these in most math publications, but serif upright glyphs are used in some European math publications and math calculation engines. The use of the differential d (U+2146) automatically introduces a small space between it and the preceding character if that character is alphabetic.

 

An OpenType table or feature is identified by a 32-bit constant equal to the contents of a four-byte little-endian string. For example, the “MATH” table is identified by the string 0x4854414D. In C/C++, you can use the macro

 

#define MakeTag(a, b, c, d)   (((d)<<24) | ((c)<<16) | ((b)<<8) | a)

#define tagMATH   MakeTag('M','A','T','H')

 

to create such IDs if you don’t want to type the ASCII values of the letters directly. Note that these IDs are case sensitive. In particular, “MATH” identifies the overall math table hierarchy, and “math” identifies the math script, which is used for math glyph-variant features such as subscripts, superscripts, and dotless i's.

mathfont.dll functions

 

The following table describes the functions exported by the mathfont.dll. All functions return an HRESULT. Some entries in the table refer to the “current font metrics”. These metrics depend on the font height (point size), the script level (0 for text size, 1, for script size and 2 for scriptscript size or higher level nestings), and the device mode (reference or presentation).

mathfont.dll function

Purpose

OpenType table used

GetMathConstants

Get pointer to math constants

MATH

GetMathGlyphItalicsCorrection

Get italic correction for a glyph at current font metrics

MATH

GetMathGlyphTopAccentAttachment

Get top accent attachment displacement for a glyph at current font metrics

MATH

GetMathGlyphIsExtendedShape

In [left]sub/sup math objects, determine whether adjacent base glyph is extended, i.e., stretched vertically

MATH

GetMathGlyphKerning

Get kerning for a given corner and height of a glyph at current font metrics

MATH

GetMathGlyphVariant

Get possibly stretched glyph variant or set of glyphs for a glyph of desired size at current font metrics

MATH

GetMathGlyphVariantItalicsCorrection

Get italic correction for a vertically stretched glyph (or set of glyphs) at current font metrics

MATH

GetMathGlyphScriptShape

Get glyph variant for script or scriptscript size (use “ssty” feature for “math” script and “dflt” language)

GDEF, GSUB

GetMathGlyphDotlessForm

Get dotless glyph variant (for i or j like glyphs) (use “dtls” feature for “math” script and “dflt” language)

GDEF, GSUB

GetMathGlyphAccentFlattenedShape

Get flattened accent glyph variant if base height exceeds x height ) (use “flac” feature for “math” script and “dflt” language)

GDEF, GSUB

GetMathFontTextMetrics

If font is a math font, get math font text ascent, descent, and linegap at current font metrics

OS/2

  Right to Left Math Zone Considerations

 

Right-to-left math requires mirroring the images of parentheses, integrals, square roots, arrows, etc. Many such mirror images can be obtained by using corresponding Unicode characters. For example the mirror image of a left parenthesis is a right parenthesis and vice versa. Such glyph variants are automatically returned by the Uniscribe function ShapeString() if SCRIPT_ANALYSIS::fRtl = TRUE. But Unicode doesn’t have many characters that are mirror images of other characters, such as integral signs and square roots. Furthermore it seems that using glyph variants for these characters makes more sense than adding characters to serve as the mirror images. Other approaches include using world transforms and mirrored bitmaps. But these approaches don’t solve the problem that the right-to-left character desired sometimes isn’t a perfect mirror image, e.g., the contour integral.

 

In principle (and in a prototype I’m working on), the glyph variant approach works by following the ShapeString() call with a call to Uniscribe’s ScriptSubstituteSingleGlyph() specifying tagScript as "math", tagLangSys as "dflt", and tagFeature as "rtlm". Here "math" identifies the script as math, "dflt" specifies the default language, and "rtlm" requests right-to-left mirroring. If no such special mirrored glyph exists, the call does nothing. In particular, if the appropriate mirrored glyph is given by a Unicode character, the call does nothing, so the ShapeString() call can be followed by the ScriptSubstituteSingleGlyph() call and never result in “double mirroring”.

 

If you want a complete specification of the math tables, please email me. Hopefully someday the specification will be available as part of the official OpenType standard. The mathfont.dll code was written by Sergey Malkin.

Categorias: Technology

MathML3 and the MathML for CSS profile are Candidate Recommendations

W3C Math - Terça, 2009/12/15 - 00:00
The W3C Director today advanced the status of MathML3 from Working Draft to Candidate Recommendation (CR). That means that W3C is now asking people to not only send comments on the text, but to implement the specification and send feedback on any problems found in actual use. The MathML for CSS profile, which describes a subset of MathML3 that can be rendered with existing CSS renderers, was advanced to CR at the same time. The next step for both specifications is PR, as soon as there are sufficiently many implementations. The working group expects to start testing implementations around March 2010. Feedback can be sent to the mailing list.
Categorias: Technology

High Fonts and Math Fonts

Murray Sargent: Math in Office - Quarta, 2009/12/02 - 01:45

Math fonts differ from other fonts not only in their extensive coverage of math operators, symbols, and math alphanumerics, but also in the large number of glyph variants they have to support two sizes of sub/superscripts and many sizes of stretchable characters like parentheses and brackets. Some such glyphs can be many times as high as the normal line height. The name given to fonts that contain glyphs larger than the line height is a “high font”. Fancy fonts like Gabriola have elegant swash variants that look great on wedding invitations and the like. Some of these glyph variants have ascents and descents much bigger than the line height and hence those fonts are also high fonts.<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

 

High fonts must be handled in a special way, at least on Windows.  The reason for this dates back to the early 1990s when TrueType fonts were introduced into Windows 3.1. At that time, fonts didn’t have glyphs larger than the line height, so two heights were unified: the first is how big to make the bitmap used to rasterize the glyphs and the second is the line height itself. The corresponding height information is contained in the TEXTMETRIC structure and the line height is given nominally by the sum TEXTMETRIC::tmAscent + TEXTMETRIC::tmDescent. Here tmAscent is the height above the text baseline and tmDescent is the (positive) height below the baseline. One could also add leading, but the sum of the ascent and descent sufficed to determine the height of the rasterization bitmap.

 

Now consider a high font with its large glyphs. Clearly the bitmap has to have tmAscent and tmDescent large enough for the largest glyph. Accordingly tmAscent and tmDescent can be considerably larger than the ascent and descent for ordinary characters such as A—Z and a—z. When these quantities are used to display a high font, the ordinary characters look very small inside a line that’s much bigger. You can see this by selecting Cambria Math or Gabriola into NotePad. At least up through Windows 7, the system edit control, which is used in NotePad, doesn’t know how to render high fonts correctly. But Word, RichEdit, PowerPoint, and other more sophisticated text programs do know how, namely they use the ascent and descent values provided by the OUTLINETEXTMETRIC structure’s otmAscent and otmDescent.

 

If you display text using ExtTextOut() using baseline alignment (call SetTextAlign() with TA_BASELINE), you only have to use the OUTLINETEXTMETRIC values. But if you use the default text alignment, TA_TOP, then you need to subtract tmAscent to get to the “top”. Kind of strange, since that may result in a negative y coordinate, but it works fine. RichEdit uses TA_TOP since it used to be faster than TA_BASELINE, although it may not be any longer. Word uses TA_BASELINE.

 

You may be wondering how to find out if a font is a high font. Officially a font is a high font if bit 7 (0x80) of OUTLINETEXTMETRIC::otmfsSelection is nonzero. This bit is named USE_TYPO_METRICS. Since fonts might not set it when they should, you might also want to conclude that a font is a high font if (unsigned)tm.tmAscent > 2*(otm.otmAscent + otm.otmLineGap).

 

As a last note, you may wonder how to tell if a font is a math font, since not all high fonts are math fonts. One way is to find out if the font has an OpenType “MATH” table. To do this, call GetFontData(hdc, tagMATH, 0, NULL, 0), where tagMATH is defined by

 

#define MakeTag(a, b, c, d)        (((d)<<24) | ((c)<<16) | ((b)<<8) | a)

#define tagMATH            MakeTag('M','A','T','H')

 

If the return value is a positive number, it is the count of bytes in the “MATH” table.

 

There’s a lot more to a math font than what’s given in this post. An earlier post on High Quality Editing and Display of Mathematical Text in Office 2007 has a section on the math font, although it doesn’t contain the material in the present post. A full specification is available for people interested in developing math fonts.

 

Thanks to Koby Kahane and Adam Twardoch for helpful comments improving this post.

Categorias: Technology

Directionality in Math Zones

Murray Sargent: Math in Office - Sexta, 2009/11/13 - 02:07

In most places, mathematical text is written “left to right” (LTR). For example, in the expression x + y the plus is displayed to the right of the x and the y is displayed to the right of the plus. But in some Arabic locales, mathematical text is written right to left (RTL). Instead of E = mc2, one would see 2cm = E, although the letters would be Arabic, not Latin.

In such RTL locales, square roots are mirrored, so that the surd symbol √ is flipped relative to the vertical axis. Similarly integral signs are mirrored, although the circular arrows in contour integrals are not mirrored, since they pertain to the 2D complex plane, not the 2D text plane.<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

The Presentation MathML 3.0 specification provides for RTL math zones. In fact, it allows a dir = “ltr” or “rtl” attribute on the top level <math> element as well as on <mrow>, <mstyle> and token elements like <mi>. Except in rare cases, only the <math> direction need be specified, since all the elements inside have the same directionality (see Section 3.15 of the MathML 3.0 specification). The specification has now undergone Last Call status and so we need to have implementations of the new features. Accordingly I’m interested in implementing at least part of the RTL functionality, namely RTL math zones.

First, consider what an LTR math zone is. This is what Word 2007 and the Office 2010 applications implement. It does have RTL text whenever Arabic or standard Hebrew characters appear adjacent to one another. But all operators and other “neutral” characters are considered to be “strong LTR”, that is, they are displayed to the right of the character that precedes them. This can be quite different from a display that obeys the Unicode Bidirectional Algorithm. A sequence of digits is always displayed LTR, regardless of the character that precedes it even outside math zones and according to the Unicode bidi algorithm. Inside LTR math zones a sequence of digits is displayed to the right of the character that precedes it even if that character is Arabic. According to the Unicode bidi algorithm, a number following an Arabic character is displayed to the left of the Arabic character in both LTR and RTL paragraphs. Inside embedded normal text in a math zone, the usual rules for bidi text are followed. Note that except for such text, the math-zone bidi rules are much simpler than those of the Unicode bidi algorithm, which gets quite tricky in complicated scenarios.

Perhaps you noticed the term “standard Hebrew characters” above. By this I mean all Hebrew characters except the four Hebrew letter-like math symbols ALEF SYMBOL, BET SYMBOL, GIMEL SYMBOL, and DALET SYMBOL (U+2135..U+2138). These symbols are strong LTR characters, unlike their HEBREW LETTER counterparts located in the Unicode Hebrew block (U+0590..U+05FF).

Analogously in an RTL math zone and in the absence of directional overrides, operators and other neutrals are treated as strong RTL characters. A sequence of digits is still displayed LTR, but it appears on the left of the character that precedes it even if that character is Latin. Sequences of Arabic and standard Hebrew letters are RTL as usual. At least that’s how I think a typical RTL math zone should be displayed.

This description of math-zone directionality is somewhat simplified compared to the generality encountered in the real world. To see some of the special cases that can happen, please read the papers by Azzeddine Lazrek:

http://www.ucam.ac.ma/fssm/rydarab/doc/unicode/amassf.doc

http://www.ucam.ac.ma/fssm/rydarab/doc/unicode/amasl.pdf

http://www.ucam.ac.ma/fssm/rydarab/doc/unicode/amdsl.pdf

http://www.ucam.ac.ma/fssm/rydarab/doc/unicode/others.pdf

http://www.ucam.ac.ma/fssm/rydarab/doc/communic/unicodem.pdf

http://www.w3.org/TR/arabic-math/

http://www.ucam.ac.ma/fssm/rydarab/

 

The following review papers are excellent sources for overviews of RTL math:

http://en.wikipedia.org/wiki/Modern_Arabic_mathematical_notation

http://www.ima.umn.edu/2006-2007/SW12.8-9.06/activities/Lazrek-Azzeddine/MathArabIMAe.pdf



Categorias: Technology

Neil Frazer

xbeta - Sábado, 2009/11/07 - 17:36
Updated by Neil Frazer on 2009-11-07 at 15:36:11Z.
Categorias: Technology

Equation Numbering Prototype

Murray Sargent: Math in Office - Quinta, 2009/10/15 - 06:33

When writing the Math in Office 2010 post back in July, I could just imagine the disappointment various people would have when they discovered no mention of equation numbering. After getting math into PowerPoint, equation numbering had been the most often requested feature. Since PowerPoint 2010 now has the math facility, equation numbering has risen to the top of the wish list. Note that there is a way to display and manage equation numbering in Word as described in the work of Dong Yu. To get a feel for a native numbering facility, I implemented a prototype in RichEdit that interfaces to the underlying Page/TableServices (PTS) math handler’s equation numbering facility. This post describes that approach. It may not be the one we ship someday, but it does work pretty well. First I describe how equation numbers are represented in the file format and in memory and then consider equation number management.<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

 

The file format is one area a new feature has to consider. It would be nice not to change the file format for equation numbers, since it makes backward compatibility just that much harder to deal with. For example, it would be very convenient if equation numbers from a future version of Word would display reasonably well in Word 2010 or Word 2007, admittedly with small modifications to those versions. So I examined the nooks and crannies of the Office math format, OMML, for a math object that could be used to contain an equation number.

 

Lurking in that object space is a “no-op” phantom. To understand what this object is and why no one is likely to have used it, we need to know what a phantom object is. The phantom is characterized by five Boolean flags: 1) zero ascent, 2) zero descent, 3) zero width, 4) show, and 5) transparent. You can read about it in the post on MathML and Ecma Math (OMML) and in Sec. 3.17 of the linear format paper. The “zero” flags are handy for suppressing any combination of three object dimensions. Such suppression is called “smashing” by Donald Knuth. You can “smash” the descent of a character with a descender like y so that it has the same size as one without a descender like x. To facilitate entering smashes, we have the control words \asmash, \dsmash, and \hsmash to smash the ascent, descent, and width, respectively, of the phantom argument. Smashes have the show flag on, since the idea is to show the argument, but give it one or more zero dimensions. If you have the show flag off, you have a true phantom, in that space is taken up by the argument, but nothing is displayed. The fifth flag, the transparent flag, means that the argument is treated by the surrounding environment as if no object were there. So an equals sign inside a transparent phantom has math spacing appropriate for an equal sign outside a math object, apart from changes imposed by smashing various dimensions.

 

A no-op phantom is one that displays its argument as if the phantom weren’t even there: the argument has its true dimensions and it is transparent. Since the no-op phantom doesn’t serve any purpose, we might as well give it one: house an equation number. That works for the file format. Now let’s see how it works for actual display in a document.

 

The PTS math handler which ships with Office 2007 and Office 2010 has a number of callbacks for equation numbers. We just need to answer the questions appropriately and presto; the equation number will be displayed accordingly. To make sophisticated choices such as placing the numbers on the left hand side of equations instead of the right hand side, one needs to have some document properties. Ignoring such generality for the moment, we tell the PTS math handler to display the equation number flushed to the right on the last line of an equation. An equation may take up several lines and the PTS math handler can center the equation number vertically, if desired. Secondly, we place the no-op phantom as if it were an equation itself directly following the equation to be numbered. As such the no-op phantom is the only thing inside a soft paragraph, separated from its equation by a Shift+Enter (ASCII VT character). The PTS callbacks needed to be generalized slightly to deal with this special kind of “equation”, but it’s straightforward.

 

The last thing to consider is how to manage equation numbers, namely insert, delete, and edit them. Section 3.21 of the linear format paper explains how to type in an equation number, namely enter the equation followed by a # (U+0023) followed by the desired equation number text and type a [Shift]+Enter. The numbers can be edited in place. But underneath one needs renumbering and synchronization with inline equation number references. Such management could be accessed via a context-menu or math-zone acetate drop down menu. Insert and Delete would work with a single click and edit would open a dialog with options such as whether to include chapter and section numbers and whether the resulting entries should be separated by periods or hyphens. Word would likely treat the equation numbers as bookmarks so that links to them would automatically show the current number. Alternatively, the insert/delete/edit command handler could update all inline no-op phantoms to agree with the corresponding no-op phantoms inside math paragraphs.

 

MathML doesn’t have the concept of a math paragraph, but when embedded in a parent format, the math paragraph can be emulated fairly easily. Basically a math paragraph is just one or more math zones separated by soft paragraph breaks. So the equation number for a displayed MathML math zone (<math>…</math>) would follow the math zone. MathML has an <mphantom>, which suppresses the display of its argument, but keeps its size. Since we want to show equations numbers, <mphantom> isn’t useful for housing an equation number. <mpadded> has the other attributes of the OMML <phantom> except for transparent and if it just displays its argument with no changes, it could be used to house an equation number. In addition, it’s legal to add any attributes that one wants to a MathML element provided the attributes are in a private namespace, so one can be quite precise as far as a given family of implementations go. To be interoperable, MathML would have to add its own way of representing equation numbers, but hasn’t done so in MathML 2.0 or 3.0.

 

Categorias: Technology

Quinta, 1970/01/01 - 02:00

Conteúdo sindicado