News

Se utskriftsvennlig utgave

Paragraphs and Paragraph Formatting

Murray Sargent: Math in Office - lør, 2008/11/22 - 05:25

What paragraphs are and how they are formatted are questions that continually come up both inside and outside of Microsoft. So this post describes Word/RichEdit paragraphs in general. A subsequent post will describe the “math paragraph”, which is part of a regular paragraph and is used for displayed equations, as distinguished from inline mathematical expressions.

The paragraph is a very important structure in written language. About six years ago, I developed the RichEdit binary format, which shipped with RichEdit 5.0 (Office 2003) as RichEdit’s preferred copy/paste format and was used by OneNote 2003 and 2007. In the design stage I talked with Eliyeser Kohen, of TrueType, OpenType, LineServices, and Page/Table Services fame. I was inclined to have four parallel streams: plain text, character formatting, paragraph formatting, and embedded objects, a format corresponding to the internal RichEdit representation. Eliyezer agreed such parallel streams were important, but insisted that they should be broken up into paragraphs. At the time, this seemed like extra overhead to me and I naturally didn’t want to slow things down. But I followed his advice and it’s right on! First, what’s a paragraph? Then what’s paragraph formatting? Then what’s a “soft” paragraph? And finally, what’s the final EOP?

What’s a paragraph?

 From a natural language point of view, a paragraph is one or (preferably) more closely related sentences that naturally belong together without becoming too long. From the Word/RichEdit point of view, a paragraph is a string of text in any combination of scripts and inline objects, including possible “soft” line breaks and “math paragraphs”, with uniform “paragraph formatting” up to and including a carriage return. The carriage return (CR) is given by the Unicode character U+000D, which you insert by typing the Enter key. In plain text on PC’s, the paragraph is usually terminated by a CRLF (U+000D U+000A) combination, but not ordinarily inside a Word document or RichEdit instance. Just the CR is used.

<rant> It’s quite convenient to use a single character. It takes up less space than the CRLF and it’s easier to parse/manipulate, since it’s an atomic entity. In fact, Unix already used a single character, the line feed (LF—U+000A), back in 1972, several  years before the PC operating systems were developed. Unfortunately, the PC with its DEC heritage preferred CRLF, a holdover from the old teletype days, and Word and the Mac shortened it to CR instead of LF. Windows NotePad still isn’t able to display Unix/Linux LF terminated paragraphs correctly after all these years (note that 2008 > 1972). I’m on a mission to fix that, but please don’t hold your breath! Anyhow I like CR better than LF, mostly because of habit. Clearly it would have been better to have a single standard. In this connection, it’s interesting to note that Word and RichEdit can handle CR, LF, and CRLF terminated paragraphs, even though they prefer CR. </ rant>

What’s paragraph formatting?

A key characteristic of a paragraph is its formatting, which is represented by a pretty large set of properties. Most of these properties are settable using a paragraph formatting dialog. In particular, there’s alignment (left, right, center, justify, along with a variety of East Asian options), space before/after, line spacing (single, double, multiple, at least, exactly), left/right margins and wrapped line indent, line/page breaks, tabs (oh, how I wish HTML had tab support!), and bullets/numbering. Internally, paragraphs and their formatting get overloaded with such entities as tables and drop caps, but let’s not get distracted. Using hot keys like Ctrl+E for centering or the paragraph formatting dialog, you can set the formatting for the paragraph(s) in which the current selection occurs. If you just have an insertion point (the blinking caret), only the paragraph containing the insertion point gets the new formatting.

What’s a soft paragraph?

When you create a numbered list, you may want to have an entry with one or more line breaks but no new number or bullet.  To insert a line break without ending the paragraph, type Shift+Enter, which inserts a Vertical Tab (VT—U+000B). Even though you get a line break, you don’t end the current paragraph, so no new line number appears. All the paragraph properties remain the same with the new line and the space-before property doesn’t apply to the new line, since the line is inside the paragraph. Sometimes it’s handy to refer to a sequence of lines terminated by such a line break as a “soft paragraph”. In HTML, these “soft line breaks” are represented by the <BR> tag, whereas “hard” paragraphs are identified by the <P> tag.

Thinking of numbered entities, you might want to change the character formatting of the number or bullet out in front. For example, you might want to use a larger font size or a different font. To do this, change the appropriate character formatting of the CR that ends the paragraph.

Final EOP

To provide a place to attach paragraph formatting for the last paragraph, every Word document and every RichEdit rich-text instance has a “final EOP” (end of paragraph), represented by a CR (CRLF in RichEdit 1.0). You cannot delete the final EOP, nor can you move the insertion point past it. In the Word and RichEdit object models, the ranges can select up through the final EOP, but they cannot collapse to an insertion point that follows the final EOP. The farthest they can go is up to just before the final EOP. Similarly messages like EM_EXSETSEL cannot make the RichEdit selection go beyond the final EOP.

RichEdit also supports plain-text controls, which are characterized by uniform paragraph formatting and don’t need, or have, a final EOP. An empty plain-text control is really empty, whereas a rich-text control always has at least one character, the final EOP.

Kategorier: Technology

New media literacies

Otro blog mas - fre, 2008/11/21 - 18:20

¿Alfabetización en nuevos medios? No me gusta la traducción pero tendrá que servir, de momento. newmedialiteracies.org es un proyecto con el sello de los Comparative Media Studies del MIT, bajo el liderazgo de Henry Jenkins (aunque no por mucho tiempo), bastante más allá de la alfabetización digital… Muy interesante, e intentando dar respuestas a muchas preguntas que se abren a demasiada velocidad.

Vía.

Kategorier: Planet JEM

Europeana

eLearning Europa - fre, 2008/11/21 - 02:00
Kategorier: eLearning

2+2 math for kids

eLearning Europa - fre, 2008/11/21 - 02:00
Kategorier: eLearning

Hulda Festival: Journey into Art & Science

eLearning Europa - tor, 2008/11/20 - 02:00
The basic concept is to reveal the overlap between art and science and use it as an additional means of informal science education, enabling a wider public to reflect and debate on science and technology. It is expected to reach 150000 persons directly and millions through media. Hulda will visit 10 European ports for her 1st trip from Stockholm to Istanbul.Extracted from Seventh Framework...
Kategorier: eLearning

Creative design for innovation in science and technology

eLearning Europa - tor, 2008/11/20 - 02:00
This will lead to the elaboration of:theories and models of creative processes in general, and those involved in creative problem solving in particular, andmethods, techniques and systems to support both creative design processes and creativity training.These 3 year positions are funded through the European Commission Framework 7 and are offered with excellent financial conditions including...
Kategorier: eLearning

Usability of music for social inclusion of children

eLearning Europa - tor, 2008/11/20 - 02:00
The multidisciplinary and transnational S&T UMSIC project develops a system that opens interactive environments for children to communicate informally with their peers by using familiar modern technologies. With a special focus on child-centred usability, intelligent musical engineering and carefully developed pedagogical design that is allied to structured learning material, UMSIC allows...
Kategorier: eLearning

The Google Generation in Higher Education: Entrant Students and Learning Technologies

eLearning Europa - ons, 2008/11/19 - 02:00
Introduction The first year at university is a time of significant flux for many learners as they adjust to unfamiliar academic and social environments. They encounter new approaches to teaching and must start to develop fresh learning strategies as they take the first steps towards becoming autonomous learners. This sense of uncertainty may be compounded by the requirement to interact with...
Kategorier: eLearning

Encontro sobre Podcasts

eLearning Europa - ons, 2008/11/19 - 02:00
Kategorier: eLearning