(*^
::[ Information =
"This is a Mathematica Notebook file. It contains ASCII text, and can be
transferred by email, ftp, or other text-file transfer utility. It should
be read or edited using a copy of Mathematica or MathReader. If you
received this as email, use your mail application or copy/paste to save
everything from the line containing (*^ down to the line containing ^*)
into a plain text file. On some systems you may have to give the file a
name ending with ".ma" to allow Mathematica to recognize it as a Notebook.
The line below identifies what version of Mathematica created this file,
but it can be opened using any other version as well.";
FrontEndVersion = "Macintosh Mathematica Notebook Front End Version 2.2";
MacintoshStandardFontEncoding;
fontset = title, inactive, noPageBreakBelow, nohscroll, preserveAspect,
groupLikeTitle, center, M7, bold, e8, 24, "Times";
fontset = subtitle, inactive, noPageBreakBelow, nohscroll, preserveAspect,
groupLikeTitle, center, M7, bold, e6, 18, "Times";
fontset = subsubtitle, inactive, noPageBreakBelow, nohscroll,
preserveAspect, groupLikeTitle, center, M7, italic, e6, 14, "Times";
fontset = section, inactive, noPageBreakBelow, nohscroll, preserveAspect,
groupLikeSection, grayBox, M22, bold, a20, 18, "Times";
fontset = subsection, inactive, noPageBreakBelow, nohscroll,
preserveAspect, groupLikeSection, blackBox, M19, bold, a15, 14, "Times";
fontset = subsubsection, inactive, noPageBreakBelow, nohscroll,
preserveAspect, groupLikeSection, whiteBox, M18, bold, a12, 12, "Times";
fontset = text, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M7,
12, "Times";
fontset = smalltext, inactive, nohscroll, noKeepOnOnePage, preserveAspect,
M7, 10, "Times";
fontset = input, noPageBreakInGroup, preserveAspect, groupLikeInput, M42,
N23, bold, B65535, L-5, 12, "Courier";
fontset = output, output, inactive, noPageBreakInGroup, preserveAspect,
groupLikeOutput, M42, N23, L-5, 12, "Courier";
fontset = message, inactive, noPageBreakInGroup, preserveAspect,
groupLikeOutput, M42, N23, R65535, L-5, 12, "Courier";
fontset = print, inactive, noPageBreakInGroup, preserveAspect,
groupLikeOutput, M42, N23, L-5, 12, "Courier";
fontset = info, inactive, noPageBreakInGroup, preserveAspect,
groupLikeOutput, M42, N23, B65535, L-5, 12, "Courier";
fontset = postscript, PostScript, formatAsPostScript, output, inactive,
noPageBreakInGroup, preserveAspect, groupLikeGraphics, M7, l34, w282, h287,
12, "Courier";
fontset = name, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M7,
italic, 10, "Geneva";
fontset = header, inactive, noKeepOnOnePage, preserveAspect, M7, 12, "Times";
fontset = leftheader, inactive, L2, 12, "Times";
fontset = footer, inactive, noKeepOnOnePage, preserveAspect, center, M7,
12, "Times";
fontset = leftfooter, inactive, L2, 12, "Times";
fontset = help, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M7,
10, "Times";
fontset = clipboard, inactive, nohscroll, noKeepOnOnePage, preserveAspect,
M7, 12, "Times";
fontset = completions, inactive, nohscroll, noKeepOnOnePage,
preserveAspect, M7, 12, "Times";
fontset = special1, inactive, nohscroll, noKeepOnOnePage, preserveAspect,
M7, 12, "Times";
fontset = special2, inactive, nohscroll, noKeepOnOnePage, preserveAspect,
M7, 12, "Times";
fontset = special3, inactive, nohscroll, noKeepOnOnePage, preserveAspect,
M7, 12, "Times";
fontset = special4, inactive, nohscroll, noKeepOnOnePage, preserveAspect,
M7, 12, "Times";
fontset = special5, inactive, nohscroll, noKeepOnOnePage, preserveAspect,
M7, 12, "Times";
currentKernel;
]
:[font = smalltext; inactive; preserveAspect]
Copyright (C) 1997 Rich Neidinger, John Swallow, and Todd Will. Free for
distribution to college and university instructors for personal,
non-commercial use. If these notebooks are used in a course, the authors
request $20 per student.
:[font = title; inactive; preserveAspect]
8. Strings; Strings as Lists
:[font = smalltext; inactive; preserveAspect]
Last revision: September 17 1996
:[font = text; inactive; preserveAspect]
The topic for this lab is the data type String, together with some
functions Mathematica provides for manipulating strings. We'll also look
at the (data type) conversion functions "FromCharacterCode[ ]",
"ToCharacterCode[ ]", "StringJoin[ ]", and "Characters[ ]", and explore how
we can use the list functions we know to duplicate the string operations.
:[font = section; inactive; Cclosed; preserveAspect; startGroup]
Strings and Things
:[font = subsection; inactive; Cclosed; preserveAspect; startGroup]
The Data Type String
:[font = text; inactive; preserveAspect]
The data type String is used for values which are finite sequences of
characters, such as "Jane" or "Hello, Mike". Inside a standard computer
system, each character of a string is represented by an integer code
between 0 and 127, each integer between 0 and 127 corresponding to some
letter, number, or "special character" such as "+", "&", or "@". The
standard code is ASCII, the American Standard Code for Information
Interchange. (On computer systems providing larger character sets, such as
sets including vowels with accents, the maximum integer may exceed 127.)
:[font = text; inactive; preserveAspect]
To define or use a String does not, however, require you to know the codes
for each of the characters. The usual way to denote a String is simply to
write out the list of characters, one at a time (with no spaces in between
unless you mean them), placing a double quotation mark before the first and
after the last; for example, "Frayed Knot" is a String with 11 characters:
"F", "r", "a", "y", "e", "d", " " (the blank space), "K", "n", "o", and
"t". Let's get Mathematica to agree with us:
:[font = input; preserveAspect]
s1 = "Frayed Knot"
:[font = input; preserveAspect]
Head[s1]
:[font = input; preserveAspect]
StringLength[s1]
:[font = text; inactive; preserveAspect; endGroup]
Notice that "Length[ ]" applies to a List, "StringLength[ ]" to a String,
and recall that the "Head[ ]" function gives us the data type of any name
in Mathematica.
:[font = subsection; inactive; Cclosed; preserveAspect; startGroup]
ASCII Codes
:[font = text; inactive; preserveAspect]
How can we find out the codes Mathematica uses to represent the letters we
see in a String? We could consult a table, but instead we'll just ask
Mathematica. The function we want to use is a data type conversion
function, "ToCharacterCode[ ]". If we give a String to "ToCharacterCode[
]", it will return a List of the Integer ASCII codes corresponding to each
character in the String, in order. For instance:
:[font = input; preserveAspect]
ToCharacterCode[s1]
:[font = text; inactive; preserveAspect]
By investigating the codes, you can find that the blank space " " has code
32, "a" through "z" have codes 97 to 122, "A" through "Z" have codes 65 to
90, and "0" through "9" have codes 48 to 57. You won't need to memorize
these codes, but it is reassuring to know that the code for "b" is one
greater than the code for "a", and so on.
:[font = text; inactive; preserveAspect]
What if we have the codes, and we would like to form a string of characters
from the codes? We need the opposite function to "ToCharacterCode[ ]",
which is "FromCharacterCode[ ]".
:[font = input; preserveAspect]
FromCharacterCode[32]
:[font = input; preserveAspect; endGroup]
FromCharacterCode[{49,57,57,54}]
:[font = subsection; inactive; Cclosed; preserveAspect; startGroup]
Special Characters
:[font = text; inactive; preserveAspect]
There are some special characters in ASCII---the characters which instruct
the computer to begin a new line or to tab over some spaces, for instance.
Also, there are characters to put a double quotation mark in a String; we
need such a character, because Mathematica would have difficulty deciding
whether """" is two Strings with no characters in them, one String with two
double quotation marks, or one String with one double quotation mark
alongside an extraneous double quotation mark. Here's a partial list of
special characters:
:[font = text; inactive; preserveAspect]
\" denotes a double quotation mark
\\ denotes a backslash
\n denotes a forced new line
\t denotes a tab (implemented differently on different systems)
\b denotes a backspace
:[font = text; inactive; preserveAspect]
We'll use several of these in one String:
:[font = input; preserveAspect; endGroup]
"Here's a quotation mark: \", a backslash: \\, \nand a new line!"
:[font = subsection; inactive; Cclosed; preserveAspect; startGroup]
String Functions
:[font = text; inactive; preserveAspect]
There are many string functions which behave almost exactly like the
corresponding list functions. Here is a list of a few:
:[font = text; inactive; preserveAspect]
StringLength[ ]
StringJoin[ ]
StringReverse[ ]
StringTake[ ]
StringDrop[ ]
:[font = text; inactive; preserveAspect]
In fact, all of these functions can be defined using list operations along
with some data type conversion functions. (This is your homework!) Let's
make sure we know what each of them does:
:[font = input; preserveAspect]
StringJoin["Davidson","College"]
:[font = input; preserveAspect]
StringReverse["Wildcats"]
:[font = input; preserveAspect]
StringTake["Wildcats",4]
:[font = input; preserveAspect]
StringDrop["Wildcats",4]
:[font = text; inactive; preserveAspect]
We can also use the boolean operator "==" with Strings to test equality.
:[font = input; preserveAspect]
"austen" == "Austen"
:[font = text; inactive; preserveAspect]
Although we cannot use the operators "<" and ">" on Strings, we can at
least sort a List of several Strings using "Sort[ ]". How are Strings
ordered? It depends on a certain Mathematica "environment variable" called
"$StringOrder". By default, "$StringOrder" is {{"a","A"}, {"b","B"},
...{"z","Z"}, "0", "1", ... "9"}, and it is used to order strings by
comparing characters left-to-right. If the beginning letters are
different, then the "smaller" string is defined to be the one whose first
letter occurs first in $StringOrder. If the two first characters are the
same, then Mathematica looks at the two second characters, and so on. Here
are some examples:
:[font = input; preserveAspect]
Sort[{"Abe","Lincoln"}]
:[font = input; preserveAspect]
Sort[{"Jane","jane"}]
:[font = text; inactive; preserveAspect]
For those who would like Strings ordered strictly according to their ASCII
codes, there are two options: either reset "$StringOrder" to "{}" (by
evaluating "$StringOrder = {}"), or decompose the Strings into Integer
codes, sort those lists of numbers, and put the Strings back together.
:[font = text; inactive; preserveAspect]
There are some other boolean operators we can use on Strings to test
various properties.
"DigitQ[ ]" tests whether or not all of the characters are digits,
"LetterQ[ ]" whether they are letters, "UpperCaseQ[ ]" whether they are
upper-case letters, and "LowerCaseQ[ ]" whether they are lower-case
letters.
:[font = input; preserveAspect]
DigitQ["01"]
:[font = input; preserveAspect]
DigitQ["0A"]
:[font = input; preserveAspect]
UpperCaseQ["HELLO"]
:[font = input; preserveAspect]
LowerCaseQ["xxx"]
:[font = text; inactive; preserveAspect]
Finally, we can change the case of characters in a String by using
"ToUpperCase[ ]", which forces all of the letters in the String to be
upper-case, or "ToLowerCase[ ]", which does the opposite.
:[font = input; preserveAspect; endGroup]
ToUpperCase["e. e. cummings"]
:[font = subsection; inactive; Cclosed; preserveAspect; startGroup]
Strings and Lists
:[font = text; inactive; preserveAspect]
We've already encountered functions which change Strings into Lists and
back again, namely "ToCharacterCode[ ]" and "FromCharacterCode[ ]".
Suppose, however, that we don't want to see the codes but simply want the
String broken up into a list of characters which we can then manipulate.
Mathematica will do so with the function "Characters[ ]", and we can undo
this operation by using "StringJoin[ ]", which puts them back together.
:[font = input; preserveAspect]
Characters["Characters"]
:[font = input; preserveAspect]
StringJoin[%]
:[font = text; inactive; preserveAspect; endGroup]
This function permits us to perform list operations on Strings without
using the String functions and without breaking the Strings into Integer
codes.
:[font = subsection; inactive; Cclosed; preserveAspect; startGroup]
Characters and Names
:[font = text; inactive; preserveAspect]
When Mathematica evaluates a string, it returns the string without
quotation marks. When Mathematica evaluates a name which is undeclared, it
returns the name. Hence it may be
difficult to distinguish between the result of "i" the string and "i" the
variable! Be sure to
enclose any character you intend to use with quotation marks, so that you
don't run into trouble down the road. Here's an example of a problem of
this sort:
:[font = input; preserveAspect]
list1 = Insert[{"a","b","c"},"d",4]
list2 = Insert[{"a","b","c"},d,4]
StringJoin[list1]
StringJoin[list2]
:[font = text; inactive; preserveAspect; endGroup; endGroup]
Note that the peculiar symbol "<>" is the operational equivalent for the
"StringJoin[ ]" command (much like "+" is the operational equivalent of the
"Plus[ ]" command). In the last line, Mathematica is returning "abc"<>d,
which is the equivalent of "StringJoin["abc",d]", because, as usual, when
Mathematica cannot perform an operation (in this case, joining the string
"abc" to the variable d), it returns the command unevaluated.
:[font = section; inactive; Cclosed; preserveAspect; startGroup]
Converting Numeric Strings into Numbers
:[font = text; inactive; preserveAspect]
How could we convert the String "123" into the Integer 123? There are
several methods we might try. We could first find the ASCII codes from the
characters:
:[font = input; preserveAspect]
ToCharacterCode["123"]
:[font = text; inactive; preserveAspect]
Then we might realize that since the ASCII codes for "0" to "9" are 48 to
57, to convert from the codes to the actual numeric digits, we would simply
need to subtract 48 from each. Subtracting a number from a list does this
to each entry, so we could do the following:
:[font = input; preserveAspect]
ToCharacterCode["123"] - 48
:[font = text; inactive; preserveAspect]
Now we'd like to put these together into a number. But we don't want to
add them; we first want to multiply each of them by the correct power of
ten. The rightmost digit would be multiplied by 1, the next-to-last by 10,
and so on. How can we create this list of powers of ten? Well, we
can use "Table[ ]":
:[font = input; preserveAspect]
Table[10^j,{j,0,2}]
:[font = text; inactive; preserveAspect]
How did we know to stop at 2? Well, that was the length of our list, minus
one. Now that we have the list of powers of ten, what can we do with it?
We would like to multiply corresponding elements of our list of digits with
our list of powers of ten, except that the powers of ten should be in
reverse order. How's this:
:[font = input; preserveAspect]
( ToCharacterCode["123"] - 48 ) *
Table[10^j,{j,2,0,-1}]
:[font = text; inactive; preserveAspect]
Aha! Now we simply want to add these together. We're almost ready to
define a conversion function from numeric Strings to Integers. (We haven't
dealt with decimals or fractions and so on.) Here's how to convert "123":
:[font = input; preserveAspect]
Apply[Plus,( ToCharacterCode["123"] - 48 ) *
Table[10^j,{j,2,0,-1}]]
:[font = text; inactive; preserveAspect]
To make this a good function, we should make sure the 2 depends on the
length of the String.
:[font = input; preserveAspect; endGroup]
string2Num[s_] :=
Apply[Plus,( ToCharacterCode[s] - 48 ) *
Table[10^j,{j,StringLength[s]-1,0,-1}]]
^*)