Cell[CellGroupData[{
Cell["Converting Raw Data into XML", "Title",
TextAlignment->Center],
Cell[TextData[{
"An XML Example Using ",
StyleBox["Mathematica",
FontSlant->"Italic"],
" 4.2"
}], "Subtitle",
TextAlignment->Center],
Cell[TextData[{
"One of the useful tools in ",
StyleBox["Mathematica",
FontSlant->"Italic"],
" 4.2 is the ability to manipulate XML data. This notebook provides an \
example of how to use ",
StyleBox["Mathematica",
FontSlant->"Italic"],
" to convert a regular row-column data set into an XML document and then \
export it for later use. In this case, the data set is an XML form of the \
Ritter Catalog of Cataclysmic Variable Stars (Ritter, H., & Kolb, U., 1998, \
A&AS, 129, 83). The data file was converted into a CSV file using Microsoft \
Excel and now ",
StyleBox["Mathematica",
FontSlant->"Italic"],
" can import the file correctly. ",
StyleBox["You will probably need to modify the path to the file to reflect \
its location at your end",
FontSlant->"Italic"],
"."
}], "Text"],
Cell[BoxData[
\(\(data = Import["\", "\"];\)\)], "Input"],
Cell["\<\
The next step is to define a list of tag names that will be used in the final \
XML file.\
\>", "Text"],
Cell[BoxData[
\(\(atts = {"\", "\", "\", "\", "\", \
"\", "\", "\", "\", "\", "\", "\", "\", "\", "\", "\", "\", "\
\", "\", "\", "\", "\", "\", \
"\", "\", "\", "\", \
"\"};\)\)], "Input"],
Cell["\<\
Since the original data file has table headers, which are not needed in the \
final XML form, we can strip these headers off. The first two rows are the \
headers.\
\>", "Text"],
Cell[BoxData[
\(\(data = Drop[data, 2];\)\)], "Input"],
Cell[TextData[{
"The data file provides a list of objects and their parameters. In the \
original data set, every pair of rows corresponds to a single object. We can \
use ",
StyleBox["Mathematica",
FontSlant->"Italic"],
" to group the data into pairs."
}], "Text"],
Cell[BoxData[
\(\(data = Partition[ToString /@ Flatten[data], 28];\)\)], "Input"],
Cell["\<\
Next, we define a set of functions that allow us to convert each data entry \
into a corresponding SymbolicXML structure. Finally, the data set is \
converted into SymbolicXML.\
\>", "Text"],
Cell[BoxData[{
\(\(ValToSymbolicXML[name_, val_] :=
XMLElement[name, {}, {val}];\)\), "\n",
\(\(CVToSymbolicXML[cv_] :=
XMLElement["\", {},
MapThread[ValToSymbolicXML[#1, #2] &, {atts, cv}]];\)\), "\n",
\(\(DataToSymbolicXML[data_] :=
XMLElement["\", {}, CVToSymbolicXML /@ data];\)\), "\n",
\(\(data = DataToSymbolicXML[data];\)\)}], "Input"],
Cell["\<\
The original data set had many blank fields. Not all of the objects have all \
of the parameters defined. We can perform some \"house cleaning\" by \
defining a function to remove any elements that are undefined/blank or that \
have question marks in them. Question marks were added to the original data \
set to denote end of record markers for those cases in which the final fields \
were blank.\
\>", "Text"],
Cell[BoxData[
\(\(data =
data /. {XMLElement[_, {}, {"\<\>"}] \[Rule] Sequence[],
XMLElement[_, {}, {"\"}] \[Rule] Sequence[]};\)\)], "Input"],
Cell[TextData[{
"The original data set used special characters in some of the fields to \
denote special meaning. This notation is fine for visual inspection of the \
catalog, but to make the data easier to analyze in an XML format, these \
special characters are better reflected as attributes of ",
StyleBox["XMLElement",
FontFamily->"Courier New"],
"s rather than as part of the content of the ",
StyleBox["XMLElement",
FontFamily->"Courier New"],
". The special character used in the original data set included colons, \
question marks, asterisks, less than, greater than, and B (denoting that a \
blue filter was used). Next, we define a set of tests to determine if a data \
entry has a special character associated with it."
}], "Text"],
Cell[BoxData[{
\(\(EndsInColon[s_String] := "\<:\>" ===
StringTake[s, \(-1\)];\)\), "\n",
\(\(EndsInQuestion[s_String] := "\" ===
StringTake[s, \(-1\)];\)\), "\n",
\(\(EndsInStar[s_String] := "\<*\>" ===
StringTake[s, \(-1\)];\)\), "\[IndentingNewLine]",
\(\(StartsWithLess[s_String] := "\<<\>" === StringTake[s, 1];\)\), "\n",
\(\(StartsWithGreater[s_String] := "\<>\>" ===
StringTake[s, 1];\)\), "\n",
\(\(EndsWithB[s_String] := "\" ===
StringTake[s, \(-1\)];\)\)}], "Input"],
Cell["\<\
With the above tests defined, we can apply the tests using replacement rules. \
The resulting data is then ready for exporting as a generic XML file.\
\>", "Text"],
Cell[BoxData[{
\(\(data =
data /. {XMLElement[name_, atts_, {val_?EndsInColon}] :>
XMLElement[name,
Append[atts, "\" -> "\"], {StringDrop[
val, \(-1\)]}]};\)\), "\n",
\(\(data =
data /. {XMLElement[name_, atts_, {val_?EndsInQuestion}] :>
XMLElement[name,
Append[atts, "\" -> "\"], {StringDrop[
val, \(-1\)]}]};\)\), "\n",
\(\(data =
data /. {XMLElement["\", atts_, {val_?EndsInStar}] :>
XMLElement["\",
Append[atts, "\" -> "\"], {StringDrop[
val, \(-1\)]}]};\)\), "\n",
\(\(data =
data /. {XMLElement[name_, atts_, {val_?StartsWithLess}] :>
XMLElement[name,
Append[atts, "\" -> "\"], {StringDrop[
val, \(-1\)]}]};\)\), "\n",
\(\(data =
data /. {XMLElement[name_, atts_, {val_?StartsWithGreater}] :>
XMLElement[name,
Append[atts, "\" -> "\"], {StringDrop[val,
1]}]};\)\), "\n",
\(\(data =
data /. {XMLElement[name_, atts_, {val_?EndsWithB}] :>
XMLElement[name,
Append[atts, "\" -> "\"], {StringDrop[
val, \(-1\)]}]};\)\)}], "Input"],
Cell[TextData[{
StyleBox["Mathematica",
FontSlant->"Italic"],
" 4.2 has the ability to Export[ ] the SymbolicXML expression to regular \
XML format. "
}], "Text"],
Cell[BoxData[
\(Export["\", data]\)], "Input"]
}, Open ]]
