You Haven't Been Forgotten
The View from the Vault
Processing of Input Files
Revisions of Records
Ron Rakowski and Roger Maris
The Variable Rules of Baseball
Game Account Acquisitions
Retrosheet Meeting in Pittsburgh
The Retrosheet Community
Future Editions of Newsletter
Although the plan was for this newsletter to be published quarterly, we have obviously slipped pretty far behind that schedule. Much has happened in the Retrosheet world since the first edition appeared last October and, working under the presumption that late is better than never, here is the next installment in our continuing saga.
Return to Table of Contents
Pete Palmieri graciously agreed to donate his talents in the editing and formatting of this edition and he provided some nice samples with impressive style improvements. However, given the delay in getting material to him, this edition will be in the same format as the first one. The text is still written by David Smith, who remains the ubiquitous "I" in what follows.
Return to Table of Contents
Pete and I agreed that it would be appropriate to have a separate section where I offer a general overview about the state of Retrosheet, similar to the way that David Pietrusza does in The Press Box section of the SABR Bulletin. I chose the hopefully catchy heading for my column based on the image of all of our scoresheets residing in our 14 file cabinets in my basement. Many of you have called me and waited patiently as I descended, cordless phone in hand, to the subterranean reaches.
The main topic I wish to address this time is data distribution. The central principle of Retrosheet's operation has always been that the data files will be available to all who wish to have them, but we do not yet have a Board-approved policy for the actual distribution. At the first Board meeting last year in Arlington the general features of a policy were discussed, but no formal action was taken. I admit that completing the policy has been a fairly low priority to me, certainly less pressing than the day to day activities of distributing scoresheets to volunteers and processing input files as they come in. However, based on the many requests I have received in the last six months, it has become obvious that I have misjudged the importance of this matter. We have an obligation to make good on our commitment to make the data available and the Board addressed the question in detail at our meeting in Pittsburgh in June. The final version is not quite in hand, although it is nearly so. David Vincent has prepared a draft which is currently being examined by the Board members. Final approval should be realized by September. I apologize for the delay and greatly appreciate the patience of everyone who has had their data request deferred.
I have one additional comment to make, and that concerns our volunteers. In the first edition I pointed out how special Retrosheet was and emphasized our unique mission. Retrosheet is unique, but the vitality of the organization resides in the fantastic attitude of its selfless volunteers. An enormous number of person-hours have already been donated and this tremendous effort has been supplied with enthusiasm and unmistakable good will. I am in the fortunate position to receive all the expressions of support but it is important that everyone knows that his or her feelings for baseball are shared throughout the organization; we all owe a great debt to each other.
Return to Table of Contents
Jim Wohlenhaus, who lives in Westminster, Colorado, asked me a few months ago to explain exactly what I did with the files after I receive them from inputters. This is a very reasonable request and I present here my answer to him followed by an extension that has an impact on the question of data distribution.
The first thing I do is analyze them with a program called CHECK that is a modification of DWCHECK. This program looks for all sorts of possible syntax and style problems. Then I examine each file in a text editor and read the comments, if any, that the inputter has inserted. Often I end up referring to the original scoresheet and, in difficult cases, a box score from The Sporting News or a daily newspaper. If there is anything I can't rationalize it goes on a list for further newspaper research, using microfilmed copies in the library. Sometimes this whole process takes a few days, but I almost always get everything tracked down. Most importantly, I definitely don't just load the games into a directory and forget about them, because these problems would then never get addressed. Detailed comments in letters from inputters are really helpful because they flag the areas where the uncertainties arose. That care and the inclusion of comments in the file are the two biggest helps I can get from inputters in getting our data as clean as possible.
Of course, this rudimentary processing is no where near the complete story. Some of the scoresheets we work from have mistakes in them and occasionally scoring decisions are changed by official scorers after the game and these alterations don't necessarily show up on our scoresheets. In addition translators and inputters can make errors in their processing, for example entering a double as a single, a mistake I have made myself more than once (well, "S" and "D" are adjacent keys after all). Ron Fisher and I independently made the same odd mistake over the winter, namely reversing the visiting and home teams. As a result the bottom of the 9th wasn't entered for those games and I only caught it with my data-logging program (more details below).
The point of all this is that we have a formidable job to do in reconciling our data with the official totals. The starting point for this phase really doesn't start until after we have entered all the games for a team for a season, since then it is possible to make comparisons against the official totals. We have the official totals in computer form and I have written a program to compare the numbers generated from our files for each batter and pitcher against the official ones (proofing of fielding data remain generally elusive, although the work of Gary Robbins on the 1964 Yankees is a striking exception - see the next section). When discrepancies are found for an individual player, the next stage is to generate running totals from our files and check against the weekly numbers printed in The Sporting News. This process usually identifies the week in which the discrepancy resides and then it is time for a day by day analysis of his numbers to find where the difference is.
The last phase of the checking is potentially the most frustrating, because all of the seasons that Retrosheet considers were played prior to computerized recording of the official totals. As I summarized in the first edition of the newsletter, the procedure for almost all of this century has been to keep a daily log for each player. The official scorer submits a summary report after each game and the numbers are transcribed by hand onto the individual players' sheets. Batting and fielding data are recorded on one sheet and pitching information on a separate one. Copies of these sheets are maintained on microfilm at the Hall of Fame and at The Sporting News (I believe the league offices have the paper originals). Needless to say, the chance for errors being made on these handwritten sheets is extraordinary, as can be seen for most seasons prior to 1980 or so when internal inconsistencies are pretty easy to find (more on this point below).
Here's the connection to data distribution. As I reflect on the progress of our organization, I have concluded that one of the reasons I dragged my feet on getting a formal policy implemented is that I wanted to wait until the files were as perfect as we could possibly make them. Putting the dime store psychoanalysis aside, I now understand it is more realistic to expect the data files to undergo several iterations as proofing proceeds. That doesn't mean that we will distribute data that we know to be wrong, but it does mean that continuous improvement of the information should be seen as the norm. The appropriate disclaimers will, of course, accompany any data we send out.
Return to Table of Contents
One of the more interesting responses I have gotten from many people when they first hear about Retrosheet is something like "Oh, so you're planning to rewrite the records books, how do you know that your numbers are correct?" The barely hidden, somewhat antagonistic subtext that I always hear in this question is that our information is somehow suspect since we do not make the assertion, which would be quite arrogant in my opinion, that we are 100% correct. This notion of "truth" in the matter of baseball statistics strikes me as really silly. No one's totals, not Elias, not Total Baseball, not the Baseball Guide, are any better than the raw data which went into them, and they can, in fact, be a lot less than the quality of the raw data. In the last 25 years, since the inception of SABR to be more precise, there have been many claims put forward to change various totals from the past: Cap Anson's hit total, Walter Johnson's win total, the winner of the 1911 AL batting race, and many others. The details of each case are different and there are huge variations in the extent of the documentation provided in support of the requested change. Lyle Spatz and the members of SABR Records Committee have done an extraordinary amount of fine work in cleaning things up. The official Major League response to proposed changes has varied all over the place, from agreement in some cases to vehement denial in others. Of recent note, of course, is that Total Baseball is now the holder of the imprimatur from Major League Baseball as the "official" source of records. Perhaps the most visible consequence has been that Cap Anson is now therefore acknowledged as having garnered 2995 hits, instead of the 3000 he was credited with for a long time.
So much for the intro; how does Retrosheet fit into this discussion? I offer here my personal opinion of Retrosheet's role. I feel strongly that it is in Retrosheet's interest for the organization itself not to take stands on any proposed changes, nor to advocate any changes in the official records. I addressed this area in the first edition of the newsletter, when I noted that Luke Kraemer had uncovered a number of cases in which he found that the official totals were wrong. Since we have processed many more games in the past nine months, there are even more documented discrepancies in hand, including a significant one that Ron Rakowski uncovered about the AL RBI leader in 1961 (more on that in the next section).
The most detailed analysis of a large number of games has been done by Gary Robbins of Crestwood, Kentucky, who translated all of the games of the 1964 Yankees, entered most of them into the computer (Dave Lamoureaux of Pittsburgh did some of the entry as well), and then generated daily records for each player using a program that I wrote. He compared these daily records against the microfilmed official totals for all categories, including the most dreaded of all: fielding. He found dozens of differences, including many literally impossible situations; I will summarize two examples from the fielding data (all the others are equally well-documented).
This is an appropriate time to ask "so what?" I quote from an Email message Gary sent me: "Does it really matter if Babe Ruth hit 714 or 713 home runs? Of course it does. Does it really matter if Phil Linz had 7 or 9 assists at second base during the 1964 season? Probably not to most people. But where do you draw the line? At what point is the information unimportant? And if Phil Linz's assist total is not correct, are the other records accurate?"
It has been a long way since the start of this section (about 800 words), so it seems reasonable to make some sort of summary. I think that Retrosheet has a very valuable contribution to make in finding discrepancies within the official totals. Widespread dissemination of these discoveries, along with all possible documentary evidence, would be an excellent service on our part. However, if we enter into the fray and argue that some numbers should be changed, then we will, in my opinion, run a huge risk of getting lost in the quagmire that results when politics mixes with the more objective world of tabulating numbers. Note carefully that I am not at all opposed to any individual taking our information and using it to advance any case he or she wishes to address. I would in fact be delighted for Retrosheet to be seen as a reliable source, but the pursuit of statistical adjustments has a long history of turning nasty and I see nothing for us to gain by jumping into the middle; maintaining the organization's credibility is a high priority for me.
The opinions here are definitely my own personal thoughts and should definitely not be taken as Retrosheet policy, and in fact the organization has no policy on this question. I welcome responses, challenges, attacks or whatever from any reader of this section. If anyone wishes to have his or her opinions on this topic printed in the next newsletter, send them along and I promise they will be distributed.
Return to Table of Contents
Those who are SABR members already know some of this story, especially if you heard Ron's excellent presentation at the Pittsburgh convention. In addition a very brief summary has appeared in the SABR Bulletin and Lyle Spatz featured it in the SABR Records Committee newsletter.
As mentioned in the first edition of the newsletter, Ron has essentially completed the 1961 season, both leagues. He is still tracking down a few games from some of our "dead spots", Cincinnati, Pittsburgh, Milwaukee, Kansas City, and Boston. As a major part of proofing the 1961 data, Ron compared daily records generated from his event files for each AL player to the microfilmed official data and he found a discrepancy for Roger Maris' RBI totals. Specifically he found that on July 5 (Cleveland at New York) the official records credit Maris with two RBI when the two scoresheets we have show him with only one. The play in question is described nicely in the paper Ron prepared for the meeting in Pittsburgh:
In the third inning of that game, Tony Kubek led off the inning by striking out, but got to first base on a passed ball by Cleveland catcher Johnny Romano on the third strike. Roger Maris then singled to right field, sending Kubek to third. Indian right fielder Willie Kirkland attempted to throw Kubek out at third, but his throw was late and Indian third baseman, Bubba Phillips, threw back towards first to try to get Maris rounding the bag. His throw went into the stands and Kubek was waved home by the umpire and Maris sent to third.
It seems impossible to award Maris on RBI on this play, but Maris is credited with two RBI on the official sheet (he hit a solo home run in the seventh inning). In addition to the accounts in our two scoresheets, Ron checked several newspapers and came up with a total of nine independent sources that indicate Maris had only one RBI that day. Even the box scores from the AP and The Sporting News had Maris with only one RBI.
What makes Ron's discovery all the more interesting is that, according to existing official records, Maris led the AL with 142 RBI in 1961, one more than Jim Gentile of the Orioles. Therefore, making a change in Maris' total alters the RBI results from an individual lead by Maris to a tie with Jim Gentile. The SABR Records Committee has endorsed Ron's findings and we may see an official change as a result.
One final footnote: Ron also found an incorrectly attributed run scored in the Yankees records for 1961. The official records have Maris and Mantle tying with 132 scored, but Ron has good evidence that Maris should have one more, because the official records incorrectly gave a run to Bill Skowron instead of to Maris on September 10 (Cleveland at New York). Therefore, Maris should have the undisputed lead in runs scored for 1961.
Return to Table of Contents
One type of question I get fairly often from translators and inputters has to do with scoring rules that have changed or come into being over the years. I will explore three examples here in some detail, but there are others out there as well. Date references below were obtained from the third edition of Total Baseball.
Save - The save became an official category in 1969 and the requirements were modified in 1973 and 1975. There are four intervals to consider: Pre-1969, 1969-1973, 1974-1975, and 1976-1983. The most recent period is no problem, since the rule for that time is the same as the current (1995) one. For the two previous periods with their own, now defunct, save rules, the decision (with which Total Baseball agrees) has been to enforce the rule that was in place at the time. There are three options that I see for the pre-1969 era:
I have heard essentially no support for option 1, although it is probably the "purest" in the sense of simply recording things as they happened at the time. The save category is so ingrained in most of us that I believe the consensus is that we should be recording saves under some sort of definition. Total Baseball has chosen option 3, while Retrosheet has been following option 2, largely because I was under the mistaken impression until recently that Total Baseball was following option 2. It probably makes more sense in the long run for us to be consistent with Total Baseball (they are "official" after all), but I would like to hear what other Retrosheetians feel about this. As a point of information I note that it would not be terribly difficult to make adjustments, since we have relatively few pre-1969 games entered so far.
Sacrifice Fly - The sacrifice fly rule has been all over the place this century. Before 1908 there had to be one out and a runner had to score for a batter to credit for a sacrifice fly. From 1908 to 1925 a sacrifice fly could be awarded when there was either zero or one out, but there had to be a runner scoring. In 1920 the official scorer was instructed in the reporting of these events to make no distinction between sacrifice flies and sacrifice bunts. From 1926 to 1930 a sacrifice fly was awarded for any runner advance, not just when a runner scored. In 1931 sacrifice flies were abolished as an official category. In 1939 the rule was reinstated for one year with the pre-1926 definition (runner must score). In 1940 sacrifice flies were once again abolished. In 1954 sacrifice flies were resurrected once again with the requirement that they be reported separately from sacrifice bunts. There was minor fiddling with the rule in 1957, 1958, 1975, and 1984, but the basic criteria of what a sacrifice fly is and whether it should be reported separately have not changed since 1954.
What are we to do? I see this situation as quite different from the save, which was a new category in 1969 with no official antecedents. What I have done so far is to follow the on-again, off-again vagaries of the rule with one modification. This means that in the years when a fly ball was counted as a sacrifice we mark it in the event file as SF; when the sacrifice fly was not counted, then we just note the play as a fly ball. The modification I have made is that we are noting SF and SH separately, which means that for some years we have a category that the official totals don't. I consider this "extra" information to be a very good thing since it will allow analysis of old seasons in a way not previously possible, which is certainly a meaningful Retrosheet objective. For example, in 1927 Lou Gehrig had 21 sacrifice hits (according to the Spalding Guide) and I bet that most people would believe these were all fly balls. We have entered 60 or so games for the 1927 Yankees and have already discovered three sacrifice bunts for Gehrig, along with three for Lazzeri and Bob Meusel and one for Babe Ruth. The Babe's came in the first inning! Any comments on the sacrifice fly policy I have described here are welcome.
Caught Stealing - The modern definition of caught stealing includes plays started by the pitcher as pickoffs when the runner makes an attempt to advance. Although that isn't news to most people, what is a bit surprising is that this rule only went into effect in 1979. Prior to that year all the plays with a runner on first being out 136 or something similar should not be scored as CS. To be explicit with this example: Enter this play as PO1(136) before 1979 and as POCS2(136) starting in 1979. Although it would not be incorrect to mark it as CS2(136), I prefer that we capture both the pickoff and caught stealing if at all possible.
Comments on any of these three situations are welcomed, as are examples of other changes that have occurred over the years.
Return to Table of Contents
Since the last newsletter there have been seven new collections of scoresheets added to our totals.
Toronto: After several years of discussion with the Blue Jays, we finally had success last December, thanks to David Vincent. The Blue Jays had not only heard from us with our request, but also from Rich Hacker, their former third base coach and a personal friend of David's. The Blue Jays called David with a request for information from the SABR home run log and offered to allow copying of their scoresheets in return. The books were sent to David who copied them just before Christmas.
San Diego: Brigg Hewitt finished copying the books of the Padres in January.
Milwaukee: Thanks to the persuasion of Gary Gillette (and the change of PR directors after the strike began), we finally got permission to copy the Brewers scorebooks last fall. Frank Schetski borrowed the books from the team, made the copies, for which I reimbursed him, and sent them to me.
Los Angeles: We already had copies of all the scorebooks that the Dodgers possess, but David Stephan continued his excellent help for Retrosheet by putting me in touch will Bill Hunter. Bill, who works in the Dodgers ticket office, is the son of the late Bob Hunter, who covered the Dodgers for the Los Angeles Herald Examiner from their arrival in California. The scorebooks in the Dodgers offices have many gaps and Bob Hunter's books fill in most of them with very clear accounts. During the winter David Stephan went to Dodger Stadium, collected the scorebooks from Bill Hunter and photocopied them on the machine in the Dodgers Publicity Department. The Dodgers then shipped the copies to me.
Chicago: Stuart Shea has copied the 1972 and 1973 White Sox books and is planning to finish 1974-1980 this summer.
Pittsburgh: We had previously believed that the team only had accounts dating from 1982, with a badly damaged 1981 scorebook that they hadn't allowed to be copied. While we were at the convention Ed Luteran arranged for me to have an appointment with Jim Trdinich, the Pirates Public Relations director. David Vincent and Sherri Nichols went with me and Jim had trouble finding the 1981 book. We then began forcefully but politely prowling around his bookshelves, led by Sherri, who discovered stacks of manila folders containing scoresheets back to 1970! He graciously allowed us to carry out the stacks of paper (46 pounds according to Fed Ex when I returned them) and I brought them back to Delaware to copy.
Montreal: After many conversations and discussions with the Expos, we finally made copying arrangements with the team in June. One of their interns is doing the copying for $25 per season. The first set of four seasons (1980-1983) arrived here on July 5 and the rest should be completed this summer.
Counting the copies in hand as well as those whose copying is underway, we now have arrangements with all Major League teams except the Tigers. Many different efforts have been made to convince the Tigers to cooperate, the most recent involving the welcome support of Cliff Kachline, one of the most respected men in the history of baseball research. Cliff made a telephone call to the Tigers and followed it up with a personal letter to them which was highly complimentary of our efforts. Hopefully by the time of the next newsletter I will have success to report on the Bengal front.
The extraordinary efforts of Ron Rakowski in ferreting out game accounts deserve special recognition and praise. Ron began his investigation into the 1961 season before he and I ever met. It is through SABR that we learned of our mutual interest in play by play accounts and we have shared resources since early 1991. Ron is incredibly persistent (Luke Kraemer calls Ron "The Bulldog"). Ron has contacted several hundred people over the last five years and was the first to talk to several of the teams. Initially he was just asking them for 1961 data, but he opened the door for me to follow up and get their information for other seasons. His most special contribution, in my opinion, is the number of contacts he has made with sportswriters and announcers and their families in the case of deceased individuals. For example, he persuaded Dick Young's widow, Harold Rosenthal, Leonard Koppett to let us copy their material. He was also the first to contact Bob Stevens, Monte Moore, Gene Elston and several teams with whom I made the final arrangements. The early 1960s are a great gap for us, with the "dead spots" I mentioned above. Ron has collected several hundred accounts for those years from a number of writers in Cincinnati and Milwaukee. Along the way he has run into several roadblocks, but he remains enthusiastic and active. For example, while we were in Pittsburgh, he contacted the Pirates radio announcer and a local antique shop because he had a lead on some scoresheets. Ron has also contacted older SABR members such as Bob Littlejohn in Cincinnati and as a result we have copies of Bob's scorecards from Reds games in the late 1930s. I can think of no better expression of what our organization cares about than these efforts which preserve accounts that otherwise would certainly be lost.
There is one more aspect to the acquisition of game accounts and that is the newspaper accounts that were routinely printed earlier in this century. I mentioned these account in the first newsletter and indeed their existence is well known to most historical baseball researchers. As we near the end of what is available from teams, writers and announcers, it becomes increasingly important to collect as many of these newspaper accounts as possible. Since Retrosheet has volunteers all over the country, it should be possible for us to mount a coordinated effort to get this information from the microfilmed records that exist. We discussed this matter at the Board meeting and agreed that it would be very useful to have a systematic list of which papers carried accounts in which years. Therefore I ask readers to contact me with this information so I can compile it and thereby allow a more efficient attack. For example, the New York Evening World published these accounts from 1924 through 1930. I have obtained all seven years through interlibrary loan and copied them. Jim Weigand copied the Cleveland Plain Dealer accounts from 1917 through 1927; they began in 1917, but continued many years past 1927. Walter LeConte has collected about 100 games for the 1903, 1920, and 1921 Yankees from the New York Evening Telegram (wouldn't you love to see Babe Ruth's splits from 1920 and 1921?). Any offers of help in this area will be gratefully accepted.
Return to Table of Contents
Our annual Board meeting and report took place in Pittsburgh on Saturday, June 17 during the SABR convention, as is specified in our by-laws. There were four major items of business which we took care of before the annual report was made.
The annual report I presented had one item of old business followed by a summary of Retrosheet activities in the last year. The old business was the Brick for Retrosheet that was discussed in the first newsletter. I completed the order form with our request and sent it off to Arlington during the winter. As of the writing of this newsletter (early July, 1995), I still don't have a response from the team nor have they cashed my check. I will contact them soon to find out the status of our application. Thanks to Jim Wohlenhaus, John Rickert and Neal Traven for their contributions to the Brick Fund.
The remainder of my report was an update on our progress in game processing and a summary of publicity we have received. The publicity details are presented in the next section since some of the points there were not mentioned in Pittsburgh. Game processing information is contained in the following table, in which the column headings have these meanings:
Year is the season.
Play is the number of games played that season.
Trans is the number of games in the hands of translators.
Input is the number of games in the hand of inputters (a game will only be counted once in either the Translator or Inputter column).
Done is the number of games which have been entered into the computer.
Year Play Trans Input Done Year Play Trans Input Done Year Play Trans Input Done 1983 2109 0 0 2098 1967 1620 0 31 881 1951 1242 0 0 41 1982 2107 0 43 2001 1966 1615 0 0 245 1950 1239 0 0 155 1981 1394 140 319 319 1965 1623 0 5 218 1949 1241 0 0 314 1980 2105 27 595 620 1964 1627 0 350 1204 1948 1237 0 19 156 1979 2099 114 11 160 1963 1619 0 71 624 1947 1243 0 0 155 1978 2102 0 0 162 1962 1621 0 20 387 1946 1242 0 0 24 1977 2103 0 0 314 1961 1430 0 0 1430 1930 1226 0 0 360 1976 1939 0 0 262 1960 1237 0 0 231 1929 1229 0 0 417 1975 1934 11 20 339 1959 1241 0 0 317 1928 1231 0 0 88 1974 1945 0 155 76 1958 1238 0 0 229 1927 1236 0 0 59 1973 1943 0 55 88 1957 1238 0 0 72 1926 1234 0 0 1 1972 1860 0 0 178 1956 1242 0 0 16 1925 1228 0 0 24 1971 1938 0 0 63 1955 1240 0 0 164 1924 1231 0 0 1 1970 1943 0 10 73 1954 1240 0 0 5 1920 1233 0 0 66 1969 1946 0 12 338 1953 1242 0 0 155 1917 1249 0 20 72 1968 1594 0 31 233 1952 1240 0 0 172 1911 1238 0 0 97 Totals: 292 1767 15704
Earlier this year Ron Fisher asked me how many games were being completed each week and I estimated that it was about 100, although I wasn't keeping count as the games came in. He pointed out that at that rate, some 5200 per year, we will use up our supply of game accounts in about 10 years! So I began calculating the "Fisher index" which is the total number received each week. There has been only one week in which our total fell below 100 (it was 86 in the week before the SABR convention in Pittsburgh). If you compare the above total (15704 completed) with the one I reported in October (9607), you will find that we have averaged about 172 games per week since October! On a per day basis it is about 24.6 per day. That means that somewhere in the world one game is being entered into the computer for Retrosheet every hour of every day, a scary thought. Our biggest single week was that of the convention when several people brought disks with them to give to me; 691 were received in that 7 day period. There was a period in May when we got 527 and 438 in consecutive weeks. I hope everyone is as impressed with the amount of this community effort as I am; there is a great deal to be proud of.
By the way I still do not have a complete log of all the scoresheets we possess. Part of the reason I'm not done yet is that we keep getting more (about 3000 from Pittsburgh and Montreal in the last two weeks). Nonetheless, I am making progress in this regard and I expect that the next newsletter will present our holdings in a different way from the table above. In addition to showing how many games we've processed, it will be possible to see how many are in hand and how many are still missing.
The final point from the convention has to do with the award I received from SABR. The first Special Achievement Award, which was just instituted this winter, was given to me for establishing Retrosheet. This type of recognition from one's peers is very special and I am extremely grateful to the SABR Board of Directors for the honor. However, I am uncomfortable with the award being given solely to me and not to the organization as a whole. In my acceptance remarks at the banquet I pointed out that there couldn't be a better example of an entire group deserving the award and that I was accepting the award on behalf of the dozens of volunteers who have donated thousands of hours of their time.
Return to Table of Contents
The Hot Corner - Most of you have never heard of this publication, but it is an incisive, pithy product of Gord Fitzgerald of Toronto. Gord hawks The Hot Corner as an alternative scorecard outside Skydome and has promoted us more than once. If you ever attend a game at Skydome, seek him out; he does very good work.
Baseball Weekly - Last October Paul White used his column in Baseball Weekly to write about Retrosheet and endorse our work. He used several items from the first newsletter to spice things up. If you haven't seen that piece and would like a copy, let me know and I'll send you one.
The Scouting Report - Gary Gillette of the Baseball Workshop edited this book this year (Stuart Shea, Pete Palmer and I were Associate Editors as well). He mentioned Retrosheet in the acknowledgments as he had done in previous years when he published the Great American Baseball Stat Book.
Rotisserie Baseball Annual - John Benson, the editor of this widely read work, was kind enough to print an entire page which I gave him describing the organization and our purpose. He is also editing a new work that should appear this summer on the greatest 100 player-seasons ever. I wrote two of the essays (1965 Sandy Koufax and 1962 Maury Wills) and John is planning to promote Retrosheet again.
Baseball Quarterly Reviews - Herm Krabbenhoft contacted me over the winter to get some information on a few players who ended their careers just below the .300 batting average level. He was favorably impressed with Retrosheet's goals and has printed promotional announcements about our organization ever since.
Cleveland Indians - In the first newsletter I mentioned the work I did to help the Twins when Dave Winfield got his 3000th hit. I did the same thing in more detail for the Indians with Eddie Murray's hits. The Indians were very grateful and have been using the information I supplied in several ways. A recent issue of Baseball Weekly had a feature on Eddie's 3000th hit and the fine print acknowledging sources lists Retrosheet first, followed by the Hall of Fame! I liked the look of that.
Cliff Kachline - On his way to the SABR meeting, Cliff Kachline stopped in Cleveland to see the new stadium. The Indians Vice-President for Public Relations, Bob DiBiasio, showed the Eddie Murray work to Cliff and commented on how happy he was with it. When Cliff got to Pittsburgh he asked about me and was very generous in his praise. At the banquet Cliff won the annual Bob Davids award, the highest SABR award there is. In his acceptance remarks, he managed to find a way to work in a reference to me and to Retrosheet. It is hard to describe the pride I felt at recognition from someone of Cliff's stature. In addition, as I mentioned above, Cliff has contacted the Tigers on our behalf as we seek their scorebooks. Retrosheet is very lucky to have Cliff as a friend.
Delaware Today - This glossy tourist-oriented magazine is not likely to be on the reading list of anyone else in Retrosheet except me. However, they printed a half page article about Retrosheet (with my picture, which was better than the one in Baseball America last year).
Daily News of New York - Here I refer to a future publication, but one which is planned to appear in the very near future. The Daily News has decided to publish a book commemorating the 40th anniversary of the Brooklyn Dodgers only World Series winner. Their sports editor, Kevin Whitmer, called the Dodgers to get some statistics for the book and found that they have almost nothing. However, they did refer him to me, which is another good indication that we are starting to have some credibility within Major League circles. It turns out that I had already entered all of the games of the 1955 Dodgers from the Allan Roth scoresheets and remember that these games are pitch by pitch. I processed these event files with every program I could think of and came up with a huge stack of numbers for them. I have seen the proofs of the book and they are going to print either 48 or 56 pages with the statistics I sent (there will also be about 50 pages of pictures and stories). The statistics section opens with a full page about Retrosheet and has my picture on it (they sent their own photographer from Washington). Luke Kraemer expressed his hope that we won't scare too many small children away. I also worry about the nation's youth, and I really am not becoming vain, but for a diehard Dodger lover like me it was not possible to pass up a chance to have my picture printed in a book about the 1955 team. They anticipate a press run of 100,000 copies for the book, which is planned to sell for $4.95 on newsstands in New York, Los Angeles, and Florida. They have promised me a number of free copies and I will send them to as many people as I can. Let me know if you are interested in having one.
Spike Lee - I'm not sure that "publicity" is the correct section for this item, since there has not yet been any public notice of our involvement, but I couldn't think of any other place to put it. As many of you know, Spike Lee has joined SABR and is working on a movie about Jackie Robinson, which is timed for release in 1997, the 50th anniversary of Jackie's debut. Norman Macht of SABR told Spike about Retrosheet and Spike called me last November to find out what kind of information we had about Jackie. We talked for a few minutes and I told him the basics, namely that we have every play of Robinson's career, thanks to the Allan Roth scoresheets along with those of Dick Young and Harold Rosenthal that Ron Rakowski arranged to get copied. Spike's staff has interviewed about 200 old ballplayers to get background information for the film. They then contacted me again with questions about specific incidents that the interviews revealed, such as the famous (notorious?) play on which Enos Slaughter spiked Robinson in 1947. It is apparent that the movie will not just be a baseball film which is reasonable indeed, since Robinson's arrival in the Major Leagues had significance well beyond the fact that he was a very talented player. However, Spike is very concerned that the baseball scenes he shows are accurate and he is asking us to supply details where we can. Again, as a zealous Dodger and Robinson fan, I find this opportunity impossible to resist. I will do all I can to get prominent mention for Retrosheet.
Return to Table of Contents
This attitude was evident in Pittsburgh during our discussions of the data distribution policy. The discussion was very open and there were different opinions presented, but there was no question that everyone was of a like mind in furthering the objectives of the organization.
One of the big improvements that has occurred in the last six months is that I now have a computerized log of all games played in this century. This log not only allows us to identify the games that we still don't have, but it also is an excellent vehicle for me to keep track of the status of each game, how many scoresheets we have for it, who's working on it, etc. This was all made possible because of the generous donations of Bob Tiemann, Arnie Braunstein and Leo Leahy. These three spent a huge number of hours compiling this information and donated it to Retrosheet so that I could use it for managing the games as I have described. Many thanks to the three of you.
It is always a bit risky to single out a few volunteers who make special contributions, since the work of each person in the organization is truly valued. However, there are six whom I wish to recognize here. We have many more very active volunteers than these six, of course, and I hope that none of the others feels slighted by being left off this list. David Vincent has input hundreds of games in the last ten months, and has also been an invaluable consultant and advisor on many features of Retrosheet's operation (if you can imagine the word "nag" with a positive connotation, then you have the idea - would "conscience" be better?). Luke Kraemer has input well over 2000 games with exquisite attention to detail; his books on the 1967 season remain the only substantive products to come from Retrosheet data. Alan Boodman has translated and input over 2000 games in just over a year, mastering every scoring system along the way. Dave Lamoureaux has provided steady, reliable work with a good cheer and enthusiasm that makes reading my Email a delight. Clem Comly has processed well over 1500 games and even does personal pickup and delivery at my house! The indomitable (inscrutable?) Ron Fisher is also in the 2000 game range during the last two and a half years and he is a faithful reporter of humorous game events (especially from the newspaper accounts). Just to finish off the numerical aspect as I did in the Pittsburgh meeting, I have entered about 3000 games so far, but my rate has dropped sharply in the last year due to the requirements of administering the day to day operations of our organization. I am sure that I will be passed by two or three of the crowd above in the next few months.
Return to Table of Contents
However, there are a few parts of our system which have subtle syntax requirements that I would like to point out in the hope of getting even greater uniformity in the event files which are produced. Most of these can be seen as style preferences, but a few will actually cause errors in the way the files are processed. Even the style examples are important, since I think it is highly desirable for the event files we release to be as consistent as possible. There are seven specific examples I wish to address:
Home runs - The concern here is to make sure we mark inside the park home runs correctly. When a scoresheet indicates, for example, a home run to left field, it is reasonable at first glance to enter it as HR7. However, following the rules of our system, that notation would literally mean home run fielded by the left fielder, which would only be the case for an inside the park home run. For home runs hit over the left field fence, the entry should be HR/7 which means home run fielded by no one (no number after the HR), hit to zone 7. The only problem will arise when we have an inside the park home run without having a fielder indicated, a situation which I have indeed encountered. My solution has been to choose a fielder arbitrarily (the 7 for right-handed batters and the 9 for left-handers) and enter HR7 or HR9 followed by a comment that I have invented the fielder.
Fielder's Choices and force plays - Almost everyone has done very nicely with following the distinction between these two events as I described them in the inputting instructions. The only difficulties I have seen is that sometimes I get back files in which a play was made on a runner other than the batter, but neither FC nor FO was indicated. For example, 64(1) or 52(3).B-1 which are both incomplete. For these two cases, my preferences would be: 64(1)/FO and FC5.3XH(52);B-1
Errors on which a fielder should get an assist - Under the interesting notions of fairness that permeate the official baseball rules, there are plays in which assists can be awarded even though no out was recorded. The standard example would be a ground ball to the second baseman who threw to the first baseman who dropped the ball. Under the logic that the second baseman had "done his job", he gets an assist and the first baseman gets an error. We enter this play simply as 4E3 and there is no problem. Confusion has occurred when the play in question is not on the batter, although I attempted to cover it in Subsection 5.3 on page 15 of the instructions. Example: runner on 1st and the batter hits a single to right field. The runner on 1st tries for 3rd and the throw beats him, but the third baseman drops the ball for an error. The temptation is to enter the play S9.1-3(9E5) which will be rejected by DWENTRY. The proper notation is S9.1X3(9E5) which is less intuitive. The logic is that the software is designed only to award assists if the play appears to retire someone, hence the 1X3 part. Then a final check is done to see if there were an error that actually allowed him to be safe. Although the reverse logic would work, it would be much less efficient to check every occurrence of 1-3 for the possibility of a dropped throw, since this will be the unusual occurrence. A parallel use of this notation would be a dropped throw on a steal attempt, which is charged as a caught stealing and an error: CS2(2E4) would be an example.
Explicit flagging of double plays - This is usually only a problem on the strike out, caught stealing double play: K+CS2(26)/DP where the /DP must be added to cause proper crediting of the double play. Note also that not all double play on ground balls should be marked as /GDP since that is an official category that requires each runner who is retired to be out at a base to which he was forced to run. Example of a double play that is not a GDP: Runner on 1st and ground ball hit to the third baseman who throws the batter out at 1st. However, the runner from 1st tries to make it to 3rd and the first baseman throws him out there. Our entry would be: 53/DP.1X3(53) which flags the DP, but does not ding the batter for a GDP.
The # sign - DWENTRY allows the # sign to be used at the end of any play for which there is an uncertainty, either of an advance, or the timing of a steal, or the identity of a fielder, or some other things. I prefer for inputters to use this key frequently whenever they aren't sure, and to follow the play with a comment explaining what the uncertainty is. In this way things are flagged so that it will be much easier to find them during the proofing stage.
Comments after the play - There is no inherent logic to putting the comment before the play to which it refers or after, but it is good practice to be consistent. The 11 years of data files held by the Baseball Workshop all follow the practice of putting the comments after the play. Therefore, I request that all of our inputters do the same.
Ground ball with outfield location - DWENTRY will complain if you enter a play as S/G8 since it would mean Single fielded by an unknown fielder (no number after the S) on a ground ball which went through the infield in zone 8. Since zone 8 is in the outfield, this wouldn't make sense. This play should be entered as S8/G showing that the center fielder played it and it was a ground ball through the infield. Problems arise when a scoresheet marks the single as 78G or something like that. We can't enter S78/G so we are forced to enter S/G followed by a comment that the hit was fielded in left-center.
Return to Table of Contents
1964, both leagues - This was the closest season of two pennant races in history. Luke Kraemer is the main operator here with assistance on the Yankees from Gary Robbins and Dave Lamoureaux, the Phillies, with work by Clem Comly and me, the Dodgers, which I entered, and the Mets, primarily input by Mark Dobrow.
1960 - Ron Rakowski is leading the work here to provide comparison data for his 1961 work. Ron Fisher has done the 1960 Yankees and I am working on the 1960 Dodgers.
Jackie Robinson's career - Inspired at least in part by the Spike Lee contact, this project has had a high priority. Gord Gladman has done 1947, 1948, 1950, and is working on 1951. Alan Boodman did 1949 and 1952, Ron Fisher 1953 and I entered 1955. Only 1954 and 1956 remain relatively untouched from his career.
Willie Mays career - Alan Boodman, who did two of the Robinson seasons as well as almost all of the 1982 National League, has been pursuing the career of Willie Mays in San Francisco. Unfortunately our holdings for his New York Giant days are virtually non-existent (except when they played the Dodgers).
1963 - Clem Comly, who also did almost all of the 1980 NL, has been working his way through the 1963 NL, except for the Giants that Alan Boodman entered and the Cardinals, which Jim Leopardi has been working on. Yes, Clem is even entering the Dodger games from that year! Dave Lamoureaux is following the same pattern with the 1963 AL, with many of the White Sox home games being entered by Ron Rakowski.
1975 - David Vincent entered the 1975 (and 1977) Twins games for Larry Hisle, who had good seasons those two years and was interested in detailed breakdowns. Larry is working as a coach in Toronto and David met Larry while visiting his friend Rich Hacker. Greg Beston completed the 1975 Red Sox and is about one third of the way through the 1975 Indians. 1959, 1928, 1929, 1930 - Among other projects Ron Fisher has worked on these seasons. The 1928-1930 games are all from the newspaper accounts I photocopied from the New York Evening World microfilm. Ron has even discovered pitch by pitch accounts in some of the games from Chicago in that era.
1948 Indians - Bob Stieglitz recently completed the 1949 Indians, which is a team he remembers from his childhood. He is now working on the 1948 team, which piqued the interest of the Indians when I told them about it.
World Series - Since October John Booth has been at the South Pole, where he works as a scientist. It has been dark there since March and John manages to find time to enter World Series games from the Neft and Cohen book and send them back to me via the Internet (they have a satellite link). As I write he has completed all the series back to 1913. He has newspaper accounts of the 1926 season from the New York Evening World and may turn to them when he finishes the Series games.
1981 Mariners - Jim Herdman has nearly completed the Mariners in this strike-interrupted season and plans to turn his attention to the 1981 Twins after that.
1969 Pilots - Christopher Chestnut is working on 1981 and 1982 while chipping away at the Pilots. We have all of their games except, you guess it, those against the Tigers.
1917 - Bruce Borey and Chuck Voas are making remarkable progress through the games of the 1917 Indians, with the accounts coming from the Cleveland Plain Dealer. This appears to be the earliest season that the Plain Dealer had game accounts and they are occasionally incomplete. What is most impressive is that Chuck has developed a wonderfully systematic technique for deducing the plays of a game from the box score and the newspaper stories. He can usually identify all the hits, walks and strike outs and deal with a large majority of the generic outs, certainly enough to create event files that will be quite usable for analysis.
1957 Cardinals - Jim Wohlenhaus has spent a lot of time on this team, and was into May before the start of the 1995 season slowed him down a bit.
1962 Mets - Jesse Seegmiller recently completed a multi-year teaching job at the University of Singapore, where he learned about Retrosheet from the articles that appeared in Baseball America and Strat-Fan. He has completed their games through May 22 and has found some amazing things, as you might imagine (see Odd and Cute below). Jesse has been working from the scoresheets we copied from Harold Rosenthal and Dick Young; he is returning to the US this summer, to Utah to be exact. How's THAT for culture shock?
Rich Hacker career - Rich only played in 16 games, all in 1971 for the Expos. As noted, Rich is a personal friend of David Vincent's and David entered Rich's games to prepare a nice summary for Rich to have. In addition, all of us computer types would just love to have "Hacker" as a name!
1973 Mets and Phillies - Chris Long (Phillies) and Scott Fischthal (Mets) are two recent Retrosheet volunteers, but they have each started on projects involving 1973. Scott wants to complete the pennant-winning Mets and Chris hopes to work through Mike Schmidt's career.
Nolan Ryan - In the first edition I requested help with the Robinson games and Nolan Ryan games, hoping to have them all entered before he enters the Hall of Fame (please don't assault me for this comment. I am simply noting what I predict will occur; I am not endorsing or belittling his qualifications). We got about half of Nolan's career done, but then several of those volunteers working on Nolan began doing other projects, which is certainly understandable. Is anyone else interested in continuing with Ryan's career?
George Brett - David Vincent travels a lot for his job, and spends many hours in hotel rooms inputting games. Starting in May he began work on Brett's career by entering all Kansas City games from George's debut (August of 1973). As with Ryan, it would be great to have all of Brett's games entered before his likely induction so that we can possibly get some publicity for Retrosheet out of it.
These focused projects are really great, because they represent logical and attractive packages. Since all the games need to be entered eventually, I don't see any reason that we shouldn't work first on those that have special interest associated with them. I can envision other "sets", such as all no-hitters that we have (there are dozens in our files), all pennant-winners, all likely Hall of Fame inductees in the near future, etc. Speaking of the Hall of Fame, in the first edition of the newsletter, I said that we had the complete play by play records (on paper for all but one) of 28 Hall of Famers. That total was too optimistic; in fact we have the full careers for 23 of the enshrined. They are:
Luis Aparicio Don Drysdale Harmon Killebrew Brooks Robinson Johnny Bench Rollie Fingers Sandy Koufax Mike Schmidt Lou Brock Bob Gibson Juan Marichal Tom Seaver Roy Campanella Catfish Hunter Willie McCovey Duke Snider Rod Carew Reggie Jackson Jim Palmer Billy Williams Steve Carlton Ferguson Jenkins Gaylord Perry
Return to Table of Contents
On December 18, 1994 Retrosheet reached what was for me a very important milestone when I finished the computer entry of the last of Sandy Koufax's 397 games. He is the first Hall of Famer whose career we have entered and he was always my personal favorite, so it meant a lot to me.
In the first newsletter I referred to our dog, Merlin, in the intro to Clem Comly's parody. What I forgot to point out is how perfect the name is. Recall that in the Arthurian legends, Merlin the Magician knew the future because he lived by going backwards in time. As he put it, he "youthened." What name could be better than Merlin for the Retrodog?! Amy(my wife wants me to assure you that Retrosheet was not the inspiration for her choice of his name (he really is more attached to her than to me).
In April Bob Stieglitz entered a wonderful comment at the end of the event file for a 1982 game between California and Chicago for which we had two particularly difficult scoresheets as sources, "I WANT OVERTIME FOR SCORING THIS MESS!" I left the comment in the file and responded to Bob, "I hereby double your pay."
The accomplishments of Luke Kraemer and Ron Rakowski in doing whole seasons single-handedly are very impressive. Without diminishing their work, I am also impressed with Retrosheet volunteers collectively for getting the 1983 and 1982 seasons so nearly done. There have been over 40 different translators and 30 different inputters who did at least one game in those two seasons. I was expressing my admiration for all of this to Ron Fisher on the phone one night and he something like "If you start singing 'I want to teach the world to sing', I'll have to drive to Delaware and kill you." You don't think he'd really waste all that time and gas driving from Michigan just to terminate me for being sentimental, do you?
Luke Kraemer wants all of you to know that he has come into possession
of a number of Baseball Guides, I believe from the 1950s. If you
are interested in them, contact him at:
P.O. Box 1544
Beaverton, OR 97075-1544
Until then, I close with a phrase I stole from a recent Email message that Gary Robbins sent me: "Retroly Yours".
Return to Table of Contents
Page Updated: 9/6/96Copyrighted: Retrosheet, 1996