The event files contain game descriptions using the Retrosheet scoring system. This page will describe the scoring system in sufficient detail to allow working with these full play-by-play descriptions.
While not part of the scoring system, the files containing the play-by-play data follow a naming convention. Current downloads contain a single file for each team-season containing all of the team's games for the season and having names of the form ALLTTTYY.EVX. In this format, YY is the last two digits of the year and TTT is a three character team code. The zip archive downloaded contains a file named TEAMYYYY that contains the team codes and team names in the particular season. Each file contains the home and away games in chronological order for the specified team. (Note: previously files had names such as YYTTT.EVA and YYTTT.EVN, depending on league, for teams' home games and YYTTTRD.EVA and YYTTTRD.EVN for teams' road games.
Files are ASCII text files consisting of a series of records. Each record is a single line starting with a type designator and ending with the DOS new line sequence (newline, carriage return characters).
For each game as many as eleven different record types may be used. Each record type has a unique designator, which is followed by several fields separated by commas. These are discussed in detail below.
The record type is not considered to be a field and starts in column 1. Following the record type are the record fields which are separated from the record type and each other by commas ' , '.
Field data such as names are normally enclosed in double quotes ' " '. Commas used in quoted fields are not field separators.
Retrosheet player id. All players are represented by a code that is unique for each player. This 8 character code is constructed from the first four letters of the player's last name, the first initial of his common name, and a three digit number. If a player's last name is less than 4 characters long a dash "-" is used as a placeholder. Numbers starting with 0 are used for players appearing in games in or after 1983. Players completing their careers before 1983 are assigned numbers starting with 100.
joner002 is the Retrosheet player id for Ruppert Jones.
id Each game begins with a twelve character ID record which identifies the date, home team, and number of the game. For example, ATL198304080 should be read as follows. The first three characters identify the home team (the Braves). The next four are the year (1983). The next two are the month (April) using the standard numeric notation, 04, followed by the day (08). The last digit indicates if this is a single game (0), first game (1) or second game (2) if more than one game is played during a day, usually a double header The id record starts the description of a game thus ending the description of the preceding game in the file.
version The version record is next, but is obsolete and can be ignored.
info There are up to 34 info records, each of which contains a single piece of information, such as the temperature, attendance, identity of each umpire, etc. The record format is info,type,data . The complete list of info record types is given below.
start and sub There are 18 (for the NL and pre-DH AL) or 20 (for the AL with the DH) start records, which identify the starting lineups for the game. Each start or sub record has five fields. The sub records are used when a player is replaced during a game. The roster files that accompany the event files include throwing and batting handedness information.
1. The first field is the Retrosheet player id, which is unique for each player.
2. The second field is the player's name.
3. The next field is either 0 (for visiting team), or 1 (for home team).
4. The next field is the position in the batting order, 1 - 9. When a game is played using the DH rule the pitcher is given the batting order position 0.
5. The last field is the fielding position. The numbers are in the standard notation, with designated hitters being identified as position 10. On sub records 11 indicates a pinch hitter and 12 is used for a pinch runner.
play The play records contain the events of the game. Each play record has 7 fields.
1. The first field is the inning, an integer starting at 1.
2. The second field is either 0 (for visiting team) or 1 (for home team).
3. The third field is the Retrosheet player id of the player at the plate.
4. The fourth field is the count on the batter when this particular event (play) occurred. Most Retrosheet games do not have this information, and in such cases, "??" appears in this field.
5. The fifth field is of variable length and contains all pitches to this batter in this plate appearance and is described below. If pitches are unknown, this field is left empty, nothing is between the commas.
6. The sixth field describes the play or event that occurred.
A play record ending in a number sign, #, indicates that there is some uncertainty in the play. Occasionally, a com record may follow providing additional information. A play record may also contain exclamation points, "!" indicating an exceptional play and question marks "?" indicating some uncertainty in the play. These characters can be safely ignored.
com,"Not sure if PB, may have been balk"
The event is the most complex of all the fields and is described in detail below.
badj This record is used to mark a plate appearance in which the batter bats from the side that is not expected ("badj" means "batting adjustment"). The syntax is:
The expectation is defined by the roster file. There are two general cases in which this is used:
1. Many switch-hitters bat right-handed against right-handed knuckle ball pitchers even though the default assumption is that they would be batting left-handed.
badj,bonib001,Rindicates that switch-hitter Bobby Bonilla was batting right-handed against a right-handed pitcher.
2. Occasionally a player will be listed in a roster as batting "R" or "L" but will bat the other way. For example, Rick Dempsey did this 13 times in 1983. The syntax this is: badj,dempr101,L
padj This record covers the very rare case in which a pitcher pitches to a batter with the hand opposite the one listed in the roster file. To date this has only happened once, when Greg Harris of the Expos, a right-hander, pitched left-handed to two Cincinnati batters on 9-28-1995. The syntax is parallel to that for the badj record: padj,harrg001,L
ladj This record is used when teams bat out of order.
data Data records appear after all play records from the game. At present, the only data type, field 2, that is defined specifies the number of earned runs allowed by a pitcher. Each such record contains the pitcher's Retrosheet player id and the number of earned runs he allowed. There is a data record for each pitcher that appeared in the game.
com The final record type is used primarily to add explanatory information for a play. However, it may occur anywhere in a file. The second field of the com record is quoted.
com,"ML debut for Behenna"
There is a standard record ordering for each game. An id record starts the description of a particular game. This is followed by the version and info records. The start records follow the info records. The game is described by a series of play, sub and com records. A sub record is always preceded by a play np record. data records follow the last play record for the game. A game description is terminated by an id record starting another game or the end of the file.
Complete records are shown. info records are of two general kinds, game-related and administrative. The order of these records, which appear after the game id, may not be in the order shown below. Game-related info records are:
The home and visiting teams are specified by their Retrosheet team codes.
The date is given in conventional yyyy/mm/dd style:
The number record indicates if this is a single game (0), first game (1) or second game (2) if more than one game is played during a day, usually this is a double header:
The hometeam, date and number records duplicate the information in the id record.
Game starting time is given by the two records (0:00 and unknown indicate missing information):
Use of the designated hitter is indicated with true or false:
The presence or absence of pitch information is given. For some games, the bal-strike counts of the plays are shown, but no pitch detail is provided. (pitches, count or none):
Each umpire and his position on the field are
indicated individually by his Retrosheet ID. For games where umpires
are stationed in the outfield, umplf
and umprf are used. Retrosheet has umpire assignments for all games
in history, except some games in 1979 in which replacement umpires were
Various field conditions are given:
Values used for fieldcond are: dry, soaked, wet, unknown;
for precip: drizzle, none, rain, showers, snow, unknown;
for sky: cloudy, dome, night, overcast, sunny, uknown;
for winddir: fromcf, fromlf, fromrf, ltor, rtol, tocf, tolf, torf, unknown.
Temp(erature) is in degrees Fahrenheit with 0 being the not known value.
An unknown windspeed is indicated by -1.
The BGAME.EXE program outputs these fields using numeric codes:
FieldCond: 0 Unknown, 1 Soaked, 2 Wet, 3 Damp, 4 Dry
Precip: 0 Unknown, 1 None, 2 Drizzle, 3 Showers, 4 Rain, 5 Snow
Sky: 0 Unknown, 1 Sunny, 2 Cloudy, 3 Overcast, 4 Night, 5 Dome
WindDir: 0 Unknown, 1 ToLeft, 2 ToCenter, 3 ToRight, 4 LeftToRight, 5 FromLeft, 6 FromCenter, 7 FromRight, 8 RightToLeft
WindSpeed: 0 Unknown, 1 Known, other value is the wind speed
The length of the game in minutes and the attendance (0 used if these are not known) are given:
The game site is provided. The site symbols are defined in the file parkcode.txt:
Pitcher win, loss and save data are given as info records. The Retrosheet player id is used for identification. If no save is credited, the player id field is empty.
When it was used as an official statistic, game winning RBI credit is given:
If this information is unknown or a gwrbi was not credited, the data field is left empty.
info records that pertain to how the game account was obtained and processed (administrative data) are:
info,inputprogvers,"version 7RS(19) of 07/07/92"
synopsis: play,inning,home/visitor,player id,count,pitches,event
The fifth field, pitches, is a string of variable length and contains all pitches to this batter in this plate appearance. Most Retrosheet games do not have pitch data and consequently this field is blank for such games.
+ following pickoff throw by the catcher * indicates the following pitch was blocked by the catcher . marker for play not involving the batter 1 pickoff throw to first 2 pickoff throw to second 3 pickoff throw to third > Indicates a runner going on the pitch
B ball C called strike F foul H hit batter I intentional ball K strike (unknown type) L foul bunt M missed bunt attempt N no pitch (on balks and interference calls) O foul tip on bunt P pitchout Q swinging on pitchout R foul ball on pitchout S swinging strike T foul tip U unknown or missed pitch V called ball because pitcher went to his mouth X ball put into play by batter Y ball put into play on pitchout
The sixth field, event, describes the play which occurred. This field is variable in length and has three main portions which define the Retrosheet scoring system.
The first part of an event is a description of the basic play.
The second part is a modifier for the first part and is separated from it with a forward slash, "/". In fact, there may be more than one modifier. A typical use of modifiers is to specify hit locations. For example, "D8/78" indicates a double fielded by the center fielder on a ball hit to left center. A complete list of modifiers excepting hit locations is given below. When more than one modifier is used, each is introduced by a "/".
The third part describes the advance of any runners, separated from the earlier parts by a period. A successful advance is indicated by a dash, "-". An out made while advancing is indicated by an X. 2-3 indicates a runner has advanced from second to third on the play. 1X2 indicates the runner was out at second advancing from first. If a base runner is not listed as advancing he remains on the base he was on. In some cases lack of advance is indicated explicitly by an advance starting and ending on the same base such as 3-3 . When put outs are made on base runners the advance field indicates fielding data and errors if they occur. See below for a complete description for advances. Note that any advances after the first are separated by semicolons.
For example, the event "S9/L9S.2-H;1-3" should be read as: single fielded by the right fielder, line drive to short right field. The runner on 2nd scored (advanced to home), and the runner on first advanced to third.
Many event descriptions require information in the form of numbers. The meaning of a particular number depends on where it appears in the event. For the descriptions that follow the following notation will be used:
Fielders will be represented by a number in the range 1 (pitcher) to 9 (right fielder) using a dollar sign, "$". When two $ symbols are used, $$, this is understood to mean a sequence of two or more fielders.
Bases are represented by a percent sign, "%", representing one of five characters, 1, 2 and 3 for first through third; B or H for home. B is used when a batter advance must be explicitly given. Scoring is indicated by an advance that reaches home, H.
Many examples of plays scored using the Retrosheet system will be given in this document. For some interesting and extreme cases check the Retrosheet strange and unusual plays listing.
The example plays have been chosen to illustrate how events are coded. Some of these events are exceedingly rare.
There is occasionally more than one event for each plate appearance, such as stolen bases, wild pitches, and balks in which the same batter remains at the plate. On these occasions the pitch sequence is interrupted by a period, and there is another play record for the resumption of the batter's plate appearance.
For purposes of description, it is convenient to separate the event types into two categories: those involving the batter at the plate and base running plays that do not involve the batter.
$ A single fielder represents a fly ball out made by the specified fielder. Modifiers can be added to indicate the fly ball trajectory: G for ground ball, L for line drive, P for pop up, F for a fly ball BG for bunt grounder, BP for bunt pop up. The ball trajectory code may be followed by a hit location code.
indicates a fly ball caught by the center fielder in left center field.
A sacrifice fly is indicated by the modifier SF following a fly out play. The runner scoring because of the sacrifice is coded in the advance part of the play.
In the case that a fielder makes an unassisted out on a ground ball a modifier G follows the event.
indicates an unassisted out made by the first baseman with the runner on second advancing to third.
$$ Strings of two or more fielders as an event specify a ground out where the put out is credited by the last fielder in the string. Other fielders are credited with assists.
indicates a ground ball out at first on a ball fielded by the shortstop.
More than one player can touch the ball before an out is made. In this case, the pitcher has deflected the ball before the second baseman threw to first base.
If the putout is made at a base not normally covered by the fielder the base runner, batter in this example, is given explicitly.
Force outs are indicated by adding the FO modifier and indicating the base runner forced.
The runner on first is forced at second by a throw from the third baseman. The runner on third scores and the batter is safe at first. The explicit advance indicated for the batter is optional. A second modifier is used to indicate the batted ball trajectory and location.
With the addition of a SH modifier this form is used to indicate sacrifice hits or bunts that advance a runner.
$(%)$ $$(%)$ Events of this form are used to code grounded into double plays.
indicates a grounded into double play. The parenthesized 1 indicates the base runner on first was the initial out on the play. The GDP modifier is followed by a another / and a hit type and location.
An unassisted ground ball out by the second baseman starts this double play.
$(B)$(%) followed by the modifier LDP is used to indicate a lined into double play.
indicates a fly ball out to the center fielder with the runner on second doubled up.
indicates an unassisted double play by the first baseman who fielded the line drive and caught the runner off first base.
The double play notation can be extended in obvious ways to describe triple plays.
double digit combination 99, which
cannot arise in play, is used to code unknown plays including forms
that otherwise describe force outs and the double plays. Additional
fielders in the double play are assigned 9. No assist or putout
credits are given.
C/E2 codes catcher interference. Implicitly, the batter is awarded first unless overridden by an advance indicating otherwise. A redundant B-1 is allowed.
C/E1 or C/E3 are used when the pitcher or first baseman are called for interfering with the batter putting him on first without being charged with an at bat. In these cases C is interpreted as interference by the fielder specified following the E, not the catcher.
A hit (excepting a home run) is indicated by one of S, D and T optionally followed by the fielder, $, initially handling the ball. If more than one fielder handles the ball the appropriate sequence of fielders is given. The fielder designation is omitted if that information is not known. The batter advance to the designated base is implicit.
is a minimal coding of a single showing that the left fielder first handled the ball. The ?? in the count field indicates the count at the time of the hit is unknown.
codes a bases loaded double fielded by the left fielder, a modifier showing the hit location code and advances for each of the base runners.
describes a triple to right field, a hit location and a runner on second scoring.
DGR is the code for a ground rule double. No fielding player is specified.
E$ is the code for an error allowing a batter to get on base. The fielder making the error is given by $. The batter advance to first is implicit but may be given explicitly.
indicates a throwing error (modifier "/TH") error on the pitcher with the runner on first advancing to third. The batter advance to first is implicit.
indicates a fielding error by the first baseman. In this case the batter advance to first has been explicitly given.
FC$ Fielder's choice. $ is the fielder first fielding the ball. The batter advance to first is understood if it is not given explicitly.
The first baseman fielded the ball and threw home in time to retire the runner attempting to score. The batter was safe at first.
The first baseman fielded the ball and attempted to throw an unspecified runner out. No outs were made and the batter is safe at first.
Note that even though force outs are considered fielder's choices, the notation distinguishes between force outs and non-forced fielder's choices.
FLE$ Error on foul fly ball.
H or HR is the code for a home run leaving the park. The location modifier can be used to indicate where the ball left the playing field.
indicates a solo home run into left field.
shows a home run into center field with the runners on first and second scoring.
H$ or HR$ indicates an inside-the-park home run by giving a fielder as part of the code.
HP Batter hit by a pitch. The batter advance to first is implicit. Other advances are given if needed.
K Strike out
A dropped third strike with a putout at first base is given by the event K23.
K+event On third strikes various base running play may also occur. The event can be SB%, CS%, OA, PO%, PB, WP and E$.
A passed ball on strike three allowed the runner on first to go to second.
An explicit batter advance is given when he reaches first on a third strike miscue. An alternative notation for WP and PB is given below.
Of course, a base running event can occur when the third strike is dropped.
NP no play. This event is used as a marker when substitutions are made.
I or IW intentional walk
W walk. In both cases base runner advances are given if needed. The batter advance to first base is implicit.
W+event, IW+event. On ball four various base running plays may also occur. The event can be SB%, CS%, PO%, PB, WP and E$.
The fourth ball was a wild pitch allowing the runner on second to advance.
The player specified in these plays is the batter at the plate, not the base runner or runners affected by the play.
The play pitches and count fields (if given) are for the batter at the time of the event. Unless the event is a inning or game ending out it will be followed by another event listing the batter.
BK indicates a balk.
CS%($$) is the event code for caught stealing. The bases, %, for this play are 2,3 and H. The fielding data, $$, is considered part of the play. Other advances may be given.
The error negates the out with the advance field indicating a two base advance on the play.
DI is the defensive indifference code and is given when there is no attempt to prevent a stolen base. The advance field specifies which base the runner went to.
OA is coded for a base runner advance that is not covered by one of the other codes. A comment may be given explaining the advance.
com,"Thompson out trying to advance after ball eluded catcher"
WP wild pitch. In both cases the catcher is unable to handle a pitch and a base runner advances.
PO%($$) picked off of base % (1, 2 or 3) with the ($$) indicating the throw(s) and fielder making the putout.
indicates the runner on second was out by a pick off throw from the pitcher to second baseman.
shows an attempt at a pick off at first with the first baseman committing an error that allows the runner to advance to second. The presence of the error (E3) negates the out normally associated with the pickoff play.
POCS%($$) picked off off base % (1, 2 or 3) with the runner charged with a caught stealing. The ($$) is the sequence of throws resulting in the out.
SB% is the event code for a stolen base. The bases, %, for this play are 2,3 and H.
show double steals, second and third in one case, second and home in the other.
Each modifier is preceded by / in a play record. As always, % indicates one the four bases and $ indicates a fielder.
BF fly ball bunt
BG ground ball bunt
BGDP bunt grounded into double play
BL line drive bunt
BP bunt pop up
BPDP bunt popped into double play
BR runner hit by batted ball C called third strike
DP unspecified double play
E$ error on $
FDP fly ball double play
FO force out
G ground ball
GDP ground ball double play
GTP ground ball triple play INT interference
L line drive
LDP lined into double play
LTP lined into triple play
P pop fly
R$ relay throw from the initial fielder to $ with no out made
SF sacrifice fly
SH sacrifice hit (bunt) TH throw TH% throw to base %
TP unspecified triple play
In addition to base runner movements, the advance portion of an event indicates fielding, errors and has the indicators indicating if a run is unearned and if an RBI is or is not credited.
Bases are represented by one of five characters, 1 for first, 2, 3 and B or H for home. B is used when a batter advance must be explicitly given. Scoring is indicated by a successful advance that reaches home, H.
Separate advances are given for each runner on base and are separated by a semicolon, ";". When more than one runner advance is given for a play they are ordered starting with the runner on third base and ending with the batter.
Advances may include additional information in the form of one or more parameters specified as a parenthesized strings of characters. When more than one parameter is given on an advance they are individually parenthesized.
A successful advance is given in the form 1-2. The dash "-" indicates a successful advance. Multiple base advances are indicated with the same notation: B-2, 1-3, 1-H, 2-H.
A runner put out at a particular base is indicated by the "X": 2X3, 1XH. When a runner is out the advance gives the fielding information as a parameter specifying the fielders. The last fielder gets credit for the put out and the others get assists.
Fielding errors are indicated by including an E in the parameter following an advance. The fielder following the E is charged with the error.
Following a second baseman error the batter is safe at second. The error indicator negates the out. The left fiellder is credited with an assist.
The parameter in this play attributes a throwing error to the third baseman. A base indicator may follow TH, TH2 for example.
Parameters are used to indicate if a run is unearned (UR) and if RBI is to be credited (RBI) or not (NR), (NORBI). When these parameters are not present, normal rules are followed.
The run scored on the passed ball is not credited as an RBI to the batter.
Three parameters are given on the 2-H advance. The first indicates a second baseman throwing error, the second indicates it is an unearned run and the third indicates no RBI.
In this play an RBI is given to the batter.
Interference can be indicated with an advance parameter. An alternative way of writing this is (5/INT).
com,"$Gonzalez out for grabbing coach on way back to 3B"
Team unearned runs are indicated by TUR in cases with more than one picther in the inning and the current pitcher is to be charged with an earned run.
A U appearing in a fielding sequence indicates the fielder handling the ball is unknown.
In the 8U3 sequence most likely the U is the shortstop or second baseman.
Advance parameters provide an alternative way of indicating wild pitches and passed balls.
ladj. This record is used when teams bat out of order. The normal assumption is that proper lineup sequence is followed, therefore, it is necessary to have some special indication when this is violated. The format is:
where "hv" is 0 for visiting or 1 for the home team and "pos" is 1-9 for the batting order position. Retrosheet has discovered quite a few cases of batting out of turn. You can see them in the Special Lists section: Batting Out of Turn. Here is a half inning of a typical, messy case from the game of 4-22-1980, Oakland at Seattle (game id: SEA198004220):
Interpretation: Willie Horton (hortw101) was the 5th place batter and batted at the correct time, grounding out. The proper person to bat next was Bill Stein, listed 6th in the starting lineup. However, Joe Simpson came up instead and doubled. Since Simpson was in the 7th spot in the batting order, the ladj record is needed to make this clear. So "ladj,1,7" means that the next batter will be the 7th place batter for the home team, no matter what was expected. In this case he doubled. According to the rules, once the improper batter completes his appearance, then he automatically becomes the correct batter. Therefore, the next batter should be the 8th place batter. However, the Mariners sent up Stein, who was in the 6th spot. The "ladj,1,6" record tells the user that the home team's 6th place batter is now up. He was hit by a pitch, and since his appearance was not challenged, the next proper batter is the 7th place batter. However, that is Simpson, who is on 2nd base! In the event, Larry Cox, the 8th place batter came up. The "ladj,1,8" record tells us that is who is batting. After his single, the proper batter is the 9th spot and that is what the Mariners followed, using Leon Roberts as a pinch-hitter for Mario Mendoza.
Conclusion: The effect of these manipulations is that the 6th and 7th place batter traded spots in this inning. It was not noticed by the A's and the plays went unchallenged. By the time the Mariners went around the order again, the mistake had been detected so that Stein and Simpson batted in the correct spots for the rest of the game.
In the fifth inning of Baltimore at the Seattle Pilots (SE1196905280) a game long sequence of out of order batting ends:
com,"$Davis is called out for batting out of order;"
com,"he doubled in 2 runs which triggered the protest;"
com,"since Simpson was the one due up, he was charged with the out"
The simple fly out to catcher play, the second out for Simpson in the inning, also provides the rules satisfying catcher putout when the Pilots finally protested.
Note that every batting out of turn situation has its own character, including whether or not it is detected by the opposition and whether or not the incorrect batter makes an out or reaches safely.