This makes it possible for us to select all of the New York City ZIP Codes from the crosswalk, and will allow us to append the county codes to every record in the IRS file, so that we can summarize data by counties and so that users will be able to select records by county. Since some ZCTAs have no residential population, the option to include records with zero population is checked.
Once the file was downloaded and opened in a spreadsheet, duplicate records were deleted so that a ZCTA was assigned to only one county. A Python 3 script was written to serve two purposes: create a subset of the IRS file that contains just records for NYC, and create a summary table that counts the number of records for each county by type of organization. The function takes four arguments from the user. The first is the name of the text file that contains the ZIP Code to county relationship.
The second is a month and year combination mar to identify which folder to draw the IRS data from and to append the date to the name of each output file. The third is the two-letter state code that identifies which state-based IRS file to process. The last element is a place name created by the user that generally describes the location of the records. The path method from the Python os module is imported first, which allows for the construction of cross-platform paths for reading and writing files using the parameters supplied by the user and static strings.
The first part of the program defines some general functions that will be called subsequently. The second part of the program creates the subset file that has the individual organization records for our area of interest.
The ZIP to county relationship text file is read in as a Python dictionary called zipcounty using the dictreader function. In other programming languages the dictionary data structure is known as a hash or an associative array, but the concept is the same. The IRS file is read in next. Thus, each record is parsed using commas and stored as an individual list within keeplist, which is a master, nested list.
An individual record list looks like this:. This format makes it easy for the program to loop through the list to select the list element that contains the ZIP Code, by using the index number for that element in this case the ZIP Code is index number 6; in Python the first index is always 0. Subsequently we modify keeplist to create a five digit ZIP Code for each record as a new list element, and we get the associated county code for that ZIP from the zipcounty dictionary and append it as a new element.
We also have to make sure we add headers for our two new elements in the header list. The list format also makes it simple to write the output; keeplist is passed through a simple function that writes the column header and each record out on its own line, with each element separated by a tab. The third and final part of the program creates a summary of the records. Most of the organizations in the Masterfile are classified as 3 c abbreviated in the data as 03 , which includes most public charities and private foundations.
This part of the program counts records by EO Code for each county. First, the county codes in the zipcounty dictionary are converted to a data structure called a set, which gives us a unique list of every county code in our area of interest. That set is then converted into a sorted list; this conversion is necessary because sets cannot be sorted. Next, a dictionary is created for each of the EO Codes, where the code is the key and the value is a list with a zero as the only element.
A second dictionary is created to hold the sum total of records by county. So in this instance, since there are five counties in NYC, each EO Code in the dictionary eocodes will have a list value with five elements. Each element will hold a count of the number of records for each county. Once the dictionaries are ready, the program can loop over the keeplist of organization records. Once the value is accounted for the sum total dictionary is updated as well.
The remainder of the program formats the data for output. A list from the Census Bureau of all the county codes and their names [ Census ] is read in as a dictionary using dictreader, and the names are used as column headers. Additional headers are inserted for the EO Code number and for a description of the categories; the category names are also read into a dictionary using dictreader and are inserted into the eocode dictionary in front of the county totals for each code.
As a hack we can strip out the brackets, and by default the remaining commas will serve as delimiters. The output file name is appended with the date and place name supplied by the user i.
Originally a Google product called GoogleRefine, the project was reorganized and released as open source in It is a desktop application that works within a web browser. You connect to a data file which can be stored in any number of formats and operate in a spreadsheet-like view. For this project the text-facet tool was used to group records into categories based on city name, which gave us the ability to see every single variant and error associated with this field so that they could be corrected.
OpenRefine has a cluster tool that uses heuristic algorithms to select several clusters of text that it considers similar enough to possibly be identical, and the end user has the ability to correct them all en-masse. For items the cluster tool does not catch, the user can select individual facets and correct all the records that use that facet.
For example, the different abbreviations and misspellings of Brooklyn could be quickly identified, and each cluster of records could be corrected en-masse to match the correct term see Figure 2. Using this tool, all misspellings were corrected, abbreviations standardized, and data that fell into the wrong field i. Unofficial variants of city names were left as is. The cleaned text file of organizational records was imported into Excel using the Text Import wizard, and care was taken to assign the ZIP and county code fields as text so that leading zeros would not be dropped.
The organizational summary file was imported into a second sheet in the workbook and cosmetic adjustments were made to each sheet. Lastly, a third sheet was inserted in front of the data sheets that contained a brief description of the file, some metadata, and links to the original source. The file was stored in the older. A dedicated LibGuide box was created to hold the file, so that other librarians could link to it if their guide carried related content.
To generate data for other parts of the country, download the IRS csv file for your state and follow the steps in the ZIP Code section of this paper to generate a ZIP Code to County relation file for your area of interest.
Census Bureau [Internet]. Census Bureau [Internet] updated No record is kept at the entry Post Office and no insurance coverage is provided. A return receipt offers proof of delivery on certified mail shipments and can be obtained for an additional fee. The return receipt identifies the article number of the mailing, the person who signed for it, and the date it was delivered. Special mailings pertaining to grant-related work or conferences must also be identified.
This identification will be used for billing purposes to chargeback postage usage to the appropriate departments. For our information and file, an email to Mailroom baruch. Remember that express mail is costly, please minimize its use by sending our mail in a timely fashion. If your office has an account with FedEx, DHL, or any other express service carrier, it is your responsibility to prepare and arrange its pick-up by that express carrier.
Should a member of the sending office be unavailable for pick-up by the carrier, they are asked to call ahead and arrange pick-up at the Mailroom. If the carrier has not come by 5 pm, the Mailroom will not be responsible for the delivery of the item.
Interoffice envelopes should be used when sending interdepartmental mail. There are two sizes of envelopes:. When addressing an interoffice envelope, be sure that all previous markings have been crossed out to ensure proper delivery. The address should always appear on the next available line. The Mailroom maintains a supply of used interoffice envelopes. If your office has accumulated a large number of or need interoffice envelopes please contact the Mailroom to arrange a pick-up or delivery.
Mailroom staff will pick-up waste toner cartridges that have been packaged for mailing, in a box with a prepaid shipping label. Mailroom The Mailroom serves the Baruch College community as a medium to exchange correspondence and resources within our campus and with those who conduct business with the College.
Contact Phone: Fax: Email: Mailroom baruch. Incoming Mail. We appreciate your cooperation in this matter. Delivery and Pickup. Delivery Schedule The Mailroom has scheduled runs for the delivery and pick-up of mail to and from departments throughout the College. Time Delivery Pick-up am Parcels and packages 1 pm Envelopes and flats All interoffice and outbound mail Outgoing mail not ready by the pick-up time can be brought to the Mailroom no later than 4 pm for same-day processing.
Outgoing Mail. Without this information, the Mailroom will not be able to process your mail.
0コメント