Disambiguation and co-authorship networks of the U

Disambiguation and co-authorship networks of the UDisambiguation and co-authorship networks of the U Disambiguation and co-authorship networks of the U.S. Patent Inventor Database Ronald Lai Baker Library Bloomberg Center 280D Harvard Business School Boston, MA 02163 rolai@hbs.edu Alexander D’Amour ...

Disambiguation and co-authorship networks of the U Disambiguation and co-authorship networks of the U.S. Patent Inventor Database Ronald Lai Baker Library Bloomberg Center 280D Harvard Business School Boston, MA 02163 rolai@hbs.edu Alexander D’Amour Center for Government and International Studies Knafel K319 Harvard Institute for Quantitative Social Science Cambridge, MA 02138 adamour@iq.harvard.edu Amy Yu Baker Library Harvard Business School Boston, MA 02163 ayu@hbs.edu Lee Fleming Morgan 485 Harvard Business School Boston, MA 02163 lfleming@hbs.edu August 26, 2010 We would like to thank the Harvard Business School Department of Research and the National Science Foundation (Grant #0830287) for supporting this research. Errors and omissions remain ours (though we ask that you bring them to our attention, in particular, any coding issues). We would like to express our gratitude to Jerry Marschke and Kjersten Whittington-Bunker for providing their hand disambiguated datasets for our algorithm checks, Vetle Torvik for his help in writing the disambiguation code, Jim Bessen and the NBER for providing access to their assignee names database. This paper draws much material and replaces an earlier paper, by the same authors, titled, “The careers and co-authorship networks of U.S. patent-holders, since 1975.” Disambiguation and co-authorship networks of the U.S. Patent Inventor Database Abstract: Using a Bayesian supervised learning approach, we identify individual inventors from the U.S. utility patent database, from 1975 to the present. An interface to calculate and illustrate patent co-authorship networks and social network measures is also provided. The network representation does not require bounding the social network beforehand. We provide descriptive statistics of individual and collaborative variables and illustrate examples of networks for an individual, an organization, a technology, and a region. The paper provides an overview of the technical algorithms and pointers to the data, code, and documentation, with the hope of further open development by the research community. Introduction Relatively complete and comprehensive patent data for the United States became widely available to researchers in the mid 1990s. These data enabled an outpouring of research in the fields of technology and innovation. Though individual researchers had developed their own databases from the raw data, the publication of a dataset from the National Bureau of Economic Research (hereafter referred to as the NBER, see Hall, Trajtenberg, and Jaffe, 2001) enabled a much broader set of researchers to use the patent data. This effort drastically reduced the barriers to entry and made patent data available to a larger community of researchers that lacked the resources, hardware or programming skills to access the data. The original NBER database included authorship and firm and state level data but did not identify unique inventors over time. This is a non-trivial task because the United States Patent Office (USPTO) does not require consistent and unique identifiers for inventors (for example, the last author of this paper is listed on one of his patents as Lee O. Fleming and the other as Lee Fleming). This paper contributes an improved disambiguation algorithm and a database with which we construct the co-authorship networks of utility patent inventors. The disambiguation algorithm uses a Bayesian supervised learning approach, which enables investigators to avoid making parametric assumptions about the importance of any particular matching field. The algorithms and code to accomplish are public as well in the hope that the community of researchers will take up responsibility for future updates and continuous improvement. The work builds directly on prior efforts by a variety of researchers (Fleming and Juda 2004; Trajtenberg, Shiff, and Melamed 2006; Fleming, King, and Juda 2007; Singh 2007; Lai, D’Amour, and Fleming 2009). In addition to improved disambiguation, we provide a network interface which enables a researcher to construct the co-authorship networks of inventors. Output formats support both regression analysis and graphical network diagrams. Applications of openly available network tools enable the researcher to develop the network without regard to computational boundary; in other words, the researcher begins with the entire co-authorship database, and need not specify social, geographical, technological, or any kind of boundary, prior to identifying the co-authorship network. Figure 1: Block Diagram of disambiguation and network variable calculation process. Figure 1 provides a high-level block diagram of the process. Source data come from the NBER database (Hall, Jaffe, and Trajtenberg, 2001) and directly from the US Patent and Trademark 1Office (USPTO) weekly publications. The data preparation step generates two datasets, first, data relevant to the inventor name and second, associated data that will be used to supplement the inventor identification process. These datasets are then fed into the disambiguation algorithm which outputs the matched inventor dataset. The first section (“Overview of dataset creation”) provides an explanation on how the inventor dataset is created; the second section (“Matching and disambiguation algorithms”) details the matching processes; the third section (“Results and accuracy metrics”), describes how we report results and accuracy, along with data examples; the fourth section (“Potential applications of the data to substantive problems”) describes and 1 Some of the early NBER data are missing and are supplemented by the 1998 Micropatent CD product (). We would like to acknowledge the donation of these data from Corey Billington and Ellen King of Hewlett-Packard. This completes approximately 70,000 gaps in data for records from 1975-1978. illustrates the data and offers ideas for applying the data in research. Various appendices include formatting information, algorithm details, user documentation, and a variable dictionary. 1. Overview of dataset creation Figure 2: Source files for disambiguation process. Primary data sources The final inventor, assignee, citation, patent and classification datasets were built using primary data sources from the USPTO and the NBER. The USPTO makes up-to-date patent data 2available on their public web resource through collaborations with the European and Asian patent offices. The weekly data aggregates granted patents, each represented by a separate XML 3file. In 2001, the USPTO also began providing weekly updates on patent applications. (The 2 USPTO provides weekly Bibliographic Information for Patent grants through its Sales Order Management System (SOMS) Catalog. 3There have been several format changes in the data in 1998, 2001 and 2002. authors hope to eventually automate these data and process into a weekly update.) While the USPTO resource provides the data which allows us to keep this dataset current, we extend the dataset to 1975 with the NBER patent database. 4The NBER patent database contains patents granted from 1975-1999 and is publicly available. 5Since the patent office only began automating data storage in 1975, we are utilizing information from 1975 onwards. To the best of our knowledge, there is not yet a comprehensive computer database which contains U.S. inventor information before 1975, though images are available and could probably be scanned for the pertinent fields. Secondary data sources In addition to the primary data sources, we merged in data from secondary data sources to create better parameters for identifying inventors. These secondary data sources include the USPTO 76CASSIS dataset, the National Geospatial-Intelligence Agency country files, the US Board on 89Geographic Names and NBER File of Patent Assignees. When a patent is granted, the USPTO assigns multiple alphanumeric codes to classify the technology. Due to advancements warranting new technical fields, the USPTO creates new classifications and updates previously coded patents. These classification changes are indicated 4 See Hall, Jaffe, and Trajtenberg, 2001 at 5 NBER provides limited data from 1963-1999 but only provides inventor data from 1975-1999. Since inventor information is necessary in our disambiguation algorithms, we have only matched inventors to patents granted after 1975. Further information about the inventor dataset can be found at: 6 Patents CLASS: Current Classifications of US Patent Grant Publications 1790 to Present' (Code: EIP-2050P-DD): #classP2050dd 7 Country Files (GNS) is a public database that contains Longitudinal and Latitude information for cities and locations around the world. 8 States, Territories, Associated Areas of the United States is a National file that contains Longitudinal and Latitude information for cities across the states. 9 in CASSIS, a dataset that is updated in two month cycles. Classifications reflect data up to November 2009. Geographic metrics have proven useful in the patent-inventor disambiguation. These data are sourced from public databases such as the National Geospatial-Intelligence Agency and the US Board on Geographic Names. End of 2009 datasets have been utilized for this update. Since assignees are oftentimes public firms, we have chosen to leverage the NBER Patent Data Project (PDP). One goal of the project is to create a key linkage to Compustat to allow further economic analysis. The project also attempts to classify firm M&A activity. Through a combination of the NBER PDP data and our own fuzzy string algorithms, we have incorporated 10their unique identifier, PDPASS into our patent database. Preparing the inventor dataset To minimize redundancy, rather than generating one large dataset containing all unique 11combinations of patent information, we created several smaller datasets that can be joined together on unique patent and inventor identifiers. These smaller independent datasets are organized by information type. They are assignees, citations, classes, inventors and patents. A detailed description of each of these datasets (Primary, Assignee, Citations, Classes, Patent) is provided in the Appendices. 10 See 11USPTO patent data contain 60+ fields of information. If we were to restrain our data into one primary dataset, unique permutations of each field would be difficult to manage. For example, many patents contain several inventors (INV), several classifications (CLS) and several citations (CIT). At the very minimum, a dataset would require INV x CLS x CIT records of data. Clearly, this can create an unnecessarily large and clunky dataset. 2. M atching and disambiguation algorithms Figure 3: Disambiguation algorithm block diagram. The major challenge in studying inventor careers from the raw patent data is determining which patents belong to the same inventor career. The patent data include unique identifiers for each patent, but do not include unique identifiers for inventors, so the clustering of patents into inventor careers must be done probabilistically. This process of clustering together similar records into likely inventor careers is called disambiguation. Feature Background Information Figure 4: Disambiguation variables. The unit of analysis in disambiguation is an author listing on a patent, which we call an inventor-patent instance. We created this inventor-patent dataset by merging in data from our relevant databases to create a raw table containing over 8 million inventor-patent instances. These records are structured such that a patent appears multiple times, once for each listed inventor (for example, a patent with three inventors would create three inventor-patent instances).We included more than 25 variables that would be relevant for disambiguation for each inventor-patent instance. A short summary of the included variables and their potential usefulness in clustering records follows. While by no means perfect, an inventor's name is still the most obvious feature to start with in disambiguation. In the raw dataset, the author name is split into first name (includes middle names) and last name (includes suffix). As patents are issued, we assume the owner of the patent (assignee) is the employer of the inventor. For most inventors, the assignee is often constant over significant stretches of a career, indicating an important attribute in disambiguation. The raw data provided contains a variable for assignee type (a numerical value corresponding to one of 16 categories) in addition to the assignee's name. The assignee name would ideally be enough to identify the firm that holds the patent, but problems arise from misspellings, different forms of the same company's name (IBM and International Business Machines, for example) and the fact that subsidiaries often have completely different names from the parent. Because of these issues, we included the assignee's type in the assignee category. Inventors include their home address in a patent application, providing another set of variables that are often constant over stretches of their careers. Location information is available in five variables: street, city, state, zipcode and country. A combination of these existing variables matched against public geographic databases allows us to extend the data to contain longitudinal and latitude metrics. Each patent also has technology class and coauthor data which are simple lists of classes and co-authors. These categories provide important information about the inventor's area of expertise and co-authorship network, respectively. We weighted each entry in the list equally, and because of time and space constraints chose to truncate each of these lists at the four most relevant primary classes and co-inventors. Disambiguation Algorithm The disambiguation algorithms in previous work and this paper cluster records by calculating similarities between pairs of records and then grouping sets of records that exceed a certain threshold of similarity together. The variation between disambiguation approaches arises from the ways in which pairs of records are assessed for similarity (paired record comparison), the methods used (if any) for enforcing consistency between comparisons (triplet correction), and the scheme by which records are chosen to be compared to each other in the first place (blocking). The approach that we present here differs substantially on all three fronts from previous work in the patent disambiguation literature, and builds on disambiguation of the PubMed database (Torvik et al, 2005; Torvik and Smalheiser, 2009). Paired Record Comparison The most basic building block of any disambiguation algorithm is a method for comparing pairs of records and determining the probability that they originated from the same author. Previous work in patent author disambiguation (Fleming and Juda, 2004; Fleming, King, and Juda, 2007; Singh 2007; Trajtenberg, Shiff, and Melamed, 2006; Lai, D’Amour, and Fleming 2009) approached record comparison by assigning ad hoc weights across all available fields (e.g. author name, assignee, technology class, coauthors), in order to determine a match score. A score threshold could then be set above which two records would be declared a match, allowing the researcher to produce more or less stringent matches. This weighting scheme could then be tuned to maximize results with a hand-curated “golden”dataset. While manual optimization can provide surprisingly accurate results, there are a number of problems with this approach that our disambiguation scheme attempts to alleviate. The first is the model-dependence of this method, which requires strong assumptions about the relationships between field similarities – for example, each of the previous papers cited here used a linear specification which implied total independence between similarity scores. However, there exist clear and non-trivial interactions between certain feature similarities. If two records match on assignee, but the assignee is large and works in multiple fields, for instance, the technology class overlap can have a large impact on how significant the assignee match ought to be considered. Linear specifications could handle these dependencies by introducing interaction terms, but this model-selection problem would be cumbersome. The second problem is that the dataset being used to train the weights, no matter how accurate, typically represents a small, biased sample. Inventors in these golden datasets tend to belong to the same communities (e.g. the BIIS dataset in Trajtenberg et al) or tend to be more prolific than average, making them more visible to researchers doing a manual survey. Despite the best efforts by researchers, hand-curated datasets are often incomplete as many inventors do not provide an exhaustive list of patents that they have authored. While still useful for verification, these biases make hand-curated datasets a poor choice for training a comparison algorithm. Finally, some valuable information is lost by assigning each pair of records a unitless match score rather than a true probability. When disambiguating a pair of records that are part of a larger set, one can find information about that pair’s match probability not only in their direct comparison, but also in comparing their similarities to other records in the group. Representing the similarity between records as the probability of a match rather than an arbitrary similarity score allows this idea to be formalized and enables substantially improved triplet adjustments. In order to alleviate these problems, the matching algorithm developed here applies a variation of the algorithm presented in Torvik and Smalheiser (2009). This approach avoids training set bias by automatically generating large sets of highly probable matches and non-matches. It then uses the frequency with which certain similarity profiles (see below) appear in these training sets to infer the probability of a match between an arbitrary pair of records without the need for a model specification. Details of the Torvik-Smalheiser algorithm can be found in the 2005 and 2009 papers detailing their disambiguation of MEDLINE, but a short and less technical overview is provided here. The Torvik-Smalheiser algorithm is distinct from previous approaches because it represents the similarity between two records with a multidimensional similarity profile rather than a one-dimensional similarity score. Each element of a similarity profile takes a discrete value corresponding to the result of a fieldwise comparison between the two records. For the patent data, we defined a 7-dimensional similarity profile – first name, middle initial, last name, author location, assignee, technology class, and coauthors – corresponding to each of the seven features 12we compared between records. The only assumption applied to these similarity profiles is monotonicity: if one profile dominates another profile (this is, each of its elements is greater than or equal to the elements of another similarity profile), then it must map to a higher match probability. Once defined, similarity profiles are mapped to an estimated match probability by observing the frequency with which they appear in automatically generated training sets. As per Torvik 2005, these training sets are generated by dividing the feature space into two sets of features that are assumed to be independent. In our particular specification, we divided our seven features into two groups: name attributes (first name, middle initials, and last name) and patent attributes (author address, assignee, technology class, and coauthors). Unbiased training sets are created by exploiting this independence, conditioning on one set of features to create a sample of obvious 12 The full specification of the similarity profiles used in this disambiguation can be found in the Appendices. matches or non-matches with which we could learn about the other set of features without bias. For the patent data, to learn about how similarity in name attributes predicts a match between two records, we selected pairs of records that shared two or more coauthors. Likewise, to learn about how similarity between the patent attributes predicts a match, we selected pairs of records where the author name was rare and matched exactly. We followed similar procedures to create non-match training sets. See Table 1 for a more detailed account of the training set conditions we used in this disambiguation. Match Set Non-Match Set Condition on: Name Pairs of records where the author Pairs of records where the last name name matches perfectly and the is different. To train: Other name is rare (the first name only ever appears with one last name, or the last name only ever appears with one first name). Condition on: Other Pairs of records that appear in the Pairs of records that appear in the same block and share two or more same block, but share no classes or To train: Name coauthors. coauthors, have different assignees, and are listed in different cities in the same year. Table 1: Summary of training sets, detailing how record pairs were selected, and which feature sets they were intended to train. Having created training sets, the Torvik-Smalheiser algorithm proceeds by counting how often each similarity profile appears in the match and non-match training sets. This yields an empirical probability of seeing a particular profile given that two records match and the empirical probability of seeing it given that two records do not match. A simple algebraic reorganization of Bayes Theorem reveals that the ratio of these probabilities – or the likelihood ratio – is monotonically related to the posterior probability that two records match given that they produce this particular profile. Because some similarity profiles appear infrequently in the data, the empirical ratios occasionally violate the monotonicity criterion stated above. The ratios are therefore adjusted using observation-weighted quadratic loss to meet these constraints (see Torvik et al, 2005 for details of these specifications and their representations as a quadratic programming problem). Given a mapping from every possible similarity profile to its likelihood ratio, calculating the probability that any two records match becomes a simple procedure. Before comparing the two records, the researcher chooses a prior match probability. The two records are compared fieldwise to generate a similarity profile. The ratio associated with that profile is then combined with the prior to produce a posterior probability of a match. Triplet Correction As noted above, the result of the direct comparison between two records does not tell the whole story about whether they ought to be matched together. In some cases, a pair of records that seem superficially similar may disagree on how similar they seem to a number of third records, providing evidence that they ought to be attributed to separate authors. Similarly, a pair of superficially different records may share a number of similarities to third records, suggesting that they are actually more similar than their direct comparison suggests. To exploit this information, the Torvik-Smalheiser algorithm incorporates a triplet correction step that adjusts the match 13probability between every pair of records so it is consistent with the match probabilities generated with the rest of the group. Clustering Once a final set of match probabilities has been determined, they can be converted into clusters. At this point, the Torvik-Smalheiser algorithm follows a standard single-linkage agglomerative clustering procedure; a threshold value is set that classifies a pair of records as a match if their match probability exceeds it and a non-match if it does not. Any two records that are declared to match are assigned to the same author. It should be noted that this is a “lossy” procedure that 2gives up a substantial amount of information (we are projecting a matrix of n probabilities into a vector of n memberships) and is highly sensitive to the threshold value. For this reason, we 13 Consistent here means that for every set of three records, their match probabilities obey the triangle inequality of probability, namely that (1-A) + (1-B) ? (1-C). For details, see Torvik et al, 2005. provide two sets of clusterings and a number of descriptive statistics that quantify how much information was lost in this procedure. Blocking and Consolidation As with most large-scale clustering problems, our procedure at its lowest level depends on pairwise comparisons between records. However, it is infeasible to do an exhaustive comparison of every pair of records in the patent database. Blocking, or partitioning the dataset into groups that contain likely matches, has been a popular approach to making disambiguation feasible. However, any single blocking scheme is a balancing act – on one hand, creating blocks that are 2too big does little to reduce the prohibitive O(n) runtime, while on the other hand creating blocks that are too small can rule out many potential matches, resulting in assigning patents from 14the same inventor to different people. Defining a blocking scheme that provides an effective balance can be difficult. To deal with this time/accuracy tradeoff, we developed a novel sequential blocking scheme. We applied the Torvik-Smalheiser algorithm to our database multiple times, and with each pass applied a different blocking. After each pass, we consolidated records that were assigned the same author identifier into a single record before proceeding to the next pass. Because of this consolidation step, we refer to these passes as “consolidation runs”. On each subsequent pass, we applied a more permissive blocking, but because of the record consolidation that had been applied after the last pass, we still achieved reasonable runtimes. This allowed us to explore more comparisons than would be feasible in the single-blocking scheme. A summary of the passes made over the data is provided in Table 2. Note especially the steep drop in the number of records after the first few runs which made more permissive blocking feasible. Sequential blocking, however, comes with its costs. With each consolidation, information is lost, as the match probabilities between all of the records that are consolidated are effectively rounded 14A variety of terms have been used to label wrong matching. Avoiding the less intuitive terms of “precision and recall”, we follow Torvik and Smalheiser (2009) and define “lumping” as matching patents from two different individuals and “splitting” as not matching patents that were invented by the same person. We use clumping and lumping interchangeably. to 1. This makes the approach very susceptible to lumping, or incorrectly attributing patents that were authored by different people to the same person. In particular, records that might have been pulled apart during triplet correction are left lumped together. To alleviate this problem, after the final consolidation run, we returned to the original dataset and blocked the database by the author identifiers applied by the last consolidation pass through the data. This provided a blocking that was very strict, but also very dense in matching comparisons. Because this run can only result in clusters of papers being split apart into multiple authors, we referred to this pass as the “splitting run”. For this run, we applied a very high threshold during clustering to ensure that spurious matches were filtered out. Table 2 summarizes each run. Run # Type Block1 Block2 Number 1 Consolidated Full first name. Full last name. 8 million 2 Consolidated First 5 characters of first name. First 8 characters of last name. 3.4 million 3 Consolidated First 5 characters of first name, First 8 characters of last name, 2.8 million omitting spaces. omitting spaces. 4 Consolidated Initials of first and middle First 5 characters of last name, 2.2 million names. omitting spaces. 5 Consolidated First initial First 5 characters of last name, 2 million omitting spaces. 6 Consolidated Initials of first and middle First 5 characters of last name, 2 million names. omitting spaces. 7 Consolidated First initial Last 5 characters of last name, omitting 1.9 million spaces. 8 Splitting 1.9 million from step 7 2.8 million Table 2: Description of disambiguation steps. Software In order to accomplish this disambiguation of the patent database, we developed a generic disambiguation engine which implements the Torvik-Smalheiser algorithm. The engine is written in C, and provides developers with a modular way to specify any disambiguation strategy on any database. The code base is currently available online at implementation to attempt their own disambiguation of the patent database or any other. 3. Results and accuracy metrics The goal of the disambiguation algorithm is to properly match and classify all an inventor’s patent to a sole and unique inventor number. Our current disambiguation is a work in progress, so to report an honest assessment of our results, we report two clusterings of the data, one which captures inventor careers in their entirety at the cost of occasionally lumping distinct inventors together (which we refer to as overclumping), and the other of which ensures that each cluster corresponds to a distinct inventor at the cost of occasionally splitting a single inventor over multiple clusters (which we refer to as underclumping). Balancing these two tendencies is akin to balancing Type I and Type II error in statistical hypothesis testing, and requires significant calibration. We first report a lower-bound disambiguation, which represents a lower number of inventors, and is intended for use when the penalty of underclumping is high. Our upper-bound disambiguation results from the splitting run and contains very few instances where multiple inventors are overclumped. It has a higher risk of splitting inventors across blocks and is intended for use when there is a high penalty to overclumping. Since the upper-bound disambiguation used the lower-bound disambiguation’s clustering as its blocking scheme, the clusters reported in the upper-bound disambiguation are strict subsets of those reported in the lower-bound. We recommend that researchers either run both sets of estimates, or manually verify the accuracy, if possible. Verification We use a manually validated dataset by NBER Researcher, Jerry Marschke in order to measure the accuracy of the dataset (referred to as our “benchmark dataset”). This benchmark dataset contains the patent history of approximately one hundred-US based academic inventors. Furthermore, to bolster the accuracy, for each step in the disambiguation algorithm, we have cross checked our outputted results with online resources and human pattern recognition. While most patent comparisons contain several bits of confirming evidence which implies a certain match, data challenges such as misspellings, firm level merger and acquisition activity, missing data, rarity of data, abundance of data, etc present algorithmic complexity. This benchmark dataset contains several of these data complexities and is thereby a worthy sample. For example, over 33% of the inventors in the benchmark dataset have patented in more than one location and nearly 50% have first name spelling variations. We calculated underclumping by taking each inventor in the golden dataset and finding the largest cluster of records in our disambiguation that corresponded to this inventor. We declared all records that should have fallen in this cluster “underclumped”. Summing the number of underclumped records and dividing by the total number of records in our golden dataset yields our underclumping statistic. We calculated overclumping by taking each cluster in our disambiguation and identifying the largest subcluster that corresponded to a single real inventor in our golden dataset. We then marked all records that were incorrectly grouped into this cluster as “overclumped”. Summing the number of overclumped records and dividing by the total number of records in our golden dataset yields our overclumping statistic. The 2009 patent datasets, which we compiled using the Torvik-Smalheiser algorithm show underclumping of 1.6% and overclumping of 17.7% for the lower-bound dataset and underclumping of 2.1% and overclumping of 7.9% for the upper-bound dataset when compared to the benchmark data. Our previous work using a manual linear weighting scheme (Lai, D’Amour, Fleming 2008) showed underclumping of 2.7% and overclumping of 3.9%. While we were able to beat our previous disambiguation’s error rates one at a time with each of our datasets, work remains to create a single disambiguation that can match the previous manual-weighting approach’s accuracy. It should be noted that nearly all the overclumping errors in the 2009 dataset are attributable to a few common names, namely David Johnson, Eric Anderson and Stephen Smith, and because of the benchmark dataset’s small size, that even small improvements in disambiguating these names can cause drastic swings in the accuracy rating. Thus, as we work to improve the results of the 2009 dataset, we also invite community members to share their own benchmark data that all can use to get a better assessment of algorithmic accuracy. Confidence Scores There is an admittedly large distance between the two disambiguations that we report, and depending on the subset of the dataset being considered, one of the disambiguations will almost certainly be closer to the truth than the other. To aid community members in deciding which disambiguation to use for their particular application, we have developed a set of descriptive statistics that describe our confidence in the reported inventor clusterings. These statistics are calculated at the level of the lower-bound cluster, allowing researchers to make the decision of which disambiguation to trust at this very granular level. As described above, the thresholding process that converts the full set of pairwise match probabilities to a clustering throws away a substantial amount of information. This procedure effectively rounds all match probabilities for records within the same cluster to 1 while rounding all match probabilities for records in different clusters to 0, allowing the reported disambiguation to diverge from what the data support in two ways. In the first, match probabilities that are relatively low can be rounded up as a side effect of the clustering process, resulting in some records within the same cluster that have very low similarity. In the second, match probabilities that are relatively high can be rounded down because they do not meet the match threshold, resulting in records that have real similarity being split into different blocks. To measure how significantly these errors affect a clustering, one can calculate the within-cluster density and the out-of-cluster density, respectively. The within-cluster density is the average match probability between every pair of records within a cluster. A high within-cluster density indicates that the data strongly support the clustering, while a low density suggests that the cluster is largely held together by spurious associations. The out-of-cluster density, on the other hand, is the average match probability between every pair of records in a block that were not assigned to the same cluster. A low score here indicates that the records were correctly split into multiple clusters, whereas a high score indicates that the clustering may be too granular, as a large amount of similarity was left unrecognized by the clustering. To make this discussion more intuitive, we deal with out-of-cluster sparsity (one minus the out-of-cluster density) instead so that we can say that a high score for both of these statistics indicates high confidence. For our confidence statistics, we calculated a within-cluster density for each lower-bound cluster. Within each lower-bound cluster we also calculated an average within-cluster density for the upper-bound clusters and the out-of-cluster sparsity that resulted from splitting the lower-bound cluster into the upper-bound cluster. The first score can be used to gauge our confidence in the lower-bound clusters, and the other two can be used to gauge our confidence in the upper-bound data. When choosing which disambiguation to use for a particular application, researchers can begin by choosing the lower-bound clusters that correspond to their area of interest, and checking the three confidence statistics. The researcher should compare the upper- and lower-bound density measures – if there is little difference, the lower-bound is likely safe to use. If there is a large difference, the researcher should check the out-of-cluster sparsity of the upper-bound blocks to see how much underclumping accuracy was sacrificed to improve overclumping performance when splitting the larger cluster into smaller clumps. If the sparsity of the upper-bound blocks is high, then it is likely safe to use the upper-bound blocks. In cases where all three confidence measures are low, the researcher might consider performing a manual disambiguation on the subset of interest using the lower-bound cluster as a starting point. Caveats and Plans for an Improved Specification The current specification has a number of shortcomings that will be addressed by ourselves and the research community. While our work is based mostly on the Torvik-Smalheiser disambiguation specification from 2005, their work in 2009 provided a number of suggestions for more rigorously setting parameters like block priors and the weighting coefficient in triplet correction, and for handling correlations between fields in the data (e.g. living in Korea and working for Samsung) that can bias disambiguation results. In studying the confidence statistics that we generated for our dataset, we also discovered how the clustering threshold can be set automatically at a block level rather than being a fixed constant that is applied uniformly over all blocks, which should improve accuracy considerably. In other issues, the name comparison algorithm is well-suited for names of European descent, but badly suited for Asian names. The assignee comparison algorithm could take into account firm size, and the location comparison algorithm could do more to separate large and small cities. Because of the specificity of the name comparison functions to European names, and the assignee comparison’s difficulty in dealing with assignees in highly concentrated industries, we were unable to perform a satisfactory disambiguation of inventors residing in most Asian countries, specifically Korea, Japan, China, and Taiwan. These records appear in the full data, but are not clustered in any way. One potential improvement is to perform region-specific disambiguation first, so that training data can provide a more accurate picture for each region, and then merge these datasets together to capture inventors who migrated between regions. This work would lay the groundwork for the heterogeneous database merging that we wish to perform with the MEDLINE and other document database data (Torvik and Fleming, 2009). Descriptive Statistics Figure 5 illustrates year by year statistics for the number of patents and unique inventors, for both lower and upper disambiguations. The number of unique inventors closely tracks the number of patents as might be expected, and there appear to be a few more unique inventors than patents, though these data do not include Asia, with the exception of Japan. Figure 5: Number of patents and unique inventors from upper and lower disambiguations. Data prior to 1982 is inaccurate, due to lack of inventor history before 1975, and data do not include Asia, with exception of Japan. Figure 6 illustrates the average number of co-authors for each inventor in each year. The results are consistent with Wuchty and co-authors’ (2007) evidence that patenting and scientific publication are becoming more collaborative over time. Figure 6: Number of unique co-authors from upper and lower disambiguations. Data prior to 1982 is inaccurate, due to lack of inventor history before 1975, and data do not include Asia, with exception of Japan. Figure 7 illustrates the size of the largest component for each year (Fleming, King, and Juda, 2007). The largest component is the co-authorship network, where every inventor can trace a path to every other inventor. As with earlier investigations of regional patenting co-authorship network, the graph demonstrates a dramatic upward event, after some triggering threshold. Current research attempts to model and explain this threshold and phase transitions in inventor networks (Airoldi et al. 2010). Figure 7: Size of largest component, by grant year, for all U.S. patents. 4. Potential applications of the data to substantive problems We sketch ideas on how these data might be used in our area of inquiry, namely creativity, collaboration, and inventor mobility, but our hope is that a diverse set of researchers will apply these data in completely unforeseen, novel, and productive ways. Individual level of analysis research that will be enabled by the data At the individual level, the primary benefit of a large and longitudinal database of collaborative relationships will be to complement survey and field-based social network research that has occurred to date. Much social network research relies upon questionnaires and field work. While field methods enable rich and real-time data collection, they are limited in other ways. For example, they are expensive to administer repeatedly, must inevitably bound their networks, usually sample without an understanding of the underlying distribution of nodes, and often rely upon self-report. Furthermore, because data collection is so expensive, field studies tend to analyze all collected data. This can lead to spurious correlation in statistical models, due to a lack of independence between adjacent nodes. Archival databases can complement field survey networks, because they can observe an individual over a lifetime, do not require bounding (for example, these data have revealed one connected component with close to 500,000 authors, over a period of ten years, see Marx, Singh, and Fleming 2010), and can more easily measure phenomena across large and indirect networks (for example, in studying the generation and subsequent diffusion of ideas). Furthermore, if enough data exist, they can be sampled randomly, and hence avoid network autocorrelation problems. The database should allow the exploration of many questions in creativity, inventor mobility, and the diffusion of ideas. For example, what portion of the advantage of collaboration is the enhanced opportunity to generate new ideas vs. the benefit that comes from easier diffusion of the ideas that are generated? In other words, does collaboration result in higher quality ideas, or does collaboration simply make it more likely that those ideas will be transmitted to and used by others, and be perceived as higher quality? What is the creative benefit from spanning a technological or organizational boundary, as opposed to spanning social space? Social and technological boundaries correlate, but they are distinct constructs, with different effects and interactions (Fleming and Waguespack, 2007). What is the optimal collaborative structure at different points in an inventive career? In other words, should a recently-graduated inventor collaborate differently than an experience inventor? Do collaboration networks vary by gender or ethnicity and is there a creative benefit to such differences? Finally, how does knowledge diffuse from scientific networks to technological networks (this will require merging of the patent and paper databases, as proposed by Torvik and Fleming 2009). In addition to basic correlations between social network position and a variety of outcomes, the large database will enable matching and other approaches for developing appropriate controls. The database will also greatly facilitate field work and the selection of interviews to complement the archival data (Marx, 2010 GET CITE). The influence of social networks on inventor mobility can be studied across 50 years (thus enabling empirical refinement of Granovetter, 1973), and the influence of direct and indirect networks upon the diffusion of knowledge can be traced across individuals, organizations, regions, and technology domains. Perhaps most importantly, the archival nature of this database enables the study of social networks over the entire careers of inventors, and how those networks enable creativity, mobility, and the co- evolution of science and technology with careers. For example, inventor births and mobility – domestic and international - could be tracked across firms, technologies, and countries. One application of this would be to study the equilibrium of brain drains caused by noncompetes – for example, are inventors (particularly the best inventors) born in states which enforce noncompetes more likely to end up in states which do not (Marx, Singh, and Fleming, 2010). Ultimately, if ethical concerns about the identification of individuals can be resolved (perhaps through the merging of datasets behind “enclave” firewalls), these data could be merged with individual data on births, deaths, and family and employment events. This would enable stronger causal inference, based on natural experiments (for example, see Azoulay, Zivin and Wang, 2007). Organizational level of analysis research that will be enabled by the database The availability of the NBER database has enabled a great deal of research on firms and patenting and some of this research has considered social networks. For the most part, however, this research has aggregated patent data to the organizational level, for example, citations or counts. These data will enable organizational level analysis to use the underlying patent co-authorships in a much more detailed and flexible way. For example, density measures of collaboration within the firm could be calculated (number of co-authorships divided by number of patents). More interestingly, the data would enable the study of the typologies and topologies of collaboration within and across organizations. Figures 8 and 9 provide an example of the radical differences in collaborative structure across firms. They illustrate two firms within the carburetion industry, circa 1990, when the industry was undergoing a radical technology shift from mechanical to electronic carburetion. The figures offer stark contrasts into how the firms’ inventors collaborated. Merged data from outside the patent record would see to connect these graph types with technological, market, and strategic outcomes, possibly using field interviews, product performance archives, and statistical methods. With these data, such research questions could be quickly and efficiently studied in a variety of industries, both visually and statistically. Figure 8: Bosch carburetor patents, circa 1980 (unpublished, developed with Dan Snow and Venkat Kuppuswamy). Note the difference with Figure 9, in that Bosch is much more collaborative. Nodes represent inventors and node size corresponds to the number of patents. Black nodes represent inventors who work in physical technologies, dark grey nodes represent electronic technologies, and light grey nodes represent inventors in both technologies. Tie width corresponds to the number of co-authored patents. Light grey ties represent later ties, black ties earlier ties, and dark gray ties intermediate ties. Figure 9: Ford carburetor patents, circa 1980 (unpublished, developed with Dan Snow and Venkat Kuppuswamy). Ford inventors are much more isolated and less collaborative than Bosch inventors illustrated in Figure 8. Regional level of analysis research that will be enabled by the database Social networks have long been argued to be an integral part of regional innovative dynamics (Marshall, 1919; Saxenian, 1994; Owen-Smith and Powell 2004). Complete archival data to study such arguments are difficult to compile, however, such that wide and long panel datasets remain rare (for example, see Fleming, King and Juda, 2007). Due to zip code data associated with the first inventor (though early data are spotty), many patents can now be identified at the regional level (at the very least, they can be assigned to a state or country). These networks enable visualization and estimation of the correlation between networks and regional innovation dynamics. Figure 11 encapsulates the results of a first exploration of regional networks in the U.S. from 1975 to 2000. It illustrates the networks of Silicon Valley and Boston in the late 1980s and graphically demonstrates the argument that California is more highly networked. It illustrates the connecting importance of educational institutions (such as Stanford and MIT) and corporate postdoctoral fellowships (see Fleming and Marx, 2006, and Fleming, King and Juda, 2007). Demonstrating an example of the mixed methods approaches encouraged above, this work also used the network diagrams to identify crucial cut-point inventors and similar inventors that did not. Both sets of inventors were then interviewed to provide a structural history of the regions’ collaboration dynamics. One possibility for future research would be to replicate these analyses across many U.S. regions. The combination of illustrations, estimations, and field work would greatly facilitate our understanding of regional innovative dynamics. Combined with instrumental variables or natural experiments, the data could illuminate many important policy questions in science and innovation policy. For example, exogenous policy changes that influence mobility (Marx et al, 2009 and 2010), could be used to understand the structural dynamics of social networks. A variety of national innovation policy questions could be pursued with the data as well, for example, the influence of international mobility upon knowledge flows and subsequent innovation, or similarly, the influence of international collaboration networks. The impact of foreign investment on the geography of innovation could be studied. Having a long series of geographical data linked to many of the patents will enable a host of longitudinal analyses. Technology domain level of analysis research that will be enabled by the database The data would also afford a better understanding of the emergence of entire fields of technology. New disciplines and breakthroughs are often thought to occur from the intersection of formerly distinct disciplines and technologies. This can only occur, however, through the collaboration of individuals across boundaries or the crossing of individuals across disciplines. One of the biggest challenges to studying this idea, however, is the difficulty of observing – and predicting the success – of such collaborations. Stated another way, it is easy to look backwards and trace the emergence of DNA arrays to a fusion of semiconductor and biotechnology. Given all potential fusions at the time, however, could we have predicted which one would have occurred – and been successful? By developing the social networks of the entire patenting record, we can avoid sampling on the dependent variable of success. The data would enable both illustration and description and inference on these issues. The data would also facilitate the measurement of the diffusion of knowledge across these four levels of analysis (individual, organizational, regional, and technological). Figure 10: Graphical comparison of Silicon Valley and Boston collaboration networks in 1990 (unpublished, developed with Ivin Baker). The text describes why Silicon Valley networks became connected and Boston did not, based on interviews and field research of the cutpoint inventors (circled) and similar “counter-factual” inventors who did not create cut-points. Conclusion Social scientists around the world have begun to disambiguate patent records (Raffo and Lhuillery 2009). We provide a disambiguation of the U.S. patent record and make our code and algorithms public, to enable critique and future improvement. In contrast to previous ad hoc methods, this approach drew from computer and information science (Torvik and Smalheiser 2009) and applied a Bayesian supervised learning approach. This work provides a public a database and tools that enable identification of any co-authorship network in the USPTO database. We provided illustrative descriptive and other statistics for the database, along with network illustrations. As individual databases of patents and scientific papers become available, the next obvious step would be to link such databases. If grant data can be integrated, this would enable tracing the impact of science research funding on scientific publications and patents. The gatekeepers between science and technology could be easily and comprehensively identified, along with knowledge and personnel flows. Finally, if economic and social outcome databases could be developed and integrated, we might able to comprehensively quantify the impact of science on society (for example, how many web hits does a patent generate, and can they be categorized as informational, legal controversy, or financial)?. Figure 12 provides an idealized schematic. Obviously, such an approach would miss much (and probably most) of the action; in some ways, as with the patent databases, it would direct research to where the light was shining. Extensive field work, however, could characterize these deficiencies, and indicate what problems were poorly addressed. Ideally, open and public access to such a database would enable researchers to improve it over time, to eventually become a valuable community resource. GRANTSPAPERSPATENTSECON/SOCIAL IEEEWeb DOEinferable outcomes NIH Compustat USPTONSFVC dbases Pubmed Private ESF EPO Figure 12: An idealized schematic of how databases might be built and linked in order to understand how science investment influences economic outcomes (from Torvik and Fleming, 2009). The intent is to implement comprehensive links between the Pubmed and US patent databases, of same people, cites from patents to papers, cites from papers to patents (for a subset of papers), and co-authors. With public critique and contributions from various fields, in time it might ultimately cover most of science and invention and societal impact. References Azoulay, P. and J. Zivin, J. Wang (2007). “Superstar Extinction.” Working paper, MIT Sloan School of Management. Fleming, L. and A. Juda, “A Network of Invention,” Harvard Business Review 82 (2004): 6. Fleming, L. and C. King, A. Juda, “Small Worlds and Regional Innovation.” Organization Science, Vol. 18, No. 2 (2007), pp. 938-954. Fleming, L. and S. Mingo, D. Chen (2007). “Collaborative Brokerage, Generative Creativity, and Creative Success.” Administration Science Quarterly, 52 (2007): 443-475. Fleming, L., and D. Waguespack (2007). “Boundary spanning, brokerage, and the emergence of leadership in open innovation communities.” Organization Science, Vol. 18: 165-180. Granovetter, M. (1973) “The strength of weak ties.” American Journal of Sociology, 78: 1360- 1379. Hall, B. H., A. B. Jaffe, and M. Trajtenberg. (2001). The NBER patent Citations Data File: Lessons Insights and Methodological Tools, NBER. Herzog, T. and F. Scheuren, W. Winkler (2007). Data Quality and Record Linkage Techniques. New York, Springer Press. Ronald Lai; Alexander D'Amour; Lee Fleming, (2009). 2009-02-18, "The careers and co-authorship networks of U.S. patent-holders, since 1975", :5:daJuoNgCZlcYY8RqU+/j2Q== Harvard Business School;Harvard Institute for Quantitative Social Science [Distributor] V1 [Version] Marx, M. (2010). “Great work if you can get it – again.” Working paper, MIT Sloan School of Management. Marx, M. and D. Strumsky, L. Fleming “Mobility, Skills, and the Michigan Non-compete Experiment,” Management Science, 55 (2009): 875-889. Marx, M. and J. Singh, L. Fleming (2010). “Regional Disadvantage? Non-competes and Brain- Drain.” Working paper, MIT Sloan School and Harvard Business School. Marshall, A. Industry and Trade. London, MacMillan, 1919. Owen-Smith, Jason & Walter W. Powell (2004) "Knowledge Networks as Channels and Conduits: The Effects of Spillovers in the Boston Biotechnology Community." Organization Science. 15(1):5-21 Raffo, J. and S. Lhuillery (2009). “How to play the “Names Game”: Patent retrieval comparing different heuristics.” Research Policy 38 (2009) 1617–1627. Saxenian, A. Regional Advantage. Cambridge, MA: Harvard University Press, 1994. Singh, J. 2005 “Collaborative networks as determinants of knowledge diffusion patterns.” Management Science, 51(5): 756–770. Singh, J. (2007). "Asymmetry of Knowledge Spillovers between MNCs and Host Country Firms." Journal of International Business Studies, 38(5): 764-786. Singh, J. (2008). "Distributed R&D, Cross-regional Knowledge Integration and Quality of Innovative Output." Research Policy. Research Policy, 37(1): 77-96. Sorenson, O., and L. Fleming 2004 “Science and the diffusion of knowledge.” Research Policy, 33(10): 1615-1634. Torvik, V. and M. Weeber, D. Swanson, N. Smalheiser (2005). “A Probabilistic Similarity Metric for Medline Records: A Model for Author Name Disambiguation,” JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 56(2):140– 158, 2005. Torvik, V. and N. Smalheiser (2009). “Author Name Disambiguation in MEDLINE.” ACM Transactions on Knowledge Discovery from Data, Vol. 3., No. 3, Article 11. Torvik, V. and L. Fleming (2009). “From grant to commercialization: an integrated database which can trace, assess, and measure the impact of scientific funding.” NSF grant #0965259. Available from last author upon request. Trajtenberg, M., G. Shiff, and R. Melamed (2006). The Names Game: Harnessing Inventors Patent Data for Economic Research, NBER. Appendix 1 Example of a single USPTO XML file Introduction XML stands for Extensible Markup Language. It is a format that has gained popularity due to its minimal size, structure, flexibility and ease of understanding. Data items within an XML file are defined through the usage of tags and attributes. XML example through HTML based websites For those who have created basic HTML based websites, the terms of tag and attributes may be easy to understand. For those who have not, I have created a basic example below to demonstrate the basic structure. This initial note is not meant to serve as a XML primer. For a comprehensive overview, there are several online resources to enhance understanding. This is a Title Tags indicate the start and end of information. In our example, we find several tags. Examples of start tags are , , , and <body>. End tags are identified through usage of a backslash “/” and appear in our example as </html>, </head>, and . If we concentrate on the This is a Title, the tags indicate that the variable “title” contains the value “This is a Title”. The (image) tag contains both start and end while containing separate attributes, namely srcand width. These attributes define the image. Tags can further separate data into parent-child like structures and are evident through our example. The tag is enclosed within the <head>tag which creates a relationship between the two variables. This allows us to determine that <title>is the child of <head>and <head>is a parent to <title>. If this example were a HTML page it would communicate to a modern browser that the title of the page should be labeled as “This is a Title” and the body of the browser should contain an image with a width of 100px and the filename picture.jpg. This powerful but simple structure allows both information and relationships to be easily determined. USPTO XML file formats and data XML files are evident within the USPTO weekly XML files. A word of caution- USPTO creates a XML file for each unique patent. Since every week, there are several hundred patents being granted, USPTO consolidates the XML file into one file. The structure is not ideal, but can be easily separated to allow for easier data manipulation. We have taken a sample XML file from the first patent granted on December 30, 2008 and will describe how it aids in the dataset creation process. To increase the efficiency we have determined the creation of smaller, independent datasets is a reasonable approach. Several datasets are created through from the USPTO patent data including: assignees (patent owner information), citations (referenced patent that aided in development), classes (patent classification scheme), inventors (demographic information about the authors of the patent) and finally, patent (basic chronological information to describe the patent). There are other forms of information which can be found within the XML file, but these items serve as the basis for our current iteration of the inventor disambiguation algorithm. <publication-reference> <document-id> …… <doc-number>D0583526</doc-number> …… </document-id> </publication-reference> The unique identifier for each patent is defined through the document number. The document number in our current patent is D0583526. The patent type has been defined as “D” or “Design” and the patent number is 583526 netting the document number D|0583526. The combination of patent type and number will be associated with each of these individual datasets. USPTO XML | Assignee <assignees> <assignee> <addressbook> <orgname>Kellogg Company</orgname> <role>02</role> <address> <city>Battle Creek</city> <state>MI</state> <country>US</country> </address> </addressbook> </assignee> </assignees> Assignee information can be easily determined from the XML extract. Specifically, Kellogg Company which is located in Battle Creek, MI USA with a role or type defined as 2. From our understanding, assignee type further classifies the firm, but we have yet to receive clarification. USPTO XML | Citations <references-cited> <citation> <patcit num="00001"> <document-id> …… <doc-number>D20662</doc-number> …… </document-id> </patcit> …… </citation> …… <citation> <patcit num="00038"> <document-id> …… <doc-number>D540507</doc-number> …… </document-id> </patcit> …… </citation> </references-cited> The development of patents is oftentimes based on research from previous patents, known simply as a citation. Within the USPTO XML file, citations are organized numerically through a patent citation number reference. The document number mimics the format that constitutes the previously mentioned patent number and type. Here we are able to determine that this patent’s thfirst citation is to design patent #20662 and its 38 citation is to design patent #540507. USPTO XML | Classes <classification-national> <country>US</country> <main-classification>D 1128</main-classification> </classification-national> Patent classification can be derived from the main classification field into a class (D1) and subclass (128) resulting in the entry “D 1|128”. Oftentimes, there can be multiple classifications for a patent. If this is the case, we attribute the first classification as primary for the patent. USPTO XML | Inventors <applicants> <applicant sequence="001" app-type="applicant-inventor" ……> <addressbook> <last-name>Roach</last-name> <first-name>Richard</first-name> <address> <city>Schaumburg</city> <state>IL</state> <country>US</country> </address> </addressbook> …… </applicant> …… <applicant sequence="005" app-type="applicant-inventor" ……> <addressbook> <last-name>Barnes</last-name> <first-name>Donald</first-name> <address> <city>Augusta</city> <state>MI</state> <country>US</country> </address> </addressbook> …… </applicant> </applicants> Several applications or inventors are associated with each patent. The USPTO XML file provides basic information on the inventors including their names, general location, and their sequence within each patent. For our example, we are able to determine that the first author or inventor of the patent is Richard Roach from Schaumburg, IL US. The fifth inventor is Donald Barnes from Augusta, MI US. This information is further consolidated and forms the basis of our inventor disambiguation algorithm. USPTO XML | Patents <publication-reference> <document-id> …… <date>20081230</date> </document-id> </publication-reference> <application-reference appl-type="design"> <document-id> …… <date>20070730</date> </document-id> </application-reference> Finally, we are interested in considering patent information to understand the timing of the patent. The publication reference provides the week (12/30/2008) when the patent was granted and the application reference provides the week (7/30/2007) when the patent was applied. 15A simple search of this patent confirms the information presented within this Appendix. Below, we have provided the design patent as it appears on USPTO’s website as of January 7, 2009. United States Patent D583,526 Roach, et al. December 30, 2008 Description: Curved trapezoid food product Claim: We claim the ornamental design for the curved trapezoid food product, as shown and described. Inventors: Roach; Richard (Schaumburg, IL), Almeida; Helbert (Battle Creek, MI), Anderson; Brian (Augusta, MI), Howrey; Bruce (Battle Creek, MI), Barnes; Donald (Augusta, MI) Assignee: Kellogg Company (Battle Creek, MI) Appl. No.: D/282,813 Filed: July 30, 2007 Current U.S. Class: D1/128 Current International Class: 0101 Field of Search: D1/100-130, 199 426/94-95, 89, 128, 144, 293, 543, 549-550, 556, 383, 496, 438- 439, 446, 450, 502-504, 512, 516, 619, 805, 808 U.S. Patent Documents 1. D20662 April 1891 Pearson 2. D22990 December 1893 Mackey 3. D31777 October 1899 Fox 4. 3384495 May 1968 Potter, Jr. 5. D212542 October 1968 McCarthy 6. D213945 April 1969 Cooper 7. D219002 October 1970 Gronberg 8. 3545979 December 1970 Ghafoori 9. D247071 January 1978 Neidenberg et al. 10. D268539 April 1983 Hamann 11. D273814 May 1984 Gellman et al. 12. D286919 November 1986 Flockhart 13. D311472 October 1990 Giles 14. D324290 March 1992 Stein 15. D328964 September 1992 Karppinen 16. D343494 January 1994 Thorniley 17. 5366749 November 1994 Frazee et al. 15 Users are able to search for patents: 18. D353032 December 1994 Mistretta 19. D376039 December 1996 Pike 20. D395535 June 1998 Reichkitzer 21. 5843503 December 1998 Clanton 22. D403485 January 1999 Clanton 23. D421827 March 2000 Doyle 24. 6120827 September 2000 Rocca 25. D443953 June 2001 Slaboden 26. D445237 July 2001 Boselli et al. 27. D445992 August 2001 Reinhart 28. D446254 August 2001 Azar 29. 6387421 May 2002 Clanton 30. D487951 April 2004 Barry et al. 31. D490590 June 2004 Ferguson et al. 32. D498034 November 2004 Schwartzberg et al. 33. D503771 April 2005 Costa 34. D506302 June 2005 Schwarzberg 35. D516668 March 2006 Costa 36. D518272 April 2006 Schwartzberg 37. D532181 November 2006 Almeida 38. D540507 April 2007 Aleman et al. Other References Golden Stoneground Wheat Crackers packaging.cited by examiner. Primary Examiner: Webster; Robin V Assistant Examiner: Kearney; Karen E Attorney, Agent or Firm: Dickinson Wright PLLC The XML files contain a tremendous amount of detail and can soon be overwhelming. To better understand the file and terminology, we approach the USPTO and through correspondence we were able to determine clarification. We have provided this clarification in Appendix 2. Appendix 2 Further USPTO XML file clarification This Appendix is based upon correspondences with the USPTO and further clarifies the XML patent file. As of July 18, 2008, the USPTO has not created a formal data dictionary for the XML file but we have received some clarification. The formal creation of the data dictionary is an item which the USPTO realizes and may create in the near future. Only the presentation of the information has been altered to remain consistent with the paper. The language provided is what has been provided by the USPTO and has not been altered. Table 1 - U.S. Patent Grant and Published Applications Document Numbers: Patent Grant Patent Number , Design Patents ? Position 1 – A constant “D” identifying the granted document as a Design Patent. ? Positions 2-8 – Seven-position numeric, right justified, with a leading zero. , SIR Patents ? Position 1 – A constant “H” identifying the granted document as a Statutory Invention Registration (SIR). ? Positions 2-8 – Seven-position numeric, right justified, with a leading zero. , Plant Patents ? Positions 1-2 – A constant “PP” identifying the granted document as a Plant Patent. ? Positions 3-8 – Six-position numeric, right justified, with a leading zero. , Reissue Patents ? Position 1-2 – A constant “RE” identifying the granted document as a Reissue Patent. ? Positions 3-8 – Six-position numeric, right justified, with a leading zero. , Utility Patents ? Positions 1-8 – Eight-position numeric, right justified, with a leading zero. , X-Series ? Patents issued between July 31, 1790 and July 4, 1836. They were not originally numbered, but have since been assigned numbers in the sequence in which they were issued ? Positions 1-8 – Eight-position, right justified, with a leading “X”. Table 2 - U.S. Patent Grants and Patent Published Applications Kind Codes Note: The following 2-position kind codes will be present in the XML <kind> tags of Red Book and Yellow Book. These 2-positions kind codes will also be present on the printed documents with the following exceptions: Reissues will contain a single position “E”, SIR documents will contain a single position “H”, and Designs will contain a single position “S”. , A1 - Utility Patent Grant issued prior to January 2, 2001. , A1 - Utility Patent Application published on or after January 2, 2001 , A2 - Second or subsequent publication of a Utility Patent Application , A9 - Correction published Utility Patent Application , Bn - Reexamination Certificate issued prior to January 2, 2001. ? NOTE: “n” represents a value 1 through 9. , B1 - Utility Patent Grant (no published application) issued on or after January 2, 2001. , B2 - Utility Patent Grant (with a published application) issued on or after January 2, 2001 , Cn - Reexamination Certificate issued on or after January 2, 2001. ? NOTE: “n” Represents a value 1 through 9 denoting the publication level. , E1 - Reissue Patent , H1 - Statutory Invention Registration (SIR) Patent Documents. ? Note: SIR documents began with the December 3, 1985 issue , I1 - “X” Patents issued from July 31, 1790 to July 13, 1836 , I2 - “X” Reissue Patents issued from July 31, 1790 to July 4, 1836 , I3 - Additional Improvements – Patents issued issued between 1838 and 1861. , I4 - Defensive Publication – Documents issued from Nov 5, 1968 through May 5, 1987 , I5 - Trial Voluntary Protest Program (TVPP) Patent Documents , NP - Non-Patent Literature , P1 - Plant Patent Grant issued prior to January 2, 2001 , P1 - Plant Patent Application published on or after January 2, 2001 , P2 - Plant Patent Grant (no published application) issued on or after January 2, 2001 , P3 - Plant Patent Grant (with a published application) issued on or after January 2, 2001 , P4 - Second or subsequent publication of a Plant Patent Application , P9 - Correction publication of a Plant Patent Application , S1 - Design Patent Table 3 - U.S. Application Series Codes Code: Filing Dates: 02 Filed prior to January 1, 1948 03 January 1, 1948 through December 31, 1959 04 January 1, 1960 through December 31, 1969 05 January 1, 1970 through December 31, 1978 06 January 1, 1979 through December 31, 1986 07 January 1, 1987 through January 21, 1993 08 January 22, 1993 through January 20, 1998 09 January 21, 1998 through October 23, 2001 10 October 24, 2001 through November 30, 2004 11 December 1, 2004 through December 5, 2007 12 December 6, 2007 through Current Design Patents Code: Filing Dates: 07 Filed prior to October 1, 1992 29 Filed after October 1, 1992 Note: The Design Series Coded “29” is present in the XML data as “29” and is displayed as a “D” on Patent on the Web. Table 4 - U.S. Patent Classifications Class , A 3-position alphanumeric field right justified with leading spaces. , Design Patents ? The first position will contain a “D”. ? Positions 2 and 3, right justified, with a leading space when required for a single digit class. , Plant Patents ? Positions 1-3 will contain a “PLT” , All Other Patents ? Three alphanumeric positions, right justified, with leading spaces Sub-Class , Three alphanumeric positions, right justified with leading spaces, and, if present, one to three positions to the right of the decimal point (assumed decimal in the Red Book XML), left justified. , A digest entry as a sub-class would appear as follows: ? Three positions containing “DIG”, followed by one to three alphanumeric positions, left justified. Appendix 3 Dataset Data dictionaries The primary dataset which contains the disambiguation algorithm is the consolidated inventor dataset. Other supporting datasets contribute either to creating the consolidated inventor dataset or enhance the algorithm. Due to the portability of the file type, we now employ Sqlite3. Table 1: Primary Dataset – Consolidated Inventor Dataset This dataset is a consolidation of the original inventor dataset and supporting datasets such as assignee, patent, and classes. The inventor disambiguation algorithm processes the data within this dataset and is the most relevant with regards to this paper. Generated variables are italicized. Variable Type Description Patent Text 8 character alphanumeric identification assigned by the USTPO Assignee Text Name of the owner of the patent AsgNum Number Unique assignee value per the NBER PDP project, PDPASS AppDate Text Week of patent application AppYear Number Year of patent application GYear Number Year of patent grant City Text Primary City of inventor (Standardized) State Text Primary State of inventor (US Only, Standardized) Country Text Primary Country of inventor (Standardized) Zipcode Text Zipcode, relevant only for US Locations. (Standardized) Lng Number Longitude (degrees) of the centroid Lat Number Latitude (degrees) of the centroid Invnum Text Unique Inventor Identifier for specific record Invnum_N Text Unique Inventor Identifier after match algorithm (pre splitting) Invnum_N_UC Text Unique Inventor Identifier after match algorithm (post splitting) Firstname Text Inventor Firstname (includes potentially Middle name and Other name) Lastname Text Inventor Lastname (includes potentially Suffix and Profession) InvSeq Number Patent Author sequence. (1 = lead author) Class Text Primary patent classification (Up to 10, separated by "|"). Secondary classification not included. LB_Density Text ALEX UB_Density Text ALEX UB_Sparsity Text ALEX Table 2: Supporting Dataset – Assignee Dataset The assignee dataset has undergone minor updates, mostly to adjust string characters to conform to basic standards. These standards include removing excess whitespace, removal of tags, and translation of Unicode characters. Variable Type Description Patent Text 8 character alphanumeric identification assigned by the USTPO Assignee Text Name of the owner of the patent AsgType Number Numerical (1-16) USPTO classification type for owner (values are currently unexplained by USPTO) AsgSeq Number Patent Assignee sequence (1 = primary assignee) City Text Primary City of assignee State Text Primary State code of assignee (US only) Country Text Primary Country code of assignee NCity Text Primary City of assignee (Standardized) NState Text Primary State code of assignee (US only, Standardized) NCountry Text Primary Country code of assignee (Standardized) NLat Number Longitude (degrees) of the centroid NLong Number Latitude (degrees) of the centroid Nationality Text Assignee Nationality (Mostly Blank) Residence Text Assignee Nationality (Mostly Blank) AsgNum Number Unique assignee value per the NBER PDP project, PDPASS Table 3: Supporting Dataset – Citations Dataset The citation dataset depends primarily on the Patent, Pat_Type, Citation, and Cit_Type. Other variables such as the Cit_Date, Cit_Name, and Kind are not being applied and are redundant with fields found within the patent dataset. The fields have been preserved to remain consistent with USPTO’s structure. The network created from patents and citations is utilized to increase the number of match pairs determined. No additional variables have been generated. Variable Type Description Patent Text 8 character alphanumeric identification assigned by the USTPO Citation Text Patent that is cited by the defined patent. Cit_Date Text Patent grant date cited by defined patent Cit_Name Text Patent primary inventor surname cited by defined patent Cit_Kind Text Patent kind codes (defined in Appendix 2) cited by defined patent Cit_Name Text Lastname of primary inventor of the cited patent Category Text Cited source of patent (Cited by examiner, other) Table 4: Supporting Dataset – Classes Dataset The classes dataset is based upon a DVD provided by the USPTO known as CASSIS. This DVD has been purchased because classifications have been known to change throughout time, and this is the most comprehensive source that reflects such changes. The primary class and the up to three subsequent classes are used to support the disambiguation algorithm. No additional variables have been generated. Variable Type Description Patent Text 8 character alphanumeric identification assigned by the USTPO Class Number General patent classification from the USPTO. Subclass Number More detailed classification from the USPTO Count Number The sequence in which a class exists in the dataset Table 5: Supporting Dataset – Patent Dataset The patent dataset contains basic information such as when the patent was applied for and when the patent was granted. This information allows us to put a timeline on the patents and so application timeline information is incorporated to assist in the sorting algorithms within the adjacency matching algorithm. No additional variables have been generated. Variable Type Description Patent Text 8 character alphanumeric identification assigned by the USTPO AppDate Date Week of patent application AppYear Number Year of patent application GDate Date Week of patent grant status GYear Number Year of patent grant status Kind String Patent kind codes (defined in Appendix 2) cited by defined patent Class String General patent classification from the USPTO. Appendix 4 Disambiguation Algorithm Specification Details Comparison Functions String Comparison Since there were spelling errors and inconsistencies in the primary data, we employed an approximate matching technique to lessen the effect of typographical differences when comparing character strings. Within the SAS platform, we implemented the Jaro-Winkler 16method(Herzog, Scheuren, Winkler) for string comparisons. This approximate matching method compares individual string characters through a statistical approach resulting in a calculated probability that two distinct strings are equivalent (for a detailed description of the Jaro-Winkler method, see Chapter 13 of Herzog, Scheuren, and Winkler, 2007). Presented in the table below are a series of example string pairs and calculated probabilities. Table 3 A few disadvantages exist in the Jaro-Winkler method. The method is fairly generous in assessing match probabilities. As an illustration, two strings which contain clearly conflicting information such as “SPEECHWORKS INTERNATIONAL INC” and “TELLME NETWORKS INC” earn a 74.7% probability match score based on the underlying characters. 16 A string is defined as a variable type which contains a sequence of alphanumeric/symbolic characters. A character is defined as a single alphanumeric/symbolic within a string construct. </div> </div>  <div class="file-content-bottom mt20"> 本文档为【Disambiguation and co-authorship networks of the U】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 <br> 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。<br> [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。<br> 本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。<br> 网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。 </div>  <div class="download-area mt20"> <div class="down-info">  下载需要：免费已有0 人下载 </div> <div class="download-groupbtns"> <a href="javascript:;">立即下载</a> </div> </div>  <div class="youlike-area mt20"> <h3>你可能还喜欢</h3> <ul class="youlike-item"> <li> <a href="https://ishare.iask.com/f/ec4hRoobOp.html">介绍饺子的英语作文3篇(1)</a> </li> <li> <a href="https://ishare.iask.com/f/iMmErVUneS.html">教育法学01任务D</a> </li> <li> <a href="https://ishare.iask.com/f/iNckHCv0J7.html">【doc】曾静、吕留良文字狱案与崇德吕氏文学、文化家族的衰变</a> </li> <li> <a href="https://ishare.iask.com/f/tahDhvUgv95.html">初中美术新课标</a> </li> <li> <a href="https://ishare.iask.com/f/bsyLM4uehrz.html">YZN-50真空浓缩机参数及配置doc</a> </li> <li> <a href="https://ishare.iask.com/f/buD2U17bU05.html">体育文献综述 (1)分析</a> </li> <li> <a href="https://ishare.iask.com/f/6taeWyMJfPC.html">高三学生心理调查问卷(1)</a> </li> <li> <a href="https://ishare.iask.com/f/65866162.html">Pronouns of power and solidarity</a> </li> <li> <a href="https://ishare.iask.com/f/7GQepMUmPeJ.html">谈建筑工程预算管理中的全面预算管理</a> </li> <li> <a href="https://ishare.iask.com/f/Rc9TkOyWg4G.html">排水工程施工方案培训资料00001)</a> </li> <li> <a href="https://ishare.iask.com/f/aWt62leNfX.html">教学设计评分标准(单项满分100分)【模板】</a> </li> <li> <a href="https://ishare.iask.com/f/5wCBabLlRZV.html">钢丝绳受力计算公式</a> </li> <li> <a href="https://ishare.iask.com/f/3154Trcp6gL.html">砍伐公路行道树许可申请书</a> </li> <li> <a href="https://ishare.iask.com/f/31wb8jyLu27.html">移动公司三级经理竞聘笔试题（网络技术类）</a> </li> <li> <a href="https://ishare.iask.com/f/36048021.html">行测练习后专业统计做题正确率表格</a> </li> <li> <a href="https://ishare.iask.com/f/izWrbDB4GV.html">【国家标准】JJG768-2005发射光谱仪</a> </li> <li> <a href="https://m.ishare.iask.com/f/2W7PCzuOxsV.html">中学物理社团活动记录</a> </li> <li> <a href="https://m.ishare.iask.com/f/2W7PCzuOxsV.html">中学物理社团活动记录</a> </li> <li> <a href="https://m.ishare.iask.com/f/2W7PCzuOxsV.html">中学物理社团活动记录</a> </li> <li> <a href="https://m.ishare.iask.com/f/2W7PCzuOxsV.html">中学物理社团活动记录</a> </li> </ul> </div>  <div class="recmond-area mt20"> <div class="recmend-tab"> <div class="tab-item" val="new">最新资料</div> <div class="tab-item current" val="hot">资料动态</div> <div class="tab-item" val="topic">专题动态</div> </div> <div class="recmend-item"> <div class="switch_content_wrap"> <ul> <li>  <a href="https://ishare.iask.com/f/iJTfGUomqu.html" target="_blank">清宫长春胶囊与气血和胶囊功能对比</a> </li> <li>  <a href="https://ishare.iask.com/f/iJvjIv4tOH.html" target="_blank">构件四种基本变形-受力特点</a> </li> <li>  <a href="https://ishare.iask.com/f/iX0hIS3KaW.html" target="_blank">关于System clock setback detected问题的解决</a> </li> <li>  <a href="https://ishare.iask.com/f/iwewQckL0n.html" target="_blank">CA-62K-N说明书琦胜计数器</a> </li> <li>  <a href="https://ishare.iask.com/f/jc3SQ3SO2s.html" target="_blank">.【最新】实验中学校长自嗨实验中学校长自嗨迅雷快传 720p mp4</a> </li> <li>  <a href="https://ishare.iask.com/f/tFxtxUibhdk.html" target="_blank">前方遭遇塌方阅读附答案</a> </li> <li>  <a href="https://ishare.iask.com/f/tHAKS2WQHm5.html" target="_blank">便利店每日消毒记录表</a> </li> <li>  <a href="https://ishare.iask.com/f/12Sd0gpoP7JT.html" target="_blank">车间安全生产教育培训</a> </li> <li>  <a href="https://ishare.iask.com/f/1fYKULvOWmj.html" target="_blank">清华大学跨文化课程教案(1)</a> </li> <li>  <a href="https://ishare.iask.com/f/31tsHJeX3Bw.html" target="_blank">电镀中间体探讨</a> </li> <li>  <a href="https://ishare.iask.com/f/32GLC0pgb7U.html" target="_blank">EXCEL操作试题(打印)</a> </li> <li>  <a href="https://ishare.iask.com/f/3j3mB4NBId.html" target="_blank">企业清算所得税申报表(含公式)</a> </li> <li>  <a href="https://ishare.iask.com/f/4PHtrLg5hea.html" target="_blank">顶棚装饰工程施工工艺</a> </li> <li>  <a href="https://ishare.iask.com/f/5d6Vt0qS9CW.html" target="_blank">新生儿听力筛查技术规范</a> </li> <li>  <a href="https://ishare.iask.com/f/iJTfGUomqu.html" target="_blank">清宫长春胶囊与气血和胶囊功能对比</a> </li> <li>  <a href="https://ishare.iask.com/f/iJvjIv4tOH.html" target="_blank">构件四种基本变形-受力特点</a> </li> <li>  <a href="https://ishare.iask.com/f/iX0hIS3KaW.html" target="_blank">关于System clock setback detected问题的解决</a> </li> <li>  <a href="https://ishare.iask.com/f/iwewQckL0n.html" target="_blank">CA-62K-N说明书琦胜计数器</a> </li> <li>  <a href="https://ishare.iask.com/f/jc3SQ3SO2s.html" target="_blank">.【最新】实验中学校长自嗨实验中学校长自嗨迅雷快传 720p mp4</a> </li> <li>  <a href="https://ishare.iask.com/f/tFxtxUibhdk.html" target="_blank">前方遭遇塌方阅读附答案</a> </li> <li>  <a href="https://ishare.iask.com/f/tHAKS2WQHm5.html" target="_blank">便利店每日消毒记录表</a> </li> <li>  <a href="https://ishare.iask.com/f/12Sd0gpoP7JT.html" target="_blank">车间安全生产教育培训</a> </li> <li>  <a href="https://ishare.iask.com/f/1fYKULvOWmj.html" target="_blank">清华大学跨文化课程教案(1)</a> </li> <li>  <a href="https://ishare.iask.com/f/31tsHJeX3Bw.html" target="_blank">电镀中间体探讨</a> </li> <li>  <a href="https://ishare.iask.com/f/32GLC0pgb7U.html" target="_blank">EXCEL操作试题(打印)</a> </li> <li>  <a href="https://ishare.iask.com/f/3j3mB4NBId.html" target="_blank">企业清算所得税申报表(含公式)</a> </li> <li>  <a href="https://ishare.iask.com/f/4PHtrLg5hea.html" target="_blank">顶棚装饰工程施工工艺</a> </li> <li>  <a href="https://ishare.iask.com/f/5d6Vt0qS9CW.html" target="_blank">新生儿听力筛查技术规范</a> </li> </ul> </div> <div class="switch_content_wrap current"> <ul> <li>  <a href="https://m.ishare.iask.com/f/6xrxHd2o7o7.html" target="_blank">管道焊接施工方案</a> </li> <li>  <a href="https://m.ishare.iask.com/f/GcCg1snlAA2.html" target="_blank">悬挑式卸料平台专项施工方案培训资料</a> </li> <li>  <a href="https://m.ishare.iask.com/f/aMksntblcq.html" target="_blank">对于端午节诗句——《午日观竞渡》</a> </li> <li>  <a href="https://m.ishare.iask.com/f/bmgbkTHMHAit.html" target="_blank">泪水后的阳光_六年级想象作文100字</a> </li> <li>  <a href="https://m.ishare.iask.com/f/bn1tgVDaxrZr.html" target="_blank">书籍吸引了我_五年级记叙文作文700字</a> </li> <li>  <a href="https://m.ishare.iask.com/f/bwTIvkz7CuD.html" target="_blank">资助政策宣传方案</a> </li> <li>  <a href="https://m.ishare.iask.com/f/iO728rsjUp.html" target="_blank">入户调查表怎么填写</a> </li> <li>  <a href="https://m.ishare.iask.com/f/iPgAdIBQ1s.html" target="_blank">影像资料整理归档要求通知</a> </li> <li>  <a href="https://m.ishare.iask.com/f/ivnsZ0p9Sb.html" target="_blank">《法律之内的正义》读后感</a> </li> <li>  <a href="https://m.ishare.iask.com/f/tAXz9DerI0X.html" target="_blank">三年级下册语文教案慢性子裁缝和急性子顾客部编版</a> </li> <li>  <a href="https://m.ishare.iask.com/f/teRUoxIOLw3.html" target="_blank">有关于偶像崇拜的访谈提纲</a> </li> <li>  <a href="https://m.ishare.iask.com/f/egjDNWFPgN.html" target="_blank">my-country-and-my-people</a> </li> <li>  <a href="https://m.ishare.iask.com/f/iLeZ9IRREG.html" target="_blank">《望江南》笛谱、《云水禅心》古筝曲谱【精选】</a> </li> <li>  <a href="https://m.ishare.iask.com/f/j3oEuvPtdm.html" target="_blank">无法创建表示服务不存在吧,比如你是家庭版而开启组策略(gpedit)就无法创建.txt</a> </li> <li>  <a href="https://m.ishare.iask.com/f/jc8Nbj5FpM.html" target="_blank">鞋业成本核算</a> </li> <li>  <a href="https://m.ishare.iask.com/f/t5fuY3EEPf6.html" target="_blank">强制执行公积金申请书</a> </li> <li>  <a href="https://m.ishare.iask.com/f/tquCux29AmQ.html" target="_blank">2023年未成年保护工作总结7篇</a> </li> <li>  <a href="https://ishare.iask.com/f/31VozSpQ2Y5.html" target="_blank">水电五局注册安全工程师统计表</a> </li> <li>  <a href="https://ishare.iask.com/f/33KLLk0rtsk.html" target="_blank">【word】妈妈的十个“谎言”</a> </li> <li>  <a href="https://ishare.iask.com/f/4ZPlCVSXMgx.html" target="_blank">品牌营销推广宣传策划PPT培训课件（带内容）</a> </li> <li>  <a href="https://ishare.iask.com/f/12JwacRD38PV.html" target="_blank">沪教牛津版小学一至六年级英语单词汇总</a> </li> </ul> </div> <div class="switch_content_wrap"> <ul> <li>  <a href="https://m.ishare.iask.com/f/jabkZ0ELJ9.html" target="_blank">首先恭喜任主喜得二位公子</a> </li> <li>  <a href="https://m.ishare.iask.com/f/jat35AGMk9.html" target="_blank">附件下载 - 中国煤炭建设协会勘察设计委员会</a> </li> <li>  <a href="https://m.ishare.iask.com/f/wPqqy8nSad.html" target="_blank">JSDB-30YA型双速多用绞车选用计算</a> </li> <li>  <a href="https://m.ishare.iask.com/f/c4PrHBDzEV.html" target="_blank">资金流选股公式</a> </li> <li>  <a href="https://m.ishare.iask.com/f/cXlO3esgDE.html" target="_blank">急性阑尾炎病历模板书写规范范文</a> </li> <li>  <a href="https://m.ishare.iask.com/f/iIGg7L5Pfi.html" target="_blank">架空线型号规格查询</a> </li> <li>  <a href="https://m.ishare.iask.com/f/iNJreuux7x.html" target="_blank">老子辩石：做个有用之才</a> </li> <li>  <a href="https://m.ishare.iask.com/f/iNm7EExKKY.html" target="_blank">赞美人得奖的成语有哪些(最新版）</a> </li> <li>  <a href="https://m.ishare.iask.com/f/iU9K57ubeK.html" target="_blank">泽普五中入团积极分子团课考试成绩</a> </li> <li>  <a href="https://m.ishare.iask.com/f/j0CtM7kiCY.html" target="_blank">大学现代应用文写作课后习题答案</a> </li> <li>  <a href="https://m.ishare.iask.com/f/x7l057JhyV.html" target="_blank">考点26 空间向量求空间角（讲解）（解析版）</a> </li> <li>  <a href="https://m.ishare.iask.com/f/iS5kIgAcZU.html" target="_blank">各国的政治制度是什么</a> </li> <li>  <a href="https://m.ishare.iask.com/f/it3W5liEaD.html" target="_blank">四年级作文竞赛分析</a> </li> <li>  <a href="https://m.ishare.iask.com/f/izp6sdf3g7.html" target="_blank">与狼共舞英文版观影报告</a> </li> <li>  <a href="https://m.ishare.iask.com/f/jabkZ0ELJ9.html" target="_blank">首先恭喜任主喜得二位公子</a> </li> <li>  <a href="https://m.ishare.iask.com/f/jat35AGMk9.html" target="_blank">附件下载 - 中国煤炭建设协会勘察设计委员会</a> </li> <li>  <a href="https://m.ishare.iask.com/f/wPqqy8nSad.html" target="_blank">JSDB-30YA型双速多用绞车选用计算</a> </li> <li>  <a href="https://m.ishare.iask.com/f/c4PrHBDzEV.html" target="_blank">资金流选股公式</a> </li> <li>  <a href="https://m.ishare.iask.com/f/cXlO3esgDE.html" target="_blank">急性阑尾炎病历模板书写规范范文</a> </li> <li>  <a href="https://m.ishare.iask.com/f/iIGg7L5Pfi.html" target="_blank">架空线型号规格查询</a> </li> <li>  <a href="https://m.ishare.iask.com/f/iNJreuux7x.html" target="_blank">老子辩石：做个有用之才</a> </li> <li>  <a href="https://m.ishare.iask.com/f/iNm7EExKKY.html" target="_blank">赞美人得奖的成语有哪些(最新版）</a> </li> <li>  <a href="https://m.ishare.iask.com/f/iU9K57ubeK.html" target="_blank">泽普五中入团积极分子团课考试成绩</a> </li> <li>  <a href="https://m.ishare.iask.com/f/j0CtM7kiCY.html" target="_blank">大学现代应用文写作课后习题答案</a> </li> <li>  <a href="https://m.ishare.iask.com/f/x7l057JhyV.html" target="_blank">考点26 空间向量求空间角（讲解）（解析版）</a> </li> <li>  <a href="https://m.ishare.iask.com/f/iS5kIgAcZU.html" target="_blank">各国的政治制度是什么</a> </li> <li>  <a href="https://m.ishare.iask.com/f/it3W5liEaD.html" target="_blank">四年级作文竞赛分析</a> </li> <li>  <a href="https://m.ishare.iask.com/f/izp6sdf3g7.html" target="_blank">与狼共舞英文版观影报告</a> </li> </ul> </div> </div> </div> </div> <div class="rigthAside fl">  <div class="editer-con"> <div class="avatar-frame"> <img src="https://pic.iask.com.cn/1mwmgl6hn3.png" alt=""> </div> <div class="editer-info"> <div class="nickname">is_105949</div> <div class="editer-brief"> 暂无简介~ </div> </div> </div>  <div class="paper-info"> <div class="info-item">格式：doc</div> <div class="info-item">大小：358KB</div> <div class="info-item">软件：Word</div> <div class="info-item">页数：68</div> <div class="info-item">分类：企业经营</div> <div class="info-item">上传时间：2018-04-15</div> <div class="info-item">浏览量：12</div> </div>  <div class="related-file"> <div class="column-name">相关资料</div> <ul> <li class="file-item"> <a href="https://ishare.iask.com/f/1yK5fuv0UMB.html">深圳大学专业介绍Word版</a> </li> <li class="file-item"> <a href="https://ishare.iask.com/f/232KLXNGP4t.html">GB12241-05安全阀一般要求</a> </li> <li class="file-item"> <a href="https://ishare.iask.com/f/33297hZJwWv.html">苦行僧发明美食</a> </li> <li class="file-item"> <a href="https://ishare.iask.com/f/34GQGVCKoU2.html">莱芜钢铁集团有限公司简介</a> </li> <li class="file-item"> <a href="https://ishare.iask.com/f/34btflcn8D9.html">冠珠陶瓷工程投标十三大优势</a> </li> <li class="file-item"> <a href="https://ishare.iask.com/f/34n9WgB5olo.html">生字组词Microsoft Office Word 文档 (4)</a> </li> <li class="file-item"> <a href="https://ishare.iask.com/f/4eJh1GPpwXc.html">京东面试自我介绍（汇编3篇）</a> </li> <li class="file-item"> <a href="https://ishare.iask.com/f/5j0CCcJYuAP.html">西安市应急预案管理办法</a> </li> <li class="file-item"> <a href="https://ishare.iask.com/f/5jXSXuWwd0q.html">考研考博-考博英语-中国矿业大学考试能力提升点睛卷80（含答案详解）</a> </li> <li class="file-item"> <a href="https://ishare.iask.com/f/1TQGj8BF6xh.html">buck开关电源电流波形续流二极管测试</a> </li> <li class="file-item"> <a href="https://ishare.iask.com/f/33gBRwO1Qve.html">APP J11.44-2009钛合金抗剪型平头高锁螺栓</a> </li> <li class="file-item"> <a href="https://ishare.iask.com/f/340kLr0zXCC.html">重庆市长寿区名校2023届八年级物理第二学期期中联考试题含解析</a> </li> <li class="file-item"> <a href="https://ishare.iask.com/f/uQKdWRPWvMk.html">上海通用汽车市场的细分及目标市场选择</a> </li> <li class="file-item"> <a href="https://ishare.iask.com/f/DgMWjbByd.html">术后镇痛治疗规范</a> </li> <li class="file-item"> <a href="https://ishare.iask.com/f/QO7n596GKyz.html">制冷量计算公式</a> </li> <li class="file-item"> <a href="https://ishare.iask.com/f/5mn6a0bQIwN.html">学习心得体会范文学习心得体会格式学习心得体会模板</a> </li> <li class="file-item"> <a href="https://m.ishare.iask.com/node/s/154cw0bt0urqu.html">服务进度保证措施</a> </li> <li class="file-item"> <a href="https://m.ishare.iask.com/f/12QJdA61Vcs5.html">GBT15773-2008水土保持综合治理验收要求规范</a> </li> <li class="file-item"> <a href="https://ishare.iask.com/f/LTnhmspMmV.html">安全生产应知应会</a> </li> <li class="file-item"> <a href="https://ishare.iask.com/f/21CRWnbBFYT.html">译林版六年级英语上册四会单词和四会句型汇总</a> </li> </ul> </div>  <div class="hot-search"> <div class="search-top"> <span class="column-name">热点搜索</span>  </div> <div class="search-list"> <a href="https://m.ishare.iask.com/f/4geJYdbe0Lw.html">教育硕士专业学位英语教材答案</a>  <a href="https://m.ishare.iask.com/f/50pZ1GijTH.html">风险管理手册总表</a>  <a href="https://m.ishare.iask.com/f/5i0MBEBPg7T.html">铜管弯曲半径表修订稿</a>  <a href="https://m.ishare.iask.com/f/31ryhZ4XMFB.html">格言别录-弘一法师编订</a>  <a href="https://m.ishare.iask.com/f/34NfDJ22RXs.html">盘县风景作文300字</a>  <a href="https://m.ishare.iask.com/f/363UWSzgDBO.html">流动人口协议书</a>  <a href="https://m.ishare.iask.com/f/5ltGmfPAxdj.html">绿色建材课件2</a>  <a href="https://m.ishare.iask.com/f/24383648.html">NEMA SM23_1991_website1</a>  <a href="https://m.ishare.iask.com/f/31RSNnW03jb.html">[精华]造梦西游2修改大全</a>  <a href="https://m.ishare.iask.com/f/32BRLkSt0Nl.html">教与学的动人乐章《香菱学诗》解读渐江路桥中学王学华读《香菱学诗</a>  <a href="https://m.ishare.iask.com/f/34YhpWqdq8g.html">幼儿园转学证明</a>  <a href="https://m.ishare.iask.com/f/357Dh1qDzFc.html">全楼地震力放大系数与地震影响系数</a>  <a href="https://m.ishare.iask.com/f/35AWBkO1um3.html">投标产品技术性能、技术指标说明</a>  <a href="https://m.ishare.iask.com/f/35Ycht4fG4w.html">胸腺肽α1（日达仙）在肝移植中的使用</a>  <a href="https://m.ishare.iask.com/f/4geJYdbe0Lw.html">教育硕士专业学位英语教材答案</a>  <a href="https://m.ishare.iask.com/f/50pZ1GijTH.html">风险管理手册总表</a>  <a href="https://m.ishare.iask.com/f/5i0MBEBPg7T.html">铜管弯曲半径表修订稿</a>  <a href="https://m.ishare.iask.com/f/31ryhZ4XMFB.html">格言别录-弘一法师编订</a>  <a href="https://m.ishare.iask.com/f/34NfDJ22RXs.html">盘县风景作文300字</a>  <a href="https://m.ishare.iask.com/f/363UWSzgDBO.html">流动人口协议书</a>  <a href="https://m.ishare.iask.com/f/5ltGmfPAxdj.html">绿色建材课件2</a>  <a href="https://m.ishare.iask.com/f/24383648.html">NEMA SM23_1991_website1</a>  <a href="https://m.ishare.iask.com/f/31RSNnW03jb.html">[精华]造梦西游2修改大全</a>  <a href="https://m.ishare.iask.com/f/32BRLkSt0Nl.html">教与学的动人乐章《香菱学诗》解读渐江路桥中学王学华读《香菱学诗</a>  <a href="https://m.ishare.iask.com/f/34YhpWqdq8g.html">幼儿园转学证明</a>  <a href="https://m.ishare.iask.com/f/357Dh1qDzFc.html">全楼地震力放大系数与地震影响系数</a>  <a href="https://m.ishare.iask.com/f/35AWBkO1um3.html">投标产品技术性能、技术指标说明</a>  <a href="https://m.ishare.iask.com/f/35Ycht4fG4w.html">胸腺肽α1（日达仙）在肝移植中的使用</a>  </div> </div> </div> </div>  <div class="website-footer"> <div class="footer-content"> <div class="footer-link"> <div class="earth-con"> <div class="file-groups earth-con-item"> <span>资料大全</span> <a href="/index/f-a.html " target="_blank">A</a> <a href="/index/f-b.html " target="_blank">B</a> <a href="/index/f-c.html " target="_blank">C</a> <a href="/index/f-d.html " target="_blank">D</a> <a href="/index/f-e.html " target="_blank">E</a> <a href="/index/f-f.html " target="_blank">F</a> <a href="/index/f-g.html " target="_blank">G</a> <a href="/index/f-h.html " target="_blank">H</a> <a href="/index/f-i.html " target="_blank">I</a> <a href="/index/f-j.html " target="_blank">J</a> <a href="/index/f-k.html " target="_blank">K</a> <a href="/index/f-l.html " target="_blank">L</a> <a href="/index/f-m.html " target="_blank">M</a> <a href="/index/f-n.html " target="_blank">N</a> <a href="/index/f-o.html " target="_blank">O</a> <a href="/index/f-p.html " target="_blank">P</a> <a href="/index/f-q.html " target="_blank">Q</a> <a href="/index/f-r.html " target="_blank">R</a> <a href="/index/f-s.html " target="_blank">S</a> <a href="/index/f-t.html " target="_blank">T</a> <a href="/index/f-u.html " target="_blank">U</a> <a href="/index/f-v.html " target="_blank">V</a> <a href="/index/f-w.html " target="_blank">W</a> <a href="/index/f-x.html " target="_blank">X</a> <a href="/index/f-y.html " target="_blank">Y</a> <a href="/index/f-z.html " target="_blank">Z</a> <a href="/index/f-09.html " target="_blank">0-9</a> </div> <div class="topic-groups earth-con-item"> <span>专题大全</span> <a href="/index/t-a.html " target="_blank">A</a> <a href="/index/t-b.html " target="_blank">B</a> <a href="/index/t-c.html " target="_blank">C</a> <a href="/index/t-d.html " target="_blank">D</a> <a href="/index/t-e.html " target="_blank">E</a> <a href="/index/t-f.html " target="_blank">F</a> <a href="/index/t-g.html " target="_blank">G</a> <a href="/index/t-h.html " target="_blank">H</a> <a href="/index/t-i.html " target="_blank">I</a> <a href="/index/t-j.html " target="_blank">J</a> <a href="/index/t-k.html " target="_blank">K</a> <a href="/index/t-l.html " target="_blank">L</a> <a href="/index/t-m.html " target="_blank">M</a> <a href="/index/t-n.html " target="_blank">N</a> <a href="/index/t-o.html " target="_blank">O</a> <a href="/index/t-p.html " target="_blank">P</a> <a href="/index/t-q.html " target="_blank">Q</a> <a href="/index/t-r.html " target="_blank">R</a> <a href="/index/t-s.html " target="_blank">S</a> <a href="/index/t-t.html " target="_blank">T</a> <a href="/index/t-u.html " target="_blank">U</a> <a href="/index/t-v.html " target="_blank">V</a> <a href="/index/t-w.html " target="_blank">W</a> <a href="/index/t-x.html " target="_blank">X</a> <a href="/index/t-y.html " target="_blank">Y</a> <a href="/index/t-z.html " target="_blank">Z</a> <a href="/index/t-09.html " target="_blank">0-9</a> </div> </div> <div class="copy-link"> <a class="website-home-link" href="/" target="_blank"> <img class="website-icon" src="//static3.iask.cn/v202404111630/images/footer_logo.png"> </a> <p class="footer-nav"> <a href="http://help.iask.com/helpCenter/5e15a72a474e3171f58ae2a6.html" rel="nofollow" target="_blank" class="footer-nav-link jsReplaceNavLink">网站声明 <span>|<span></a> <a href="http://help.iask.com/helpCenter/5d11e55e0cf2d66b81a5513f.html" rel="nofollow" target="_blank" class="footer-nav-link jsReplaceNavLink">侵权处理 <span>|<span></a> <a href="/node/feedback/feedback.html" rel="nofollow" target="_blank" class="footer-nav-link">投诉反馈 <span>|<span></a> <a href="http://help.iask.com/helpCenter/ishare.html" rel="nofollow" target="_blank" class="footer-nav-link">帮助中心 <span>|<span></a> <a href="/index/f-a.html" target="_blank" class="footer-nav-link">网站地图 <span>|<span></a> <a href="https://office.iask.com/" target="_blank" class="footer-nav-link">爱问办公</a> </p> <a class="copy-txt" href="https://beian.miit.gov.cn/#/Integrated/index" target="_blank"><span class="beian">京ICP证000007-6</span> 爱问文库-Copyright © 2024 版权所有</a> <p class="web-copyright jsWebCopyright"> <a target="_blank" href="https://beian.mps.gov.cn/#/query/webSearch?code=33021202002483" rel="noreferrer"><img class="copyright-mark" src="//static3.iask.cn/v202404111630/images/common/ic_mark.png" alt="">浙公网安备 33021202002483</a> </p> </div> </div> <div class="footer-kefu"> <div class="footer-border"></div> <p>客服热线：0755-26904047</p> <p>工作日：9:00-18:00</p> <span class="btn-contact jsContactMeiqia" data-pageid="footer">在线客服</span> </div> <ul class="footer-qrcode-items"> <li class="qrcode-item"> <div class="qrcode-item-img"> <img src="//static3.iask.cn/v202404111630/images/ishare_gongxiang.jpg"> </div> <p class="qrcode-item-desc">关注爱问文库服务号</p> </li> </ul> </div> </div> <script> window.pageConfig = { page: {} }; seajs.use(["dist/spider/init.js"]); </script> <script type="text/javascript"> document.write(unescape( "%3Cspan id='cnzz_stat_icon_1279079195'%3E%3C/span%3E%3Cscript src='https://v1.cnzz.com/stat.php%3Fid%3D1279079195' type=" + "'text/javascript'%3E%3C/script%3E" )); </script> </body> </html>