Boundary Objects in Archaeological Finds and Environmental Research: Using Bowker and Star’s Concept of the Boundary Object to Analyse and Interpret Disparate Archaeological Legacy Data

Recently the concept of big data has been employed in archaeology to push forward research on very large-scale legacy datasets, often with a developer-funded component; however, relatively little of this effort has focused on artefact and ecofact data. This paper explores the possibility of using Bowker and Star’s concept of the boundary object to manage the issues caused by data scale, complexity, diversity, and variable information standards when attempting to carry out large-scale research on artefacts and ecofacts. The critique of archaeological data standards as it impacts research into artefacts and ecofacts is reviewed. A methodology for the construction and use of a large database of legacy data is described, and a case study on the regional histories of food production/consumption in southern Britain, using datasets derived from the author’s PhD as part of the English Landscapes and Identities (EngLaId) Project (Gosden et al. 2021) at Oxford University is presented.


Introduction
Issues of collection and analysis of very large and complex datasets, often glossed as big data, have become increasingly important in recent years. The exploitation of big data, which has been defined as being 'less about data that is big than it is about a capacity to search aggregate and cross reference large datasets' (Boyd and Crawford 2012: 663) is argued to provide the impetus for a new paradigm, or at least a revolutionary new approach to scholarship (Guldi and Armitage 2014: 111-116).
This movement has overlapped with a vast increase in the amounts of available archaeological data generated by the introduction of developer funding for fieldwork in the United Kingdom (Donnelly 2016: 107, Fig. 13;Gosden et al. 2021: 1) that has made possible a number of big data projects in British archaeology (Darvill and Wainwright 1995;Bradley 2007;Rippon et al. 2015;Smith et al. 2016;Smith et al. 2018;Gosden et al. 2021). However, within the ongoing debate generated by these changes, relatively little consideration has been given, thus far, in comparison to analyses of landscape and settlement, to the importance of large digital artefact (an object modified or manufactured according to a set of humanly imposed attributes, Darvill 2002: 25), and ecofact (a natural material that has been used by humans , Darvill 2002: 129) datasets. This pattern is clear when considering the number of volumes largely devoted to landscape and settlement studies produced by the big data projects (e.g., Bradley 2007;Rippon et al. 2015;Bradley et al. 2016;Smith et al. 2016;Gosden et al. 2021). Although there are important exceptions to this: see Hurley (2018), Orton et al. (2014), Holdaway et al. (2019), Reed et al. (2015), Marchetti et al. (2018) Kintigh et al. (2018), , Smith et al. (2018), and papers in Allison et al. (2018), for example.
In considering the nature of developer-funded archaeological artefact and ecofact datasets it is tempting to critique the practices of data collection and curation employed, with the aim of promoting a consensus on standards (e.g., Fulford 2017: 281, 363). However, while greater standardisation is undoubtedly important for the ongoing creation of archaeological data, it cannot address the issue of variable data standards employed in legacy datasets. All data are the product of specific and variable historical and social conditions, including changing levels of funding, differing technologies, and changing theoretical concepts and are therefore contingent and partial (Cooper and Green 2016: 273-274). This being so and given the impossibility of 'retro-engineering' legacy datasets to fit contemporary data standards, it becomes important to address the issue of how to employ them so that the effort expended in creating them is not wasted. It may, therefore, be helpful to think of such datasets as 'characterful', meaning that they have 'histories and flaws of various kinds' and as providing imperfect, but interesting affordances for research (Gattiglia 2015: 114;Cooper and Green 2016: 271).
The question of how to collate such disparate data in ways that are useful for research therefore becomes urgent.
This paper attempts to give one possible answer to that question by applying Bowker and Star's (1999: 297) concept of the boundary object to one such disparate dataset, made up of digital artefact and ecofact legacy data from a range of different sources, gathered as part of the English Landscapes and Identities Project. Boundary objects as defined by Bowker and Star have a crucial role to play in mediating between different 'communities of practice' (Star 2010: 604), such as those represented by the different sub-disciplines of Prehistoric, Roman, and Medieval ceramics, archaeozoology, and archaeobotany. The paper attempts to show how the concept of the boundary object can be used to identify elements of datasets that can be used to enable interoperability, defined as the exchange of information between different datasets. It presents a case study exploring the construction and use of a database created using legacy datasets generated by developer-funded excavations carried out on the route of the High Speed 1 railway (HS1) and along the Thames Valley from Gloucestershire to Essex.
The case study draws on legacy data relating to time periods outside the scope of the study of Roman Britain, namely the Later Prehistoric and Early Medieval periods.
However, the long-term view enabled by this approach is crucial. It demonstrates the continued importance of regional patterns of eating, drinking, and food production established, at the latest by the Iron Age, to the population of Roman Britain and the continuing influence of these regional patterns into the Early Medieval period, and thereby validates the use of the boundary object concept.

The Critique of Digital Data and Digital Standards in Archaeology
Until recently attempts to disseminate large digital artefact and ecofact databases were rare and research based on such datasets was therefore uncommon. The following section presents a selective history of digital artefact and ecofact research and the methodological critiques that this research has given rise to. Huggett (2012: 543) notes that archaeology, in common with the other humanities and social sciences and in contrast to most natural science disciplines is reticent about discussing data, preferring instead to focus on interpretation and narrative, a point highlighted by Faniel et al. (2020: 10-11) in their ethnographic study of data integration practices on four archaeological excavation projects. This observation is qualified, however, by the existence of a specialist subfield of archaeological data management research that exists in a degree of isolation from mainstream archaeological discourse and calls for the development of cyber infrastructure for archaeology, and a focus on the development of common standards for archaeological data (e.g., Kintigh 2006: 567-568;Kintigh et al. 2018: 31).
In contrast to a perceived focus on the gaps in and inadequacies of archaeological data management, Cooper and Green (2016) have developed theoretical and methodological ideas for a more nuanced understanding of archaeological big data, drawing on the extensive literature on standards and data in other disciplines (e.g., Bowker and Star 1999). In doing so, they illustrate their argument through an empirical case study developed in the course of the EngLaId project. This case study highlights a critique that has developed over the course of the last two decades amongst archaeological data specialists, which has foregrounded a perceived lack of digital infrastructure, the need to develop standardized metadata and digital ontologies, and a problematic overlap in datasets. In addition, this critique has included the quality of archaeological digital data: arguing that variability of recording, as well as diversity in format and structure of databases, and attitudes to data accessibility all present barriers to the use of digital data for research, and therefore the interpretative potential of archaeological big data (Kintigh 2006: 570;Roskams and Whyman 2007;Huggett 2012: 542;Evans 2013: 31;Kintigh et al. 2018: 31;Marchetti et al. 2018: 461). While accepting much of the substance of this critique, Cooper and Green (2016: 294-299) also point out the importance of working with data as they are and the opportunities offered by 'flawed' datasets for the understanding of disciplinary histories and archaeological working practices.
Since 2016 the development of tools for the analysis of digital archaeological data have continued, notably with several successful attempts to construct databases employing flexible and open standards, which aggregate archaeological data generated by different researchers (Reed et al. 2015;Kintigh et al. 2018;Marchetti et al. 2018;Holdaway et al. 2019). The published output of these projects includes descriptions of the architecture of databases designed to share data generated by several research teams, including finds data and in the case of papers by Kintigh et al. (2018: 32-39) and Marchetti et al. (2018: 461-465), to be used by researchers outside of the immediate research milieu of the authors. They therefore necessarily engage with the problems of data standards and interoperability of datasets, and all arrive at a similar solution: the construction of a new database to which outside researchers can be invited to contribute their data; in the process either adapting that data or mapping it to a set of predetermined, if flexible, ontologies. In developing their open standards/ontologies and the workflows necessary to map data to them, the authors of all four papers have arguably been engaged in the creation of boundary objects (Star 1989;2010: 602-603;Star and Griesemer 1989: 393;Bowker and Star 1999: 15-16). This concept, first discussed in an archaeological context by Jones (2002: 75), has strong affinities with the concept of a trading zone (Galison 1999: 146-147), which is based on the idea of a common language, or creole, used by different communities of practice.
Bowker and Star define boundary objects as: 'those objects that both inhabit several communities of practice and satisfy the informational requirements of each of them'. They argue that boundary objects are 'customisable' and yet retain 'common identities across settings ' (1999:16). This is achieved by 'allowing the (boundary) objects to be weakly structured in common use' and by 'imposing stronger structures in the individual-site tailored use' (Bowker and Star 1999:16). For Bowker and Star, boundary objects proliferate in human societies. However, they find a particularly rich source of examples in the International Classification of Diseases (ICD), a system for the classification of disease published by the World Health Organization that institutes a global system of alphanumeric codes. Despite its global reach the ICD has been widely adapted to suit local circumstances .
The original context for the creation of the boundary object concept was the field of science and technology studies, where it enabled its creators to understand the ways in which different communities of practice worked together within a given scientific field (Star 2010: 604). This clearly makes it relevant to understanding the creation and use of information schemas in archaeology, which is a complex scientific field with its own communities of practice, and therefore worth unpacking in more detail. Firstly, the sense in which the words object and boundary are used is important. A boundary is conceived of as a shared space, where different communities of practice interact (Star 2010: 602-603); an object has both a computer programming and a material aspect. It is something 'people or... other objects and programs act toward and with' (Star 2010: 603). Star (2010: 602) identifies three dimensions to boundary objects: 'interpretative flexibility, the material/organizational structure of different types of boundary objects and the question of scale/granularity'. Within this conceptual structure interpretative flexibility, whereby a given category may mean different things within different but related research milieus, is related to different kinds of work practices and informatics structures, essentially how particular categories are differently coded for within different milieus. These questions lead in turn to further questions, concerning how researchers tack between tightly structured and loosely structured definitions of boundary objects as they go about their work. The operation of boundary objects in the above ways can, and often does, lead to the construction of what Bowker and Star term boundary infrastructures (Bowker and Star 1999: 286), which are discussed further below.
Within archaeology the operation of the three dimensions of boundary objects can be widely observed. However, one area with direct relevance to the subject of this paper is the operation of ceramic vessel typologies. For example, there are no general ceramic typologies describing both fabric and vessel forms for Roman pottery. Instead, different regional typologies intersect (Webster and Dannell 1984;Perrin 1999;Young 2000) and must be made interoperable to facilitate comparison between and within assemblages.
Ceramic typologies are therefore made up of hierarchical schemas capable of broadbrush identification and with the potential for the addition of almost infinite detail.
By way of example, we may take the National Roman Fabric Reference collection, which divides Roman Samian fabrics into south, central, and eastern Gaulish, further subdividing these into the products of specific groups of kilns, e.g., Les-Martres-de-Veyre and Lezoux (Tomber and Dore 1998). For most fieldworkers, the identification of material as south or central Gaulish is sufficient to establish chronology or identify the potential status of a site. For Samian specialists interested in the development of the industry more specific detail may be required.
Ceramic typologies, therefore, represent a classic example of the operation of boundary objects/infrastructures: their use leads to the creation of specific workflows and informatics structures. The process of tacking back and forth between well-structured and weakly-structured aspects of boundary objects that results from these workflows/infrastructures can be illustrated by the comparison of any published pottery report, with the notes and data records on which the report is based. It should, therefore, be clear that the concept of the boundary object is highly applicable to developing improved ways of integrating large and diverse archaeological datasets.
The recent creation of very large-scale databases in archaeology has had a strong orientation toward the future (Cooper and Green 2016: 298). Concerned with how best to organize newly created data, the incorporation of legacy datasets has been made less of a priority. So, although in many ways the creation of these databases offers a solution to the problems explored in this paper, there is also a gap in the model: namely, it remains very difficult (Kintigh et al. 2018: 39) to incorporate large amounts of legacy data from disparate sources. This problem is particularly acute in the field of Roman pottery studies, where recent attempts at big data synthesis of material from rural Roman Britain have been forced to neglect pottery to a degree, because of the difficulty of gathering comparable quantified data (Allen et al. 2017: 267;Fulford and Holbrook 2018: 14). This gap is the main concern of the present paper and, while not offering anything resembling a complete solution, the following sections go on to explore the problems created by it and to relate it to current research.

Artefacts and ecofacts and the critique of digital standards
Two kinds of archaeological data that have been relatively neglected within the above critiques are those of digital artefact and environmental data. For example, while Cooper and Green (2016: 275) incorporate two specific collections of artefact data into their analysis, the focus of their commentary is on attitudes to data in general. The authors of more recent papers on digital data standards are heavily focused towards the collection and sharing of data in the field (e.g., Marchetti et al. 2018). However, users of artefact and ecofact data face, if anything, greater problems of variation in standards and the application of digital ontologies than fieldworkers. This is demonstrated by the history of attempts to impose centralised standards on data collection for archaeological ceramics in the United Kingdom (PCRG 1997;Irving 2011;Barclay et al. 2016), which, despite intensive effort on the part of ceramicists, do not seem to have resulted in datasets that can be easily combined to facilitate research across multiple assemblages (Allen et al. 2017: 281;Fulford and Holbrook 2018:14).
It is clear that while a critique of digital standards for archaeological datasets has emerged over the course of the last two decades, this critique has neither engaged widely with debates in informatics and science and technology studies (e.g., Bowker and Star 1999), nor been widely engaged by archaeological finds and environmental specialists.
As will be seen below, this has consequences for the use of finds and environmental data in the synthesis of developer-funded data that have emerged under the banner of archaeological big data.

Archaeological Big Data: Artefacts and Ecofacts
In parallel to the critique of data standards and digital ontologies in archaeology, several projects seeking to deploy the immense archive of data amassed by developer-funded archaeologists in north-western Europe since 1990 (Bradley et al. 2016: 38) have been completed since the turn of the twenty-first century. Arguably the first of these was Bradley's (2007) survey of British and Irish prehistory using developer-funded data.
This project was followed in the 2010s by the Fields of Britannia Project (Rippon et al. 2015), the Roman Rural Settlement Project (Smith et al. 2016;Smith et al. 2018), the English Landscapes and Identities Project (Gosden et al. 2021), and Bradley et al.'s study of the later prehistory of north-western Europe (Bradley et al. 2016). These projects were all primarily concerned with using developer-funded data to study past landscapes, and while seeking to deploy archaeological data on a large scale, and in some cases to use artefact and environmental data, they were for the most part developed in isolation from the critique of data standards and cyber infrastructure explored above. Nor did these projects, with a few important exceptions (e.g., Fulford and Holbrook 2018), devote much space to in-depth discussion of methodological issues. However, it is clear from their results that all grappled with the issues described above, in particular, although not exclusively, when dealing with data derived from artefact or biological assemblages.
Despite over a decade of wrestling with issues of data standardisation and the use of developer-funded data for archaeological research, it is felt by many that little progress has been made in developing tools for the exploitation of large-scale digital data. Although, this has recently begun to be addressed for certain categories of data, The remainder of this paper argues that the different communities of practice that study and produce data on ceramics, animal bones, and charred plant remains in contemporary archaeology commonly deploy boundary objects tacitly. Furthermore, if boundary objects can be identified within different digital datasets, then they can be used to make those datasets interoperable. This is not to argue in Bowker and Star's (1999: 313) terms that 'the chimera of a totally unified and universally applicable information system' should be replaced with 'the chimera of a distributed, boundary object driven information system'.
The identification of boundary objects is key to the stitching together of different digital datasets and provides a potential alternative to the common practice of constructing entirely new databases, with bespoke standards, from scratch.

The concept of the 'boundary object' in ceramic, archaeozoological, and archaeobotanical studies
In ceramic studies, boundary objects may be equated with the commonly deployed categories of jar, bowl, cup, etc. These are used to categorize vessels and are common to the different sub-disciplines of Prehistoric, Roman, Medieval, and Post-Medieval ceramics, while being 'tailored' by different researchers with the use of specific type codes, including sub-discipline specific types such as mortaria. In ceramic studies, a good example of a boundary object can be seen in the use of the Dragendorff typology by Romanists. Roman pottery specialists use this typology to classify a type of massproduced Roman tableware known as Samian ware (Webster and Dannell 1984).
Within the typology vessel forms are given numbers, for example: Drag. 36 or Drag. 37 (Webster and Dannell 1984: 46-47). Certain vessel forms are common within Samian assemblages and can be recognised as having influenced the shape of vessels in regional industries, e.g., the Nene Valley industry (Perrin 1999 : Fig. 102). While specialists do not use the Dragendorff typology to formally classify grey wares, they do note the presence of Dragondorff like forms in the comments fields of their databases (Edward Biddulph pers. comm.). Thus, the type Drag. 36, for example, crosses the boundary of its originating community of practice (Roman Samian specialists) into the broader community of practice represented by Roman pottery specialists in general and fulfils all three of Star's (2010: 602) dimensions of a boundary object (see above).
In archaeozoology, boundary objects are arguably more prevalent (See Kintigh et al. 2018: 33). Indeed, the species classification sheep/goat provides a good example of an archaeozoological boundary object in that within the community of specialist archaeozoologists it is commonly deployed because of the difficulties of distinguishing between highly fragmented sheep and goat bones (Martyn Allen pers comm.). However, both among specialists and the wider community of archaeologists engaged in more synthetic work it is often used as shorthand for sheep, which it is widely assumed to represent (e.g., Cool 2006: 87;Maltby 2016: 795-797).
In archaeobotany, boundary objects might be identified at the level of species and sub-species such as emmer wheat, spelt wheat, and barley. However, a more tailored approach may be taken within species and sub-species, e.g., two-rowed and six-rowed hulled barley, which are used by some specialists, but can be assimilated to the broader category of hulled barley, or simply barley when aggregating datasets produced by different specialists. Indeed, several of the big data projects described above, including the Roman Rural Settlement Project and the fields of the Britannia project used these kinds of information schemas to order their data.
In order to explore the use of boundary objects in the context of archaeological artefact and ecofact data in more depth, the following sections describe the collection of data and the construction of a database designed to aid the investigation of histories of food production and consumption during the Later Prehistoric, Roman, and Early Medieval periods in the Thames Valley and on the route of HS1 in Southern England. This is presented primarily as a case study on the use of boundary objects in the analysis of specialist data and so does not go into detail on the nature of food production and consumption.

Case Study: Food Production and Consumption in the Thames Valley and on the Route of HS1
The nature of the case study areas The datasets discussed below were collected from three case study areas in Southern England (Figure 1)

The Thames Valley and HS1 case study areas compared
The three case study areas provide contrasting environments for analysis, in terms of geology, topography, settlement record, and the nature of fieldwork carried out on them.
The Upper Thames Valley is environmentally homogeneous, with a largely low lying and flat topography overlain by deposits of river sands and gravels (Lambrick and Robinson 2009: 17-20

The settlement records of the case study areas
The Upper Thames Valley was dominated throughout the period 1500 BC-AD 1086 by extensive and shifting rural settlements revealed by large-scale gravel quarrying (Booth et al. 2007;Lambrick and Robinson 2009) with relatively little evidence for differences in social status, e.g., the multiperiod settlement at Yarnton in Oxfordshire (Hey 2004;Hey et al. 2011;. In contrast, the Middle Thames Valley appears extensively scope for excavating multi-period settlements with shifting foci was somewhat lower than in the latter two case study areas ).
The environmental and archaeological contrasts between the case studies are clearly relevant to the nature of the find assemblages recovered from them. The nature of the soils may have influenced the choice of crops and animals, while woodland cover may also have affected the choice of agricultural regime. Finally, the range of settlement functions and status will have influenced the amount of functional variation in ceramic assemblages (Evans 2001). Clearly the nature of the excavations and fieldwork methodologies (including sampling strategies) in the different case study areas have also affected our ability to understand the nature of sites, the kinds and quantities of artefact and biological data collected, and the nature of the archives produced from it (Donnelly 2016: 85, 139).

The Database
The data gathered for the project were entered into a single relational (Filemaker) database. The database comprises 16 linked tables, seven containing primary data on sites, contexts, and artefacts and nine containing supporting data, including lists of pottery types, animal and plant species, and translations of the different dating schemes into a single overarching scheme. The primary data tables store the data, while the supporting tables make the primary tables interoperable.
A schematic representation of the structure of the database is shown in Figure 2. The following text concentrates on the function of supporting tables, including (briefly) the lookup instructions and calculations which some contain. Data comprising site data, context data, ceramic data, animal bone data, and charred plant remains were entered into the database. These data are split into seven tables, with charred plant remains divided into two tables (Ctx Plant Remains and Ctx MOLA plant remains). Information on the area that the data comes from is stored in a separate table labelled Regions.
The tables illustrated in the lower two rows of Figure 2 all contribute to enhancing the interoperability of the tables in the top three rows. Concordance tables translate alphanumeric codes encountered in the primary data into standardized terms.
For example, the context phasing concordance table contains two fields: context phase and database phase, which narrow down 288 different phasing terms from the primary data into 29 terms. The database phase field, having been imported from the phasing concordance table into the primary data tables, sorts the data by period for the purpose of analysis. Meanwhile, the Ctx_pot_spot_date, Ctx_pot_dates_MOLA, end dates assigned to some MOLA contexts into the same 29 phasing terms.
The sequence of calculations that are carried out in order to arrive at a phase for any given context are complex because they can involve choosing among contradictory dates for the same context. The process is illustrated in Figures 3 and 4 below.  As this brief summary demonstrates the database brings together digital data from different archives with disparate digital ontologies but common boundary objects.
For example, the concept of phase can be represented as a period through different alphanumeric codes or as a date range using calendar dates. Therefore, large amounts of legacy data can be uploaded quickly using csv files, without the need for the entry of large amounts of data from published sources or the creation of bespoke digital ontologies. However, the collection of data using these methods inevitably incorporates the flaws present in the original data. The resulting database is therefore limited by the questions asked and methods used by the originators of the data, as well as those of the creator of the database.
This section has given a simplified outline of how the database is structured to answer questions on the regional histories of food production and consumption. The following section presents a brief overview of those histories contrasting the three regions in Later Prehistory, the Roman period, and the Early Medieval period, drawing out aspects of the analysis.

Case Study: The Regional Production and Consumption of Cereals, Meat, and Ceramics in Iron Age, Roman, and Early Medieval Central Southern England
The analysis of data on this scale reveals subtle differences of emphasis between case study areas in the varieties of cereals and animals grown or raised and eaten, which in certain cases persist from the beginning of the Later Prehistoric period around 1500 BC until at least the end of the Roman period in around AD 410 and perhaps beyond.
These regional patterns of production and consumption only changed significantly with the rise of new combinations of crops and an emphasis on different combinations of animals towards the end of the study period, beginning in the seventh century AD (McKerracher 2016: 98). Over the same period there were also subtle regional differences in the production and consumption of ceramic vessels, which followed an alternating pattern of diversity and homogeneity in vessel form that, apart from its regionality, was seemingly unrelated to the differences seen in crops and animals. The following summary emphasizes the Roman period data.
Analysis shows that in parts of central Kent transected by the HS1 railway there was a subtle, but very definite emphasis upon emmer wheat alongside other crops (Figure 5).
The analysis reveals an emphasis on emmer wheat, that, while being reduced from a Later Prehistoric peak persists into the Roman period, indicating a long-term regional interest in this particular variety of crop. This preference for emmer wheat in Kent is also shown by recent work carried out as part of the Roman Rural Settlement Project by Lodwick (Allen and Lodwick 2017: 155), who also demonstrates a preference for emmer in parts of western and south western England and the Welsh Marches.
The pattern revealed in animal husbandry and consumption is different (Figure 6).
In the parts of Kent transected by HS1 in Later Prehistory, in the same areas showing an emphasis on emmer wheat, the analysis shows a preference for sheep/goat over cattle.
However, this emphasis did not continue into the Roman period, when cattle was preferred to sheep/goat on the route of HS1 and in the Middle and Lower Thames Valley. (Figure 7) show cyclical variation over time between assemblages dominated by suites of cooking and communal eating vessels such as jars and those which also have a significant element of individual drinking and eating vessels, such as dishes, cups, and beakers. The Late Iron Age and Roman periods are characterized by assemblages with a wide range of vessel types, including several types of vessels designed for eating and drinking and other more specialist vessels for food processing, such as mortaria. However, regional variation within the Roman period is also apparent. For example, the part of the Upper Thames Valley assemblage   that does not comprise jars is dominated by dishes and bowls to a greater extent than the HS1 transect through Kent, which has assemblages with a greater proportion of drinking equipment.

Patterns of variation in ceramic repertoires
The broad patterns of regional variation in the ceramics' analysis imply further questions. For example, that of the varying regional importance of drinking in southern England from Later Prehistory through to the Early Medieval period. This is a question highlighted through the recognition of a particular boundary object in the datasetthe category of cup-attached to various forms of ceramic vessels in databases constructed by members of different ceramic specialist communities. Drinking alcohol in a ceremonial or ritual context has been viewed as something that was of importance to certain communities at times in the period concerned; for example, particularly at the beginning of the Roman period (Cool 2006: 163). The evidence collected during this project suggests an even more nuanced picture, where drinking was more important in some regions than in others. Being referenced from the Bronze Age to the Early Medieval period in the South East, through the deposition of cups, but less in evidence in the Upper Thames Valley, even in the Roman period, when cups of various kinds are common in the wider ceramic repertoire.
Evidently, the conclusions drawn from this analysis, which are presented in summary form are provisional and have a degree of dependence on the way in which the data-which are presented here at a very broad scale-have been divided up and on variation in the quantities of data available from different regions. In this analysis, data from Late Iron Age contexts have been included in the Roman period and contexts dating from the mid-fifth century onwards have been included in the Early Medieval period. It is possible that both of these decisions could have had the effect of emphasising differences in the data at the expense of continuity, by assigning data with 'Roman', or 'Early Medieval' characteristics such as Late Iron Age Gallo-Belgic pottery, or free threshing wheat to the Roman period, rather than later prehistory or the Early Medieval period. However, the fact the patterns observed reflect varying degrees of continuity rather than radical difference gives confidence that the chronological boundaries selected are not completely arbitrary. Similarly, large variations in the numbers of samples available between regions, such as the Middle Thames Valley and Kent, which are partly related to the differing nature of archaeological investigations (Donnelly 2016: 85, 139), may have resulted in arbitrary patterns generated from very small datasets being compared to more robust patterns from larger ones. It is therefore reassuring to note that there are strong similarities in the regional patterns seen in this study compared to others (e.g., McKerracher 2016; Lodwick 2017), even when large datasets are being compared against smaller datasets. For example, the pattern of dominance of wheat and barley over oats between the regions (Figure 5), or the salience of emmer wheat in the south-east. Nevertheless, it should be remembered that this analysis is intended to be illustrative of the methodological and theoretical approach that is the subject of this paper and it is hoped that it will be possible to publish a more detailed analysis of the data elsewhere in the future.
Having discussed the kinds of analysis that can be carried out using the methodology outlined in this paper and having presented an analytical case study, the following section moves on to consider the wider archaeological implications of the method.

Discussion: The Implications of the Concepts of Boundary Object and Boundary Infrastructure for Finds and Environmental Archaeology
Having presented a case study on the analysis of a specific large dataset, this section moves on to bring out the implications of the analysis by focusing on the identification of the tacit use of boundary objects and boundary infrastructures as defined by Bowker and Star (1999: 297) to structure the data.
Bearing in mind the definition of a boundary object as 'an object for classification that spans more than one community of practice and satisfies the informational requirements of each of them' (Bowker and Star 1999: 297). It follows that the datasets discussed above can be seen as tacitly constructed around a number of boundary objects/infrastructures if it is accepted that the sub-disciplines of Prehistoric, Roman and Medieval ceramics, archaeozoology and archaeobotany represent different communities of practice. In chronological terms, these include the concepts of Later Prehistory, the Roman period, and the Early Medieval period, all of which are subdivided in different ways by different specialist communities and are chronologically elastic, but also recognisable by different period specialists.
The determinant of whether an object counts as a boundary object depends on if it is employed by scholars working on more than one of these chronological periods, or in the case of ceramics, archaeobotany, and archaeozoology, whether it is employed by scholars who do not belong to one of these communities of specialists. Equally, given that a boundary object has been defined as 'a sort of arrangement that allows different groups to work together without consensus' (Star 2010: 602), it becomes clear what a boundary object is not: any object that is not used by more than one specialist community, e.g., the categories of two, or six-rowed hulled barley. In ceramics the most common boundary objects are articulated around functional vessel classes: jars, bowls, cups, and flagons or jugs, as well as several less 'translatable' forms such as amphorae.
Importantly, most of these classifications, and especially the first three, allow for a common understanding between different specialist communities, whilst also allowing for a proliferation of sub-types for local use. In archaeozoology and archaeobotany the most commonly deployed boundary objects are located at the level of genera or species, so that all the datasets tended to deploy the genus or species classifications of cattle, sheep/goat, horse, pig, wheat, barley, and oat, but to employ more local and contingent classifications at the sub-species level.
Boundary objects are articulated by what Bowker and Star refer to as boundary infrastructures. The definition of boundary infrastructures is complex (Bowker and Star 1999: 34-35;Star 2010: 611) but can be glossed as the assemblage of work processes and tools that allow knowledge work to be carried out. Boundary infrastructureswhich assemble boundary objects together-both enable the production of knowledge and constrain it down particular paths. This is significant when thinking about the role that digital developer-funded data has in the production of archaeological knowledge, as it suggests that the use of digital data at a large scale will inevitably shape the kinds of knowledge that archaeologists produce. In the examples given above, boundary infrastructure incorporates abstract schemes of classification such as ceramic typologies and the alphanumeric codes used to translate the typologies into digital form, but also the physical reference collections of pot sherds and animal bones which researchers use to relate their specimens to the typologies, the microscopes they use to examine the details on which classification depends, and database architectures which enable the recording of certain kinds of information and exclude others.
The presence of these boundary objects and infrastructures structuring the datasets described above, allowed the construction of a database containing a vast amount of data from multiple organisations, representing the work of many individuals. The challenge lay in isolating them and attaching common identifiers to them, however, once achieved this resulted in a database that provides a very clear idea of the 'big picture' on a regional scale. Admittedly it is more difficult to obtain a nuanced picture at the contextual level because of the sheer volume of data involved. However, these problems are inherent in using data on a very large scale and can be understood with reference to the concept of boundary infrastructure. Big data requires and develops a particular kind of boundary infrastructure, one that tends to favour simplified, standardised information schema that can be easily coded for use with commercially available databases and GIS software packages. Such boundary infrastructure more easily facilitates the production of 'big picture' analysis than the production of small-scale, nuanced understandings. For example, digital recording tends to exclude the nuance found in the comment sections of paper ceramics recording sheets, which might sometimes include sketches of vessel profiles. Arguably exclusion of this kind of data can steer analysis toward a more general analysis based on broad vessel class. However, it is also equally arguable that the 'big picture' analysis achieved in this project was worth the exclusion of a more nuanced approach, so long as it is remembered that balancing attempts need to be made to produce a more detailed understanding in other contexts.
It is important to note that the above is not an attempt to argue that the effort to create common standards and infrastructures should be abandoned, but rather that efforts at agreeing on common standards should be focused on boundary objects: the jars, cattle, wheat, etc. With additional effort being made to devise commonly accepted digital identifiers for those objects, while allowing a multiplicity of local 'standards' and modifications to flourish.
Additionally, the use of big data is bound up with the role of openness and the public availability of data, both for researchers and others, as the alternative, namely the aggregation of data within research institutions erodes trust in the knowledge created by these disciplines and deprives researchers of the opportunity to maximise the value of the data they themselves create. There is, therefore, a role for the concepts of the boundary object and of boundary infrastructure here, in dealing with the ethical risks inherent in big data, as the interpretative flexibility which boundary objects offer, can facilitate a tacking back and forth between ill-structured and well-structured aspects of the data that facilitates the analysis of 'the big picture', without necessarily obfuscating the presentation of local interpretations. However, what must be resisted in this approach is the hardening of boundary objects into standards, as this is the point at which they lose their flexibility (Star 2010: 613-614).

Conclusions: How to Organize Digital Finds and Environmental Data in Developer-Funded Archaeology for Research
It is clear, both from the results of the recent big data initiatives in UK archaeology and elsewhere, as well as the more modest findings of the PhD project on which this paper is based, that large and complex archaeological datasets can be successfully drawn on to generate new and interesting 'big picture' understandings of the past and to tease out previously unseen patterns in the data. However, the emphasis on 'characterful data' (Cooper and Green 2016: 294), which foregrounds the local, contingent, and partial nature of archaeological data, suggests a desire and opportunity to go beyond the big-picture, which has only been partially realised. This desire to draw out more local and contextual archaeological narratives from the large datasets gathered for these projects may be viewed as the 'next phase' of archaeological big data research. However, the methodological tools and concepts developed from these projects, including those of the boundary object and boundary infrastructure will remain crucial in the development of a more nuanced approach to the interrogation and integration of both legacy and newly generated datasets. A boundary object, or boundary infrastructure driven approach also has its limitations. Principle among these is probably the fact that fuzziness and ambiguity within primary datasets will always be a cause of anxiety for some scholars and be perceived as indicating a lack of rigour inherent within the data. It is therefore also important that archaeologists aim to be as rigorous as possible in the collection, dissemination, and archiving of data. In addition, some 'quantified' data is simply not scalable without a high level of standardisation, for example, if different units of measurement are used to quantify rim diameters of pottery then the resulting datasets will never be made interoperable without a great deal of extra work. However, use of a boundary objects driven approach to interrogate legacy datasets at a local level, or for very specific objects across larger regions, has the potential to produce much more nuanced analyses using large datasets. Future work might, for example, attempt to trace the depositional contexts of a particular Roman ceramic vessel type across northwest Europe or the presence of a particular cereal variety across the entirety of post-Roman and Early Medieval Britain. In the context of Roman Britain, use of the boundary object concept could be used to build up our knowledge of the regional differences identified in the case studies above. For example, in tracing the distributional and depositional differences of particular ceramic forms within and between regions. The case studies suggest that a particularly fruitful area of research could be the difference in composition between eating and drinking assemblages in the south-east of Roman Britain and more central areas such as the Upper Thames Valley.
Boundary objects and their associated infrastructures can act as a gateway to much more subtle and locally contextualised narratives to do with cuisine, consumption, and regional identity. The task that remains, therefore, is to develop these narratives by bringing together the increasingly sophisticated computational techniques now available, with the growing quantities of data generated by developer-funding of archaeological investigation.