Empirical researchers often do not know what kind of datasets are already out there, are currently being worked at or the idea of creating a dataset were discarded due to a lack of feasibility for a single person or team to finish it.
Polydata aims to be a platform before researchers even begin working on their often time-consuming creation of datasets, to find and collaborate with other researchers that may plan or are currently working on the same or a similar dataset. Time-consuming datasets should not be public knowledge only after it was already done and published, thus possibly wasting a lot of time of everyone involved.
The aim is to be a better, more reliable platform for information exchange of datasets than rumors, hearsay or talks during seminars, conferences or other outlets where researchers infrequently meet.
Apart from being a platform to propose datasets projects in order to find others that are similarly interested in creating such datasets, this platform also offers a frequently (community) updated list on the completed datasets that can be filtered by key words or field of interest, which you otherwise may only find through the right combination of words in a search engine or infrequently summarized in some metastudy.
What polydata groups can do
- Find other people who are interested in, have already started creating, or need collaborators to realistically start or finish a dataset in order to reduce inefficiencies.
- Share experiences through a group specific Wiki about the process to increase efficiency.
- A group specific forum for information exchange, coordination or collaboration.
- Find suggested polydata groups that your friends are also in.
Reason why I set up polydata
As a (historical) urban/geographical economists, I often had to digitize historical economic activity/population ‘proxy’ data at a specific unit of observation (city level, individual household level) for a research project.
Sometimes by pure chance, I learned about some colleagues working on interesting datasets that I planned to work on as well (but did not pursue primarily due to time constraints) or they are working on the same or similar dataset.
For example, one proposed idea was to use aerial photographs to get the exact location of WWII bombing destruction within German cities. So far, only city-wide bombing statistics (rubble per city capita, percentage of housing destroyed) were available, which limits within-city research possible. However, I did not think digitization was feasible at least by hand within a reasonable time, so I gave up on it. By chance, I met Siegfried To recently at a seminar who is using computer vision to digitize aerial photographs. Through conversations, I have also found many other researchers who are interested in such a dataset.
Another idea was to use city directories to get an idea about where people with different socioeconomic backgrounds live within the city and how different factors may influence segregation and inequality within the city from the mid 19th century to today. By hearsay, I was told that some digitization work on German ‘Adressbücher’ city directories had been done in the past by Hans-Joachim Voth and Tommy Krieger. Again, these informations were found out only indirectly and may have led to doing some of the work twice.
Lastly, I was at a seminar of Eric Chaney who created a proxy variable for historical city population by painstakingly looking at historical authors death location in modern library holdings. I only found out about this tremendous dataset after it was presentable at seminars i.e. the dataset was basically done. However given that it often takes years to create such a dataset, there is a risk of working twice if someone else is doing the same or something similar in parallel. For example, before that seminar I was thinking about digitizing painters (not authors) biographies to proxy city population as well, but I (very luckily) did not pursue this idea due to time constraints during my PhD.
Because of that, I initially set up this platform to facilitate dataset collaboration among economic historians, urban/geographical economists and other empirically minded economic fields of studies. However, many other disciplines which may similarly suffer from a lack of information on what is out there/being worked at currently are welcome to propose a dataset in a polydata project, as long as its purpose is to use it within academic research. Completely free of charge of course.
About the name ‘polydata’
The polydata name is a reference to the very successful collaborative math project platform ‘polymath’ by mathematicians that coordinates mathematicians to efficiently work on the same problem and together find the best route to that goal. For example, Yitang Zhang found a first prove that bounds the gap between primes to 70 million, with the polymath project together eventually decreasing that gap even further to 246. I always wished that empirical researchers would have a similar platform for collaboration on huge datasets.
cheers
Duc Nguyen