GenMerge Frequently Asked Questions (FAQS)
1. What kind of Family Data does GenMerge work with?
GenMerge operates on GEDCOM files. Future releases of GenMerge will access popular file formats (like PAF) directly. To convert your family data to a GEDCOM file, refer to the directions for exporting data from your genealogy desktop program.
Most desktop genealogy programs have some type of match/merge functionality. However, most do not utilize the family information to evaluate potential matches. This means that you are reviewing many, many potential matches that can't possible be the same person. Also, because the match functions usually utilize last name as part of the criteria for matching, many potential matches with last name variations are missed, even if the “sounds like” functionality is used. GenMerge does no other job but find and merge duplicates. It doesn't replace your desktop program, but provides an important tool for you to use in managing your family data.
GenMerge will run on all machines that run a version of the Windows operating system and will merge files with an aggregate number of individuals up to about 200,000. The question is how long the processing will take. GenMerge keeps all the individuals in memory, so is memory intensive. A run with thousands of individuals can take a long time if you don't have adequate memory. How much memory is enough? Well, that depends on the operating system you are running. The newer versions of Windows (like XP) require so much memory on their own that even with 512 Mbytes of memory, there is very little left over for applications to use. As an example, processing a 44Mbyte file with about 200,000 individuals on an XP machine with 512Mbytes of memory takes about 73 minutes. The same file on a machine with 1Gb of memory takes 6 minutes. Small files will process quickly on any size machine, but as you move into larger and larger GEDCOM files, the amount of memory you have available is critical. If you have questions about using GenMerge on your machine, e-mail us with your configuration (processor and memory) and the size of your GEDCOM files (Mbytes and number of individuals) and we can help you determine if you have enough processing power. In any case, if you are using GenMerge with larger files, make sure you close all your other applications and have adequate disk space for your swap file.
We have found that it is very important to have clean, consistent GEDCOM files to use for merging. Any problems in the input compound with the merging process. That is why GenMerge generates a very detailed report of what the data looks like. For each file you choose, the Initial Report contains information about the original file and the Final Report contains the same information but for the cleaned file. This information is divided into the following sections:
This section contains several counts which can tell you about the quality of data in your file. These counts are:
GenMerge processes the name fields to eliminate extraneous words and characters. The resulting name is allocated to first, middle and last names using a set of rules. The last names that result from this processing are counted and the 50 most common names are listed with the count and the percentage of the total.
The counts for male and female individuals reflect the declared sex. If sex is not given or is unknown, the person is counted as "unknown".
Birth and Death Year Analysis
The birth and death year information is aggregated from the individual birth and death years. The numbers are shown by century until the 1900's and by decades after that. The initial count shows the number of individuals that have no recorded birth or death year. The counts are shown with the percentage of total individuals the count represents.
The list of marriage counts shows the number of individuals that have no marriages, a single marriage, two marriages, etc.
Totals are listed for:
The final list in this section is the count of the number of children per marriage.
Birth Place Analysis
GenMerge processes the birth (and death) places to standardize the location. The values shown in the list are the most common birth states and birth countries found in the database.
If any problems are noted in the input GEDCOM file (family references that are not reflexive, poorly formed GEDCOM tags, etc.), they will be listed here, as will the birth year inconsistencies and loops. A birth year inconsistency occurs when someone has a birth year before one of his ancestors. These errors are listed with the heading “Birth year inconsistency” and show the descending tree where the error exists. The person with the inconsistent birth year is removed as a child from his parent's marriage. If you agree with this, you need take no action. If the birth year is incorrect or there is another problem, you can fix this in your original data and repeat the analysis.
A loop in is any individual that appears as his own ancestor or the ancestor of his spouse. The error listing shows the type of loop, the id number of the individual that is causing the loop and the descending tree where the problem was found.
A loop is a serious problem in a family database and most often reflects confusion about where an individual should be in a pedigree. GenMerge breaks these loops by removing the individual as a child in a marriage (for loops where the individual is his own ancestor) or removing the individual from a marriage (where a spouse is his ancestor). GenMerge will proceed with linking files after breaking these loops and if you agree with the solution, there is no need to do anything else. However, if you are uncertain about this part of your database, you may want to do some research and repair the problem before merging this family database with another.
A connected component is a group of individuals that are connected by family relationships. Some GEDCOM files are a single family unit. Others may have different family groups that are not connected. Sometimes, there are individuals or small groups of people who are added to the family database, but never attached to the main family unit. This section of the report lists the size of the connected components and the number of components of this size. A optional tag is added to the output GEDCOM file that has the connected component size and the number of the connected component of that size. For example, if you have three connected components of size two and you would like to see these individuals they will be labeled 3-1, 3-2 and 3-3. The two individuals labeled 3-1 are connected, the two individuals labeled 3-2 are connected, etc.
The generation analysis shows the generation number and the number of individuals at that generation level. This analysis can show the type of GEDCOM data you have. A family tree with a large number of generations, but only a few individuals at each generation indicate someone working on a specific line of their pedigree. A small number of generations with a large number of individuals in those generations is more reflective of work done to find all the descendants of a particular individual.
Finding duplicates is a two-step process. First, all individuals with similar last names are compared to one another and two scores are computed: the individual score and the family score. If a pair of individuals has a good individual score and a good family score, these individuals are considered matches. GenMerge then merges these pairs. Step two looks at the relatives of these initial matches. This means that if two individuals match, GenMerge looks at the set of fathers, mothers, wives and children and finds additional duplicates in these smaller sets. For example, if we merge two John Johnson's born in 1869 and one has a wife named Mary Jones, born in 1872 and the other has a wife named Mary Unknown born in 1872 we can confidently merge these two women, even though we didn't find them in our initial step.
The merge options are
The first option is the most common. This is the option you would choose if you have your own family database and you want to add information from one or more other files. The result will be one merged file. Any new individuals that you didn't have in your original file will be listed in the New People report. Any individuals that overlapped between your file and the other files will be listed in the Duplicate Report.
The other options are useful for investigating several files that you have no experience with. You can select several files and by choosing the second option (merge by pairs) you can in one run evaluate the overlap between these files. The third option is useful if you have a core file and you would like to see the effects of merging several files to it, one step at a time.
When the merge process is complete, the New People report shows all the individuals that were not in the main file, but are now in the merged file. If you are working with another person on a family this is the report that will show you what people they added. The Duplicate Report shows the overlap, the New People report shows the individuals that did not overlap.
The High Scoring Failure report lists the pairs of individuals that have very high individual scores, but were not chosen in the initial step as duplicates because they have a family inconsistency. These are interesting because they point out problems in the family information. This can be because of different research or just simple data entry problems. This report gives you a concise list of individuals to review.
The Parent Problem Report is similar, but in these cases the individuals were merged. Because a person can have only one set of parents, when a person merges, but the parents don't one of the sets of parents is chosen for the resulting merged file, but this report shows both sets of parents. Most of the time, there is a simple error in one or the other of the parents, but sometimes, this can also be the result of differing research. This report is another that gives you important information about differences in the information in two files.
When GenMerge decides two individuals are duplicates, it combines the information into a new merged individual. The way tags are generated in the output file depends on the option chosen on the initial Setup tab. The default is to keep all the non-duplicated tags from all the individuals. Each tag is copied to the new individual. If the tag is not for the main file, a source tag is inserted for the file from which it came. A note is also generated listing the id numbers of all the individuals that merged. This means that you have access to all the name variations, all the place and date variations as well as all the notes and source information from each individual.
You can also opt to have only the tags from the Main file copied to the output for any duplicated individuals. This means that if you are adding information from other sources, the people you already had in your file will be unchanged. Any new people will have all their tags from whatever file they came from.
During the merging process, GenMerge cleans names, dates and places. If you want to preserve these cleaned data items, you may select that option.
GenMerge “automatically” merges individuals. By producing a detailed report of what merging was done you still have the option to approve or disapprove of each merge by changing your data to more accurately reflect the individuals. The process used by GenMerge considers ALL the family information for each potential match. This is very hard to duplicate when doing a manual merge, because it is difficult to show a complete ascending and descending tree for each individual so that you have all the information GenMerge has to make a decision. We believe that you will be pleased with the choices made by GenMerge and by your ability to review each choice after the fact.
The preprocessing step does find and eliminate duplicates from each file. This is the same process that is used when merging two or more files, so you've already done what you wanted to do! The duplicates found are listed in the Duplicate Report for the file you chose.
GenMerge should preserve all the tags that you have in your GEDCOM file including multi-media objects.
If you don't agree with a choice made by GenMerge we would like to think it is because of information you have that the program didn't have. If you will enter this information into your data, it will serve to improve the merging/not merging. For example, if you know that one John Jones was from England while another was from Ireland, then enter this information in the birth information so the two will be differentiated. If you know that one name is a common alternate spelling for another, making both people have the same variation will probably improve the score enough to merge them. If you have specific problems that you think GenMerge should handle automatically, let us know. We want GenMerge to work as well as you do! .
The demonstration copies are good for a week, but we realize that sometimes you need a longer period of time to evaluate a product. Contact customer support and we'll help you out.
When you purchased your license, the product key came with instructions for registering. If you have followed these instructions and are still getting an error, please contact customer support for help.
Occasionally when GenMerge installs the permissions on the installed directories are not correct. To set the permissions do the following:
1. Navigate to the directory c:\Program Files and right click on the GenMerge directory, then click Properties.
2. Click the Security tab, and then click Edit.
3. From the list of users, highlight your username and check the Allow box next to Full Control, or check all the boxes for the other privileges individually. Click apply. If you get an error adjusting these settings that says you must be an administrator, you will need to log off and log back on as a user with administrative privilege. Most of the time the user you log on as is already an administrator.
4. Repeat this process to check the permissions on the logs directory. Change the permissions if necessary so your user name has full privileges to the logs directory.
a. In the GenMerge installation directory rename _localjvm to be save_localjvm. By default the GenMerge installation directory is c:\Program Files\GenMerge (c:\Program Files (x86) on 64 bit machines).
b. Start GenMerge. If you have a version of Java on your machine, GenMerge will start without a problem and without the display reset. If not, you will see a message that there is no Java.
c. To download Java, start your browser and navigate to www.java.com. Press the “Free Java Download” button. Once the installation completes, GenMerge will start without a problem.