-
公开(公告)号:US20190034437A1
公开(公告)日:2019-01-31
申请号:US15663575
申请日:2017-07-28
Applicant: Microsoft Technology Licensing, LLC
Inventor: Sumit GULWANI , Prateek JAIN , Daniel Adam PERELMAN , Saswat PADHI , Oleksandr POLOZOV
CPC classification number: G06F16/355 , G06F17/2264 , G06F17/271
Abstract: A computing device includes a storage machine holding instructions executable by a logic machine to generate multi-string clusters, each containing alphanumeric strings of a dataset. Further multi-string clusters are generated via iterative performance of a combination operation in which a hierarchically-superior cluster is generated from a set of multi-string clusters. The combination operation includes, for candidate pairs of multi-string clusters, generating syntactic profiles describing an alphanumeric string from each multi-string cluster of the candidate pair. For each of the candidate pairs, a cost factor is determined for at least one of its syntactic profiles. Based on the cost factors determined for the syntactic profiles, one of the candidate pairs is selected. The multi-string clusters from the selected candidate pair are combined to generate the hierarchically-superior cluster including all of the alphanumeric strings from the selected candidate pair of multi-string clusters.
-
公开(公告)号:US20190311004A1
公开(公告)日:2019-10-10
申请号:US16448805
申请日:2019-06-21
Applicant: Microsoft Technology Licensing, LLC
Inventor: Sumit GULWANI , Prateek JAIN , Daniel Adam PERELMAN , Saswat PADHI , Oleksandr POLOZOV
Abstract: A computing device includes a storage machine holding instructions executable by a logic machine to generate multi-string clusters, each containing alphanumeric strings of a dataset. Further multi-string clusters are generated via iterative performance of a combination operation in which a hierarchically-superior cluster is generated from a set of multi-string clusters. The combination operation includes, for candidate pairs of multi-string clusters, generating syntactic profiles describing an alphanumeric string from each multi-string cluster of the candidate pair. For each of the candidate pairs, a cost factor is determined for at least one of its syntactic profiles. Based on the cost factors determined for the syntactic profiles, one of the candidate pairs is selected. The multi-string clusters from the selected candidate pair are combined to generate the hierarchically-superior cluster including all of the alphanumeric strings from the selected candidate pair of multi-string clusters.
-
公开(公告)号:US20250068837A1
公开(公告)日:2025-02-27
申请号:US18734203
申请日:2024-06-05
Applicant: Microsoft Technology Licensing, LLC
Inventor: Benjamin Goth ZORN , Marc Manuel Johannes BROCKSCHMIDT , Pallavi CHOUDHURY , Oleksandr POLOZOV , Rishabh SINGH , Saswat PADHI
IPC: G06F40/18 , G06F16/338 , G06N3/04 , G06N3/08 , G06N5/046
Abstract: Systems, methods, and computer-readable storage devices are disclosed for improved table identification in a spreadsheet. One method including: receiving a spreadsheet including at least one table; identifying, using machine learning, one or more classes of a plurality of classes for each cell of the received spreadsheet, wherein the plurality of classes include corners and not-a-corner; and inducing at least one table in the received spreadsheet based on the one or more identified classes for each cell of the received spreadsheet.
-
4.
公开(公告)号:US20200019603A1
公开(公告)日:2020-01-16
申请号:US16034447
申请日:2018-07-13
Applicant: Microsoft Technology Licensing, LLC
Inventor: Benjamin Goth ZORN , Marc Manuel Johannes BROCKSCHMIDT , Pallavi CHOUDHURY , Oleksandr POLOZOV , Rishabh SINGH , Saswat PADHI
Abstract: Systems, methods, and computer-readable storage devices are disclosed for improved table identification in a spreadsheet. One method including: receiving a spreadsheet including at least one table; identifying, using machine learning, one or more classes of a plurality of classes for each cell of the received spreadsheet, wherein the plurality of classes include corners and not-a-corner; and inducing at least one table in the received spreadsheet based on the one or more identified classes for each cell of the received spreadsheet.
-
-
-